A Python-based tool to aggregate and track cloud usage costs from AWS, GCP, and Azure. Collects daily service-level cost data and stores it in PostgreSQL for analysis and reporting.
- Multi-Cloud Support: Collects costs from AWS, GCP, and Azure
- Service-Level Granularity: Tracks costs broken down by individual services
- Daily Cost Tracking: Maintains daily cost history for trend analysis
- Automatic Upserts: Handles cost updates as cloud providers finalize billing data
- T-2 Day Lookback: Accounts for cloud billing data materialization delays (costs from 2 days ago)
- Historical Backfill: Supports backfilling up to 90 days of historical data
- Normalized Pricing: All costs stored in USD for consistency
- Credit Handling: Excludes credits and refunds to show actual usage costs
- Parallel Collection: Collects from all providers simultaneously for speed
cloud_cost_aggregator/
├── collectors/ # Cloud provider collectors
│ ├── base_collector.py # Base class for all collectors
│ ├── aws_collector.py # AWS Cost Explorer integration
│ ├── gcp_collector.py # GCP BigQuery billing export
│ └── azure_collector.py # Azure Sponsorship portal API
├── database/ # Database layer
│ ├── connection.py # Database connection management
│ ├── models.py # SQLAlchemy models
│ └── schema.sql # PostgreSQL schema
├── utils/ # Utility functions
│ ├── logger.py # Logging configuration
│ └── date_utils.py # Date range utilities
├── aggregator.py # Main aggregation orchestrator
├── config.py # Configuration management
├── main.py # CLI entry point
└── requirements.txt # Python dependencies
- Python 3.9+
- PostgreSQL 12+
- Cloud provider accounts with appropriate permissions:
- AWS: Cost Explorer API access
- GCP: BigQuery billing export enabled
- Azure: Valid sponsorship or subscription
git clone https://github.com/yourusername/cloud-cost-aggregator.git
cd cloud-cost-aggregatorpython3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtCreate the database:
# Using createdb
createdb cloud_costs
# Or using psql
psql -U postgres -c "CREATE DATABASE cloud_costs;"Initialize the schema:
# Using psql
psql -U postgres -d cloud_costs -f database/schema.sql
# Or using the CLI
python main.py --init-dbCopy the example environment file:
cp .env.example .envEdit .env with your actual credentials (see Configuration section below).
Edit the .env file with your credentials:
# PostgreSQL Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=cloud_costs
DB_USER=postgres
DB_PASSWORD=your_password_here
# AWS Credentials
AWS_ACCESS_KEY_ID=AKIAXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_REGION=us-east-1
# GCP Credentials
GCP_BILLING_ACCOUNT_ID=XXXXXX-YYYYYY-ZZZZZZ
GCP_PROJECT_ID=your-project-id
GCP_CREDENTIALS_PATH=/path/to/service-account.json
GCP_BIGQUERY_DATASET=billing_export
# Azure Credentials
AZURE_TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_CLIENT_SECRET=your-secret-value-here
AZURE_SUBSCRIPTION_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
# Azure Sponsorship Cookies (for Azure for Students/Startups)
AZURE_SPONSORSHIP_COOKIES=your_cookies_here
# Optional Configuration
LOG_LEVEL=INFO
LOOKBACK_DAYS=2
BACKFILL_DAYS=90- Create IAM user with Cost Explorer permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ce:GetCostAndUsage",
"ce:GetCostForecast"
],
"Resource": "*"
}
]
}- Enable Cost Explorer in the AWS Console (if not already enabled)
- Create access key and add credentials to
.envfile
- Enable billing export to BigQuery:
# Create dataset for billing export
bq mk --dataset --location=US billing_exportThen in GCP Console:
- Go to Billing → Billing export → BigQuery export
- Enable "Detailed usage cost" export
- Set dataset to:
billing_export
- Create service account and grant permissions:
# Create service account
gcloud iam service-accounts create cloud-cost-reader \
--display-name="Cloud Cost Reader"
# Grant BigQuery permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:cloud-cost-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.user"
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:cloud-cost-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
# Create and download key
gcloud iam service-accounts keys create gcp-credentials.json \
--iam-account=cloud-cost-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com- Update
.envwith the path togcp-credentials.json
Note: It can take up to 24 hours for billing data to appear in BigQuery after enabling export.
# Create service principal
az ad sp create-for-rbac --name "cloud-cost-reader"
# Grant Cost Management Reader role
az role assignment create \
--assignee <client-id> \
--role "Cost Management Reader" \
--scope /subscriptions/<subscription-id>Azure Sponsorship accounts don't support the Cost Management API, so we use a cookie-based approach:
- Log in to Azure Sponsorship Portal
- Open browser DevTools (F12) → Network tab
- Refresh the page
- Find any API request and copy the entire
Cookieheader value - Add to
.envasAZURE_SPONSORSHIP_COOKIES
Note: Cookies expire periodically and need to be refreshed.
Before running cost collection, verify all cloud provider connections:
python main.py --test-connectionsTest specific providers:
python main.py --test-connections --providers aws,gcpRun daily collection with T-2 lookback (recommended for cron jobs):
python main.pyThis collects costs from 2 days ago, accounting for billing data materialization delays.
Backfill 90 days of historical data:
python main.py --backfillCustom backfill period:
python main.py --backfill --start-date 2024-10-01 --end-date 2024-11-01Collect costs for specific date range:
python main.py --start-date 2024-11-01 --end-date 2024-11-08Collect from specific providers only:
python main.py --providers aws,azureCreate database tables:
python main.py --init-dbAdd to crontab (crontab -e):
# Run daily at 2 AM
0 2 * * * cd /path/to/cloud-cost-aggregator && /path/to/venv/bin/python main.py >> /var/log/cloud-costs.log 2>&1Run twice daily to catch cost updates:
# Morning run at 2 AM
0 2 * * * cd /path/to/cloud-cost-aggregator && /path/to/venv/bin/python main.py >> /var/log/cloud-costs.log 2>&1
# Evening run at 2 PM
0 14 * * * cd /path/to/cloud-cost-aggregator && /path/to/venv/bin/python main.py >> /var/log/cloud-costs.log 2>&1- Open Task Scheduler
- Create Basic Task
- Set trigger: Daily at 2:00 AM
- Action: Start a program
- Program:
C:\path\to\venv\Scripts\python.exe - Arguments:
main.py - Start in:
C:\path\to\cloud-cost-aggregator
- Program:
CREATE TABLE cloud_costs (
id SERIAL PRIMARY KEY,
cloud_provider VARCHAR(50) NOT NULL,
service_name VARCHAR(255) NOT NULL,
cost_usd DECIMAL(12, 2) NOT NULL,
usage_date DATE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(cloud_provider, service_name, usage_date)
);The UNIQUE constraint enables automatic upserts when costs are updated.
| Column | Type | Description |
|---|---|---|
| id | SERIAL | Primary key |
| cloud_provider | VARCHAR(50) | 'aws', 'gcp', or 'azure' |
| service_name | VARCHAR(255) | Service name (e.g., 'EC2', 'S3') |
| cost_usd | DECIMAL | Cost in USD (2 decimal places) |
| usage_date | DATE | Date the cost occurred |
| created_at | TIMESTAMP | Record creation timestamp |
| updated_at | TIMESTAMP | Last update timestamp |
The schema includes helpful views for common queries:
daily_cost_summary - Daily totals by provider
service_cost_summary - Service-level aggregates
Total costs by provider (last 30 days):
SELECT
cloud_provider,
SUM(cost_usd) as total_cost
FROM cloud_costs
WHERE usage_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY cloud_provider
ORDER BY total_cost DESC;Top 10 most expensive services:
SELECT
cloud_provider,
service_name,
SUM(cost_usd) as total_cost
FROM cloud_costs
WHERE usage_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY cloud_provider, service_name
ORDER BY total_cost DESC
LIMIT 10;Daily cost trend:
SELECT
usage_date,
cloud_provider,
SUM(cost_usd) as daily_cost
FROM cloud_costs
WHERE usage_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY usage_date, cloud_provider
ORDER BY usage_date DESC, cloud_provider;Monthly cost comparison:
SELECT
DATE_TRUNC('month', usage_date) as month,
cloud_provider,
SUM(cost_usd) as monthly_cost
FROM cloud_costs
GROUP BY month, cloud_provider
ORDER BY month DESC, cloud_provider;The Azure collector normalizes service names for better grouping:
- Azure OpenAI: All GPT, ChatGPT, Davinci, Embedding, and Ada models
- Azure Speech-to-Text: All speech-to-text and STT services
- Azure Text-to-Speech: All text-to-speech, TTS, and neural voice services
Service names are preserved as returned by the respective APIs.
Issue: AWS costs showing as $0 or negative values
Solution: The tool automatically excludes credits and refunds using the Filter parameter. This shows actual usage costs. AWS data can take 24-48 hours to fully materialize.
Issue: BigQuery billing export tables not found
Solution:
- Verify billing export is enabled in GCP Console
- Wait up to 24 hours for initial data to populate
- Check dataset name matches
GCP_BIGQUERY_DATASETin.env - Verify service account has
bigquery.userandbigquery.dataViewerroles
Issue: Azure Sponsorship API returns authentication errors
Solution:
- Cookies have expired - refresh them from browser
- Log in to Azure Sponsorship portal
- Open DevTools (F12) → Network tab
- Find an API request and copy the Cookie header
- Update
AZURE_SPONSORSHIP_COOKIESin.env
Issue: Azure returns aggregated data for date ranges
Solution: The collector automatically makes separate API calls for each day in the range to ensure daily granularity.
Issue: Cannot connect to PostgreSQL
Solution:
# Check PostgreSQL is running
pg_isready
# Test connection manually
psql -h localhost -U postgres -d cloud_costs
# Verify credentials in .env match your PostgreSQL setupIssue: API rate limits exceeded
Solution:
- Run providers separately:
--providers awsthen--providers gcp, etc. - The tool already uses parallel collection which should be within limits
- Check your cloud provider's API quota limits
Running this tool incurs minimal cloud costs:
- AWS Cost Explorer API: $0.01 per API request (typically 1-2 requests per run)
- GCP BigQuery: Billed per query (usually < $0.01 per run with small billing data)
- Azure: Sponsorship portal API is free
Estimated monthly cost: $0.50-$1.00 per month in API charges for daily runs.
- Never commit
.envfile - it contains sensitive credentials .envis already in.gitignore- verify before committing- Store credentials securely (use secret management tools in production)
- Rotate Azure sponsorship cookies regularly
- Use IAM roles with minimal required permissions
- Enable MFA on cloud provider accounts
- Consider using AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault for production
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - See LICENSE file for details
For issues, questions, or contributions:
- Open an issue on GitHub
- Check existing issues for solutions
- Refer to cloud provider documentation for API-specific questions
Future enhancements:
- Add support for Oracle Cloud Infrastructure (OCI)
- Implement cost anomaly detection and alerts
- Add web dashboard for visualization
- Support for multiple AWS accounts
- Export to CSV/Excel
- Slack/email notifications for cost spikes
- Budget alerts and thresholds
- Cost forecasting using historical data
- Tag-based cost allocation
- Docker containerization
Built to help teams track and optimize their multi-cloud spending.
Last Updated: November 2024