Cloud Cost Aggregator

A Python-based tool to aggregate and track cloud usage costs from AWS, GCP, and Azure. Collects daily service-level cost data and stores it in PostgreSQL for analysis and reporting.

Features

Multi-Cloud Support: Collects costs from AWS, GCP, and Azure
Service-Level Granularity: Tracks costs broken down by individual services
Daily Cost Tracking: Maintains daily cost history for trend analysis
Automatic Upserts: Handles cost updates as cloud providers finalize billing data
T-2 Day Lookback: Accounts for cloud billing data materialization delays (costs from 2 days ago)
Historical Backfill: Supports backfilling up to 90 days of historical data
Normalized Pricing: All costs stored in USD for consistency
Credit Handling: Excludes credits and refunds to show actual usage costs
Parallel Collection: Collects from all providers simultaneously for speed

Architecture

cloud_cost_aggregator/
├── collectors/              # Cloud provider collectors
│   ├── base_collector.py   # Base class for all collectors
│   ├── aws_collector.py    # AWS Cost Explorer integration
│   ├── gcp_collector.py    # GCP BigQuery billing export
│   └── azure_collector.py  # Azure Sponsorship portal API
├── database/               # Database layer
│   ├── connection.py       # Database connection management
│   ├── models.py          # SQLAlchemy models
│   └── schema.sql         # PostgreSQL schema
├── utils/                 # Utility functions
│   ├── logger.py         # Logging configuration
│   └── date_utils.py     # Date range utilities
├── aggregator.py         # Main aggregation orchestrator
├── config.py            # Configuration management
├── main.py             # CLI entry point
└── requirements.txt    # Python dependencies

Prerequisites

Python 3.9+
PostgreSQL 12+
Cloud provider accounts with appropriate permissions:
- AWS: Cost Explorer API access
- GCP: BigQuery billing export enabled
- Azure: Valid sponsorship or subscription

Installation

1. Clone the repository

git clone https://github.com/yourusername/cloud-cost-aggregator.git
cd cloud-cost-aggregator

2. Create virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Setup PostgreSQL database

Create the database:

# Using createdb
createdb cloud_costs

# Or using psql
psql -U postgres -c "CREATE DATABASE cloud_costs;"

Initialize the schema:

# Using psql
psql -U postgres -d cloud_costs -f database/schema.sql

# Or using the CLI
python main.py --init-db

5. Configure environment variables

Copy the example environment file:

cp .env.example .env

Edit .env with your actual credentials (see Configuration section below).

Configuration

Environment Variables

Edit the .env file with your credentials:

# PostgreSQL Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=cloud_costs
DB_USER=postgres
DB_PASSWORD=your_password_here

# AWS Credentials
AWS_ACCESS_KEY_ID=AKIAXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AWS_REGION=us-east-1

# GCP Credentials
GCP_BILLING_ACCOUNT_ID=XXXXXX-YYYYYY-ZZZZZZ
GCP_PROJECT_ID=your-project-id
GCP_CREDENTIALS_PATH=/path/to/service-account.json
GCP_BIGQUERY_DATASET=billing_export

# Azure Credentials
AZURE_TENANT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
AZURE_CLIENT_SECRET=your-secret-value-here
AZURE_SUBSCRIPTION_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

# Azure Sponsorship Cookies (for Azure for Students/Startups)
AZURE_SPONSORSHIP_COOKIES=your_cookies_here

# Optional Configuration
LOG_LEVEL=INFO
LOOKBACK_DAYS=2
BACKFILL_DAYS=90

AWS Setup

Create IAM user with Cost Explorer permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ce:GetCostAndUsage",
        "ce:GetCostForecast"
      ],
      "Resource": "*"
    }
  ]
}

Enable Cost Explorer in the AWS Console (if not already enabled)
Create access key and add credentials to .env file

GCP Setup

Enable billing export to BigQuery:

# Create dataset for billing export
bq mk --dataset --location=US billing_export

Then in GCP Console:

Go to Billing → Billing export → BigQuery export
Enable "Detailed usage cost" export
Set dataset to: billing_export

Create service account and grant permissions:

# Create service account
gcloud iam service-accounts create cloud-cost-reader \
  --display-name="Cloud Cost Reader"

# Grant BigQuery permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:cloud-cost-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.user"

gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:cloud-cost-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

# Create and download key
gcloud iam service-accounts keys create gcp-credentials.json \
  --iam-account=cloud-cost-reader@YOUR_PROJECT_ID.iam.gserviceaccount.com

Update .env with the path to gcp-credentials.json

Note: It can take up to 24 hours for billing data to appear in BigQuery after enabling export.

Azure Setup

For Standard Subscriptions:

# Create service principal
az ad sp create-for-rbac --name "cloud-cost-reader"

# Grant Cost Management Reader role
az role assignment create \
  --assignee <client-id> \
  --role "Cost Management Reader" \
  --scope /subscriptions/<subscription-id>

For Azure Sponsorships (Students/Startups):

Azure Sponsorship accounts don't support the Cost Management API, so we use a cookie-based approach:

Log in to Azure Sponsorship Portal
Open browser DevTools (F12) → Network tab
Refresh the page
Find any API request and copy the entire Cookie header value
Add to .env as AZURE_SPONSORSHIP_COOKIES

Note: Cookies expire periodically and need to be refreshed.

Usage

Test Connections

Before running cost collection, verify all cloud provider connections:

python main.py --test-connections

Test specific providers:

python main.py --test-connections --providers aws,gcp

Daily Cost Collection

Run daily collection with T-2 lookback (recommended for cron jobs):

python main.py

This collects costs from 2 days ago, accounting for billing data materialization delays.

Historical Backfill

Backfill 90 days of historical data:

python main.py --backfill

Custom backfill period:

python main.py --backfill --start-date 2024-10-01 --end-date 2024-11-01

Custom Date Ranges

Collect costs for specific date range:

python main.py --start-date 2024-11-01 --end-date 2024-11-08

Provider Selection

Collect from specific providers only:

python main.py --providers aws,azure

Initialize Database

Create database tables:

python main.py --init-db

Setting Up Automation

Linux/macOS (Cron)

Add to crontab (crontab -e):

# Run daily at 2 AM
0 2 * * * cd /path/to/cloud-cost-aggregator && /path/to/venv/bin/python main.py >> /var/log/cloud-costs.log 2>&1

Run twice daily to catch cost updates:

# Morning run at 2 AM
0 2 * * * cd /path/to/cloud-cost-aggregator && /path/to/venv/bin/python main.py >> /var/log/cloud-costs.log 2>&1

# Evening run at 2 PM
0 14 * * * cd /path/to/cloud-cost-aggregator && /path/to/venv/bin/python main.py >> /var/log/cloud-costs.log 2>&1

Windows (Task Scheduler)

Open Task Scheduler
Create Basic Task
Set trigger: Daily at 2:00 AM
Action: Start a program
- Program: C:\path\to\venv\Scripts\python.exe
- Arguments: main.py
- Start in: C:\path\to\cloud-cost-aggregator

Database Schema

Main Table: `cloud_costs`

CREATE TABLE cloud_costs (
    id SERIAL PRIMARY KEY,
    cloud_provider VARCHAR(50) NOT NULL,
    service_name VARCHAR(255) NOT NULL,
    cost_usd DECIMAL(12, 2) NOT NULL,
    usage_date DATE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(cloud_provider, service_name, usage_date)
);

The UNIQUE constraint enables automatic upserts when costs are updated.

Column	Type	Description
id	SERIAL	Primary key
cloud_provider	VARCHAR(50)	'aws', 'gcp', or 'azure'
service_name	VARCHAR(255)	Service name (e.g., 'EC2', 'S3')
cost_usd	DECIMAL	Cost in USD (2 decimal places)
usage_date	DATE	Date the cost occurred
created_at	TIMESTAMP	Record creation timestamp
updated_at	TIMESTAMP	Last update timestamp

Views

The schema includes helpful views for common queries:

daily_cost_summary - Daily totals by provider service_cost_summary - Service-level aggregates

Querying Cost Data

Example SQL Queries

Total costs by provider (last 30 days):

SELECT
    cloud_provider,
    SUM(cost_usd) as total_cost
FROM cloud_costs
WHERE usage_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY cloud_provider
ORDER BY total_cost DESC;

Top 10 most expensive services:

SELECT
    cloud_provider,
    service_name,
    SUM(cost_usd) as total_cost
FROM cloud_costs
WHERE usage_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY cloud_provider, service_name
ORDER BY total_cost DESC
LIMIT 10;

Daily cost trend:

SELECT
    usage_date,
    cloud_provider,
    SUM(cost_usd) as daily_cost
FROM cloud_costs
WHERE usage_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY usage_date, cloud_provider
ORDER BY usage_date DESC, cloud_provider;

Monthly cost comparison:

SELECT
    DATE_TRUNC('month', usage_date) as month,
    cloud_provider,
    SUM(cost_usd) as monthly_cost
FROM cloud_costs
GROUP BY month, cloud_provider
ORDER BY month DESC, cloud_provider;

Service Name Normalization

Azure Services

The Azure collector normalizes service names for better grouping:

Azure OpenAI: All GPT, ChatGPT, Davinci, Embedding, and Ada models
Azure Speech-to-Text: All speech-to-text and STT services
Azure Text-to-Speech: All text-to-speech, TTS, and neural voice services

AWS & GCP

Service names are preserved as returned by the respective APIs.

Troubleshooting

AWS: No data or incorrect costs

Issue: AWS costs showing as $0 or negative values

Solution: The tool automatically excludes credits and refunds using the Filter parameter. This shows actual usage costs. AWS data can take 24-48 hours to fully materialize.

GCP: "does not match any table"

Issue: BigQuery billing export tables not found

Solution:

Verify billing export is enabled in GCP Console
Wait up to 24 hours for initial data to populate
Check dataset name matches GCP_BIGQUERY_DATASET in .env
Verify service account has bigquery.user and bigquery.dataViewer roles

Azure: 401 Unauthorized or empty data

Issue: Azure Sponsorship API returns authentication errors

Solution:

Cookies have expired - refresh them from browser
Log in to Azure Sponsorship portal
Open DevTools (F12) → Network tab
Find an API request and copy the Cookie header
Update AZURE_SPONSORSHIP_COOKIES in .env

Issue: Azure returns aggregated data for date ranges

Solution: The collector automatically makes separate API calls for each day in the range to ensure daily granularity.

Database Connection Failed

Issue: Cannot connect to PostgreSQL

Solution:

# Check PostgreSQL is running
pg_isready

# Test connection manually
psql -h localhost -U postgres -d cloud_costs

# Verify credentials in .env match your PostgreSQL setup

Rate Limiting

Issue: API rate limits exceeded

Solution:

Run providers separately: --providers aws then --providers gcp, etc.
The tool already uses parallel collection which should be within limits
Check your cloud provider's API quota limits

Cost Considerations

Running this tool incurs minimal cloud costs:

AWS Cost Explorer API: $0.01 per API request (typically 1-2 requests per run)
GCP BigQuery: Billed per query (usually < $0.01 per run with small billing data)
Azure: Sponsorship portal API is free

Estimated monthly cost: $0.50-$1.00 per month in API charges for daily runs.

Security Notes

Never commit .env file - it contains sensitive credentials
.env is already in .gitignore - verify before committing
Store credentials securely (use secret management tools in production)
Rotate Azure sponsorship cookies regularly
Use IAM roles with minimal required permissions
Enable MFA on cloud provider accounts
Consider using AWS Secrets Manager, GCP Secret Manager, or Azure Key Vault for production

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests if applicable
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - See LICENSE file for details

Support

For issues, questions, or contributions:

Open an issue on GitHub
Check existing issues for solutions
Refer to cloud provider documentation for API-specific questions

Roadmap

Future enhancements:

Acknowledgments

Built to help teams track and optimize their multi-cloud spending.

Last Updated: November 2024

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
collectors		collectors
database		database
utils		utils
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
aggregator.py		aggregator.py
config.py		config.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

License

Trinary-Projects/cloud-cost-aggregator

Folders and files

Latest commit

History

Repository files navigation

Cloud Cost Aggregator

Features

Architecture

Prerequisites

Installation

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Setup PostgreSQL database

5. Configure environment variables

Configuration

Environment Variables

AWS Setup

GCP Setup

Azure Setup

For Standard Subscriptions:

For Azure Sponsorships (Students/Startups):

Usage

Test Connections

Daily Cost Collection

Historical Backfill

Custom Date Ranges

Provider Selection

Initialize Database

Setting Up Automation

Linux/macOS (Cron)

Windows (Task Scheduler)

Database Schema

Main Table: cloud_costs

Views

Querying Cost Data

Example SQL Queries

Service Name Normalization

Azure Services

AWS & GCP

Troubleshooting

AWS: No data or incorrect costs

GCP: "does not match any table"

Azure: 401 Unauthorized or empty data

Database Connection Failed

Rate Limiting

Cost Considerations

Security Notes

Contributing

License

Support

Roadmap

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Main Table: `cloud_costs`

Packages