Skip to content

dastan08/Automated-Expense-Reporting-System

Repository files navigation

📧 Gmail Order Parser — Food & Grocery Spend Tracker

A production-quality Python tool that automatically fetches order confirmation emails from Zomato, Swiggy, Swiggy Instamart, Blinkit, and Zepto via Gmail API, parses them into structured data, and generates comprehensive Excel reports with charts and CSV exports.


✨ Features

Feature Details
Platforms Zomato, Swiggy, Swiggy Instamart, Blinkit, Zepto
Email parsing Regex + BeautifulSoup HTML parsing + MIME decoding
Data extracted Order ID, Payment ID, Amount, Date, Store, Status
Deduplication SQLite cache prevents re-parsing emails
Cancellation detection Detects cancelled/refunded orders
Excel reports 6-sheet workbook with charts, filters, currency formatting
CSV exports Per-platform, per-month, and full-dataset CSVs
Scheduling APScheduler cron-based auto-runs
CLI Rich Click-based commands
Unit tests 35 passing pytest tests

📁 Project Structure

gmail_order_parser/
├── main.py                    # CLI entry point
├── gmail_auth.py              # OAuth2 Gmail authentication
├── gmail_fetcher.py           # Email fetching + routing
├── database.py                # SQLite caching layer
├── scheduler.py               # APScheduler daemon
├── requirements.txt
├── .env.example               # Sample environment config
│
├── config/
│   ├── settings.py            # Typed config from .env
│   └── logger.py              # Colored console + file logging
│
├── parsers/
│   ├── __init__.py            # Parser registry
│   ├── base_parser.py         # Abstract base with shared utilities
│   ├── zomato_parser.py
│   ├── swiggy_parser.py
│   ├── instamart_parser.py
│   ├── blinkit_parser.py
│   └── zepto_parser.py
│
├── reports/
│   ├── excel_generator.py     # Full Excel workbook generator
│   └── csv_exporter.py        # CSV export module
│
├── tests/
│   └── test_parsers.py        # 35 unit tests
│
├── scripts/
│   └── generate_sample_report.py  # Demo without Gmail access
│
├── data/                      # SQLite database (auto-created)
├── reports/                   # Generated Excel files
├── exports/                   # Generated CSV files
└── logs/                      # Log files

🚀 Quick Start

1. Clone & Install

git clone <repo-url>
cd gmail_order_parser
pip install -r requirements.txt

2. Configure Environment

cp .env.example .env
# Edit .env as needed (see Configuration section)

3. Set Up Gmail OAuth2

Step-by-step OAuth Setup:

  1. Go to Google Cloud Console
  2. Create a new project (or select existing)
  3. Navigate to APIs & Services → Library
  4. Search for Gmail API and click Enable
  5. Go to APIs & Services → Credentials
  6. Click Create Credentials → OAuth 2.0 Client IDs
  7. Application type: Desktop app
  8. Name it anything (e.g., "Gmail Order Parser")
  9. Click CreateDownload JSON
  10. Rename the downloaded file to credentials.json
  11. Place it in the config/ directory
gmail_order_parser/
└── config/
    └── credentials.json   ← put it here

First Run Authentication:

python main.py fetch-emails

A browser window will open asking you to authorize the app. Sign in with the Gmail account containing your order emails. The token is saved to config/token.json for future runs.


💻 CLI Commands

# Fetch and parse new emails from Gmail
python main.py fetch-emails

# Fetch with custom options
python main.py fetch-emails --max 200 --from-date 2024-01-01

# Generate Excel report
python main.py generate-report

# Generate report for one platform only
python main.py generate-report --platform Zomato

# Export all orders to CSV
python main.py export-csv

# Print monthly summary to console
python main.py monthly-summary

# Filter by year
python main.py monthly-summary --year 2024

# Run everything in one command
python main.py run-all

# Start the scheduler daemon
python main.py schedule

# Show database statistics
python main.py stats

📊 Excel Report Structure

The generated .xlsx workbook contains 6 sheets:

Sheet Contents
Dashboard KPI cards, platform breakdown
Raw Orders All orders with filters, colored rows for cancellations
Monthly Summary Month-wise spend totals with grand total row
Platform Summary Per-platform orders, spend, cancelled/refunded counts
Payment Methods Payment method breakdown with % of total
Charts Bar chart (monthly spend) + Pie chart (platform split)

Formatting Features

  • ✅ Frozen header rows
  • ✅ Auto-filters on all columns
  • ✅ Currency formatting (₹#,##0.00)
  • ✅ Alternating row colors
  • ✅ Platform-colored rows
  • ✅ Cancelled orders highlighted in pink
  • ✅ Refunded orders highlighted in orange

⚙️ Configuration (.env)

# Path to OAuth credentials
GOOGLE_CREDENTIALS_PATH=config/credentials.json
GOOGLE_TOKEN_PATH=config/token.json

# Database location
DB_PATH=data/orders.db

# Output directories
REPORTS_DIR=reports
EXPORTS_DIR=exports
LOGS_DIR=logs

# Logging: DEBUG, INFO, WARNING, ERROR
LOG_LEVEL=INFO

# Cron for auto-schedule (default: 8 AM daily)
SCHEDULE_CRON=0 8 * * *

# Max emails per run (0 = unlimited)
MAX_EMAILS=500

# Only fetch emails after this date
FETCH_FROM_DATE=2024-01-01

# Platforms to scan
ENABLED_PLATFORMS=zomato,swiggy,instamart,blinkit,zepto

🧪 Running Tests

python -m pytest tests/ -v

All 35 tests cover:

  • Parser can_parse() matching for each platform
  • Order data extraction (amount, payment method, type)
  • Cancellation/refund detection
  • Base parser utilities (amount extraction, date parsing, HTML stripping)

🎲 Demo (No Gmail Required)

Generate a sample report with 80 realistic synthetic orders:

python scripts/generate_sample_report.py

Output: reports/sample_report.xlsx + 18 CSV files in exports/


➕ Adding a New Platform

  1. Create parsers/newplatform_parser.py inheriting BaseParser
  2. Implement can_parse() and parse_order()
  3. Register in parsers/__init__.py:
    from parsers.newplatform_parser import NewPlatformParser
    PARSER_REGISTRY.append(NewPlatformParser())
  4. Add sender domains to config/settings.pyPLATFORM_SENDERS
  5. Add unit tests in tests/test_parsers.py

🔒 Security Notes

  • Credentials and token files are in config/ — add to .gitignore
  • The tool uses gmail.readonly scope — it cannot modify or delete emails
  • All data is stored locally in SQLite — nothing is sent to external servers

📦 Dependencies

google-auth, google-auth-oauthlib, google-api-python-client  → Gmail API
pandas                                                         → Data processing
openpyxl                                                       → Excel generation
beautifulsoup4, lxml                                           → HTML parsing
python-dotenv                                                  → Config management
APScheduler                                                    → Scheduling
click                                                          → CLI
colorlog                                                       → Colored logging
tqdm                                                           → Progress bars

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages