A complete data pipeline project using Python, SQLite, and SQL for generating synthetic e-commerce data, managing it in a database, and performing analytical queries.
This project demonstrates:
- Data Generation: Creating realistic synthetic e-commerce data using Python Faker
- Database Management: Setting up and managing SQLite database
- SQL Queries: Writing complex queries with JOINs and aggregations
ecommerce-data-pipeline/
├── generate_ecommerce_data.py # Generate synthetic data
├── create_database.py # Create database and import data
├── query_database.py # Execute SQL queries
├── requirements.txt # Python dependencies
└── README.md # This file
pip install -r requirements.txtRun the data generation script to create CSV files with synthetic e-commerce data:
python generate_ecommerce_data.pyThis will generate:
customers.csv(100 records)categories.csv(10 records)products.csv(50 records)orders.csv(200 records)order_items.csv(500 records)
Create the SQLite database and import data from CSV files:
python create_database.pyThis creates ecommerce.db with 5 tables and imports all data.
Run SQL queries to analyze the data:
python query_database.pyThis generates 5 output CSV files with analytical results.
- customer_id (Primary Key)
- customer_name
- email (Unique)
- phone
- address
- city
- country
- registration_date
- category_id (Primary Key)
- category_name
- description
- product_id (Primary Key)
- product_name
- category_id (Foreign Key)
- price
- stock_quantity
- description
- order_id (Primary Key)
- customer_id (Foreign Key)
- order_date
- status (Pending, Shipped, Delivered, Cancelled)
- order_item_id (Primary Key)
- order_id (Foreign Key)
- product_id (Foreign Key)
- quantity
- unit_price
- total_price
- Customer Orders Summary - Total orders per customer
- Detailed Order Information - Complete order details with products
- Sales by Category - Performance metrics per category
- Top Customers - Best customers by spending
- Product Performance - Sales metrics per product
After running the queries, you'll get:
output_customer_orders.csvoutput_order_details.csvoutput_sales_by_category.csvoutput_top_customers.csvoutput_product_performance.csv
- Python 3.7+
- faker
- pandas
Created for the Cursor IDE A-SDLC Exercise
MIT License