# SQL + Python Pipeline for Olist E-Commerce Data

## Connecting Python to PostgreSQL

This file demonstrates how to **connect a PostgreSQL database to Python** using SQLAlchemy and `psycopg2`.

- Establishes a **database engine** and connection.
- Executes SQL queries directly from Python without using GUI tools.
- Retrieves query results as **Pandas DataFrames** for further analysis.
- Demonstrates integration of **SQL and Python in a single workflow**, highlighting reproducibility and automation skills.


In [2]:
# Imports
import pandas as pd
import psycopg2
from sqlalchemy import create_engine

# Connection to PostgreSQL
db_user = "postgres"
db_pass = "123456"
db_host = "localhost"
db_port = "5432"
db_name = "Olist_Ecommerce_ver2"  # the name of your database

engine = create_engine(f"postgresql+psycopg2://{db_user}:{db_pass}@{db_host}:{db_port}/{db_name}")

#  Test connection
df_test = pd.read_sql("SELECT * FROM orders LIMIT 5;", engine)
df_test


Unnamed: 0,order_id,customer_id,order_status,order_purchase_timestamp,order_approved_at,order_delivered_carrier_date,order_delivered_customer_date,order_estimated_delivery_date
0,e481f51cbdc54678b7cc49136f2d6af7,9ef432eb6251297304e76186b10a928d,delivered,2017-10-02 10:56:33,2017-10-02 11:07:15,2017-10-04 19:55:00,2017-10-10 21:25:13,2017-10-18
1,53cdb2fc8bc7dce0b6741e2150273451,b0830fb4747a6c6d20dea0b8c802d7ef,delivered,2018-07-24 20:41:37,2018-07-26 03:24:27,2018-07-26 14:31:00,2018-08-07 15:27:45,2018-08-13
2,47770eb9100c2d0c44946d9cf07ec65d,41ce2a54c0b03bf3443c3d931a367089,delivered,2018-08-08 08:38:49,2018-08-08 08:55:23,2018-08-08 13:50:00,2018-08-17 18:06:29,2018-09-04
3,949d5b44dbf5de918fe9c16f97b45f8a,f88197465ea7920adcdbec7375364d82,delivered,2017-11-18 19:28:06,2017-11-18 19:45:59,2017-11-22 13:39:59,2017-12-02 00:28:42,2017-12-15
4,ad21c59c0840e6cb83a9ceb5573f8159,8ab97904e6daea8866dbdbc4fb7aad2c,delivered,2018-02-13 21:18:39,2018-02-13 22:20:29,2018-02-14 19:46:34,2018-02-16 18:17:02,2018-02-26


**Top Customers Query Demo**

In [8]:
query_top_customers = """
SELECT o.customer_id, SUM(op.payment_value) AS total_spent
FROM orders o
JOIN order_payments op ON o.order_id = op.order_id
GROUP BY o.customer_id
ORDER BY total_spent DESC
LIMIT 10;
"""

df_top_customers = pd.read_sql(query_top_customers, engine)
df_top_customers


Unnamed: 0,customer_id,total_spent
0,1617b1357756262bfa56ab541c47bc16,13664.08
1,ec5b2ba62e574342386871631fafd3fc,7274.88
2,c6e2731c5b391845f6800c97401a43a9,6929.31
3,f48d464a0baaea338cb25f816991ab1f,6922.21
4,3fd6777bbce08a352fddd04e4a7cc8f6,6726.66
5,05455dfa7cd02f13d132aa7a6a9729c6,6081.54
6,df55c14d1476a9a3467f131269c2477f,4950.34
7,e0a2412720e9ea4f26c1ac985f6a7358,4809.44
8,24bbf5fd2f2e1b359ee7de94defc4a15,4764.34
9,3d979689f636322c62418b6346b1c6d2,4681.78


## Conclusion

This notebook demonstrates a **Python–PostgreSQL connection** and shows how to run SQL queries directly in Python. The example retrieves the **top 10 customers by total spending**, highlighting integration of SQL results with Pandas for further analysis.
