# Purpose:

Save analysis back to SQL or export to CSV using to_sql() or to_csv().

#  Primary Goals:

| # | Purpose                    | Description                                                                    |
| - | -------------------------- | ------------------------------------------------------------------------------ |
| 1 | **Save to SQL**            | Push cleaned or aggregated data back to a new or existing table in SQL Server. |
| 2 | **Save to CSV**            | Export DataFrame as a `.csv` file for use in Excel, BI tools, or archiving.    |
| 3 | **Optional backup/export** | Let you store analysis output for reproducibility or version tracking.         |


#  Relational Mapping Plan (Join Strategy)

| # | Table         | Join Key(s)                | Join Type | Join Target Table | Purpose                          |
| - | ------------- | -------------------------- | --------- | ----------------- | -------------------------------- |
| 1 | `orders`      | `customer_id`              | INNER     | `customers`       | Get customer info for each order |
| 2 | `order_items` | `order_id`                 | INNER     | `orders`          | Get item info per order          |
| 3 | `products`    | `product_id`               | INNER     | `order_items`     | Get product category & price     |
| 4 | `payments`    | `order_id`                 | LEFT      | `orders`          | Add payment type and value       |
| 5 | `sellers`     | `seller_id`                | LEFT      | `order_items`     | Get seller info per item         |
| 6 | `geolocation` | `customer_zip_code_prefix` | LEFT      | `customers`       | Add region info via zip code     |



#  Join Path Summary


customers → orders → order_items → products  
                          ↓  
                      payments  
                          ↓  
                       sellers  
                          ↓  
                     geolocation  


Each arrow → or ↓ indicates a relationship via a foreign key, typically through an id field. This structure allows us to build a comprehensive dataset that connects:

Who the customer is

What they ordered

Which items were included

What category they belong to

Who sold them

How they paid

Where they are located

# Master join is now set up step-by-step. This code connects:

customers → orders via customer_id

orders → payments via order_id

orders → order_items via order_id

order_items → products via product_id

order_items → sellers via seller_id

customers → geolocation via customer_zip_code_prefix

# Step-by-step full join from raw tables in SQL Server using Python

In [7]:
pip install pandas sqlalchemy pyodbc matplotlib seaborn

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [9]:
import pandas as pd
from sqlalchemy import create_engine, event

In [3]:
# Step 1: Connect to SQL Server
db_engine = create_engine(
    "mssql+pyodbc://NARENDRA\\SQLEXPRESS/sql_to_python?driver=ODBC+Driver+17+for+SQL+Server&trusted_connection=yes"
)

In [4]:
# Step 2: Build full join query
query = """
SELECT
    c.customer_id,
    c.customer_city,
    c.customer_state,
    o.order_id,
    o.order_status,
    o.order_purchase_timestamp,
    pmt.payment_type,
    pmt.payment_value,
    oi.product_id,
    oi.price AS item_price,
    pr.[product category],
    s.seller_id,
    s.seller_city,
    g.geolocation_lat,
    g.geolocation_lng
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN payments pmt ON o.order_id = pmt.order_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products pr ON oi.product_id = pr.product_id
JOIN sellers s ON oi.seller_id = s.seller_id
LEFT JOIN geolocation g ON c.customer_zip_code_prefix = g.geolocation_zip_code_prefix
"""

In [5]:
# Step 3: Load result into a DataFrame
df = pd.read_sql(query, db_engine)

In [6]:
# Step 4: Preview and optionally save or push back to SQL
df.head()

Unnamed: 0,customer_id,customer_city,customer_state,order_id,order_status,order_purchase_timestamp,payment_type,payment_value,product_id,item_price,product category,seller_id,seller_city,geolocation_lat,geolocation_lng
0,3ce436f183e68e07877b285a838db11a,campos dos goytacazes,RJ,00010242fe8c5a6d1ba2dd792cb16214,delivered,2017-09-13 08:59:02,credit_card,72.19,4244733e06e7ecb4970a6e2683c13e61,58.9,Cool Stuff,48436dade18ac8b2bce089ec2a041202,volta redonda,-21.758076,-41.312633
1,3ce436f183e68e07877b285a838db11a,campos dos goytacazes,RJ,00010242fe8c5a6d1ba2dd792cb16214,delivered,2017-09-13 08:59:02,credit_card,72.19,4244733e06e7ecb4970a6e2683c13e61,58.9,Cool Stuff,48436dade18ac8b2bce089ec2a041202,volta redonda,-21.758843,-41.306754
2,3ce436f183e68e07877b285a838db11a,campos dos goytacazes,RJ,00010242fe8c5a6d1ba2dd792cb16214,delivered,2017-09-13 08:59:02,credit_card,72.19,4244733e06e7ecb4970a6e2683c13e61,58.9,Cool Stuff,48436dade18ac8b2bce089ec2a041202,volta redonda,-21.767046,-41.311328
3,3ce436f183e68e07877b285a838db11a,campos dos goytacazes,RJ,00010242fe8c5a6d1ba2dd792cb16214,delivered,2017-09-13 08:59:02,credit_card,72.19,4244733e06e7ecb4970a6e2683c13e61,58.9,Cool Stuff,48436dade18ac8b2bce089ec2a041202,volta redonda,-21.771661,-41.312119
4,3ce436f183e68e07877b285a838db11a,campos dos goytacazes,RJ,00010242fe8c5a6d1ba2dd792cb16214,delivered,2017-09-13 08:59:02,credit_card,72.19,4244733e06e7ecb4970a6e2683c13e61,58.9,Cool Stuff,48436dade18ac8b2bce089ec2a041202,volta redonda,-21.763006,-41.306182


In [None]:
# Save to new table if desired
df.to_sql("combined_data", db_engine, if_exists="replace", index=False)

In [10]:
from sqlalchemy import create_engine, event

# Step 1: Define connection string with fast_executemany
conn_str = "mssql+pyodbc://NARENDRA\\SQLEXPRESS/sql_to_python?driver=ODBC+Driver+17+for+SQL+Server&trusted_connection=yes"
engine = create_engine(conn_str, fast_executemany=True)

# Step 2: Enable fast_executemany using event hook
@event.listens_for(engine, "before_cursor_execute")
def receive_before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
    if executemany:
        cursor.fast_executemany = True

# ✅ Step 3: Now push your existing DataFrame to SQL (chunked + fast)
df.to_sql("combined_data_fast", engine, if_exists="replace", index=False, chunksize=10000)

print("✅ Data pushed to SQL successfully using fast mode.")

✅ Data pushed to SQL successfully using fast mode.
