# **📌 Data Loading: Inserting Cleaned Data into AWS MySQL RDS**
This notebook takes the **cleaned dataset** and inserts it into a **MySQL database** hosted on AWS RDS.  

## **🔹 Steps Overview**
1. **Establish a database connection** using `pymysql`.  
2. **Insert data** into the following tables:
   - `Item` → Stores item metadata (ID, description, type, unit, location).  
   - `Inventory` → Stores inventory details (stock, cost, purchase date, vendor).  
   - `Sales` → Stores sales transactions (item, price, customer, quantity sold).  


### **🔹 Database Connection**
The script connects securely to the **AWS RDS instance** using credentials stored in a separate config file (`config.py`). If the connection fails, an error message is displayed.  

### **🔹 Data Insertion Strategy**
- **`ON DUPLICATE KEY UPDATE`** ensures existing records in `Item` are updated rather than duplicated.  
- **Looping over DataFrame** rows to insert records into `Item`, `Inventory`, and `Sales`.  


In [41]:
import pandas as pd
import pymysql

# Restore cleaned dataframe
%store -r df_cleaned  

# Ensure df_cleaned is loaded
if 'df_cleaned' not in locals():
    raise ValueError("DataFrame 'df_cleaned' not found. Ensure data_cleaning notebook was executed.")

# Import RDS credentials
from config import RDS_HOST, RDS_USER, RDS_PASSWORD, RDS_DATABASE

# ✅ Establish Database Connection
try:
    mydb = pymysql.connect(
        host=RDS_HOST,
        user=RDS_USER,
        password=RDS_PASSWORD,
        database=RDS_DATABASE,
        cursorclass=pymysql.cursors.DictCursor,
        autocommit=False  # Enable transactions
    )
    mycursor = mydb.cursor()  # Create cursor
    print("✅ Database connection established!")
except pymysql.MySQLError as e:
    raise RuntimeError(f"❌ Database connection failed: {e}")

# ✅ Insert Data into Item Table
for index, row in df_cleaned.iterrows():
    item_sql = """INSERT INTO Item (item_id, description, item_type, unit, location)
                   VALUES (%s, %s, %s, %s, %s)
                   ON DUPLICATE KEY UPDATE description=%s, item_type=%s, unit=%s, location=%s"""
    
    item_values = (
        row['item_id'], row['description'], row['item_type'],
        row['unit'], row['location'], row['description'], 
        row['item_type'], row['unit'], row['location']
    )

    try:
        mycursor.execute(item_sql, item_values)
    except pymysql.MySQLError as e:
        print(f"❌ Error inserting/updating Item: {e}. Values: {item_values}")
        mydb.rollback()
        raise

# ✅ Insert Data into Inventory Table
for index, row in df_cleaned.iterrows():
    inventory_sql = """INSERT INTO Inventory (item_id, quantity_on_hand, cost, purchase_date, vendor)
                       VALUES (%s, %s, %s, %s, %s)"""
    
    inventory_values = (row['item_id'], row['quantity_on_hand'], row['cost'], row['purchase_date'], row['vendor'])

    try:
        mycursor.execute(inventory_sql, inventory_values)
    except pymysql.MySQLError as e:
        print(f"❌ Error inserting into Inventory: {e}. Values: {inventory_values}")
        mydb.rollback()
        raise

# ✅ Insert Data into Sales Table
for index, row in df_cleaned.iterrows():
    sales_sql = """INSERT INTO Sales (item_id, price, date_sold, cust, quantity_sold)
                   VALUES (%s, %s, %s, %s, %s)"""
    
    sales_values = (row['item_id'], row['price'], row['date_sold'], row['cust'], row['quantity_sold'])

    try:
        mycursor.execute(sales_sql, sales_values)
    except pymysql.MySQLError as e:
        print(f"❌ Error inserting into Sales: {e}. Values: {sales_values}")
        mydb.rollback()
        raise

# ✅ Commit the changes to save data
mydb.commit()
print("🎉 Data loaded successfully!")

# ✅ Close connection
if mydb.open:
    mycursor.close()
    mydb.close()
    print("🔌 Database connection closed.")


✅ Database connection established!
🎉 Data loaded successfully!
🔌 Database connection closed.


## **🏁 Finalizing Data Loading Process**

The data has been successfully loaded into the MySQL RDS database, marking the completion of the **ETL (Extract, Transform, Load) process**. This step ensures that the dataset is structured, optimized, and ready for further analysis.  

