# ETL (LOADING) PROCESS
In this section, we will load the data into the database. The data is already in the `staging` schema, so we will move it to the `public` schema.

In this case, we will save as `Parquet` files, which are more efficient for performance, storage and querying.
To save the data as `Parquet` files, we will use the `pandas` library. You also need to install the `pyarrow` library to save the data as `Parquet` files.

```bash
pip install pyarrow
```

In [2]:
# Import necessary libraries
import pandas as pd
import os

# Ensure output directory exists
os.makedirs("loaded", exist_ok=True)

# Load CSVs
full_data = pd.read_csv("Transformed/transformed_full.csv")
incremental_data = pd.read_csv("Transformed/transformed_incremental.csv")

# Save as Parquet
full_data.to_parquet("loaded/full_data.parquet", index=False)
incremental_data.to_parquet("loaded/incremental_data.parquet", index=False)

# Preview parquet file for full data    
preview_full = pd.read_parquet("loaded/full_data.parquet")
preview_full.head()

Unnamed: 0,order_id,customer_name,product,quantity,unit_price,order_date,region,total_spend,customer_tier
0,1,Diana,Tablet,2.0,500.0,2024-01-20 00:00:00,South,1000.0,Gold
1,2,Eve,Laptop,2.0,500.0,2024-04-29 00:00:00,North,1000.0,Gold
2,3,Charlie,Laptop,2.0,250.0,2024-01-08 00:00:00,Unknown,500.0,Silver
3,4,Eve,Laptop,2.0,750.0,2024-01-07 00:00:00,West,1500.0,Platinum
4,5,Eve,Tablet,3.0,500.0,2024-03-07 00:00:00,South,1500.0,Platinum


In [3]:
# Preview parquet file for incremental data
preview_incremental = pd.read_parquet("loaded/incremental_data.parquet")
preview_incremental.head()

Unnamed: 0,order_id,quantity,unit_price,order_date,prod_Laptop,prod_Tablet,reg_Central,reg_North,total_spend
0,101,1.5,900.0,2024-05-09 00:00:00,True,False,True,False,1350.0
1,102,1.0,300.0,2024-05-07 00:00:00,True,False,True,False,300.0
2,103,1.0,600.0,2024-05-04 00:00:00,True,False,True,False,600.0
3,104,1.5,300.0,2024-05-26 00:00:00,False,True,True,False,450.0
4,105,2.0,600.0,2024-05-21 00:00:00,False,True,False,True,1200.0
