# Data Integration & Preprocessing for Customer Satisfaction Analysis

This notebook is part of a larger project exploring customer satisfaction in Brazilian e-commerce using the [Olist dataset](https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce).  
It builds on the cleaned datasets prepared in the previous notebook by joining them into a single analytical table, performing additional data cleaning, and engineering features to support the upcoming analysis.


**Goals of this notebook:**

- Join the cleaned tables into a consolidated dataset at the order level  
- Clean and standardize the merged dataset (e.g., data types, missing values)  
- Create new features to capture relevant aspects of orders, products, payments etc.
- Export the final dataset for analysis

**This notebook is preceded and followed by:**

- [Data Cleaning Notebook](./01_data_cleaning.ipynb): loads and prepares the individual raw datasets for integration  
- [Exploratory Analysis Notebook](./03_customer-satisfaction-analysis.ipynb): investigates customer satisfaction patterns and key influencing factors
---

In [None]:
import duckdb

## **Data Integration**

### Load Cleaned CSV Files into DuckDB Tables

In [None]:
# Load Cleaned CSV Files into DuckDB Tables

# Path to the folder containing cleaned CSV files
data_path = "..data/cleaned"

# Connect to an in-memory DuckDB database
con = duckdb.connect(database=":memory:")

# Load each cleaned dataset into its own DuckDB table
con.execute(f"""
    CREATE TABLE customers AS 
    SELECT * FROM read_csv_auto('{data_path}/customers.csv');
""")

con.execute(f"""
    CREATE TABLE orders AS 
    SELECT * FROM read_csv_auto('{data_path}/orders.csv');
""")

con.execute(f"""
    CREATE TABLE items AS 
    SELECT * FROM read_csv_auto('{data_path}/items.csv');
""")

con.execute(f"""
    CREATE TABLE payments AS 
    SELECT * FROM read_csv_auto('{data_path}/payments.csv');
""")

con.execute(f"""
    CREATE TABLE reviews AS 
    SELECT * FROM read_csv_auto('{data_path}/reviews.csv');
""")

con.execute(f"""
    CREATE TABLE products AS 
    SELECT * FROM read_csv_auto('{data_path}/products.csv');
""")

con.execute(f"""
    CREATE TABLE sellers AS 
    SELECT * FROM read_csv_auto('{data_path}/sellers.csv');
""")