# ETL Process with Clustering
This notebook demonstrates an ETL (Extract, Transform, Load) process followed by clustering analysis using KMeans.

## Step 1: Import Required Libraries
We start by importing the necessary libraries for data manipulation, database interaction, and clustering.

In [None]:
import pandas as pd
import sqlalchemy
from sklearn import cluster

## Step 2: Load SQL Query
Read the SQL query from an external file. This query extracts seller revenue and sales data from the database.

In [None]:
# Open the SQL file and read its content
with open("../sql/etl.sql") as open_file:
    query = open_file.read()

# Print the query to verify its content
print(query)

## Step 3: Extract Data from Database
Use the SQL query to extract data from the SQLite database into a Pandas DataFrame.

In [None]:
# Create a connection to the SQLite database
engine = sqlalchemy.create_engine("sqlite:///../data/olist.db")

# Execute the SQL query and load the result into a DataFrame
df = pd.read_sql_query(query, con=engine)

# Display the first few rows of the DataFrame
df

## Step 4: Perform Clustering
Apply KMeans clustering to group sellers based on their total revenue and number of sales.

In [None]:
# Initialize the KMeans model with 4 clusters
kmean = cluster.KMeans(n_clusters=4)

# Fit the model using 'total_revenue' and 'qt_salles' columns
kmean.fit(df[["total_revenue", "qt_salles"]])

### Add Cluster Labels to DataFrame
Assign the cluster labels generated by KMeans to a new column in the DataFrame.

In [None]:
# Add the cluster labels to the DataFrame
df["cluster"] = kmean.labels_

# Display the updated DataFrame
df

## Step 5: Save Results to Database
Save the clustered data back to the database for further analysis or reporting.

In [None]:
# Save the DataFrame to a new table in the database
df.to_sql("sellers_cluster", con=engine,
          index=False,
          if_exists="replace")

### Conclusion
The ETL process is complete, and the sellers have been clustered into groups based on their revenue and sales data.