# Assignment 7.2: Clustering Sales Data Using K-Means
# Ziyad Salem
## Objective
The objective of this assignment is to apply K-Means clustering to a weekly sales transaction dataset in order to identify groups of products with similar sales behavior. Since no product metadata is available, clustering is performed solely based on numerical sales patterns.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler


In [None]:
df = pd.read_csv("Sales_Transactions_Dataset_Weekly.csv")
df.head()


## Initial Data Inspection
The dataset consists of weekly sales figures for 800 products across one year. No categorical or descriptive product information is available, making this a suitable candidate for unsupervised clustering.



In [None]:
df.shape
df.info()
df.describe()


In [None]:
sales_data = df.drop(columns=['Product_Code'], errors='ignore')


Product identifiers were removed since clustering should be based only on sales behavior.


In [None]:
scaler = StandardScaler()
sales_scaled = scaler.fit_transform(sales_data)


Standardization was applied to ensure that weekly sales values contribute equally to distance calculations in K-Means.


In [None]:
inertia = []

for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(sales_scaled)
    inertia.append(kmeans.inertia_)


In [None]:
plt.figure()
plt.plot(range(1, 11), inertia, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal k')
plt.show()


In [None]:
kmeans = KMeans(n_clusters=4, random_state=42)
clusters = kmeans.fit_predict(sales_scaled)


In [None]:
df['Cluster'] = clusters
df.head()


In [None]:
cluster_summary = df.groupby('Cluster').mean()
cluster_summary


Each cluster represents a group of products with similar weekly sales patterns. Differences in average sales levels indicate distinct demand behaviors.


## Conclusion
K-Means clustering successfully grouped products based on weekly sales behavior without the need for labeled data. The identified clusters can help businesses understand demand patterns, optimize inventory decisions, and design targeted sales strategies.
