# **Edagotti Naresh**

**mail:naresh21375019@gmail.com**

# **TASK--1**

# **Clustering/Pattern Mining**

**Task:** **Use any unsupervised technique to extract patterns or segregate data into groups.**

**User Story:** User should be able to provide a data point (a row) and program should be able to identify
to which group given data point belongs to and why?

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder

In [None]:
data=pd.read_excel('/content/Copy of Online Retail.xlsx')

In [None]:
data.head()

In [None]:
data.isnull().sum()

In [None]:
# Select the columns for clustering
columns_for_clustering = ['Quantity', 'UnitPrice']
data_for_clustering = data[columns_for_clustering]

# Initialize a list to store the inertia values
inertia = []

# Perform the elbow method for different values of K
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(data_for_clustering)
    inertia.append(kmeans.inertia_)

# Plot the inertia values
plt.plot(range(1, 11), inertia, marker='o')
plt.xlabel('Number of clusters (K)')
plt.ylabel('Inertia')
plt.title('Elbow Method')
plt.show()

In [None]:
inertia

# **How we selected K-value**
**The inertia values suggest that using 3 or 4 centroids (clusters) is a reasonable choice because there is a significant reduction in inertia compared to using just one centroid.**
**Beyond 4 clusters, the reduction in inertia becomes less pronounced, indicating diminishing returns in cluster quality improvement.**

In [None]:
# Select the columns for clustering
columns_for_clustering = ['Quantity', 'UnitPrice']
data_for_clustering = data[columns_for_clustering]

# Perform K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=42)  # Replace 3 with your desired number of clusters (K)
data['ClusterLabel'] = kmeans.fit_predict(data_for_clustering)

# Visualize the clustering results (scatter plot)
plt.figure(figsize=(8, 6))
plt.scatter(data['Quantity'], data['UnitPrice'], c=data['ClusterLabel'], cmap='viridis')
plt.xlabel('Quantity')
plt.ylabel('UnitPrice')
plt.title('Clustering Results')
plt.show()

In [None]:
data['ClusterLabel'].value_counts()

In [None]:
cluster_3_data = data[data['ClusterLabel'] == 2]
cluster_3_data

In [None]:
new_data_point = [[80000,1.5]]  # Replace with the values of your new data point
cluster_label = kmeans.predict(new_data_point)

In [None]:
cluster_label

# **Conclusion**
**The given data point [80000, 1.5] has been assigned to cluster 2 by the K-Means clustering algorithm. This means that, based on the features (in this case, 'Quantity' and 'UnitPrice'), the algorithm has identified similarities between this data point and other data points in cluster 2. These similarities could be related to both the quantity and unit price of the items in the dataset.**

# **TASK--2**

# **Prediction**
# **Task:**
**User Story:** User should be able to provide a Customer ID and Date, and program should be able to
predict quantity.


In [None]:
mean_customer_id = data['CustomerID'].mean()
data['CustomerID'].fillna(mean_customer_id, inplace=True)

In [None]:
data.isnull().sum()

In [None]:
# Ensure 'InvoiceDate' is in datetime format
data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])

# Feature engineering (extract relevant features)
data['DayOfWeek'] = data['InvoiceDate'].dt.dayofweek
data['Month'] = data['InvoiceDate'].dt.month
data['Year'] = data['InvoiceDate'].dt.year
data['Hour'] = data['InvoiceDate'].dt.hour  # Add hour component
data['Minute'] = data['InvoiceDate'].dt.minute

In [None]:
from sklearn.model_selection import train_test_split
# Split the dataset into training and testing sets
X = data[['CustomerID', 'DayOfWeek', 'Month', 'Year', 'Hour', 'Minute']]  # Features
y = data['Quantity']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
from sklearn.linear_model import LinearRegression
model=LinearRegression()
model.fit(X_train,y_train)

In [None]:
# User input: CustomerID and Date for prediction
user_customer_id = 12680
user_date = pd.to_datetime('2011-12-09 08:55')

# Prepare input features for prediction
user_input = pd.DataFrame({'CustomerID': [user_customer_id],
                            'DayOfWeek': [user_date.dayofweek],
                            'Month': [user_date.month],
                            'Year': [user_date.year],
                           'Hour': [user_date.hour],
                           'Minute': [user_date.minute]})

In [None]:
# Predict the quantity
predicted_quantity = model.predict(user_input)[0]

print(f"Predicted Quantity for Customer ID {user_customer_id} on {user_date}: {predicted_quantity}")

# **Conclusion:**
**Based on the input provided (Customer ID 12680 on December 9, 2011, at 08:55 AM), the machine learning model predicts a quantity of approximately 16.09 quantity. This prediction is made using various features, including the day of the week, month, year, hour, and minute, which are derived from the provided date and time.**

# **TASK--3**

# **Recommendation**
**Task:** **Recommend item to the given customer id for a given date.**

**User Story:** User should be able to provide a Customer ID and Date, and program should be able to
recommend item to be purchased.

In [None]:
data=pd.read_excel('/content/Copy of Online Retail.xlsx')

In [None]:
from surprise import Dataset, Reader, KNNBasic
from datetime import datetime

# Ensure 'InvoiceDate' is in datetime format
data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])

# Filter data for the specified date
user_date = pd.to_datetime('2011-12-09') # Replace with the user's desired date
data_for_date = data[data['InvoiceDate'].dt.date == user_date.date()]

# Create a Surprise Reader object
reader = Reader(rating_scale=(0, 1))  # Since we're interested in purchase or no purchase (binary)

# Load the dataset for Surprise
data_surprise = Dataset.load_from_df(data_for_date[['CustomerID', 'StockCode', 'Quantity']], reader)

# Build an item-based collaborative filtering model (K-nearest neighbors)
sim_options = {
    'name': 'cosine',
    'user_based': False  # Item-based
}
model = KNNBasic(sim_options=sim_options)

# Train the model
trainset = data_surprise.build_full_trainset()
model.fit(trainset)

# User input: CustomerID
user_customer_id = 13113

# Get the list of items that the user has not purchased yet
items_purchased_by_user = data_for_date[data_for_date['CustomerID'] == user_customer_id]['StockCode'].tolist()
items_not_purchased = data_for_date[~data_for_date['StockCode'].isin(items_purchased_by_user)]['StockCode'].unique()

# Generate recommendations for the user
recommendations = []
for item in items_not_purchased:
    predicted_rating = model.predict(user_customer_id, item).est
    recommendations.append({'StockCode': item, 'PredictedRating': predicted_rating})

# Sort recommendations by predicted rating in descending order
recommendations.sort(key=lambda x: x['PredictedRating'], reverse=True)

# Display recommended items
print("Recommended Items:")
for i, rec in enumerate(recommendations[:10]):  # Display the top 10 recommendations
    print(f"{i + 1}. StockCode: {rec['StockCode']} (Predicted Rating: {rec['PredictedRating']:.2f})")


# **Conclusion:**

**For Customer ID 13113 and the given date, the program recommends 10 items for purchase, each with a predicted rating of 1.00. These recommendations are highly confident and are based on the customer's historical data and preferences. The items are expected to align with the customer's interests, enhancing their trading experience.**