# Customer Recommendation System

Recommender systems aid users in efficiently locating the most pertinent items on e-commerce platforms like Amazon or content on streaming services like YouTube and Netflix. By enhancing customer satisfaction and engagement, effective recommender systems can substantially contribute to the success of these platforms. (Evelyn, 2022)

Recommender systems can be either personalized or non-personalized. While non-personalized systems are simpler, personalized systems typically offer superior performance by tailoring recommendations to individual user preferences. Collaborative filtering, a common technique for personalized recommendation, analyzes user interaction data to predict ratings for items. This method is considered a regression task, as it involves estimating numerical ratings.  

There are two primary types of collaborative filtering:  
- *user-based*
- *item-based*
 
User-based collaborative filtering assumes that users with similar rating patterns will have similar preferences for other items. It primarily focuses on identifying similarities between users. However, in some cases, user preferences may be too complex to directly compare. In such instances, item-based collaborative filtering becomes more suitable. This method compares the similarity between items rather than users. (Evelyn, 2022)


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity
from datetime import datetime

### Cleaning and Preprocessing data

In [3]:
df = pd.read_excel('online_retail_II.xlsx')
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom


### Data Dictionary

**InvoiceNo**: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.  

**StockCode**: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product.  

**Description**: Product (item) name. Nominal.  

**Quantity**: The quantities of each product (item) per transaction. Numeric.   

**InvoiceDate**: Invice date and time. Numeric. The day and time when a transaction was generated.  

**UnitPrice**: Unit price. Numeric. Product price per unit in sterling (Â£).  

**CustomerID**: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer.  

**Country**: Country name. Nominal. The name of the country where a customer resides.  

In [4]:
df.shape

(525461, 8)

In [5]:
# Checking if the dataframe contains any duplicate rows
duplicate_rows = df[df.duplicated()]
duplicate_rows.shape

(6865, 8)

In [6]:
# Checking for null values
df.isna().sum()

Invoice             0
StockCode           0
Description      2928
Quantity            0
InvoiceDate         0
Price               0
Customer ID    107927
Country             0
dtype: int64

In [7]:
# Dropping duplicates
# Since there is no way to impute Cutomer ID and Description, dropping the NaN rows
df.drop_duplicates(inplace = True)
df.dropna(inplace = True)
df.shape

(410763, 8)

In [8]:
df['Customer ID'] = df['Customer ID'].astype('object')
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 410763 entries, 0 to 525460
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   Invoice      410763 non-null  object        
 1   StockCode    410763 non-null  object        
 2   Description  410763 non-null  object        
 3   Quantity     410763 non-null  int64         
 4   InvoiceDate  410763 non-null  datetime64[ns]
 5   Price        410763 non-null  float64       
 6   Customer ID  410763 non-null  object        
 7   Country      410763 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(5)
memory usage: 28.2+ MB


One approach for building customer purchase recommendation systems utilizes K-Nearest Neighbors (KNN) as described in a Kaggle notebook by Prajapati (2023).
The code can be broken down into 4 sections:  
- Creating interaction matrix for Customer ID and Product Description
- KNN based similarity computation for product recommendation
- User input through loop for interactivity
- Error handling and edge case consideration

Since a customer only purchases (interacts with) a small subset of items, therefore our interaction matrix will be sparse. Hence, using KNN will be more useful as it is better suited for handling sparse data (with appropriate distance metric, like cosine).

In [9]:
# Creating interaction matrix
im = df.pivot_table(index='Customer ID', columns='Description', values='Quantity', fill_value=0)
knn = NearestNeighbors(metric='cosine', algorithm='brute')
knn.fit(im.T)

def recommend_products(customer_id, top_n = 5):
    # Checking if Customer ID exists
    if customer_id not in im.index:
        return f"Customer ID {customer_id} not found in the dataset."
    
    # Get products the customer has interacted with
    cust_data = im.loc[customer_id]
    pur_prods = cust_data[cust_data > 0].index.tolist()
    
    if not pur_prods:
        return f"No purchase data available for Customer ID {customer_id}."
    
    print(f"\nProducts purchased by Customer ID {customer_id}:")
    print(pur_prods)
    
    # Finding similar products (for all purchased products)
    sim_prods = {}
    for prod in pur_prods:
        if prod not in im.columns:  # Skip if product is not in the DataFrame
            print(f"Product '{prod}' not found in data. Skipping.")
            continue
        
        index = im.columns.get_loc(prod)
        
        try:
            distances, indices = knn.kneighbors(im.T.iloc[index].values.reshape(1, -1), n_neighbors=top_n + 1)
        except ValueError as e:
            print(f"Error finding neighbors for product '{prod}': {e}")
            continue
        
        for i in range(1, len(indices.flatten())):  # Starting from 1 to skip the product itself
            sim_prod = im.columns[indices.flatten()[i]]
            sim_prods[sim_prod] = distances.flatten()[i]
    
    if not sim_prods:
        return f"No similar products found for Customer ID {customer_id}."
    
    # Sorting by similarity scores
    recommended_products = sorted(sim_prods.items(), key=lambda x: x[1])
    
    print(f"\nTop {top_n} recommended products for Customer ID {customer_id}:")
    return [product for product, _ in recommended_products[:top_n]]

# Implementing a loop so that without running the code multiple times, users can use the algorithm multiple times
while True:
    try:
        customer_id_input = int(input("\nEnter Customer ID (eg. 13085): "))
        top_n_input = int(input("Enter the number of recommended products to display: "))
        
        recommended = recommend_products(customer_id_input, top_n=top_n_input)
        
        if isinstance(recommended, str):
            print(recommended)
        else:
            print("Recommended Products:")
            print(recommended)
    except ValueError:
        print("Invalid input. Please enter valid Customer ID and number.")

    # Ask the user if they want to continue
    another_query = input("\nDo you want to see recommendations for another customer? (yes/no): ").strip().lower()
    if another_query not in ['yes', 'y']:
        print("Exiting the recommendation system. Goodbye!")
        break



Enter Customer ID (eg. 13085):  13085
Enter the number of recommended products to display:  3



Products purchased by Customer ID 13085:
['  DOORMAT UNION JACK GUNS AND ROSES', ' WHITE CHERRY LIGHTS', '15CM CHRISTMAS GLASS BALL 20 LIGHTS', '72 SWEETHEART FAIRY CAKE CASES', 'BLACK RECORD COVER FRAME', 'CAT BOWL ', 'DOG BOWL , CHASING BALL DESIGN', 'DOOR MAT UNION FLAG', 'DOORMAT NEIGHBOURHOOD WITCH ', 'FANCY FONT HOME SWEET HOME DOORMAT', 'FRYING PAN BLUE POLKADOT ', 'FRYING PAN PINK POLKADOT ', 'HEART MEASURING SPOONS LARGE', 'HOOK, 1 HANGER ,MAGIC GARDEN', 'HOOK, 3 HANGER ,MAGIC GARDEN', 'LOVE HEART SOCK HANGER', 'LUNCHBOX WITH CUTLERY FAIRY CAKES ', 'MILK PAN BLUE RETROSPOT', 'MILK PAN PINK RETROSPOT', 'PINK  HEART CONFETTI IN TUBE', 'PINK  HEART SHAPE LOVE BUCKET ', 'PINK CHERRY LIGHTS', 'PINK DOUGHNUT TRINKET POT ', 'RECORD FRAME 7" SINGLE SIZE ', 'RED DAISY PAPER LAMPSHADE', 'RED HEART SHAPE LOVE BUCKET ', 'ROUND SNACK BOXES SET OF 4 FRUITS ', 'ROUND SNACK BOXES SET OF4 WOODLAND ', 'SAVE THE PLANET MUG', 'SET 10 LIGHTS NIGHT OWL', 'STRAWBERRY CERAMIC TRINKET BOX', 'UNION JA


Do you want to see recommendations for another customer? (yes/no):  no


Exiting the recommendation system. Goodbye!


After testing the system, it can be confirmed that the recommendation system is working very well.

### Recommendations  
- For larger datasets, KNN requires pairwise distance computation, which can be computationally expensive. SVD (Singular Value Decomposition) models can work well with large dataset, as they handle sparse data more effectively.
- KNN struggles with new customers or products with no interaction. So, different methodologies can be used to recommend new customers, like implementation of other models like deep learning, or other filtering technique in conjunction with collaborative filtering.

## Dataset
- Chen, D. (2014). Online Retail II [UCI Machine Learning Repository]. https://archive.ics.uci.edu/ml/datasets/online+retail+II

## References  
- Evelyn [eve.9512] (2022). Collaborative filtering in recommender system: An overview. Medium. https://medium.com/@toprak.mhmt/collaborative-filtering-3ceb89080ade
- Prajapati (2023). Customer Purchase Recommendation System KNN. Kaggle. https://www.kaggle.com/code/utisop/customer-purchase-recommendation-system-knn