Welcome to this notebook, where we delve into three distinct recommendation techniques tailored for an e-commerce website. Our objective is to master the art of crafting a robust recommendation system capable of suggesting products based on user behaviors, including both purchase history and browsing patterns.

The trio of techniques under scrutiny are as follows:

Popularity Based Filtering:

A straightforward yet effective recommendation method devoid of user-specific data or intricate algorithms. It directs users towards items with high popularity or frequent usage.
## Content-Based Filtering:

This technique harnesses product attributes such as category, brand, and price to suggest items akin to those a user has previously purchased or viewed. A deeper dive into personalized recommendations.
#### Collaborative Filtering:

In contrast, collaborative filtering leans on user behavior data to unveil patterns and offer recommendations grounded in similar user preferences. A method that taps into the collective wisdom of users.
As we progress through this notebook, I'll furnish an overview of each technique, elucidate their inner workings, and guide you through their implementation using Python, alongside essential libraries like Pandas, NumPy, and Scikit-learn.

Let's embark on this journey of recommendation systems, combining theory with hands-on Python implementation!

Additional - I will do keyword based searching

## Import basic library and data

In [2]:
import numpy as np
import pandas as pd

In [19]:
#import the data
df = pd.read_csv('/kaggle/input/bigbasket-entire-product-list-28k-datapoints/BigBasket Products.csv')
df2 = pd.read_csv('/kaggle/input/amazon-ratings/ratings_Beauty.csv')

In [20]:
df2 = df2.head(30000)

In [21]:
df2.ProductId.value_counts()

B00006IV2F    704
B0000632EN    686
B00005O0MZ    585
B0000530ED    584
B00004TUBL    558
             ... 
B00005B701      1
9790792115      1
9790791968      1
B00005BHRE      1
0205616461      1
Name: ProductId, Length: 1858, dtype: int64

In [22]:
df.drop('index', axis=1, inplace=True)
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27555 entries, 0 to 27554
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   product       27554 non-null  object 
 1   category      27555 non-null  object 
 2   sub_category  27555 non-null  object 
 3   brand         27554 non-null  object 
 4   sale_price    27555 non-null  float64
 5   market_price  27555 non-null  float64
 6   type          27555 non-null  object 
 7   rating        18929 non-null  float64
 8   description   27440 non-null  object 
dtypes: float64(3), object(6)
memory usage: 1.9+ MB


Sample of input data

In [9]:
df.head()

Unnamed: 0,product,category,sub_category,brand,sale_price,market_price,type,rating,description
0,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...
1,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ..."
2,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m..."
3,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...
4,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...


index column is unnecessary, so we can remove it

In [25]:
df2.ProductId.unique()

array(['0205616461', '0558925278', '0733001998', ..., 'B00007LB75',
       'B00007LVDA', 'B00007M0CP'], dtype=object)

In [29]:
unique_product_ids = df2['ProductId'].unique()

for index, product_id in enumerate(unique_product_ids):
    df.at[index, 'ProductID'] = product_id

Drop the duplicate data out, as I preview it, there are a lot of dupplicate product that has all the same name and description.

In [31]:
df.head()

Unnamed: 0,product,category,sub_category,brand,sale_price,market_price,type,rating,description,ProductID
0,Garlic Oil - Vegetarian Capsule 500 mg,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,220.0,220.0,Hair Oil & Serum,4.1,This Product contains Garlic Oil that is known...,205616461
1,Water Bottle - Orange,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,180.0,180.0,Water & Fridge Bottles,2.3,"Each product is microwave safe (without lid), ...",558925278
2,"Brass Angle Deep - Plain, No.2",Cleaning & Household,Pooja Needs,Trm,119.0,250.0,Lamp & Lamp Oil,3.4,"A perfect gift for all occasions, be it your m...",733001998
3,Cereal Flip Lid Container/Storage Jar - Assort...,Cleaning & Household,Bins & Bathroom Ware,Nakoda,149.0,176.0,"Laundry, Storage Baskets",3.7,Multipurpose container with an attractive desi...,737104473
4,Creme Soft Soap - For Hands & Body,Beauty & Hygiene,Bath & Hand Wash,Nivea,162.0,162.0,Bathing Bars & Soaps,4.4,Nivea Creme Soft Soap gives your skin the best...,762451459


In [32]:
df.drop_duplicates(inplace = True, subset=['product'])

In [9]:
df.head()
save_df = df.copy()

Look fo how many category and subcategory we have

In [33]:
print('category values')
df.category.value_counts()


category values


Beauty & Hygiene            6825
Gourmet & World Food        4040
Kitchen, Garden & Pets      3185
Cleaning & Household        2408
Snacks & Branded Foods      2401
Foodgrains, Oil & Masala    1964
Bakery, Cakes & Dairy        746
Beverages                    740
Baby Care                    544
Fruits & Vegetables          350
Eggs, Meat & Fish            338
Name: category, dtype: int64

In [34]:
print('subcategory values')
df.sub_category.value_counts()

subcategory values


Skin Care                   2079
Health & Medicine           1019
Storage & Accessories        875
Fragrances & Deos            853
Bath & Hand Wash             852
                            ... 
Herbs & Seasonings            16
Water                          7
Flower Bouquets, Bunches       7
Pork & Other Meats             3
Marinades                      1
Name: sub_category, Length: 90, dtype: int64

### Popularity-Based Filtering
The popularity-based technique hinges on the concept of recommending popular items—those with high ratings. In this approach, we prioritize suggesting items based on their descending popularity or user ratings. Let's delve into the mechanics of this straightforward yet effective recommendation strategy.

Trending Now: Top 5 Items in Each Category
For new users exploring our website, we're showcasing the top 5 items in each category, providing a snapshot of what's currently trending and popular. Here's a glimpse into the spotlight categories:

1. Beauty & Hygiene
2. Gourmet & World Food
3. Kitchen, Garden & Pets
4. Snacks & Branded Foods
5. Foodgrains, Oil & Masala
6. Cleaning & Household
7. Beverages
8. Bakery, Cakes & Dairy
9. Baby Care
10. Fruits & Vegetables
11. Eggs, Meat & Fish

In [36]:
# Group the DataFrame by category, and sort by rating
grouped = df.sort_values(by='rating', ascending=False).groupby('category')

# Define a function to return the top 5 products from each group
def top_5(group):
    return group.head(5)

# Apply the function to each group, and concatenate the results
result = grouped.apply(top_5).reset_index(drop=True)

# Display the resulting DataFrame
print(result[['category', 'sub_category', 'product', 'rating']])

                    category              sub_category  \
0                  Baby Care           Diapers & Wipes   
1                  Baby Care          Baby Accessories   
2                  Baby Care       Baby Bath & Hygiene   
3                  Baby Care           Diapers & Wipes   
4                  Baby Care           Diapers & Wipes   
5      Bakery, Cakes & Dairy                     Dairy   
6      Bakery, Cakes & Dairy     Cookies, Rusk & Khari   
7      Bakery, Cakes & Dairy          Cakes & Pastries   
8      Bakery, Cakes & Dairy          Cakes & Pastries   
9      Bakery, Cakes & Dairy                     Dairy   
10          Beauty & Hygiene                 Skin Care   
11          Beauty & Hygiene                 Skin Care   
12          Beauty & Hygiene         Health & Medicine   
13          Beauty & Hygiene         Fragrances & Deos   
14          Beauty & Hygiene          Feminine Hygiene   
15                 Beverages                       Tea   
16            

Now we get the result!! ready to display it on website

### Keyword-Based Filtering
Empowering users to discover products aligned with their interests, our keyword-based filtering system allows users to input one or more keywords related to their desired products. Upon submission, the system promptly returns a curated list of items that match the specified keywords. This personalized search feature ensures a tailored shopping experience, putting the user in control of finding exactly what they're looking for. Enter your keywords and explore a world of relevant products at your fingertips!

In [37]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity



To transform textual data into numerical representations, I've employed the TF-IDF (Term Frequency-Inverse Document Frequency) approach using the TfidfVectorizer object. This crucial step allows us to quantify the importance of words within the product descriptions.

The cosine_similarity function, courtesy of sklearn.metrics.pairwise, calculates the cosine similarity between user-input text and the descriptions of each product in our dataset. This similarity metric forms the basis for recommending products that closely align with the user's specified interests.

Finally, the recommend_product function encapsulates the process, returning detailed information about the most similar product based on the calculated cosine similarities. This ensures that users receive tailored recommendations reflective of their input.

In [38]:
# Create a TF-IDF vectorizer
vectorizer = TfidfVectorizer(stop_words='english')

# Fit and transform the 'description' column
tfidf_matrix = vectorizer.fit_transform(df['description'].values.astype('U'))

def recommend_products(input_text, top_n=5):
    # Transform the input text using the vectorizer
    input_vector = vectorizer.transform([input_text])

    # Calculate the cosine similarity between input and dataset
    similarities = cosine_similarity(input_vector, tfidf_matrix)

    # Get the indices of the top N most similar products
    most_similar_indices = similarities.argsort()[0, ::-1][:top_n]

    # Return the details of the top N most similar products
    most_similar_products = df.iloc[most_similar_indices]
    return most_similar_products

# Test the function
user_input = input("Enter your text: ")
recommended_products = recommend_products(user_input)

print("Top Recommended Products:")
for index, product in recommended_products.iterrows():
    print(f"{product['product']} - {product['brand']} - {product['category']}")


Enter your text:  Gel


Recommended Product:
Name: Taft Ultra Wet Gel - Ultra Strong 4
Brand: Schwarzkopf
Category: Beauty & Hygiene


New version that can return more than 1 item, in this code I set to 5.

In [39]:
def recommend_products(input_text, num_recommendations=5):
    # Transform the input text using the vectorizer
    input_vector = vectorizer.transform([input_text])

    # Calculate the cosine similarity between input and dataset
    similarities = cosine_similarity(input_vector, tfidf_matrix)

    # Get the indices of the top N most similar products
    #The argsort function is used to get the indices of the top num_recommendations most similar products. 
    top_indices = similarities.argsort()[0][-num_recommendations:][::-1]

    # Return the details of the top N most similar products
    top_products = df.iloc[top_indices]
    return top_products

# Test the function
user_input = input("Enter your text: ")
recommended_products = recommend_products(user_input)
print("Recommended Products:")
for index, product in recommended_products.iterrows():
    print("Name:", product['product'])
    print("Brand:", product['brand'])
    print("Category:", product['category'])
    print("-----------------------------")

Enter your text:  Gel


Recommended Products:
Name: Taft Ultra Wet Gel - Ultra Strong 4
Brand: Schwarzkopf
Category: Beauty & Hygiene
-----------------------------
Name: Passion For Vanilla Shampoo & Shower Gel
Brand: Nike
Category: Beauty & Hygiene
-----------------------------
Name: Pure Aloe Soothing Gel
Brand: Prakrta
Category: Beauty & Hygiene
-----------------------------
Name: Vegan Sports Energy Gel - Chocolate Flavour
Brand: FAST&UP
Category: Beauty & Hygiene
-----------------------------
Name: Vegan Sports Energy Gel - Strawberry Banana Flavour
Brand: FAST&UP
Category: Beauty & Hygiene
-----------------------------


## Content-Based Filtering
Content-based filtering leverages the specific characteristics or details of items that users have searched for to recommend similar items. By analyzing the attributes of the items a user has shown interest in, this approach tailors suggestions to their unique preferences. It's all about delivering a personalized and relevant user experience based on the content that resonates with each individual.

Calculate Item Similarity:

Initiate the process by computing similarity scores between each item based on their characteristics. Popular metrics like cosine similarity often come into play here. This step forms the foundation for understanding how closely items align with each other.
Generate Recommendations:

With the similarity scores in hand, it's time to generate personalized recommendations for the user. This can be achieved by selecting the top N items with the highest similarity scores. Alternatively, more advanced techniques like collaborative filtering or matrix factorization may be employed for a nuanced approach.
Evaluate Recommendations:

To ensure the quality of recommendations, rigorous evaluation becomes imperative. Metrics such as precision, recall, and F1 score offer quantitative insights into the effectiveness of the recommendation system. User studies can also be conducted to gauge user satisfaction and fine-tune the recommendation algorithm.
This systematic approach ensures a robust content-based filtering system that not only calculates item similarity but also crafts recommendations tailored to individual user preferences, all while maintaining a keen focus on evaluation metrics for continuous improvement.

### Recommend based on clicked product


In [40]:
import re
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity

I will drop rows that has null value so we can use the product that full of content

In [41]:
df = df.dropna()
df = df.reset_index(drop=True)
df[['category', 'sub_category','brand', 'product','type','description']]

Unnamed: 0,category,sub_category,brand,product,type,description
0,Beauty & Hygiene,Hair Care,Sri Sri Ayurveda,Garlic Oil - Vegetarian Capsule 500 mg,Hair Oil & Serum,This Product contains Garlic Oil that is known...
1,"Kitchen, Garden & Pets",Storage & Accessories,Mastercook,Water Bottle - Orange,Water & Fridge Bottles,"Each product is microwave safe (without lid), ..."
2,Cleaning & Household,Pooja Needs,Trm,"Brass Angle Deep - Plain, No.2",Lamp & Lamp Oil,"A perfect gift for all occasions, be it your m..."
3,Cleaning & Household,Bins & Bathroom Ware,Nakoda,Cereal Flip Lid Container/Storage Jar - Assort...,"Laundry, Storage Baskets",Multipurpose container with an attractive desi...
4,Beauty & Hygiene,Bath & Hand Wash,Nivea,Creme Soft Soap - For Hands & Body,Bathing Bars & Soaps,Nivea Creme Soft Soap gives your skin the best...
...,...,...,...,...,...,...
1236,"Kitchen, Garden & Pets",Kitchen Accessories,Adithya,"Cotton Kitchen Apron - Red Checked, Free Size",Kitchen Tools & Other Accessories,This superior 100% cotton with reinforced ties...
1237,Beauty & Hygiene,Hair Care,Soulflower,Onion Herbal Hair Growth Oil,Hair Oil & Serum,A real onion herbal hair oil treatment that ac...
1238,"Kitchen, Garden & Pets",Cookware & Non Stick,Omega,Stainless Steel Kadai With Copper Bottom No.11,Kadai & Fry Pans,"An offering of the brand Omega, this Copper Bo..."
1239,Snacks & Branded Foods,Frozen Veggies & Snacks,Jain Farm Fresh,Aamrus - Kesar Mango Pulp,Frozen Vegetables,Jain FarmFresh Aamrus is 100% Natural frozen M...


To extract meaningful features from the text data in the description column of a DataFrame, I employ the TfidfVectorizer from scikit-learn. This technique transforms the textual content into a sparse matrix of TF-IDF features.

TF-IDF Feature Extraction:
Objective:

Transform the text data into a numerical representation that captures the significance of words in each document.
Process:

The TfidfVectorizer is utilized, considering each row in the DataFrame as a document and creating a column for each unique word in the corpus.
Parameter Settings:

The stop_words parameter is set to 'english,' filtering out common English words for more meaningful feature extraction.
TF-IDF (Term Frequency-Inverse Document Frequency):
Definition:

TF-IDF is a numerical technique assigning weights to words based on their frequency in a document and rarity across the entire corpus. This technique highlights essential words while filtering out common and less informative ones.
Application:

Widely used in natural language processing and information retrieval, TF-IDF serves as a crucial preprocessing step, enabling the extraction of valuable features from text data for subsequent analysis.
This approach enhances the interpretability and relevance of the textual data, laying the groundwork for more effective analysis and recommendation system implementation.

In [42]:
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(df['description'])
tfidf_matrix.shape

(1241, 8521)

Cosine Similarity Calculation Using TF-IDF Matrix
To determine the similarity between all pairs of documents in a corpus, we leverage the TF-IDF matrix generated from the text data. Here's a breakdown of the process:

Objective:

Calculate the cosine similarity between each pair of documents to identify their level of similarity.
Technique:

Utilizing the TF-IDF matrix, the linear_kernel function from scikit-learn computes the cosine similarity between all pairs of documents. Each document is represented as a row in the TF-IDF matrix.
Matrix Representation:

The resulting cosine_sim matrix is symmetric, with each element representing the cosine similarity score between two documents. This matrix provides a comprehensive view of the similarity relationships within the corpus.
Application:

Commonly employed in natural language processing and information retrieval, this technique assists in finding documents that are most similar to a given query document. It serves as a fundamental step in content-based filtering, aiding in the recommendation of items based on their textual characteristics.
This process enhances our understanding of document similarity, enabling more nuanced and effective recommendations in various domains.

In [43]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim

array([[1.        , 0.01709015, 0.01069189, ..., 0.        , 0.00987216,
        0.        ],
       [0.01709015, 1.        , 0.00806329, ..., 0.05285553, 0.        ,
        0.        ],
       [0.01069189, 0.00806329, 1.        , ..., 0.06368878, 0.        ,
        0.05110836],
       ...,
       [0.        , 0.05285553, 0.06368878, ..., 1.        , 0.        ,
        0.        ],
       [0.00987216, 0.        , 0.        , ..., 0.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.05110836, ..., 0.        , 0.        ,
        1.        ]])

In [44]:
mapping = pd.Series(df.index,index = df['product'])
mapping

product
Garlic Oil - Vegetarian Capsule 500 mg                        0
Water Bottle - Orange                                         1
Brass Angle Deep - Plain, No.2                                2
Cereal Flip Lid Container/Storage Jar - Assorted Colour       3
Creme Soft Soap - For Hands & Body                            4
                                                           ... 
Cotton Kitchen Apron - Red Checked, Free Size              1236
Onion Herbal Hair Growth Oil                               1237
Stainless Steel Kadai With Copper Bottom No.11             1238
Aamrus - Kesar Mango Pulp                                  1239
Rice - Vermicelli                                          1240
Length: 1241, dtype: int64

Define recommend function that get product as parameter and return dataset of ralated products


In [45]:
def recommend_product_based_on_click(product_input):
    product_index = mapping[product_input]
    #get similarity values with other product
    #similarity_score is the list of index and similarity matrix
    similarity_score = list(enumerate(cosine_sim[product_index]))
    #sort in descending order the similarity score of product inputted with all the other product
    similarity_score = sorted(similarity_score, key=lambda x: x[1], reverse=True)
    # Get the scores of the 15 most similar products. Ignore the first product.
    similarity_score = similarity_score[1:15]
    #return product names using the mapping series
    product_indices = [i[0] for i in similarity_score]
    return (df['product'].iloc[product_indices])


Try our function by put the 'product' as parameter

In [46]:
recommend_product_based_on_click('Onion Herbal Hair Growth Oil')

1124    Hair growth Oil With Dandruff & Hair Fall Trea...
897                           Natural Nourishing Hair Oil
95      Cold Pressed Bhringraj Cooling Oil For Hair Fa...
858     Anti Dandruff Conditioner - With Tea Tree & Gi...
56                             Argan-Liquid Gold Hair Spa
330                                 Hair Fall Control Oil
250                                  Hair Spa Oil Therapy
247                                Anti-Hair Fall Shampoo
1031           Flex Body Building Shampoo - Normal To Dry
8       Biotin & Collagen Volumizing Hair Shampoo + Bi...
135                             Ylang Ylang Essential Oil
724                                           Avocado Oil
589     Bio Green - Apple Fresh Daily Purifying Shampo...
912                                 Hair Gel Regular Hold
Name: product, dtype: object

# Collaborative filtering
relies on user behavior data to identify patterns and make recommendations based on similar user preferences.

In this case, I will use another dataset because this technique use the purches history of customer.

import the data and crop it.

In [47]:
import sklearn
from sklearn.decomposition import TruncatedSVD

In [48]:
df2.head()

Unnamed: 0,UserId,ProductId,Rating,Timestamp
0,A39HTATAQ9V7YF,205616461,5.0,1369699200
1,A3JM6GV9MNOF9X,558925278,3.0,1355443200
2,A1Z513UWSAAO0F,558925278,5.0,1404691200
3,A1WMRR494NWEWV,733001998,4.0,1382572800
4,A3IAAVS479H7M7,737104473,1.0,1274227200


Create a user-item matrix from the dataset. This matrix represents the user-item interactions, where rows correspond to users, columns correspond to items, and the values represent the ratings given by users to items.

In [49]:
user_item_matrix = df2.pivot_table(
    index='UserId', columns='ProductId', values='Rating', fill_value=0)

Singular Value Decomposition (SVD) that helps to reduce the dimensionality of a matrix.

SVD is a matrix factorization method that decomposes a matrix into three matrices: U, Σ, and V^T. For a given matrix M, the SVD factorizes it as M = U * Σ * V^T, where U and V are orthogonal matrices and Σ is a diagonal matrix with singular values.

TruncatedSVD is similar to traditional SVD but produces an approximation of the original matrix by only considering the top-k singular values and their corresponding singular vectors.

In [82]:
X = user_item_matrix.T
X1 = X


SVD = TruncatedSVD(n_components=10)
decomposed_matrix = SVD.fit_transform(X)

In [83]:
X

UserId,A00205921JHJK5X9LNP42,A00473363TJ8YSZ3YAGG9,A01437583CZ7V02UKZQ5S,A01907982I6OHXDYN5HD6,A020135981U0UNEAE4JV,A024581134CV80ZBLIZTZ,A03056581JJIOL5FSKJY7,A03099101ZRK4K607JVHH,A03454732N8VEYJAMGTTH,A03900532XT2E5T10WV0U,...,AZZDA9BRMPP1B,AZZHB6U54UDYW,AZZHJZP4GQPPZ,AZZNK89PXD006,AZZOFVMQC0BJG,AZZQXL8VDCFTV,AZZSAMMJPJKJ1,AZZTJQ7CQZUD8,AZZVCBG5G4EV8,AZZWPNME0GQZ2
ProductId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0205616461,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0558925278,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0733001998,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0737104473,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0762451459,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
B00007L1IE,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
B00007L64J,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
B00007LB75,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
B00007LVDA,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [84]:
correlation_matrix = np.corrcoef(decomposed_matrix)
correlation_matrix

array([[ 1.        ,  0.1967911 , -0.24982777, ...,  0.53395213,
         0.41098679, -0.58812897],
       [ 0.1967911 ,  1.        ,  0.70876563, ...,  0.0523039 ,
        -0.06108128,  0.40500971],
       [-0.24982777,  0.70876563,  1.        , ..., -0.34151872,
        -0.68382501,  0.80306823],
       ...,
       [ 0.53395213,  0.0523039 , -0.34151872, ...,  1.        ,
         0.72789028, -0.74256   ],
       [ 0.41098679, -0.06108128, -0.68382501, ...,  0.72789028,
         1.        , -0.76633062],
       [-0.58812897,  0.40500971,  0.80306823, ..., -0.74256   ,
        -0.76633062,  1.        ]])

In [85]:
i = "B0000530LL"

product_names = list(X.index)
product_ID = product_names.index(i)

In [78]:
i = "A01907982I6OHXDYN5HD6"

user_names = list(X.index)
user_ID = user_names.index(i)

In [87]:
correlation_user_ID = correlation_matrix[product_ID]
correlation_user_ID.shape

Recommend = list(X.index[correlation_user_ID > 0.5])


# Removes the item already bought by the customer
# Recommend.remove(i)

In [88]:
correlation_user_ID

array([-0.83925602,  0.15753317,  0.59250329, ..., -0.80145411,
       -0.69078861,  0.90061587])

In [89]:
Recommend[0:9]

['0733001998',
 '0737104473',
 '0762451459',
 '130414089X',
 '1304196070',
 '1304482634',
 '1451646526',
 '322700075X',
 '3227001055']

In [90]:
result = Recommend[0:10]

In [91]:
result

['0733001998',
 '0737104473',
 '0762451459',
 '130414089X',
 '1304196070',
 '1304482634',
 '1451646526',
 '322700075X',
 '3227001055',
 '4057362886']

In [92]:
product_names_dict = df.set_index('ProductID')['product'].to_dict()

# Create a dictionary mapping ProductID to ProductName

product_names = [product_names_dict.get('e.productId') for e in result]

In [71]:
product_names_dict.get('0205616461')

'Garlic Oil - Vegetarian Capsule 500 mg'

In [93]:
product_names_dict.get('3227001055')

'Dog Supplement - Absolute Skin + Coat Tablet'

In [94]:
def get_product_name(product_id):
    if product_id in product_names_dict:
        return product_names_dict[product_id]
    else:
        return "Product not found"

In [95]:
for item in result:
    product_name = get_product_name(item)
    print(product_name)

Brass Angle Deep - Plain, No.2
Cereal Flip Lid Container/Storage Jar - Assorted Colour
Creme Soft Soap - For Hands & Body
Hand Sanitizer - 70% Alcohol Base
Salted Pumpkin
Instant Noodles - Chicken Satay Flavor
Acne & Oil Control Face Wash
Atta Chalan - Stainless Steel, Size- No.8
Dog Supplement - Absolute Skin + Coat Tablet
Product not found
