## **Recommender System**

Often, we see ads of products similar to the products we just bought, like or view. Of course that does not happen magically or we might think there is someone doing that manually. some other examples can bee seen in online music player, youtube, netflix, etc., when we see recommendation based on what we already listened, watched, or liked. In reality, that occurance is caused by machine learning algorithms called recommender system. 

So, in order assure that we targetted product promotions to the right customers, recommender system may be the proper solution. In this notebook, the implementation of recommender system using **Content Based and Collaborative Filtering** will be built. This can be used in e-commerce so that the system will recommend product based on product similarity. In this case, the system will focus to only fashion products.

Dataset used: https://www.kaggle.com/datasets/aldoattallah/fashion-ecommerce

### **Content-based Filtering**

**Import Packages**

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# pd.set_option("display.max_columns", None)

**Data Loading**

In [2]:
df = pd.read_csv("fashion-dataset-clean.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,transaction_id,payment_method,payment_status,promo_amount,shipment_fee,total_amount,customer_id,first_name,last_name,...,quantity,item_price,product_name,master_category,sub_category,article_type,base_color,season,year,usage
0,0,186e2bee-0637-4710-8981-50c2d737bc42,Debit Card,Success,1415,10000,199832,5868,Titin,Pratiwi,...,1,191247,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
1,1,81f60282-96c5-45f5-8a24-18e8111ccd08,OVO,Success,0,10000,232512,82831,Ibrani,Thamrin,...,1,222512,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
2,2,a144e124-1ad6-425b-9f64-b01f05c697ff,Gopay,Success,0,10000,255159,47013,Sabar,Saragih,...,1,245159,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
3,3,e620a19d-982d-4fc2-9715-29fda7f42269,Credit Card,Success,0,0,263371,17135,Sabri,Wacana,...,1,263371,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
4,4,bbe1053a-9738-4438-bea4-0a3abcaf6afb,Gopay,Success,0,10000,2413496,70185,Erik,Prasetyo,...,8,300437,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual


**Filter Products-related Columns**

In [3]:
products = df.loc[:,"product_gender":].drop("quantity", axis=1)
products

Unnamed: 0,product_gender,product_id,item_price,product_name,master_category,sub_category,article_type,base_color,season,year,usage
0,Men,54728,191247,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
1,Men,54728,222512,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
2,Men,54728,245159,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
3,Men,54728,263371,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
4,Men,54728,300437,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
...,...,...,...,...,...,...,...,...,...,...,...
851382,Women,52323,55893,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual
851383,Women,52323,341441,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual
851384,Women,52323,319042,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual
851385,Women,52323,477400,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual


**Duplicate Check**

In [4]:
products.duplicated().sum()

25

In [5]:
products.drop_duplicates(keep='first', inplace=True)
products.duplicated().sum()

0

Unique Value Checking

In [6]:
print(f"Product Name: {len(products.product_name.unique())}")
print(f"Product ID: {len(products.product_id.unique())}")


Product Name: 31094
Product ID: 44384


For recommender system using content-based filtering, We are going to drop duplicated product_name and keep only the first one in the data.

In [7]:
products_clean = products.drop_duplicates(subset='product_name', keep='first')
products_clean

Unnamed: 0,product_gender,product_id,item_price,product_name,master_category,sub_category,article_type,base_color,season,year,usage
0,Men,54728,191247,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
17,Men,16193,145526,Puma Men Knitted Vest Green Sweater,Apparel,Topwear,Sweaters,Green,Fall,2011,Casual
41,Women,53686,135174,Kiara Women Purple & Yellow Handbag,Accessories,Bags,Handbags,Purple,Summer,2012,Casual
62,Women,20228,271012,Wrangler Women Cable Red Sweater,Apparel,Topwear,Sweaters,Red,Fall,2011,Casual
81,Women,55220,198753,Lakme Absolute Forever Silk Chestnut Lip Liner 03,Personal Care,Lips,Lip Liner,Brown,Spring,2017,Casual
...,...,...,...,...,...,...,...,...,...,...,...
851284,Women,56009,225237,Colorbar Velvet Matte Diva Lipstick 02 M,Personal Care,Lips,Lipstick,Red,Spring,2017,Casual
851292,Men,15020,372619,ADIDAS Men Solid Maroon Jackets,Apparel,Topwear,Jackets,Maroon,Fall,2011,Sports
851317,Men,38577,311271,Nike Men Breakline Navy Blue Track Pants,Apparel,Bottomwear,Track Pants,Navy Blue,Summer,2012,Sports
851327,Men,6965,328629,s.Oliver Men's Generous Grey T-shirt,Apparel,Topwear,Tshirts,Grey,Summer,2011,Casual


Now we can proceed with deleteing unnecessary columns

In [8]:
products_clean = products_clean.drop(['product_id', 'item_price', 'year', 'article_type', 'base_color', 'season', 'usage'], axis=1)
products_clean.head()

Unnamed: 0,product_gender,product_name,master_category,sub_category
0,Men,Vans Men Black Shoes,Footwear,Shoes
17,Men,Puma Men Knitted Vest Green Sweater,Apparel,Topwear
41,Women,Kiara Women Purple & Yellow Handbag,Accessories,Bags
62,Women,Wrangler Women Cable Red Sweater,Apparel,Topwear
81,Women,Lakme Absolute Forever Silk Chestnut Lip Liner 03,Personal Care,Lips


Since the product attribute is already separated in multiple columns, we can just go ahead creating data dummies for encoding.

In [9]:
products_dummies = pd.get_dummies(products_clean, columns=['product_gender','master_category', 'sub_category'], dtype=int).reset_index(drop=True)


In [10]:
products_dummies.head()

Unnamed: 0,product_name,product_gender_Boys,product_gender_Girls,product_gender_Men,product_gender_Unisex,product_gender_Women,master_category_Accessories,master_category_Apparel,master_category_Footwear,master_category_Free Items,...,sub_category_Sports Equipment,sub_category_Stoles,sub_category_Ties,sub_category_Topwear,sub_category_Umbrellas,sub_category_Vouchers,sub_category_Wallets,sub_category_Watches,sub_category_Water Bottle,sub_category_Wristbands
0,Vans Men Black Shoes,0,0,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,Puma Men Knitted Vest Green Sweater,0,0,1,0,0,0,1,0,0,...,0,0,0,1,0,0,0,0,0,0
2,Kiara Women Purple & Yellow Handbag,0,0,0,0,1,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Wrangler Women Cable Red Sweater,0,0,0,0,1,0,1,0,0,...,0,0,0,1,0,0,0,0,0,0
4,Lakme Absolute Forever Silk Chestnut Lip Liner 03,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Saving Matrix

In [11]:
products_dummies.to_csv('deployment/products_matrix.csv')

Model Function

In [12]:
def product_based(product):
    '''
    This function receives product name and return some similar products to recommend
    '''
    def cosine_sim(vect1,vect2):
        '''
        this function is built inside the parent function. This return cosine similarity of two vectors
        '''
        norm_1 = np.linalg.norm(vect1)
        norm_2 = np.linalg.norm(vect2)

        cos_sim = (vect1 @ vect2) / (norm_1 * norm_2)
        return cos_sim
    
    # matrix sereis
    product_matrix = pd.read_csv("deployment/products_matrix.csv", index_col='product_name')
    # consine similarity series
    cossim = pd.Series([cosine_sim(product_matrix.loc[product], x) for x in product_matrix.values],
                       index=product_matrix.index).drop(index=product)
    print(f'You like {product}, you may also like:')
    for i,pro in enumerate(cossim.sort_values(ascending=False)[:10].index):
        print(f'{i+1}. {pro}')

Usage

In [43]:
product_based("Kiara Women Purple & Yellow Handbag")

You like Kiara Women Purple & Yellow Handbag, you may also like:
1. Royal Diadem Red Earrings
2. Royal Diadem Set of 2 Golden Bangles
3. Murcia Women Grey Casual Handbag
4. Rocia Women Gold Handbag
5. Catwalk Women Black Heels
6. Belmonte Men Bright Assorted Steel Cufflinks
7. FNF Green & Black Wedding Collection Sari
8. FNF Maroon Printed Sari
9. Baggit Women Beige Handbag
10. Catwalk Women Red Casual Shoes


In this example, the products, that are recommended, are similar with product "Kiara Women Purple & Yellow Handbag".

### **User-Based Filtering**

First, the user data needs to be sampled to only 100000 users data due to lack of memory capacity. 

In [14]:
users = df[['customer_id', 'product_name']].sample(100000, random_state=4).copy()
users

Unnamed: 0,customer_id,product_name
33035,34656,Nike Unisex Tennis White Caps
190348,13240,Chhota Bheem Kids Girl I Love to Play Green Ts...
595675,18699,Fabindia Men White Pyjamas
2501,17502,ADIDAS Women Grey Capris
584301,33852,Puma Men Ftpa Socks White Socks
...,...,...
79007,36369,Lino Perros Men Red Suspenders
412570,47588,John Players Men Check Orange Shirt
155983,40472,Nike Men Air Dictate MSL White Sports Shoes
69606,43145,Proline Men Navy Striped Polo T-shirt


Dropping Duplicates

In [15]:
users.drop_duplicates(keep='first', inplace=True)
users

Unnamed: 0,customer_id,product_name
33035,34656,Nike Unisex Tennis White Caps
190348,13240,Chhota Bheem Kids Girl I Love to Play Green Ts...
595675,18699,Fabindia Men White Pyjamas
2501,17502,ADIDAS Women Grey Capris
584301,33852,Puma Men Ftpa Socks White Socks
...,...,...
79007,36369,Lino Perros Men Red Suspenders
412570,47588,John Players Men Check Orange Shirt
155983,40472,Nike Men Air Dictate MSL White Sports Shoes
69606,43145,Proline Men Navy Striped Polo T-shirt


Adding label for Value assigning

In [16]:
users['Buy']=1
users

Unnamed: 0,customer_id,product_name,Buy
33035,34656,Nike Unisex Tennis White Caps,1
190348,13240,Chhota Bheem Kids Girl I Love to Play Green Ts...,1
595675,18699,Fabindia Men White Pyjamas,1
2501,17502,ADIDAS Women Grey Capris,1
584301,33852,Puma Men Ftpa Socks White Socks,1
...,...,...,...
79007,36369,Lino Perros Men Red Suspenders,1
412570,47588,John Players Men Check Orange Shirt,1
155983,40472,Nike Men Air Dictate MSL White Sports Shoes,1
69606,43145,Proline Men Navy Striped Polo T-shirt,1


Create Pivot Table

In [17]:
pivot = pd.pivot_table(users, values='Buy', columns='product_name', index='customer_id') # creating pivot table to create matrix
pivot

product_name,109F Blue A-Line Dress,109F Red & White A-Line Dress,109F Women Beige Embroidered Top,109F Women Black & Cream Dress,109F Women Black & Cream-Coloured Colourblocked Printed Tunic,109F Women Black & White Top,109F Women Black Embellished Tunic,109F Women Black Printed Kaftan Tunic,109F Women Black Printed Tunic,109F Women Blue & Green Polka Dot Print Tunic,...,s.Oliver Women's Blue Tops,s.Oliver Women's Department Refined Red T-shirt,s.Oliver Women's Green Blouse Shirt,s.Oliver Women's Purple Blouse Top,s.Oliver Women's Sky Blue Top,s.Oliver Women's Striped Light Blue Top,s.Oliver Women's Tank Brown Top,s.Oliver Women's White Blouse Top,test dispName,united Colors Of Benetton Women Grey Tight
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3,,,,,,,,,,,...,,,,,,,,,,
15,,,,,,,,,,,...,,,,,,,,,,
17,,,,,,,,,,,...,,,,,,,,,,
18,,,,,,,,,,,...,,,,,,,,,,
20,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99989,,,,,,,,,,,...,,,,,,,,,,
99991,,,,,,,,,,,...,,,,,,,,,,
99992,,,,,,,,,,,...,,,,,,,,,,
99995,,,,,,,,,,,...,,,,,,,,,,


Creating Normlaized Matrix

In [18]:
normalized_matrix = pivot.divide(pivot.mean(axis=1), axis=0).fillna(0) # normalisasi matrix
normalized_matrix

product_name,109F Blue A-Line Dress,109F Red & White A-Line Dress,109F Women Beige Embroidered Top,109F Women Black & Cream Dress,109F Women Black & Cream-Coloured Colourblocked Printed Tunic,109F Women Black & White Top,109F Women Black Embellished Tunic,109F Women Black Printed Kaftan Tunic,109F Women Black Printed Tunic,109F Women Blue & Green Polka Dot Print Tunic,...,s.Oliver Women's Blue Tops,s.Oliver Women's Department Refined Red T-shirt,s.Oliver Women's Green Blouse Shirt,s.Oliver Women's Purple Blouse Top,s.Oliver Women's Sky Blue Top,s.Oliver Women's Striped Light Blue Top,s.Oliver Women's Tank Brown Top,s.Oliver Women's White Blouse Top,test dispName,united Colors Of Benetton Women Grey Tight
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
18,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
20,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
99991,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
99992,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
99995,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Users Similarity Table

In [19]:
cossim = cosine_similarity(normalized_matrix) # using consine_similarity 
users_data = pd.DataFrame(cossim, index=normalized_matrix.index, columns=normalized_matrix.index) # matrix dataframe
users_data.sample(5, axis=1).round(2)

customer_id,96435,22686,19856,60365,98052
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,0.0,0.0,0.0,0.0,0.0
15,0.0,0.0,0.0,0.0,0.0
17,0.0,0.0,0.0,0.0,0.0
18,0.0,0.0,0.0,0.0,0.0
20,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...
99989,0.0,0.0,0.0,0.0,0.0
99991,0.0,0.0,0.0,0.0,0.0
99992,0.0,0.0,0.0,0.0,0.0
99995,0.0,0.0,0.0,0.0,0.0


Saving to h5 to compress file size

In [20]:
# using .h5 to reduce file size
users_data.to_hdf('deployment/users_matrix.h5', key='df', mode='w', complevel=9)

Saving Reference File

In [21]:
# creating file for references
cust_product = df[['customer_id', 'product_name']].copy()
cust_product.to_csv('deployment/data_cust_product.csv') 

For similar users, it's only selected the top first position of consine similarity.

In [40]:
def collaborative(id):
    sim_users = users_data[id].sort_values(ascending=False)[1:].index[0]
    data = cust_product[(cust_product['customer_id'] == sim_users)].product_name[:10]
    print(sim_users)
    return data.values

Usage

In [42]:
collaborative(22686)

55664


array(['Arrow Men Cream Cap', 'Puma Men Cell Tolero Black Casual Shoes',
       'Alma Women Black Kurta',
       'Fila Women Grey Speed Lite Sports Shoes',
       'Wrangler Men Bull Rider Grey T-shirt',
       'Gini and Jony Boys Check Blue Shirt',
       'Peter England Unisex Statements Black Passport Holder',
       'U.S. Polo Assn. Men Checks Navy Blue Shirt',
       'Ed Hardy Love & Luck Women Mermaid Fragrance Gift Set',
       'Estd. 1977 Men Brown Sandals'], dtype=object)

In this example, user with id 22686, have similarity with user with id 55664. The items that user 55664 bought will be recommended to user 55664 