## **Recommender System**

Often, we see ads of products similar to the products we just bought, like or view. Of course that does not happen magically or we might think there is someone doing that manually. some other examples can bee seen in online music player, youtube, netflix, etc., when we see recommendation based on what we already listened, watched, or liked. In reality, that occurance is caused by machine learning algorithms called recommender system. 

So, in order assure that we targetted product promotions to the right customers, recommender system may be the proper solution. In this notebook, the implementation of recommender system using **Content Based Filtering** will be built. This can be used in e-commerce so that the system will recommend product based on product similarity. In this case, the system will focus to only fashion products.

Dataset used: https://www.kaggle.com/datasets/aldoattallah/fashion-ecommerce

**Import Packages**

In [60]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

pd.set_option("display.max_columns", None)

**Data Loading**

In [61]:
df = pd.read_csv("fashion-dataset-clean.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,transaction_id,payment_method,payment_status,promo_amount,shipment_fee,total_amount,customer_id,first_name,last_name,customer_gender,device_type,home_location_lat,home_location_long,home_location,age,product_gender,product_id,quantity,item_price,product_name,master_category,sub_category,article_type,base_color,season,year,usage
0,0,186e2bee-0637-4710-8981-50c2d737bc42,Debit Card,Success,1415,10000,199832,5868,Titin,Pratiwi,F,Android,-6.122897,106.8765,Jakarta Raya,23,Men,54728,1,191247,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
1,1,81f60282-96c5-45f5-8a24-18e8111ccd08,OVO,Success,0,10000,232512,82831,Ibrani,Thamrin,M,Android,-2.764594,116.34979,Kalimantan Selatan,22,Men,54728,1,222512,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
2,2,a144e124-1ad6-425b-9f64-b01f05c697ff,Gopay,Success,0,10000,255159,47013,Sabar,Saragih,M,Android,-7.98389,110.658139,Yogyakarta,25,Men,54728,1,245159,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
3,3,e620a19d-982d-4fc2-9715-29fda7f42269,Credit Card,Success,0,0,263371,17135,Sabri,Wacana,M,iOS,0.92124,121.202319,Sulawesi Tengah,13,Men,54728,1,263371,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
4,4,bbe1053a-9738-4438-bea4-0a3abcaf6afb,Gopay,Success,0,10000,2413496,70185,Erik,Prasetyo,M,Android,-1.272805,100.850657,Sumatera Barat,27,Men,54728,8,300437,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual


**Filter Products-related Columns**

In [62]:
products = df.loc[:,"product_gender":].drop("quantity", axis=1)
products

Unnamed: 0,product_gender,product_id,item_price,product_name,master_category,sub_category,article_type,base_color,season,year,usage
0,Men,54728,191247,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
1,Men,54728,222512,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
2,Men,54728,245159,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
3,Men,54728,263371,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
4,Men,54728,300437,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
...,...,...,...,...,...,...,...,...,...,...,...
851382,Women,52323,55893,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual
851383,Women,52323,341441,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual
851384,Women,52323,319042,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual
851385,Women,52323,477400,ToniQ Women Off White Bangle,Accessories,Jewellery,Bangle,White,Winter,2016,Casual


**Duplicate Check**

In [63]:
products.duplicated().sum()

25

In [64]:
products.drop_duplicates(keep='first', inplace=True)
products.duplicated().sum()

0

Unique Value Checking

In [65]:
print(f"Product Name: {len(products.product_name.unique())}")
print(f"Product ID: {len(products.product_id.unique())}")


Product Name: 31094
Product ID: 44384


Since we are going to have recommender system using content-based filtering. We are going to drop duplicated product_name and keep only the first one in the data.

In [66]:
products_clean = products.drop_duplicates(subset='product_name', keep='first')
products_clean

Unnamed: 0,product_gender,product_id,item_price,product_name,master_category,sub_category,article_type,base_color,season,year,usage
0,Men,54728,191247,Vans Men Black Shoes,Footwear,Shoes,Casual Shoes,Black,Summer,2012,Casual
17,Men,16193,145526,Puma Men Knitted Vest Green Sweater,Apparel,Topwear,Sweaters,Green,Fall,2011,Casual
41,Women,53686,135174,Kiara Women Purple & Yellow Handbag,Accessories,Bags,Handbags,Purple,Summer,2012,Casual
62,Women,20228,271012,Wrangler Women Cable Red Sweater,Apparel,Topwear,Sweaters,Red,Fall,2011,Casual
81,Women,55220,198753,Lakme Absolute Forever Silk Chestnut Lip Liner 03,Personal Care,Lips,Lip Liner,Brown,Spring,2017,Casual
...,...,...,...,...,...,...,...,...,...,...,...
851284,Women,56009,225237,Colorbar Velvet Matte Diva Lipstick 02 M,Personal Care,Lips,Lipstick,Red,Spring,2017,Casual
851292,Men,15020,372619,ADIDAS Men Solid Maroon Jackets,Apparel,Topwear,Jackets,Maroon,Fall,2011,Sports
851317,Men,38577,311271,Nike Men Breakline Navy Blue Track Pants,Apparel,Bottomwear,Track Pants,Navy Blue,Summer,2012,Sports
851327,Men,6965,328629,s.Oliver Men's Generous Grey T-shirt,Apparel,Topwear,Tshirts,Grey,Summer,2011,Casual


Now we can proceed with deleteing unnecessary columns

In [67]:
products_clean = products_clean.drop(['product_id', 'item_price', 'year', 'article_type', 'base_color', 'season', 'usage'], axis=1)
products_clean.head()

Unnamed: 0,product_gender,product_name,master_category,sub_category
0,Men,Vans Men Black Shoes,Footwear,Shoes
17,Men,Puma Men Knitted Vest Green Sweater,Apparel,Topwear
41,Women,Kiara Women Purple & Yellow Handbag,Accessories,Bags
62,Women,Wrangler Women Cable Red Sweater,Apparel,Topwear
81,Women,Lakme Absolute Forever Silk Chestnut Lip Liner 03,Personal Care,Lips


Since the product attribute is already separated in multiple columns, we can just go ahead creating data dummies for encoding.

In [69]:
products_dummies = pd.get_dummies(products_clean, columns=['product_gender','master_category', 'sub_category'], dtype=int).reset_index(drop=True)


In [70]:
products_dummies.head()

Unnamed: 0,product_name,product_gender_Boys,product_gender_Girls,product_gender_Men,product_gender_Unisex,product_gender_Women,master_category_Accessories,master_category_Apparel,master_category_Footwear,master_category_Free Items,master_category_Home,master_category_Personal Care,master_category_Sporting Goods,sub_category_Accessories,sub_category_Apparel Set,sub_category_Bags,sub_category_Bath and Body,sub_category_Beauty Accessories,sub_category_Belts,sub_category_Bottomwear,sub_category_Cufflinks,sub_category_Dress,sub_category_Eyes,sub_category_Eyewear,sub_category_Flip Flops,sub_category_Fragrance,sub_category_Free Gifts,sub_category_Gloves,sub_category_Hair,sub_category_Headwear,sub_category_Home Furnishing,sub_category_Innerwear,sub_category_Jewellery,sub_category_Lips,sub_category_Loungewear and Nightwear,sub_category_Makeup,sub_category_Mufflers,sub_category_Nails,sub_category_Perfumes,sub_category_Sandal,sub_category_Saree,sub_category_Scarves,sub_category_Shoe Accessories,sub_category_Shoes,sub_category_Skin,sub_category_Skin Care,sub_category_Socks,sub_category_Sports Accessories,sub_category_Sports Equipment,sub_category_Stoles,sub_category_Ties,sub_category_Topwear,sub_category_Umbrellas,sub_category_Vouchers,sub_category_Wallets,sub_category_Watches,sub_category_Water Bottle,sub_category_Wristbands
0,Vans Men Black Shoes,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Puma Men Knitted Vest Green Sweater,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,Kiara Women Purple & Yellow Handbag,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Wrangler Women Cable Red Sweater,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
4,Lakme Absolute Forever Silk Chestnut Lip Liner 03,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Saving dummies file

In [71]:
products_dummies.to_csv('products_vector.csv')

In [72]:
def recommend_system(product):
    def cosine_sim(vect1,vect2):
        norm_1 = np.linalg.norm(vect1)
        norm_2 = np.linalg.norm(vect2)

        cos_sim = (vect1 @ vect2) / (norm_1 * norm_2)
        return cos_sim
    product_vector = pd.read_csv("products_vector.csv", index_col='product_name')
    cossim = pd.Series([cosine_sim(product_vector.loc[product], x) for x in product_vector.values],
                       index=product_vector.index).drop(index=product)
    print(f'You like {product}, you may also like:')
    for i,pro in enumerate(cossim.sort_values(ascending=False)[:10].index):
        print(f'{i+1}. {pro}')

In [82]:
recommend_system("Reid & Taylor Men Casual Brown Wallets")

You like Reid & Taylor Men Casual Brown Wallets, you may also like:
1. United Colors of Benetton Men Solid Green Wallets
2. Reid & Taylor Men Casual Black Wallets
3. Van Heusen Men Blue Wallet
4. American Tourister Men Black Slim Fold Wallet
5. Hidekraft Men Dark Brown Wallet
6. Reid & Taylor Men Solid Brown Wallets
7. Arrow Men Two Toned Wallet
8. OTLS Men Laurel Brown Wallet
9. New Hide Men Brown Wallet
10. Allen Solly Men Black Card Holder


Since this recommender system is built using python functions, simply use model.py to import the model