# [Content-Based Filtering](https://medium.com/mlearning-ai/recommendation-systems-content-based-filtering-e19e3b0a309e)

In [1]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [2]:
products = pd.read_csv('Accessories_Final.csv', usecols=['name', 'features'])
products

Unnamed: 0,name,features
0,Honor X5 Bluetooth Earbuds - White,"\n- Press control\n- Proximity discovery, dual..."
1,"Oraimo Wireless Earphones, Black - OEB-E03D",NONE
2,Nothing Ear 2 Wireless Earphones - Black,-Built-in 13mm drivers dilvers dynamic sound
3,"JBL Tour Pro Plus Wireless Earphones, Black",\n- Driver size: 6.8mm \n- IPX5 waterproof\n- ...
4,Xiaomi Redmi Buds 4 Active Wireless Earphones ...,NONE
...,...,...
10982,Wild Wolf Hunter Mouse Pad,Brand: Wild Wolf\nModel: Hunter\n\nType: Mouse...
10983,Logitech Mouse Pad (Copy),Brand: Logitech (Copy)\n\nType: Mouse Pad\n\nH...
10984,Mouse Pad With Rest,Brand: Generic\n\nType: Mouse Pad\n\n\nDimensi...
10985,Protective Sticker For Laptop Keyboard 15-17,Protects Your Laptop Screen from Scratches and...


In [3]:
products['features']

0        \n- Press control\n- Proximity discovery, dual...
1                                                     NONE
2             -Built-in 13mm drivers dilvers dynamic sound
3        \n- Driver size: 6.8mm \n- IPX5 waterproof\n- ...
4                                                     NONE
                               ...                        
10982    Brand: Wild Wolf\nModel: Hunter\n\nType: Mouse...
10983    Brand: Logitech (Copy)\n\nType: Mouse Pad\n\nH...
10984    Brand: Generic\n\nType: Mouse Pad\n\n\nDimensi...
10985    Protects Your Laptop Screen from Scratches and...
10986    Brand: Other\n\nType: Mouse Pad\n\nDimensions:...
Name: features, Length: 10987, dtype: object

## TF-IDF Vectorization of Product Features
The TF-IDF (Term Frequency-Inverse Document Frequency) vectorization process is applied to the textual features of products using the TfidfVectorizer from the scikit-learn library. The 'stop_words' parameter is set to 'english' to exclude common English stop words during the vectorization. The 'fillna("")' operation ensures that any missing values in the 'features' column are replaced with empty strings. The resulting TF-IDF matrix, stored in 'tfidf_matrix', represents the numerical representation of the product features, providing a basis for calculating cosine similarity between products in the subsequent steps of the content-based recommendation model.

In [4]:
tfidf = TfidfVectorizer(stop_words='english')
products['features'] = products['features'].fillna("")
tfidf_matrix = tfidf.fit_transform(products['features'])

## Computing Cosine Similarity Matrix
The cosine similarity matrix is computed using the linear_kernel function from scikit-learn, applied to the TF-IDF matrix previously generated for product features. The resulting matrix, stored in 'cosine_similarity,' captures the pairwise cosine similarity scores between products based on their textual features. This similarity matrix serves as a crucial component in the content-based recommendation model, as it quantifies the similarity relationships among products, enabling the identification of items with closely aligned textual characteristics for personalized recommendations.

In [5]:
cosine_similarity = linear_kernel(tfidf_matrix, tfidf_matrix)

## Creating Product Index Series

In [6]:
indecies = pd.Series(products.index, index = products['name']).drop_duplicates()
indecies

name
Honor X5 Bluetooth Earbuds - White                           0
Oraimo Wireless Earphones, Black - OEB-E03D                  1
Nothing Ear 2 Wireless Earphones - Black                     2
JBL Tour Pro Plus Wireless Earphones, Black                  3
Xiaomi Redmi Buds 4 Active Wireless Earphones - Black        4
                                                         ...  
Wild Wolf Hunter Mouse Pad                               10982
Logitech Mouse Pad (Copy)                                10983
Mouse Pad With Rest                                      10984
Protective Sticker For Laptop Keyboard 15-17             10985
Mouse Pad Normal Color                                   10986
Length: 10987, dtype: int64

In [7]:
indecies['HK8 PRO MAX ULTRA Smartwatch AMOLED Screen 2.12 Inch 485 * 520 Pixels - Wearfit PRO - Orange']

10738

## Product Recommendation Function Based on Cosine Similarity


In [8]:
def get_recommendation(name, cosine_similarity = cosine_similarity):
    idx = indecies[name]
    sim_scores = enumerate(cosine_similarity[idx])
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse = True)
    sim_scores = sim_scores[1:11]
    for i in sim_scores:
        print(i)
    sim_index = [i[0] for i in sim_scores]
    print(products['name'].iloc[sim_index])
get_recommendation('Honor X5 Bluetooth Earbuds - White')

(4361, 0.22820825154777793)
(120, 0.19086389725617797)
(45, 0.18886992567078797)
(1123, 0.17702746123598212)
(10, 0.17383017033553902)
(11, 0.17275437622248924)
(6083, 0.16236456282367312)
(320, 0.15989139507026948)
(7704, 0.1591794656166611)
(691, 0.1568554637626664)
4361    XT90 Wireless BT Waterproof Headphone With Tou...
120        In Ear Wired Earphone with Mic, White - G-H6YN
45      Lenovo Thinkplus In-Ear Wireless Earphones wit...
1123         X8 Ultra Plus Smart Watch, 2.08 Inch - Black
10      Anker Soundcore Liberty 3 Pro Bluetooth Earpho...
11      Anker Soundcore Liberty 3 Pro Bluetooth Earpho...
6083    VIDVIE (HS653) In Ear Headset 3.5mm Wired Hand...
320     Wired Metal Gaming Keyboard And Mouse Combo - ...
7704    Honor Choice Earbuds X5 ANC Active Noise Cance...
691     Microsoft Ergonomic Wired Mouse, Black - RJG-0...
Name: name, dtype: object


In [9]:
laptops = pd.read_csv('LaptopsFinal.csv')
laptops.head()

Unnamed: 0,brand,name,model-name,image-src,price,sale,rate,no-reviews,display-size,ram,HDD,processor,graphics-card,link-href,operating-system,SSD,rate-weight,site
0,Lenovo,"Lenovo Legion 7 82n600Q3ED Ryzen 9 5900HX, 32G...","Lenovo Legion 7 82n600Q3ED Ryzen 9 5900HX, 32G...",https://dream2000.com/media/catalog/product/ca...,95810.0,11.53,0.0,0.0,16 inches,32 GP,1TB,AMD,NVIDIA® GeForce RTX,https://dream2000.com/en/lenovo-legion-7-82n60...,NONE,NONE,0.0,dream2000
1,Hp,"Hp Pavilion Laptop, Intel® Core™ i5-1135G7 , 8...","Hp Pavilion Laptop, Intel® Core™ i5-1135G7 , 8...",https://dream2000.com/media/catalog/product/ca...,27300.0,11.36,0.0,0.0,15.6 inches,8 GP,512GB,intel Core i5,NVIDIA® GeForce MX,https://dream2000.com/en/hp-pavilion-laptop-co...,NONE,NONE,0.0,dream2000
2,HP,HP ProBook 450 G9 Intel® Core™ i7-1255U - 8G -...,HP ProBook 450 G9 Intel® Core™ i7-1255U - 8G -...,https://dream2000.com/media/catalog/product/ca...,37000.0,11.48,0.0,0.0,15.6 inches,NONE,512GB,intel Core i7,NVIDIA® GeForce MX,https://dream2000.com/en/hp-probook-450-g9-int...,NONE,NONE,0.0,dream2000
3,Asus,Asus Zenbook UX5304VA-OLED517W Intel® Core™i7-...,Asus Zenbook UX5304VA-OLED517W Intel® Core™i7-...,https://dream2000.com/media/catalog/product/ca...,55499.0,11.48,0.0,0.0,13.3 inches,16 GP,512GB,NONE,Internal Intel card,https://dream2000.com/en/asus-zenbook-ux5304va...,Windows,NONE,0.0,dream2000
4,Asus,"Asus ROG Ally Rayzen Z1, 16GB, 512GB, AMD Rad...","Asus ROG Ally Rayzen Z1, 16GB, 512GB, AMD Rad...",https://dream2000.com/media/catalog/product/ca...,36999.0,11.48,0.0,0.0,NONE,16 GP,512GB,NONE,AMD Radeon™,https://dream2000.com/en/asus-rog-ally-rayzen-...,Windows,NONE,0.0,dream2000


In [10]:
laptops['features'] = laptops['display-size'] + laptops['ram'] + laptops['HDD'] + laptops['processor'] + laptops['graphics-card'] + laptops['operating-system'] + laptops['SSD'] 
laptops_products = laptops[['name', 'features']]
laptops_products.head()

Unnamed: 0,name,features
0,"Lenovo Legion 7 82n600Q3ED Ryzen 9 5900HX, 32G...",16 inches32 GP1TBAMDNVIDIA® GeForce RTXNONENONE
1,"Hp Pavilion Laptop, Intel® Core™ i5-1135G7 , 8...",15.6 inches8 GP512GBintel Core i5NVIDIA® GeFor...
2,HP ProBook 450 G9 Intel® Core™ i7-1255U - 8G -...,15.6 inchesNONE512GBintel Core i7NVIDIA® GeFor...
3,Asus Zenbook UX5304VA-OLED517W Intel® Core™i7-...,13.3 inches16 GP512GBNONEInternal Intel cardWi...
4,"Asus ROG Ally Rayzen Z1, 16GB, 512GB, AMD Rad...",NONE16 GP512GBNONEAMD Radeon™WindowsNONE


In [11]:
tfidflaptop = TfidfVectorizer(stop_words='english')
laptops_products.loc[:, 'features'] = laptops_products['features'].fillna("")
tfidf_matrix_laptops = tfidflaptop.fit_transform(laptops_products['features'])

In [12]:
cosine_similarity_laptops = linear_kernel(tfidf_matrix_laptops, tfidf_matrix_laptops)

In [13]:
indecies_laptops = pd.Series(laptops_products.index, index = laptops_products['name']).drop_duplicates()
indecies_laptops

name
Lenovo Legion 7 82n600Q3ED Ryzen 9 5900HX, 32GB Ram, 2x 1TB SSD, NVIDIA GeForce RTX 3080 - Storm Grey                                                                                                      0
Hp Pavilion Laptop, Intel® Core™ i5-1135G7 , 8GB Ram, 512GB SSD, Nvidia MX350 2GB, 15.6", 15-EG0062NE                                                                                                      1
HP ProBook 450 G9 Intel® Core™ i7-1255U - 8G - 512G SSD - NVIDIA® GeForce® MX570 - 15.6" FHD - Silver                                                                                                      2
Asus Zenbook UX5304VA-OLED517W Intel® Core™i7-1355U,16GB, 512GB SSD, Intel® Iris Xe Graphics, 13.3" 2.8K OLED, Win11 - Silver                                                                              3
Asus ROG Ally Rayzen Z1, 16GB, 512GB,  AMD Radeon, 7 inch 120HZ, Win11 - RC71L-NH022W                                                                                          

In [14]:
indecies_laptops['HP Pavilion x360 -14-ek0010ne Intel® Core™ i7-1255U - 16GB - 512GB - Intel Iris Xe – 14" FHD IPS - Win 11 - Silver']

38

In [23]:
def get_recommendation_laptops(name, cosine_similarity_laptops = cosine_similarity_laptops):
    idx = indecies_laptops[name]
    sim_scores = enumerate(cosine_similarity_laptops[idx])
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse = True)
    sim_scores = sim_scores[1:11]
    for i in sim_scores:
        print(i)
    sim_index = [i[0] for i in sim_scores]
    print(laptops_products['name'].iloc[sim_index])
get_recommendation_laptops('Asus B9450FA-BM1050R Intel® Core™ i7-10510U, 16GB, 1TB SSD, UHD Graphics 620, 14" FHD, Win 10 - Black')

(29, 0.7109200665204017)
(25, 0.6926377205565998)
(69, 0.6926377205565998)
(21, 0.6379009872039112)
(22, 0.6379009872039112)
(54, 0.570825130493434)
(82, 0.5538710989791849)
(97, 0.5188353151206623)
(15, 0.4607222896299425)
(79, 0.4607222896299425)
29    HP 15S-FQ5042NE Intel® Core™i7-1255U - 16GB - ...
25    MSI Modern 14 B11M Intel® Core™ i7-1195G75 - 8...
69    Lenovo Flex 5 14IAU7 - 82R70055ED Intel® Core™...
21    Lenovo IdeaPad 1 15IAU7 82QD008NED Intel® Core...
22    Lenovo IdeaPad 1 15IAU7 82QD008LED Intel® Core...
54    HP 250 G7 Notebook PC , Intel® Core™ i3-1005G1...
82    Hp Envy 13 -ba1018ne Intel® Core™ i7-1165G7, 1...
97    Asus laptop X543MA-GQ001W Intel Celeron N4020,...
15    Asus B9400CEA-KC007R Intel® Core™ i7-1165G7G, ...
79    Asus UX3402ZA-OLED007W intel® Core™ i7-1260P, ...
Name: name, dtype: object


In [16]:
mobiles = pd.read_csv('MobilesFinal.csv')
mobiles.head()

Unnamed: 0,brand,name,model-name,img,price,sale,rate,no-rates,screen-size,ram,...,processor,battery,prim-cam,second-cam,SIM-count,network,operating-sys,link,rate-weight,site
0,Samsung,Samsung Galaxy S23 Ultra - 12GB RAM - 256GB,Samsung Galaxy S23 Ultra - 12GB RAM - 256GB,https://smhttp-ssl-73217.nexcesscdn.net/pub/me...,54999.0,12.7,0.0,0.0,6.8 inches,12 GB,...,Octa-core (1x3.36 GHz Cortex-X3 & 2x2.8 GHz Co...,5000 mAh,200 MP,12 MP,Dual,4G,"Android 13, One UI 5.1",https://2b.com.eg/en/samsung-galaxy-s23-ultra-...,0.0,2b
1,Vivo,Vivo Y35 - 8GB RAM - 128GB,Vivo Y35 - 8GB RAM - 128GB,https://smhttp-ssl-73217.nexcesscdn.net/pub/me...,8699.0,13.0,0.0,0.0,6.5 inches,8 GB,...,Octa-core (4x2.4 GHz Kryo 265 Gold & 4x1.9 GHz...,5000 mAh,50 MP,16 MP,Dual,4G,"Android 11, Funtouch 12",https://2b.com.eg/en/vivo-y35-8gb-ram-128gb.html,0.0,2b
2,Nokia,Nokia C10 - 2GB RAM - 32GB,Nokia C10 - 2GB RAM - 32GB,https://smhttp-ssl-73217.nexcesscdn.net/pub/me...,2499.0,16.67,0.0,0.0,6.5 inches,2 GB,...,Quad-core 1.3 GHz Cortex-A7,3000 mAh,5 MP,5 MP,Dual,NONE,Android 11 (Go edition),https://2b.com.eg/en/nokia-c10-2gb-ram-32gb.html,0.0,2b
3,Vivo,vivo Y73 - 8GB RAM - 128GB,vivo Y73 - 8GB RAM - 128GB,https://smhttp-ssl-73217.nexcesscdn.net/pub/me...,8899.0,11.88,0.0,0.0,6.44 inches,8 GB,...,Octa-core (2x2.05 GHz Cortex-A76 & 6x2.0 GHz C...,4000 mAh,64 MP,16 MP,Dual,4G,"Android 11, Funtouch 11.1",https://2b.com.eg/en/vivo-y73-8gb-ram-128gb.html,0.0,2b
4,Apple,Apple iPhone 13 - 128GB - Face ID (12 Month Wa...,Apple iPhone 13 - 128GB - Face ID (12 Month Wa...,https://smhttp-ssl-73217.nexcesscdn.net/pub/me...,34999.0,12.5,0.0,0.0,6.1 inches,4 GB,...,Hexa-core (2x3.22 GHz Avalanche + 4xX.X GHz Bl...,3240 mAh,12 MP,12 MP,Single,4G,iOS 15,https://2b.com.eg/en/apple-iphone-13-128gb-fac...,0.0,2b


In [17]:
mobiles['features'] = mobiles['screen-size'] + mobiles['ram'] + mobiles['internal-memory'] + mobiles['processor'] + mobiles['battery'] + mobiles['prim-cam'] + mobiles['second-cam'] + mobiles['SIM-count'] + mobiles['network'] + mobiles['operating-sys'] 
mobiles_products = mobiles[['name', 'features']]
mobiles_products.head()

Unnamed: 0,name,features
0,Samsung Galaxy S23 Ultra - 12GB RAM - 256GB,6.8 inches12 GB256 GBOcta-core (1x3.36 GHz Cor...
1,Vivo Y35 - 8GB RAM - 128GB,6.5 inches8 GB128 GBOcta-core (4x2.4 GHz Kryo ...
2,Nokia C10 - 2GB RAM - 32GB,6.5 inches2 GB32 GBQuad-core 1.3 GHz Cortex-A7...
3,vivo Y73 - 8GB RAM - 128GB,6.44 inches8 GB128 GBOcta-core (2x2.05 GHz Cor...
4,Apple iPhone 13 - 128GB - Face ID (12 Month Wa...,6.1 inches4 GB128 GBHexa-core (2x3.22 GHz Aval...


In [18]:
tfidfmobile = TfidfVectorizer(stop_words='english')
mobiles_products.loc[:, 'features'] = mobiles_products['features'].fillna("")
tfidf_matrix_mobiles = tfidfmobile.fit_transform(mobiles_products['features'])

In [19]:
cosine_similarity_mobiles = linear_kernel(tfidf_matrix_mobiles, tfidf_matrix_mobiles)

In [20]:
indecies_mobiles = pd.Series(mobiles_products.index, index = mobiles_products['name']).drop_duplicates()
indecies_mobiles

name
Samsung Galaxy S23 Ultra - 12GB RAM - 256GB                                         0
Vivo Y35 - 8GB RAM - 128GB                                                          1
Nokia C10 - 2GB RAM - 32GB                                                          2
vivo Y73 - 8GB RAM - 128GB                                                          3
Apple iPhone 13 - 128GB - Face ID (12 Month Warranty)                               4
                                                                                 ... 
Samsung Galaxy A03 Core Dual SIM (32GB / 2GB Ram / 6.5 Inch / 4G LTE) - Black    2814
Realme C11 / 2021 Dual SIM (32GB / 2GB Ram / 6.52 Inch / 4G LTE) - Lake Blue     2815
Nokia C10 Dual SIM (32GB / 2GB Ram / 6.52 Inch / 3G) - Light Purple              2816
Nokia 105 Dual SIM (4MB / 4MB Ram / FM / 1.77 Inch / 2G) - Black                 2817
ITEL IT2160 Dual SIM (4MB / 4MB Ram / FM / 1.77 Inch / 2G) - Black               2818
Length: 2819, dtype: int64

In [21]:
indecies_mobiles['Realme C11 / 2021 Dual SIM (32GB / 2GB Ram / 6.52 Inch / 4G LTE) - Lake Blue']

2815

In [22]:
def get_recommendation_mobiles(name, cosine_similarity_mobiles = cosine_similarity_mobiles):
    idx = indecies_mobiles[name]
    sim_scores = enumerate(cosine_similarity_mobiles[idx])
    sim_scores = sorted(sim_scores, key = lambda x: x[1], reverse = True)
    sim_scores = sim_scores[1:11]
    for i in sim_scores:
        print(i)
    sim_index = [i[0] for i in sim_scores]
    print(mobiles_products['name'].iloc[sim_index])
get_recommendation_mobiles('Realme C11 / 2021 Dual SIM (32GB / 2GB Ram / 6.52 Inch / 4G LTE) - Lake Blue')

(2814, 0.661066862950192)
(2813, 0.6391199547420322)
(30, 0.5476875816742359)
(31, 0.5476875816742359)
(32, 0.5476875816742359)
(29, 0.5429582725392261)
(9, 0.5362303782314091)
(129, 0.5362303782314091)
(2810, 0.5338908422453764)
(22, 0.5291626715477451)
2814    Samsung Galaxy A03 Core Dual SIM (32GB / 2GB R...
2813    Realme Narzo 50i Prime Dual SIM (64GB / 4GB Ra...
30         Xiaomi Poco C40 - 3GB RAM - 32GB - Poco Yellow
31         Xiaomi Poco C40 - 3GB RAM - 32GB - Coral Green
32         Xiaomi Poco C40 - 3GB RAM - 32GB - Power Black
29                       Xiaomi Poco C40 - 3GB RAM - 32GB
9               Realme Narzo 50A - 4GB RAM - 64GB - Green
129              Realme Narzo 50A - 4GB RAM - 64GB - Blue
2810    Realme Narzo 50A Dual SIM (128GB / 4GB Ram / 6...
22      Realme C55 -  8GB RAM - 256GB - SUN Shower (12...
Name: name, dtype: object
