<a href="https://colab.research.google.com/github/ariflaksito/recsys/blob/master/04_cb_ecommerce.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Content-based RS
CB RS using TF-IDF & cosine similarity<br />
Article: https://heartbeat.fritz.ai/recommender-systems-with-python-part-i-content-based-filtering-5df4940bd831

## Import Library

In [0]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
# Load the data
ds = pd.read_csv("/content/drive/My Drive/Colab Data/e-commerce.csv")

In [4]:
ds.head()

Unnamed: 0,id,description
0,1,Active classic boxers - There's a reason why o...
1,2,Active sport boxer briefs - Skinning up Glory ...
2,3,Active sport briefs - These superbreathable no...
3,4,"Alpine guide pants - Skin in, climb ice, switc..."
4,5,"Alpine wind jkt - On high ridges, steep ice an..."


In [5]:
ds.shape

(500, 2)

## Creating TF-IDF 

In [0]:
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(ds['description'])

In [7]:
tfidf_matrix

<500x52262 sparse matrix of type '<class 'numpy.float64'>'
	with 148989 stored elements in Compressed Sparse Row format>

In [0]:
cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix) 
results = {}
for idx, row in ds.iterrows():
   similar_indices = cosine_similarities[idx].argsort()[:-100:-1] 
   similar_items = [(cosine_similarities[idx][i], ds['id'][i]) for i in similar_indices] 
   results[row['id']] = similar_items[1:]

In [0]:
def item(id):
    return ds.loc[ds['id'] == id]['description'].tolist()[0].split(' - ')[0]

## Function recommendation

In [0]:
def recommend(item_id, num):
    print("Recommending " + str(num) + " products similar to " + item(item_id) + "...")
    print("-------")
    recs = results[item_id][:num]
    for rec in recs:
        print("Recommended: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")

In [12]:
recommend(item_id=11, num=10)

Recommending 10 products similar to Baby sunshade top...
-------
Recommended: Sunshade hoody (score:0.21330296021085024)
Recommended: Baby baggies apron dress (score:0.10975311296284812)
Recommended: Runshade t-shirt (score:0.09988151262780731)
Recommended: Runshade t-shirt (score:0.09530698241688207)
Recommended: Runshade top (score:0.08510550093018411)
Recommended: Sunshade shirt (score:0.08352790846323879)
Recommended: Lw sun hoody (score:0.08154548183209832)
Recommended: Cap 3 crew (score:0.08153818909197846)
Recommended: Active sport briefs (score:0.08109685448621314)
Recommended: L/s runshade top (score:0.07984113879741114)
