<a href="https://colab.research.google.com/github/YuliiaChorna1/DataScience-10-Reccomender-systems/blob/main/10.1.1_extra_recsys_cloths_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [27]:
import pandas as pd
import random

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [2]:
from google.colab import drive
drive.mount("/content/drive")

Mounted at /content/drive


In [4]:
path = "/content/drive/MyDrive/Recommender_systems"

In [5]:
# Source: Addressing Marketing Bias in Product Recommendations - Mengting Wan, Jianmo Ni, Rishabh Misra, Julian McAuley - WSDM, 2020
# https://github.com/MengtingWan/merketBias

df = pd.read_json(path + "/modcloth_final_data.json", lines=True)

In [6]:
df.head()

Unnamed: 0,item_id,waist,size,quality,cup size,hips,bra size,category,bust,height,user_name,length,fit,user_id,shoe size,shoe width,review_summary,review_text
0,123373,29.0,7,5.0,d,38.0,34.0,new,36.0,5ft 6in,Emily,just right,small,991571,,,,
1,123373,31.0,13,3.0,b,30.0,36.0,new,,5ft 2in,sydneybraden2001,just right,small,587883,,,,
2,123373,30.0,7,2.0,b,,32.0,new,,5ft 7in,Ugggh,slightly long,small,395665,9.0,,,
3,123373,,21,5.0,dd/e,,,new,,,alexmeyer626,just right,fit,875643,,,,
4,123373,,18,5.0,b,,36.0,new,,5ft 2in,dberrones1,slightly long,small,944840,,,,


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82790 entries, 0 to 82789
Data columns (total 18 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   item_id         82790 non-null  int64  
 1   waist           2882 non-null   float64
 2   size            82790 non-null  int64  
 3   quality         82722 non-null  float64
 4   cup size        76535 non-null  object 
 5   hips            56064 non-null  float64
 6   bra size        76772 non-null  float64
 7   category        82790 non-null  object 
 8   bust            11854 non-null  object 
 9   height          81683 non-null  object 
 10  user_name       82790 non-null  object 
 11  length          82755 non-null  object 
 12  fit             82790 non-null  object 
 13  user_id         82790 non-null  int64  
 14  shoe size       27915 non-null  float64
 15  shoe width      18607 non-null  object 
 16  review_summary  76065 non-null  object 
 17  review_text     76065 non-null 

In [13]:
df = df[~df["user_id"].isna()]
df = df[~df["item_id"].isna()]
df = df[~df["quality"].isna()]
df = df[~df["review_text"].isna()]
df = df.reset_index()

In [14]:
len(df.item_id.unique())

1322

In [15]:
len(df.user_id.unique())

44811

In [16]:
len(df)

76000

# Non-personalized recommender systems

Popularity-based recommender systems: Popularity-based recommenders work by suggesting the most frequently purchased products to customers. As the name suggests, Popularity based recommendation system works with the trend. It basically uses the items which are in trend right now.

### Frequency of purchase

Popularity-based recommenders work by suggesting the most frequently purchased products to customers. This vague idea can be turned into at least two concrete implementations:

- Check which articles are bought most often across all customers. Reccomend these articles to each customer. Source: https://towardsdatascience.com/how-to-build-popularity-based-recommenders-with-polars-cc7920ad3f68#:~:text=Popularity%Dbased%20recommenders%20work%20by,these%20articles%20to%20each%20customer.

In [17]:
items_popularity = df.groupby("item_id")["user_id"].count().sort_values(ascending=False)
items_popularity = items_popularity.reset_index()
items_popularity

Unnamed: 0,item_id,user_id
0,539980,2007
1,668696,1555
2,397005,1506
3,175771,1438
4,407134,1437
...,...,...
1317,542404,1
1318,541405,1
1319,214259,1
1320,536646,1


In [18]:
items_popularity.iloc[:3]["item_id"].to_list()

[539980, 668696, 397005]

In [21]:
popular_items = items_popularity.iloc[:3]["item_id"].to_list()

def present_recommended_products(popular_items: list):
    print("**Currently trending products**")
    print("")

    for index, item_id_ in enumerate(popular_items):
        slice_df = df[df["item_id"] == item_id_]
        print(f"Recommended item {index+1}/{len(popular_items)}: product {item_id_}")

        category = slice_df["category"].unique()[0]
        print(f"{category=}")

        slice_with_reviews = slice_df[~slice_df["review_text"].isna()]
        reviews_for_slice = list(slice_with_reviews["review_text"].unique())
        if len(reviews_for_slice) > 0:
            reviews = random.sample(reviews_for_slice, min(len(reviews_for_slice), 3))
            print(f"User reviews:")
            for review in reviews:
                print("-", review)
            print("...")
        else:
            print("There are no reviews for this product yet.")
        print("")

present_recommended_products(popular_items)

**Currently trending products**

Recommended item 1/3: product 539980
category='tops'
User reviews:
- Sent this one back, just wasn't as roomy as I was wanting.
- Nice lightweight cardigan. Color is vibrant.
- Awesome cardigan! Pretty color and very versatile. Wash with care :)
...

Recommended item 2/3: product 668696
category='bottoms'
User reviews:
- Perfect
- It's exactly what I wanted! It's like a character skirt for ballet dancers!
- I absolutely love this skirt! It's full and the fabric is really good quality. I ordered a medium, however the waist is fairly loose but not so loose that it can't be fixed by a belt. Besides I think the smaller size would be too small. The waistband is a bit odd, it gapes because the top of it is longer than the bottom. The length is a bit longer than I am used to (the skirt falls a few inches below my knees) but I was already expecting that when I bought it. It still looks great regardless, so the length doesn't bother me. Quite happy with this dre

## Content-based Personalized recommendation systems

In [23]:
df.head(2)

Unnamed: 0,level_0,index,item_id,waist,size,quality,cup size,hips,bra size,category,bust,height,user_name,length,fit,user_id,shoe size,shoe width,review_summary,review_text
0,6722,6725,152702,27.0,4,4.0,b,37.0,32.0,new,,5ft 6in,avNYC,just right,small,668176,9.0,average,Too much ruching,"I liked the color, the silhouette, and the fab..."
1,6723,6726,152702,26.0,4,5.0,c,36.0,34.0,new,,5ft 6in,lanwei91,slightly short,fit,320759,7.5,,Suits my body type!,From the other reviews it seems like this dres...


In [24]:
df_reviews = df[["item_id", "review_text", "category"]][~df["review_text"].isna()]
df_reviews.head()

Unnamed: 0,item_id,review_text,category
0,152702,"I liked the color, the silhouette, and the fab...",new
1,152702,From the other reviews it seems like this dres...,new
2,152702,I love the design and fit of this dress! I wo...,new
3,152702,I bought this dress for work it is flattering...,new
4,152702,This is a very professional look. It is Great ...,new


In [25]:
len(df_reviews)

76000

In [26]:
df_grouped = df_reviews.groupby(["item_id", "category"]).agg({'review_text': ' '.join})
df_grouped = df_grouped.reset_index()
df_grouped.head()

Unnamed: 0,item_id,category,review_text
0,152702,new,"I liked the color, the silhouette, and the fab..."
1,153494,new,I wanted to fit in this dress so bad so I made...
2,153798,new,Unfortunately the fabric is soooo thin and wri...
3,154411,new,My only complaint is that people notice when I...
4,154882,new,Most of the other reviews said size up one but...
