## Simple Recommender 
Based only on overall popularity and ratings.

Ratings do not take into account popularity (we could have one person be the only one to rate a pattern, and rate it as 5 - using this metric, that pattern would then be considered better than one that had 1000's of ratings and average of 4.8). We need to take both into consideration and use a weighted average.

Ravelry is interesting as it has a few popularity metics other than rating count. Popularity can also be considered by number of people who have it in their queue (planning on making it in the near future), and number of projects completed or attempted should also be considered. 

Also note - I only took patterns with average ratings of 4 and 5.

In [51]:
import pandas as pd
import numpy as np

In [53]:
# import pattern data

patterns = pd.read_csv('data/patterns_5star_not_clothing.csv')
df = patterns.copy()
print(df.shape)
df.head()

(100, 35)


Unnamed: 0,pattern_id,name,name_permalink,favorites_count,projects_count,difficulty_average,difficulty_count,rating_average,rating_count,pattern_type_id,...,row_gauge,free,downloadable,categories,yarn_weight_description,yarn_weight_id,yarn_weight_name,yarn_weight_ply,yarn_weight_wpi,yarn_weight_knit_gauge
0,10,The Flower Basket Shawl (S-2014),the-flower-basket-shawl-s-2014,4526,2009,3.887821,936.0,4.466368,892.0,10,...,24.0,False,True,"['scarf', 'neck-torso', 'accessories', 'neck-t...",Fingering (14 wpi),5,Fingering,4.0,14.0,28.0
1,13,Marilyn's Not-So-Shrunken Cardigan,marilyns-not-so-shrunken-cardigan,731,159,3.0,78.0,4.266667,75.0,7,...,32.0,False,True,"['scarf', 'neck-torso', 'accessories', 'neck-t...",DK / Sport,3,DK / Sport,,,
2,16,Child's Placket Neck Pullover,childs-placket-neck-pullover,6627,2845,2.97295,1183.0,4.153497,1101.0,17,...,32.0,True,True,"['scarf', 'neck-torso', 'accessories', 'neck-t...",DK (11 wpi),11,DK,8.0,11.0,22.0
3,17,Pomatomus,pomatomus,11631,5127,4.91164,2354.0,4.471111,2250.0,2,...,12.0,True,True,"['scarf', 'neck-torso', 'accessories', 'neck-t...",Fingering (14 wpi),5,Fingering,4.0,14.0,28.0
4,20,Amelia Earhart Aviator Cap,amelia-earhart-aviator-cap,3920,873,2.926714,423.0,4.287958,382.0,3,...,,True,True,"['scarf', 'neck-torso', 'accessories', 'neck-t...",Worsted (9 wpi),12,Worsted,10.0,9.0,20.0


In [16]:
# Only need the rating_count and rating_average columns for simple recommender

df = df[['pattern_id', 'rating_count', 'rating_average']]
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   pattern_id      100 non-null    int64  
 1   rating_count    99 non-null     float64
 2   rating_average  100 non-null    float64
dtypes: float64(2), int64(1)
memory usage: 2.5 KB


In [17]:
# Drop Nan and average ratings of 0. 

df.dropna(inplace=True)
df
print(df.shape)

(99, 3)


#### Calculate Weighted Average
To calculate weighted average, we need the overall average for all patterns (mean_average below), rating count (df.rating_count), and rating average (df.rating_average) for each pattern.

WeightedRating(WR) = ((c/(c+m))*R) + (m/(c+m))*M

c = # rating counts
M - mean rating across whole dataset (mean_average)
R - average rating for pattern
m - minimum number of ratings to be on chart (say, to be in the top 90 percent)

##### (reference: https://www.datacamp.com/community/tutorials/recommender-systems-python)

In [27]:
# Calculate the overall pattern average (M)

mean_average = df['rating_average'].mean()
mean_average

4.198661551852408

In [28]:
# Want the top segment of patterns (only recommending top ones) - look at the top 10%
# m is the minimum rating count to get into that segmented

# from tues walk though - look at top ratings:
# print(book_ratingCount['totalRatingCount'].quantile(np.arange(.9, 1, .01)))
m = df['rating_count'].quantile(0.90)
print(m)

1048.2000000000003


In [29]:
top_10percent_count_patterns = df.copy().loc[df['rating_count'] >= m]
top_10percent_count_patterns.shape

(10, 3)

In [30]:
def weighted_rating(x, m=m, C=mean_average):
    v = x['rating_count']
    R = x['rating_average']
    # Calculation based on the IMDB formula
    return (v/(v+m) * R) + (m/(m+v) * C)

In [36]:
top_10percent_count_patterns['score'] = top_10percent_count_patterns.apply(weighted_rating, axis=1)

In [37]:
top_10percent_count_patterns = top_10percent_count_patterns.sort_values('score', ascending=False)

In [38]:
top_10percent_count_patterns

Unnamed: 0,pattern_id,rating_count,rating_average,score
29,54,3226.0,4.532858,4.4509
93,140,3186.0,4.525738,4.444768
11,29,8501.0,4.432537,4.406865
3,17,2250.0,4.471111,4.384524
68,108,2242.0,4.461195,4.377557
42,71,1858.0,4.450484,4.359658
18,38,1115.0,4.401794,4.303364
19,40,4367.0,4.265858,4.252851
54,88,6619.0,4.225412,4.221755
2,16,1101.0,4.153497,4.175524


In [64]:
top_patterns = top_10percent_count_patterns.merge(patterns[['pattern_id','name','photos_url']], on='pattern_id', how = 'left')

In [65]:
top_patterns[['name', 'score', 'rating_count', 'photos_url']]

Unnamed: 0,name,score,rating_count,photos_url
0,Felted Clogs (AC33e),4.4509,3226.0,https://images4-g.ravelrycache.com/uploads/FTk...
1,Irish Hiking Scarf,4.444768,3186.0,https://images4-f.ravelrycache.com/uploads/cas...
2,Clapotis,4.406865,8501.0,https://images4-g.ravelrycache.com/uploads/cas...
3,Pomatomus,4.384524,2250.0,https://images4-g.ravelrycache.com/uploads/fre...
4,#13 Central Park Hoodie,4.377557,2242.0,https://images4-g.ravelrycache.com/uploads/mck...
5,Endpaper Mitts,4.359658,1858.0,https://images4-f.ravelrycache.com/uploads/yar...
6,Icarus Shawl,4.303364,1115.0,https://images4-g.ravelrycache.com/uploads/cas...
7,Jaywalker,4.252851,4367.0,https://images4-g.ravelrycache.com/uploads/San...
8,Calorimetry,4.221755,6619.0,https://images4-f.ravelrycache.com/uploads/cas...
9,Child's Placket Neck Pullover,4.175524,1101.0,https://images4-g.ravelrycache.com/uploads/Col...


### Simple recommender on most projects 
(curently a ravelry search feature)

### Simple recommender on most favourited projects 
(curently a ravelry search feature)

In [None]:
df[['pattern_id', 'name', 'favorites_count']]

### Favouites to projects ratio 
(everything above a 90 percent project count threshold)

In [46]:
df.columns

Index(['pattern_id', 'name', 'name_permalink', 'favorites_count',
       'projects_count', 'difficulty_average', 'difficulty_count',
       'rating_average', 'rating_count', 'pattern_type_id',
       'pattern_type_names', 'pattern_type_clothing', 'photos_url', 'craft_id',
       'url', 'pattern_attributes', 'yardage_max', 'yardage',
       'yardage_description', 'generally_available', 'published', 'gauge',
       'gauge_pattern', 'gauge_divisor', 'row_gauge', 'free', 'downloadable',
       'categories', 'yarn_weight_description', 'yarn_weight_id',
       'yarn_weight_name', 'yarn_weight_ply', 'yarn_weight_wpi',
       'yarn_weight_knit_gauge'],
      dtype='object')

In [45]:
m = df['project_count'].quantile(0.90)
print(m)

KeyError: 'project_count'