# Ravelry API - Content Based Recommendation System
___

This notebook reads in the cleaned Ravelry dataframe and transforms it into a consine distance matrix.  This allows it to be used as a content based recommender program using its features.  

The notebook includes code to search for patterns in the database and to submit a pattern name for five recommended patterns that are considered similar.  Another function returns the five recommended patterns along with Ravelry URL links to the pattern's information.

### Contents:
- [Import Data & Prepare Features](#Import-Data-&-Prepare-Features)
- [Calculate Cosine Distances and Build Recommender Dataframe](#Calculate-Cosine-Distances-and-Build-Recommender-Dataframe)
- [User Interface and Recommender Function](#User-Interface-and-Recommender-Function)

|Function|Argument|Function|
|---|---|---|
|**display_recs**|*str* - user input|If user_input pattern is in rav_rec dataframe, it will return the five most similar patterns as well as URL links to the pattern's Ravelry page.|

In [1]:
import pandas as pd
import numpy as np

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics.pairwise import pairwise_distances

from scipy import sparse

## Import Data & Prepare Features
___

In [2]:
rav_clean_df = pd.read_csv('../data/rav_clean.csv')
rav_clean_df.drop(columns = ['id', 'notes', 'gauge', 'gauge_divisor'], inplace = True)

In [3]:
rav_clean_df.head()

Unnamed: 0,name,author,difficulty_avg,max_yardage,price,projects_count,queued_projects_count,rating_avg,yarn_weight,type,gauge_per_inch
0,Musselburgh,Ysolda Teague,2.46,610.0,6.0,23656,7700,4.89,Fingering,hat,6.0
1,Classic Ribbed Hat,Purl Soho,1.92,305.0,0.0,10353,5382,4.83,DK,hat,8.0
2,Alpine Bloom Hat,Caitlin Hunter,3.37,230.0,5.0,1520,2364,4.84,Sport,hat,6.0
3,Classic Cuffed Hat,Purl Soho,1.87,328.0,0.0,9097,5043,4.7,Worsted,hat,5.0
4,February Hat,Kate Gagnon Osborn,2.62,213.0,0.0,3888,3532,4.75,Worsted,hat,4.5


**Features to One Hot Encode**
* author
* yarn_weight
* type

**Features to Scale**
* difficulty_avg
* gauge
* gauge_divisor
* max_yardage
* price
* projects_count
* queued_projects_count
* rating_avg

In [4]:
# Instantiate the transformers

ohe = OneHotEncoder(handle_unknown='ignore',
                    drop = 'first',
                   sparse_output = False)

sc = StandardScaler()

In [5]:
# Transform columns as mentioned previously
ctx = ColumnTransformer(
    transformers=[
        ('one_hot', ohe, ['author', 'yarn_weight', 'type']),
        ('sc', sc, ['difficulty_avg', 'gauge_per_inch',
                    'max_yardage', 'price', 'projects_count',
                    'queued_projects_count', 'rating_avg']),

    ], remainder='passthrough', verbose_feature_names_out=False
)


In [6]:
rav_clean_enc = ctx.fit_transform(rav_clean_df)

In [7]:
rav_clean_enc = pd.DataFrame(rav_clean_enc,
                             columns = ctx.get_feature_names_out(),
                            )

rav_clean_enc.set_index(['name'], inplace = True)

In [8]:
rav_clean_enc.tail()

Unnamed: 0_level_0,author_13th Raven Designs,author_A Little Knitty Designs,author_A Whimsical Wood Yarn Co.,author_A. Karen Alfke,author_A.Opie Designs,author_Abi Gregorio,author_Abundant Earth Fiber Mill,author_Adrian Bizilia,author_Adriana nanoadri,author_Adrienna Slagle,...,yarn_weight_Worsted,type_pullover,type_socks,difficulty_avg,gauge_per_inch,max_yardage,price,projects_count,queued_projects_count,rating_avg
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Bray,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,1.334409,-0.734384,0.889821,2.223824,0.041732,0.942706,0.279886
Barbet Turtleneck,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,-0.693428,-0.991796,1.054351,-1.219539,-0.24238,-0.370049,0.528803
Stonewall,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,0.668552,-0.734384,1.479093,0.899454,-0.105823,0.901682,-0.23217
Park Pullover,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,-0.436165,-0.734384,0.531591,0.367057,-0.238714,-0.404995,0.230103
Construction Trucks Sweater,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,0.040528,-0.21956,0.407902,0.104832,-0.24238,-0.406515,-0.104156


## Calculate Cosine Distances and Build Recommender Dataframe
___

For this to work, I need to convert rav_clean_enc to a float array.  Referenced [this](https://stackoverflow.com/questions/57434284/covert-to-sparse-matrix-typeerror-no-supported-conversion-for-types-dtype) stackoverflow post.

In [9]:
rav_clean_array = np.array(rav_clean_enc, dtype = float)

In [10]:
rav_clean_sparse = sparse.csr_matrix(rav_clean_array)

In [11]:
rav_clean_sparse

<6000x1700 sparse matrix of type '<class 'numpy.float64'>'
	with 57942 stored elements in Compressed Sparse Row format>

In [12]:
distances = pairwise_distances(rav_clean_sparse, metric = 'cosine')


In [13]:
distances

array([[0.        , 0.04279796, 0.34281604, ..., 0.88940528, 1.18609681,
        1.21929265],
       [0.04279796, 0.        , 0.24174123, ..., 0.92278574, 1.25261274,
        1.26548653],
       [0.34281604, 0.24174123, 0.        , ..., 0.81577667, 1.25470714,
        1.26680367],
       ...,
       [0.88940528, 0.92278574, 0.81577667, ..., 0.        , 0.66569043,
        0.50751494],
       [1.18609681, 1.25261274, 1.25470714, ..., 0.66569043, 0.        ,
        0.59168414],
       [1.21929265, 1.26548653, 1.26680367, ..., 0.50751494, 0.59168414,
        0.        ]])

In [14]:
rav_rec_df = pd.DataFrame(distances, columns = rav_clean_enc.index, index = rav_clean_enc.index)

In [15]:
rav_rec_df.head()

name,Musselburgh,Classic Ribbed Hat,Alpine Bloom Hat,Classic Cuffed Hat,February Hat,October Hat,Manhattan Hat,My Baker's Hat,Berry Baby Hat,Basic Baby Hat,...,Sister Snowflakes,Galloway Pullover,Afterlight,Rolling Rock,Amélie,Bray,Barbet Turtleneck,Stonewall,Park Pullover,Construction Trucks Sweater
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Musselburgh,0.0,0.042798,0.342816,0.050619,0.176796,0.23613,0.603041,0.757712,0.072313,0.03784,...,1.118049,1.04527,0.818167,0.68837,0.950549,0.858998,1.145848,0.889405,1.186097,1.219293
Classic Ribbed Hat,0.042798,0.0,0.241741,0.016628,0.097727,0.125131,0.612153,0.686783,0.027674,0.028956,...,1.059942,1.087065,0.796243,0.660808,0.965564,0.93285,1.138027,0.922786,1.252613,1.265487
Alpine Bloom Hat,0.342816,0.241741,0.0,0.23114,0.17109,0.176727,0.620753,0.790955,0.18352,0.32009,...,0.917184,1.066382,0.695963,0.429249,0.809358,0.745822,1.237184,0.815777,1.254707,1.266804
Classic Cuffed Hat,0.050619,0.016628,0.23114,0.0,0.058249,0.132604,0.569388,0.612777,0.012422,0.037381,...,1.058588,1.082498,0.816892,0.645843,0.984453,0.884834,1.053949,0.855001,1.210474,1.204937
February Hat,0.176796,0.097727,0.17109,0.058249,0.0,0.106845,0.573871,0.535229,0.042721,0.127939,...,0.984164,1.08273,0.813877,0.60929,0.960694,0.868106,0.993357,0.814008,1.221639,1.178729


## User Interface and Recommender Function
___

In [16]:
# Make a list of pattern names that can be searched
patterns = rav_rec_df.index

In [17]:
patterns

Index(['Musselburgh', 'Classic Ribbed Hat', 'Alpine Bloom Hat',
       'Classic Cuffed Hat', 'February Hat', 'October Hat', 'Manhattan Hat',
       'My Baker's Hat', 'Berry Baby Hat', 'Basic Baby Hat',
       ...
       'Sister Snowflakes', 'Galloway Pullover', 'Afterlight', 'Rolling Rock',
       'Amélie', 'Bray', 'Barbet Turtleneck', 'Stonewall', 'Park Pullover',
       'Construction Trucks Sweater'],
      dtype='object', name='name', length=6000)

In [18]:
def display_recs(user_pattern):
    '''
    Function accepts 'user_pattern' string argument which is a user's input of a pattern they wish to find recommendations for.<br>
    using the 'rav_rec' dataframe, the function will sort out the top five patterns most similar to 'user_pattern' argument.  This is saved in a 'top_five' list variable.
    The function will take to 'top_five' and using a character replacement table, will transform each pattern to its Ravelry URL equivalent.
    Finally the function return the name of each recommended pattern as well as a URL link to its details on Ravelry.  If the pattern is not found in the 'rav_rec' database,
    the function will handle the KeyError by printing a message to the user to check their input.
    '''
    

    #Using try here to catch any typos or non-existent patterns the user may enter
    try:

        top_five = list(rav_rec_df[user_pattern].sort_values().iloc[1:6].index)

        # Make a dictionary to deal with characters in the pattern name, but not in the url address.
        replacements = {
            '#': '',
            '&': '',
            ' ': '-',
            '/': '-',
            '!': '',
            '@': '-',
            '~': '-',
            ',': '',
            "'":''
        }

        # Make a table of the replacements dictionary using .maketrans
        replacement_table = str.maketrans(replacements)

        #Iterate through top five patterns to transform names into their url equivalents
        for patt in top_five:
            url_ready = patt.translate(replacement_table).lower()
            url = print(f'{patt}: https://www.ravelry.com/patterns/library/{url_ready}')
        return url

    # Return error message if typo or non-existent pattern
    except KeyError:
        return 'Please check to see if your pattern is typed correctly.  It must be written exactly as designer writes it.'

In [19]:
search_term = input('Search for a pattern! ').lower()

if list(patterns[patterns.str.lower().str.contains(search_term)]) == []:
    print('Nothing Found')
else:
    print(list(patterns[patterns.str.lower().str.contains(search_term)]))

Search for a pattern!  monkey


['Monkey Socks', 'No Purl Monkeys', 'Los Monos Locos / The Crazy Monkeys', 'Sock Monkey Slipper Socks']


In [None]:
# Asks user for a pattern they like and uses display_recs function to return five urls of similar patterns.
user_input = input('Type a knitting pattern you like: ')
display_recs(user_input)

In [None]:
#rav_rec_df.to_pickle('../data/rav_rec.pkl')