# 01. Stay Kind! Recommender
> Author: [Dawn Graham](https://dawngraham.github.io/)

#### Limitations
- Zines are available in limited quantities. Some may not be replaced after they run out.
- Times may not overlap. I.e. a given zine may have been available at the same time of only 40% of the other zines. So people were not choosing from the same overall selection.
- Most sales are made in-person, with cash. While some people may be repeat customers, this may not be captured in the records.
- Not all available zines are displayed due to limited table space. Often the display is different at the beginning of the day than the end of the day.
- When people purchase in person, they have had a chance to browse through, but this is not always an indicator of how much they like it. They also may be buying zines as gifts.

## Import Libraries

In [1]:
import pandas as pd
from scipy import sparse
from sklearn.metrics.pairwise import pairwise_distances

## Import Data

In [2]:
items = pd.read_csv('../data/items.csv')
items.head()

Unnamed: 0,SKU,Item
0,IEY1012,Ima Eat You! December 2010
1,IEY1105,Ima Eat You! May 2011
2,IEY1106,Ima Eat You! June 2011
3,IEY1107,Ima Eat You! July 2011
4,IEY1503,Ima Eat You! March 2015


In [3]:
df = pd.read_csv('../data/purchased.csv')
df.head()

Unnamed: 0,SKU,Item,Customer
0,IEY1500,Ima Eat You! 2015 Zine Pack,86
1,IEY1506,Ima Eat You! June 2015,541
2,IEY1507,Ima Eat You! July 2015,541
3,IEY1506,Ima Eat You! June 2015,524
4,IEY1507,Ima Eat You! July 2015,524


## Create pivot table

In [4]:
# Add `1` to all rows to indicate customer purchased item
df['Purchased'] = 1

In [5]:
pivot = pd.pivot_table(df, index='Item', columns='Customer', values='Purchased')
pivot.head()

Customer,0,1,2,3,4,5,6,7,8,9,...,624,625,626,627,628,629,630,631,632,633
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1 1/2 Weeks,,,,,,,,,,,...,,,,,,,,,,
9 Love Potions,,,,,,,,,,,...,,,,,,,,,,
A Critique of Ally Politics,,,,,,,,,,,...,,,,,,,,,,
Alex Issue 10,,,,,,,,,,,...,,,,1.0,,,,,,
Alex Issue 9,,,,1.0,,,,,,,...,,,,,1.0,,,,,


## Create sparse matrix

In [6]:
sparse_pivot = sparse.csr_matrix(pivot.fillna(0))

## Calculate cosine similarity

In [7]:
recommender = pairwise_distances(sparse_pivot, metric='cosine')

## Create distances DataFrame

In [8]:
recommender_df = pd.DataFrame(recommender, columns=pivot.index, index=pivot.index)
recommender_df.head()

Item,1 1/2 Weeks,9 Love Potions,A Critique of Ally Politics,Alex Issue 10,Alex Issue 9,All That There Is,Amor Y Sacrificio,Ang Pangalawang Pagtatapat: The Second Confessions,Anxiety Toolkit,Ask me about my preferred pronouns,...,"Sometimes, Cigarettes",Stay Kind! Button,Stay Kind! Sticker,Stories from the Inside Issue 1,Tender Vol. 2,The Goods Zine Vol. 1,The Goods Zine Vol. 2,The House Our Gods Destroyed,The List Zine,The Pen Pal Network News Issue #2
Item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1 1/2 Weeks,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,0.925875,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9 Love Potions,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.952435,1.0,...,1.0,0.971613,0.974575,1.0,1.0,1.0,1.0,1.0,1.0,1.0
A Critique of Ally Politics,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.926279,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Alex Issue 10,1.0,1.0,1.0,0.0,1.0,1.0,0.833333,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
Alex Issue 9,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.955545,1.0,1.0,...,0.904654,0.92514,0.955301,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## Evaluate recommender performance

In [9]:
search = 'confessions'

for item in items.loc[items['Item'].str.contains(search, case=False), 'Item']:
    print(item)
    try:
        print('Number sold:', pivot.T[item].count())
        print('')
        print('10 Closest Items')
        print(recommender_df[item].sort_values()[1:11])
    except:
        print('Number sold: 0')
    print('')
    print('*******************************************************************************************')
    print('')

Confessions
Number sold: 32

10 Closest Items
Item
Ang Pangalawang Pagtatapat: The Second Confessions    0.336511
Kulay Ng Balat: The Third Confessions                 0.875000
Dream Life Personal Planner                           0.875000
Penises Are Confusing #21: Learning Curve             0.875000
How to Get Along with Introverts                      0.888197
Gimme Gimme: My Relationship with Blow Jobs           0.897938
BiH, A Love Story                                     0.897938
Parts of the Whole                                    0.927831
Practicing > Preaching                                0.927831
Ima Eat You! December 2015                            0.937500
Name: Confessions, dtype: float64

*******************************************************************************************

Ang Pangalawang Pagtatapat: The Second Confessions
Number sold: 23

10 Closest Items
Item
Confessions                                0.336511
Kulay Ng Balat: The Third Confessions      0.7788