## Association
Association rules help us find out the items that occur frequently in order or retain context. The items that are brought together 
is called itemset.
This items set helps discover relationship between items that are brought together which can be used as a basis for creating strategies
like combo offers and placing two items next to each other on retail shelves to attract customer

## Support 
Support refers to number of times items appear together in basket with respect to all possible times items appear in basket.
<br>Support(X,Y) = N(xy)/N
<br>N(xy) Number of items x and y appear together 
<br>N Total Number of basket

<br>Appriori algorithm uses support to limit possible number of itemset combinations. Suppose support is set as 0.01 then only those itemsets will be considered that occur more than 1% of times. This reduces search space for apriori alogrithm thus further reducing required computational power.

## Confidence 
Confidence measure the proportion of transaction that contain X which also contain Y. X is called antecedant and Y is called consequent.
<br>Confidence(X->Y)  = N(xy)/N(x)

## Lift
Lift = Support(X,Y)/Support(X)*Support(Y)
<br>Lift value of greater than 1 signifies that sale of 1 product will increase sale of other product. 
<br> Lift value equal or less than 1 indicates that either they are substitutes or the sale of one item reduces sale of another item


In [1]:
all_txns = []
with open('C:\Term 3\Supervised learning with python\Codes-Data-Files\Machine Learning (Codes and Data Files)\Data\groceries.csv') as f:
    #read each line
    content = f.readlines()
    #Remove the whitespaces from beginning and end of the line
    txns = [x.strip() for x in content]
    #iterate through each line and create a list of transactions
    for each_txn in txns:
        all_txns.append(each_txn.split(","))
all_txns[0:5]
        

[['citrus fruit', 'semi-finished bread', 'margarine', 'ready soups'],
 ['tropical fruit', 'yogurt', 'coffee'],
 ['whole milk'],
 ['pip fruit', 'yogurt', 'cream cheese ', 'meat spreads'],
 ['other vegetables',
  'whole milk',
  'condensed milk',
  'long life bakery product']]

In [2]:
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [3]:
one_hot_encoding = TransactionEncoder()
one_hot_txns = one_hot_encoding.fit(all_txns).transform(all_txns)
one_hot_txns = one_hot_txns.astype(int)

In [4]:
one_hot_txns

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 1, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 1, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [5]:
one_hot_txns = pd.DataFrame(one_hot_txns, columns = one_hot_encoding.columns_)
one_hot_txns

Unnamed: 0,Instant food products,UHT-milk,abrasive cleaner,artif. sweetener,baby cosmetics,baby food,bags,baking powder,bathroom cleaner,beef,...,turkey,vinegar,waffles,whipped/sour cream,whisky,white bread,white wine,whole milk,yogurt,zwieback
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9830,0,0,0,0,0,0,0,0,0,1,...,0,0,0,1,0,0,0,1,0,0
9831,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9832,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
9833,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [6]:
one_hot_txns.shape

(9835, 171)

In [7]:
#we will first filter out itemset that have support value greater than certain threshold value
frquent_itemsets = apriori(one_hot_txns,min_support = 0.02,use_colnames = True)
frquent_itemsets.sort_values('support',ascending=False)



Unnamed: 0,support,itemsets
57,0.255516,(whole milk)
39,0.193493,(other vegetables)
43,0.183935,(rolls/buns)
49,0.174377,(soda)
58,0.139502,(yogurt)
...,...,...
75,0.020539,"(whole milk, frankfurter)"
60,0.020437,"(bottled beer, whole milk)"
76,0.020437,"(frozen vegetables, whole milk)"
96,0.020437,"(pip fruit, tropical fruit)"


In [8]:
rules = association_rules(frquent_itemsets,metric = "lift",min_threshold = 1)
rules.sort_values("confidence",ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
121,"(other vegetables, yogurt)",(whole milk),0.043416,0.255516,0.022267,0.512881,2.007235,0.011174,1.528340,0.524577
16,(butter),(whole milk),0.055414,0.255516,0.027555,0.497248,1.946053,0.013395,1.480817,0.514659
25,(curd),(whole milk),0.053279,0.255516,0.026131,0.490458,1.919481,0.012517,1.461085,0.505984
114,"(root vegetables, other vegetables)",(whole milk),0.047382,0.255516,0.023183,0.489270,1.914833,0.011076,1.457687,0.501524
115,"(root vegetables, whole milk)",(other vegetables),0.048907,0.193493,0.023183,0.474012,2.449770,0.013719,1.533320,0.622230
...,...,...,...,...,...,...,...,...,...,...
124,(whole milk),"(other vegetables, yogurt)",0.255516,0.043416,0.022267,0.087147,2.007235,0.011174,1.047905,0.674027
75,(whole milk),(pork),0.255516,0.057651,0.022166,0.086749,1.504719,0.007435,1.031862,0.450546
0,(whole milk),(beef),0.255516,0.052466,0.021251,0.083168,1.585180,0.007845,1.033487,0.495856
30,(whole milk),(frankfurter),0.255516,0.058973,0.020539,0.080382,1.363029,0.005470,1.023280,0.357751


## Limitations
Association rules are simple and can be interepreted easily but there are limitation as association do not take rating of customer into 
<br> account. There is a possibility that customer like certain product of itemset but don not like other prdocut of itemset

# Collaborative Filtering
<br>It build on the notion of similarity For example if two users are similar that is if they bought same products and gave a similar rating 
<br>to those product then if in future one of them brought a product and gave a high rating to product than we can recommend that product 
<br>to other user
<br> There can be two types of collaborative filtering
<br> 1. Item based similarity:- Find k nearest items based on the common users who have used those items.
<br> 2. User based similarity:- Find k nearest users based on the common items who have used that items.

In [9]:
import pandas as pd
rating_df = pd.read_csv(r"C:/Term 3/Supervised learning with python/Codes-Data-Files/Machine Learning (Codes and Data Files)/Data/ratings.csv")
rating_df.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


In [10]:
rating_df.shape

(100836, 4)

To find similar users and similar movies we have to convert this matrix where data can be represented as the movies a particular user
liked and movie that are liked by the users

In [11]:
user_movie_df = rating_df.pivot(index="userId", columns="movieId", values="rating").reset_index(drop=True)
user_movie_df.index = rating_df.userId.unique()
user_movie_df.iloc[0:5, 0:15]

movieId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,4.0,,4.0,,,4.0,,,,,,,,,
2,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,
5,4.0,,,,,,,,,,,,,,


In [12]:
user_movie_df.fillna(0,inplace=True)
user_movie_df.iloc[0:5, 0:15]

movieId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,4.0,0.0,4.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
from sklearn.metrics import pairwise_distances
from scipy.spatial.distance import cosine, correlation
user_sim = 1-pairwise_distances(user_movie_df.values,metric="cosine")
user_sim

array([[1.        , 0.02728287, 0.05972026, ..., 0.29109737, 0.09357193,
        0.14532081],
       [0.02728287, 1.        , 0.        , ..., 0.04621095, 0.0275654 ,
        0.10242675],
       [0.05972026, 0.        , 1.        , ..., 0.02112846, 0.        ,
        0.03211875],
       ...,
       [0.29109737, 0.04621095, 0.02112846, ..., 1.        , 0.12199271,
        0.32205486],
       [0.09357193, 0.0275654 , 0.        , ..., 0.12199271, 1.        ,
        0.05322546],
       [0.14532081, 0.10242675, 0.03211875, ..., 0.32205486, 0.05322546,
        1.        ]])

In [14]:
np.fill_diagonal(user_sim, 0)

In [15]:
user_sim_df = pd.DataFrame(user_sim)
user_sim_df.index = rating_df.userId.unique()
user_sim_df.columns = rating_df.userId.unique()
user_sim_df.iloc[0:5,0:15]
# each value in cell represents similarity between two users

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1,0.0,0.027283,0.05972,0.194395,0.12908,0.128152,0.158744,0.136968,0.064263,0.016875,0.132499,0.016458,0.092971,0.113238,0.160689
2,0.027283,0.0,0.0,0.003726,0.016614,0.025333,0.027585,0.027257,0.0,0.067445,0.044419,0.0,0.043918,0.016901,0.119778
3,0.05972,0.0,0.0,0.002251,0.00502,0.003936,0.0,0.004941,0.0,0.0,0.0,0.0,0.0,0.003064,0.017251
4,0.194395,0.003726,0.002251,0.0,0.128659,0.088491,0.11512,0.062969,0.011361,0.031163,0.054767,0.049945,0.076949,0.048989,0.071551
5,0.12908,0.016614,0.00502,0.128659,0.0,0.300349,0.108342,0.429075,0.0,0.030611,0.183805,0.05886,0.017157,0.221711,0.110152


In [16]:
user_sim_df.idxmax(axis=1)[0:5] #  this is giving id of user having maximum similarity with first 5 users.

1    266
2    366
3    313
4    391
5    470
dtype: int64

In [17]:
user_movie_df.iloc[[2,312]]

movieId,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
313,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,4.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
movies_df = pd.read_csv(r"C:/Term 3/Supervised learning with python/Codes-Data-Files/Machine Learning (Codes and Data Files)/Data/movies.csv")

In [19]:
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [20]:
movies_df.drop("genres", axis = 1, inplace=True)

In [21]:
def get_user_similar_movie(user1 ,user2):
    common_movies = rating_df[rating_df["userId"]==user1].merge(rating_df[rating_df["userId"]==user2], on = "movieId",how="inner")
    return common_movies.merge(movies_df,on = "movieId",how = "inner")

In [22]:
common_movies = get_user_similar_movie(1,266)
common_movies[(common_movies['rating_x']>=4.0) & (common_movies['rating_y']>=4.0)]
#common_movies

Unnamed: 0,userId_x,movieId,rating_x,timestamp_x,userId_y,rating_y,timestamp_y,title
1,1,6,4.0,964982224,266,4.0,944980835,Heat (1995)
2,1,50,5.0,964982931,266,4.0,944980462,"Usual Suspects, The (1995)"
3,1,110,4.0,964982176,266,5.0,945670079,Braveheart (1995)
5,1,235,4.0,964980908,266,4.0,945669162,Ed Wood (1994)
6,1,260,5.0,964981680,266,4.0,945670679,Star Wars: Episode IV - A New Hope (1977)
9,1,356,4.0,964980962,266,4.0,945669632,Forrest Gump (1994)
12,1,457,5.0,964981909,266,4.0,945670028,"Fugitive, The (1993)"
13,1,480,4.0,964982346,266,4.0,945670079,Jurassic Park (1993)
14,1,592,4.0,964982271,266,4.0,945670386,Batman (1989)
15,1,608,5.0,964982931,266,5.0,945670764,Fargo (1996)


## Challenges with user similarity recommender systems
The problem with user similarity is that it does not work on new users. It won't work until user has brought some items.

In [23]:
rating_mat = rating_df.pivot(index = "movieId", columns = "userId",values = "rating").reset_index(drop=True)
rating_mat.fillna(0,inplace=True)

In [24]:
movie_sim = 1 - pairwise_distances(rating_mat.values,metric = "correlation")
movie_sim_df = pd.DataFrame(movie_sim)

In [26]:
movie_sim_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,9714,9715,9716,9717,9718,9719,9720,9721,9722,9723
0,1.000000,0.231327,0.173213,-0.028917,0.192474,0.192686,0.143743,0.085477,0.177245,0.183382,...,-0.028906,-0.028906,-0.028906,-0.028906,-0.028906,-0.028906,-0.028906,-0.028906,-0.028906,-0.028906
1,0.231327,1.000000,0.191945,0.071269,0.200526,0.158341,0.127569,0.141540,-0.021045,0.285086,...,-0.018291,-0.018291,-0.018291,-0.018291,-0.018291,-0.018291,-0.018291,-0.018291,-0.018291,-0.018291
2,0.173213,0.191945,1.000000,0.067143,0.370171,0.196442,0.351513,0.296897,0.275812,0.136916,...,-0.011729,-0.011729,-0.011729,-0.011729,-0.011729,-0.011729,-0.011729,-0.011729,-0.011729,-0.011729
3,-0.028917,0.071269,0.067143,1.000000,0.167910,0.053755,0.258075,0.148726,-0.016025,0.056000,...,-0.004138,-0.004138,-0.004138,-0.004138,-0.004138,-0.004138,-0.004138,-0.004138,-0.004138,-0.004138
4,0.192474,0.200526,0.370171,0.167910,1.000000,0.215503,0.429890,0.265777,0.308085,0.110833,...,-0.011456,-0.011456,-0.011456,-0.011456,-0.011456,-0.011456,-0.011456,-0.011456,-0.011456,-0.011456
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9719,-0.028906,-0.018291,-0.011729,-0.004138,-0.011456,-0.017712,-0.012033,-0.004383,-0.006359,-0.020524,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,-0.001642
9720,-0.028906,-0.018291,-0.011729,-0.004138,-0.011456,-0.017712,-0.012033,-0.004383,-0.006359,-0.020524,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,-0.001642
9721,-0.028906,-0.018291,-0.011729,-0.004138,-0.011456,-0.017712,-0.012033,-0.004383,-0.006359,-0.020524,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,-0.001642
9722,-0.028906,-0.018291,-0.011729,-0.004138,-0.011456,-0.017712,-0.012033,-0.004383,-0.006359,-0.020524,...,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,-0.001642


In [29]:
def get_similar_movies(movieId, topN = 5):
    movieidx = movies_df[movies_df.movieId == movieId].index[0]
    movies_df['similarity'] = movie_sim_df.iloc[movieidx]
    top_n = movies_df.sort_values('similarity',ascending = False)[0:topN]
    return top_n
    
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     

In [30]:
get_similar_movies(858)

Unnamed: 0,movieId,title,similarity
659,858,"Godfather, The (1972)",1.0
921,1220,"Blues Brothers, The (1980)",0.76939
913,1212,"Third Man, The (1949)",0.560246
895,1192,Paris Is Burning (1990),0.496048
827,1088,Dirty Dancing (1987),0.442128
