# Amazon Recommendation System

The Recommendation System of Amazon follows the principle of generating product based recommendations which means measuring the similarities between two products and then recommend the most similar products to each user. The methods of measuring similarities between two products have always been a major focus of researchers.

But when it comes to a website like Amazon, it needs to add more criteria to recommend products to the users such as the quality of the product. A good quality product will always have a good collection of reviews so we can use both the similarity score and product reviews to generate recommendations. In the section below, I will take you through how to create an Amazon Recommendation System using Python.

## Amazon Recommendation System using Python
I will try to use the fewer Python libraries I can for creating this recommendation system. To work with data I will be using only pandas and NumPy library in Python. So let’s import the data and see how to create an Amazon Recommendation System using Python:

In [24]:
import pandas as pd
import numpy as np

In [25]:
data=pd.read_csv('ratings_Electronics (1).csv')

In [26]:
data.head()

Unnamed: 0,AKM1MP6P0OYPR,0132793040,5.0,1365811200
0,A2CX7LUOHB2NDG,321732944,5.0,1341100800
1,A2NWSAGRHCP8N5,439886341,1.0,1367193600
2,A2WNBOD3WNDNKT,439886341,3.0,1374451200
3,A1GI0U4ZRJA8WN,439886341,1.0,1334707200
4,A1QGNMC6O1VW39,511189877,5.0,1397433600


#### The dataset that I am using here does not have columns names, so let’s give the most appropriate names to these columns:

In [27]:
data.columns=['user_id','product_id','ratings','timestamp']

#### Since the dataset is very large let me use a sample of the data

In [28]:
df=data[:int(len(data)*0.1)]

## Now let’s prepare the dataset for creating a recommendation system:

In [29]:
counts=df['user_id'].value_counts()

In [30]:
counts

A5JLAU2ARJ0BO     384
A231WM2Z2JL0U3    249
A25HBO5V8S8SEA    163
A6FIAB28IS79      113
AT6CZDCP4TRGA     112
                 ... 
A1B5G76Q4UYQRW      1
A29IZM5WJG0Z4L      1
A2GAME6E4MX8QP      1
A2TJZ1AOM8A45L      1
AYJHBMM59I47X       1
Name: user_id, Length: 606149, dtype: int64

In [31]:
data=df[df['user_id'].isin(counts[counts>=50].index)]

In [32]:
data

Unnamed: 0,user_id,product_id,ratings,timestamp
2161,A5JLAU2ARJ0BO,1400532655,1.0,1291334400
7380,A2AEZQ3DGBBLPR,B000000O48,5.0,1038873600
7447,A2R6RA8FRBS608,B000001OL6,4.0,1209513600
7788,A11D1KHM7DVOQK,B000001OMN,2.0,1167350400
8731,A6FIAB28IS79,B00000J05A,3.0,985564800
...,...,...,...,...
778202,A3OXHLG6DIBRW8,B000BSOBG0,4.0,1267315200
779340,AKT8TGIT6VVZ5,B000BTFZZA,5.0,1147737600
780931,A149RNR5RH19YY,B000BTL0OA,5.0,1141689600
781739,A1RPTVW5VEOSI,B000BTPVHW,3.0,1172102400


In [33]:
data.groupby('product_id')['ratings'].mean().sort_values(ascending=False)

product_id
B00004Y284    5.0
B0000513SA    5.0
B000068BRE    5.0
B0000DJEIP    5.0
B00007J8SB    5.0
             ... 
B0001LGXO0    1.0
B0001PFO3C    1.0
B0001WNKBI    1.0
B0001X6GEK    1.0
1400532655    1.0
Name: ratings, Length: 2207, dtype: float64

In [35]:
final_ratings=data.pivot(index='user_id',columns='product_id',values='ratings').fillna(0)

In [36]:
num_of_ratings=np.count_nonzero(final_ratings)

In [37]:
possible_ratings=final_ratings.shape[0]*final_ratings.shape[1]

In [38]:
density=num_of_ratings/possible_ratings

In [39]:
density *=100
final_ratings_T=final_ratings.transpose()

In [44]:
grouped=data.groupby('product_id').agg({'user_id':'count'}).reset_index()
grouped

Unnamed: 0,product_id,user_id
0,1400532655,1
1,B000000O48,1
2,B000001OL6,1
3,B000001OMN,1
4,B00000J05A,1
...,...,...
2202,B000BSOBG0,1
2203,B000BTFZZA,1
2204,B000BTL0OA,1
2205,B000BTPVHW,1


In [45]:
grouped.rename(columns = {'user_id': 'score'},inplace=True)

In [46]:
grouped

Unnamed: 0,product_id,score
0,1400532655,1
1,B000000O48,1
2,B000001OL6,1
3,B000001OMN,1
4,B00000J05A,1
...,...,...
2202,B000BSOBG0,1
2203,B000BTFZZA,1
2204,B000BTL0OA,1
2205,B000BTPVHW,1


In [47]:
training_data=grouped.sort_values(['score','product_id'],ascending=[0,1])
training_data['Rank'] = training_data['score'].rank(ascending=0, method='first') 
recommendations = training_data.head()

In [48]:
recommendations

Unnamed: 0,product_id,score,Rank
113,B00004SB92,6,1.0
1099,B00008OE6I,5,2.0
368,B00005AW1H,4,3.0
612,B0000645C9,4,4.0
976,B00007KDVI,4,5.0


In [49]:
recommend_products = recommendations 
recommend_products['user_id'] = id 
column = recommend_products.columns.tolist() 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recommend_products['user_id'] = id


In [50]:
column[-1:] 

['user_id']

In [51]:
def recommend(id):     
    recommend_products = recommendations 
    recommend_products['user_id'] = id 
    column = recommend_products.columns.tolist() 
    column = column[-1:] + column[:-1] 
    recommend_products = recommend_products[column] 
    return recommend_products 


In [52]:
print(recommend(11))

      user_id  product_id  score  Rank
113        11  B00004SB92      6   1.0
1099       11  B00008OE6I      5   2.0
368        11  B00005AW1H      4   3.0
612        11  B0000645C9      4   4.0
976        11  B00007KDVI      4   5.0


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recommend_products['user_id'] = id
