## Analysis of Instacart Data to build a Recommender system to make three types of recommendations to the user on the following three criterias:
### Recommendations based on the item added to cart by the user using Unsupervised KNN Algorithm.
### Popularity Based Recommendations.
### Personalized Recommendations Based on Items Reordered the Most by a user.
##### Done By :  Bhakti Mehta

#### Import Libraries : 

In [1]:
import pandas as pd
import numpy as np
import csv
import sklearn
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

#### Ingestion of Instacart Products and Users Orders data:

In [2]:
orders = pd.read_csv('orders.csv')

In [3]:
products=pd.read_csv('products.csv')

In [4]:
order_products=pd.read_csv('order_products__train.csv')

#### Data Cleaning and Manipulation:

In [5]:
products=products.drop(['aisle_id','department_id'],axis=1)

In [6]:
orders=orders.drop(['eval_set','order_number','order_dow','order_hour_of_day','days_since_prior_order'],axis=1)

In [7]:
order_products=order_products.drop(['add_to_cart_order'],axis=1)

In [8]:
order_products=order_products.merge(orders,left_on='order_id',right_on='order_id',how='inner')

In [9]:
order_products=order_products.drop(['order_id'],axis=1)

In [10]:
order_products=order_products.merge(products,left_on='product_id',right_on='product_id',how='left')

In [11]:
order_products=order_products.drop(['product_id'],axis=1)

#### Finally Our Clean and prepared Data is shown below:

In [12]:
order_products.head()

Unnamed: 0,reordered,user_id,product_name
0,0,42756,Shelled Pistachios
1,0,42756,Organic Biologique Limes
2,0,42756,Organic Raw Unfiltered Apple Cider Vinegar
3,1,42756,Organic Baby Arugula
4,0,42756,Organic Hot House Tomato


#### Since our data exceeds the scope of python memory, we need to limit our training data:

In [13]:
len(order_products)

422309

In [14]:
order_products=order_products[0:100000]

In [15]:
pop_products=pd.DataFrame(order_products)

## Recommendations based on the item added to cart by the user using Unsupervised KNN Algorithm

#### Implementation of KNN Algorithm:

In [16]:
order_products=order_products.pivot_table(values='reordered', index='product_name', columns='user_id')

In [17]:
order_products=order_products.fillna(-1)

##### 0 : Represents User has ordered the product once, 1 : Represents User has Ordered the product more than once and -1 :Represents User has never ordered the product 

In [18]:
order_products.head()

user_id,7,30,34,49,55,56,66,70,74,79,...,63030,63033,63046,63056,63064,63068,63080,63084,63086,63094
product_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
#2 Coffee Filters,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
0% Fat Black Cherry Greek Yogurt y,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
0% Fat Blueberry Greek Yogurt,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
0% Fat Free Organic Milk,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
0% Fat Organic Greek Vanilla Yogurt,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0


In [19]:
order_products_matrix=csr_matrix(order_products.values)

In [20]:
model_knn=NearestNeighbors(metric='cosine',algorithm='brute')

In [21]:
model_knn.fit(order_products_matrix)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='cosine',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

In [22]:
query_index=np.random.choice(order_products.shape[0])

In [23]:
distances, indices = model_knn.kneighbors(order_products.iloc[query_index,:].reshape(1,-1), n_neighbors=6)

  """Entry point for launching an IPython kernel.


In [24]:
for i in range(0,len(distances.flatten())):
    if i == 0:
        print 'Recommendations for {0} : \n'.format(order_products.index[indices.flatten()[i]])
    else:
        print '{0} : {1}'.format(i,order_products.index[indices.flatten()[i]])

Recommendations for Original Chai Tea Latte : 

1 : Magic Tape Clear
2 : Macaroni & Soy Cheeze Pasta
3 : Mach3 Razor Replacement Cartridges
4 : Madras Sambar Toor Dal with Vegetables
5 : Macaroni & Cheese Dinner Nickelodeon Sponge Bob Square Pants Shapes


## Popularity Based Recommendations based on Department selected by the User

In [25]:
my_products=pd.read_csv('products.csv')

In [26]:
my_departments=pd.read_csv('departments.csv')

In [27]:
my_departments.head()

Unnamed: 0,department_id,department
0,1,frozen
1,2,other
2,3,bakery
3,4,produce
4,5,alcohol


In [28]:
my_products=my_products.merge(my_departments,left_on='department_id',right_on='department_id',how='left')

In [29]:
my_products.drop('aisle_id',axis=1,inplace=True)

In [30]:
pop_products=pop_products.merge(my_products,left_on='product_name',right_on='product_name',how='left')

In [31]:
pop_products.to_csv('products_data.csv')

In [32]:
def Popular_in_department(dept_name):
    dept_selected=pd.DataFrame(pop_products[pop_products['department']==dept_name])
    dept_selected = pd.DataFrame(dept_selected.groupby('product_name')['reordered'].count())
    dept_selected = dept_selected.sort_values('reordered',ascending=False)
    dept_selected.rename(columns={'reordered':'count'},inplace=True)
    print ('Items recommended based on the popularity in the {0} department are :\n'.format(dept_name))
    print list(dept_selected.index[0:5])
    return dept_selected

#### Demonstration : If user selects the department 'alcohol' Top 5 popular products are recommended to the user

In [33]:
    Popular_items=pd.DataFrame(Popular_in_department('alcohol'))

Items recommended based on the popularity in the alcohol department are :

['Sauvignon Blanc', 'Pinot Noir', 'Beer', 'Vodka', 'Chardonnay']


## Recommendations Based on analysis of  User's purchase history i.e Reorders of the User

In [34]:
reordered_products=pop_products[pop_products['reordered']==1]

In [35]:
def Reordered_in_department(current_user,dept_name):
    dept_selected=pd.DataFrame(reordered_products[reordered_products['department']==dept_name])
    dept_selected=pd.DataFrame(reordered_products[reordered_products['user_id']==current_user])
    dept_selected = pd.DataFrame(dept_selected.groupby(['product_name'])['reordered'].count())
    dept_selected = dept_selected.sort_values('reordered',ascending=False)
    dept_selected.rename(columns={'reordered':'count'},inplace=True)
    print ('Items recommended based on your previous orders in the {0} department are : \n'.format(dept_name))
    print list(dept_selected.index[0:5])
    return dept_selected

#### Demonstration : If customer with user_id '61770' selects the department 'alcohol', based on his reorders the following recommendations are given:

In [36]:
Reordered_items=pd.DataFrame(Reordered_in_department(61770,'alcohol'))

Items recommended based on your previous orders in the alcohol department are : 

['Beer', 'Extra IPA Beer', 'Hell or High Watermelon Wheat, Cans', 'Ksa Ko?Lsch Style Ale', 'Mighty Dry Hard Cider']
