# Recommendation System for Purchase Data
#### We will build collaborative filtering models for recommending products to customers using purchase data.
Steps in constructing a recommendation system with Python and machine learning include:
- Transforming and normalizing data
- Training models
- Evaluating model performance
- Selecting the optimal model

**1. Import modules:**
   - pandas and numpy for data manipulation
   - turicreate for performing model selection and evaluation
   - sklearn for splitting the data into train and test set

In [3]:
import pandas as pd
import numpy as np

import time
#import turicreate as tc #Turi Create simplifies the development of custom machine learning models.

from sklearn.model_selection import train_test_split

import sys
sys.path.append("..")

**2. Load data:**

In [4]:
customers = pd.read_csv('./data/recommend_1.csv')
transactions = pd.read_csv('./data/trx_data.csv')

In [6]:
print(customers.shape)
customers.head()

(1000, 1)


Unnamed: 0,customerId
0,1553
1,20400
2,19750
3,6334
4,27773


In [10]:
print(transactions.shape)
transactions.head()

(62483, 2)


Unnamed: 0,customerId,products
0,0,20
1,1,2|2|23|68|68|111|29|86|107|152
2,2,111|107|29|11|11|11|33|23
3,3,164|227
4,5,2|2


**3. Data preparation:**
<br>Our goal here is to break down each list of items in the products column into rows and count the number of products bought by a user

**3.1. Create data with user, item, and target field:**
   - This table will be an input for our modeling later
   - In this case, our user is customerId, productId, and purchase_count

In [None]:
data = pd.melt()

In [9]:
transactions_melted = pd.melt(transactions.set_index('customerId'))
transactions_melted

Unnamed: 0,variable,value
0,products,20
1,products,2|2|23|68|68|111|29|86|107|152
2,products,111|107|29|11|11|11|33|23
3,products,164|227
4,products,2|2
...,...,...
62478,products,103|103|8|48|126
62479,products,124|37|37|78|124|124|37|8|35|8
62480,products,24
62481,products,167


In [41]:
transactions.drop()

customerId                      0
products      198|260|157|136|136
Name: 61071, dtype: object

In [35]:
transactions[transactions['products'] == '198|260|157|136|136']

Unnamed: 0,customerId,products
61071,0,198|260|157|136|136


In [26]:
data = pd.melt(transactions.set_index('customerId')['products'].apply(pd.Series).reset_index(), 
             id_vars=['customerId'],
             value_name='products') \
    .dropna().drop(['variable'], axis=1) \
    .groupby(['customerId', 'products']) \
    .agg({'products': 'count'}) \
    .rename(columns={'products': 'purchase_count'}) \
    .reset_index() \
    .rename(columns={'products': 'productId'})


data['productId'] = data['productId'].astype(np.int64)

ValueError: invalid literal for int() with base 10: '198|260|157|136|136'

In [None]:
print(data.shape)
data.head()