<a href="https://colab.research.google.com/github/HassanSherwani/Product_Purchase_Frequency/blob/master/Product_Freq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem Statement:

How often a certain product has been sold in the past days.

#1)- Importing key modules

In [0]:
# Let's be rebels and ignore warnings for now
import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings("ignore",category=DeprecationWarning)

In [0]:
import pandas as pd
import numpy as np
import time
from sklearn.model_selection import train_test_split
import sys

In [3]:
! pip install turicreate

Collecting turicreate
[?25l  Downloading https://files.pythonhosted.org/packages/4f/ef/1847a704548ad4cbcabe09b3882181c190f5b696da8b2d082521c33ec187/turicreate-5.4-cp36-cp36m-manylinux1_x86_64.whl (87.4MB)
[K    100% |████████████████████████████████| 87.4MB 140kB/s 
Collecting coremltools==2.1.0 (from turicreate)
[?25l  Downloading https://files.pythonhosted.org/packages/b9/9d/7ec5a2480c6afce4fcb99de1650b7abfd1457b2ef1de5ce39bf7bee8a8ae/coremltools-2.1.0-cp36-none-manylinux1_x86_64.whl (2.7MB)
[K    100% |████████████████████████████████| 2.7MB 12.1MB/s 
[?25hCollecting mxnet<1.2.0,>=1.1.0 (from turicreate)
[?25l  Downloading https://files.pythonhosted.org/packages/96/98/c9877e100c3d1ac92263bfaba7bb8a49294e099046592040a2ff8620ac61/mxnet-1.1.0.post0-py2.py3-none-manylinux1_x86_64.whl (23.8MB)
[K    100% |████████████████████████████████| 23.8MB 2.2MB/s 
Collecting graphviz<0.9.0,>=0.8.1 (from mxnet<1.2.0,>=1.1.0->turicreate)
  Downloading https://files.pythonhosted.org/packages/5

In [0]:
import turicreate as tc

# 2)-Loading Data

In [0]:
url = 'https://raw.githubusercontent.com/HassanSherwani/Product_Purchase_Frequency/master/20190207_transactions%20.json'

In [0]:
transactions = pd.read_json(url, lines= True)

# 3)-Exploring dataset

In [7]:
transactions.head()

Unnamed: 0,id,products
0,0,"[185, 30, 77, 188, 78, 125, 45, 155, 241, 229,..."
1,1,"[119, 148, 108, 34, 157, 82, 113, 45, 165]"
2,2,"[173, 103, 229, 240]"
3,3,[91]
4,4,"[175, 192, 54, 172]"


In [8]:
transactions.shape

(2500, 2)

In [9]:
print(transactions['products'][1415])

[250, 236, 242, 229, 92, 2, 71, 172, 109, 247, 171, 209, 90, 139, 188, 191, 145, 214, 216, 237]


In [10]:
transactions.info() # checking missing values

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2500 entries, 0 to 2499
Data columns (total 2 columns):
id          2500 non-null int64
products    2500 non-null object
dtypes: int64(1), object(1)
memory usage: 138.6+ KB


In [11]:
transactions.describe()

Unnamed: 0,id
count,2500.0
mean,1249.5
std,721.83216
min,0.0
25%,624.75
50%,1249.5
75%,1874.25
max,2499.0


# 4)- Adding Features:

### 4.1)-Create a separate dataframe for recommending users

In [0]:
customers=transactions['id']

In [13]:
customers.head()

0    0
1    1
2    2
3    3
4    4
Name: id, dtype: int64

In [0]:
import random
random.shuffle(customers) 

 To get random values of customer ids. This would help us in avoiding sample biaseness in our modeling.

In [0]:
cust_2_rec=customers[:1000]

In [16]:
cust_2_rec.head()

0     148
1    1674
2     337
3    1390
4     544
Name: id, dtype: int64

### 4.2)- break down each list of items in the products column into rows and count the number of products bought by a user

In [17]:


pd.melt(transactions.head(2).set_index('id')['products'].apply(pd.Series).reset_index(), 
             id_vars=['id'],
             value_name='products') \
    .dropna().drop(['variable'], axis=1) \
    .groupby(['id', 'products']) \
    .agg({'products': 'count'}) \
    .rename(columns={'products': 'purchase_count'}) \
    .reset_index() \
    .rename(columns={'products': 'productId'})

Unnamed: 0,id,productId,purchase_count
0,148,30.0,1
1,148,45.0,1
2,148,77.0,1
3,148,78.0,1
4,148,89.0,2
5,148,125.0,1
6,148,133.0,1
7,148,155.0,1
8,148,161.0,1
9,148,185.0,1


# 5- Creating features for user_id,product_id and purchase count

### 5a)-One for purchase count

In [18]:
s=time.time()

data = pd.melt(transactions.set_index('id')['products'].apply(pd.Series).reset_index(), 
             id_vars=['id'],
             value_name='products') \
    .dropna().drop(['variable'], axis=1) \
    .groupby(['id', 'products']) \
    .agg({'products': 'count'}) \
    .rename(columns={'products': 'purchase_count'}) \
    .reset_index() \
    .rename(columns={'products': 'productId'})
data['productId'] = data['productId'].astype(np.int64)

print("Execution time:", round((time.time()-s)/60,2), "minutes")

Execution time: 0.01 minutes


In [19]:
data.shape

(24811, 3)

In [20]:
data.head()

Unnamed: 0,id,productId,purchase_count
0,0,3,1
1,0,12,1
2,0,48,1
3,0,63,1
4,0,83,1


### 5-b)-Dummy as target

Dummy for marking an assumption whether a customer bought that item or not.
If one buys an item, then purchase_dummy are marked as 1.

In [0]:
 
def create_data_dummy(data):
    data_dummy = data.copy()
    data_dummy['purchase_dummy'] = 1
    return data_dummy

In [0]:
data_dummy = create_data_dummy(data)

In [23]:
data_dummy.head()

Unnamed: 0,id,productId,purchase_count,purchase_dummy
0,0,3,1,1
1,0,12,1,1
2,0,48,1,1
3,0,63,1,1
4,0,83,1,1


There is a reason to take dummy into consideration instead of normalization.
Normalizing the purchase count, by each user, would not work because customers may have different buying frequency don’t have the same taste.

### 5-c)-Normalize item

we normalize purchase frequency of each item across users by first creating a user-item matrix 

In [0]:
df_matrix = pd.pivot_table(data, values='purchase_count', index='id', columns='productId')

In [25]:
df_matrix.head()

productId,1,2,3,4,5,6,7,8,9,10,...,241,242,243,244,245,246,247,248,249,250
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,1.0,,,,,,,,...,,,1.0,1.0,,,,,,
1,,,,,1.0,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,1.0,,1.0,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,1.0,,,,,,


The NaN tells us that the item represented by the column was not purchased in that specific transaction.

In [26]:
df_matrix.shape

(2378, 250)

In [27]:
df_matrix_norm = (df_matrix-df_matrix.min())/(df_matrix.max()-df_matrix.min())

df_matrix_norm.head()

productId,1,2,3,4,5,6,7,8,9,10,...,241,242,243,244,245,246,247,248,249,250
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,0.0,,,,,,,,...,,,0.0,0.0,,,,,,
1,,,,,0.0,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,0.0,,0.0,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,0.0,,,,,,


In [28]:
print(df_matrix_norm.shape)

(2378, 250)


### 5-d)- create a table for input to the modeling

In [0]:
d = df_matrix_norm.reset_index()
d.index.names = ['scaled_purchase_freq']
data_norm = pd.melt(d, id_vars=['id'], value_name='scaled_purchase_freq').dropna()

In [30]:
data_norm.head()

Unnamed: 0,id,productId,scaled_purchase_freq
60,63,1,0.0
65,68,1,0.0
70,73,1,0.0
73,76,1,0.0
97,102,1,0.0


In [31]:
data_norm.shape

(22530, 3)

### 5-e) a function for normalizing data

In [0]:
def normalize_data(data):
    df_matrix = pd.pivot_table(data, values='purchase_count', index='id', columns='productId')
    df_matrix_norm = (df_matrix-df_matrix.min())/(df_matrix.max()-df_matrix.min())
    d = df_matrix_norm.reset_index()
    d.index.names = ['scaled_purchase_freq']
    return pd.melt(d, id_vars=['id'], value_name='scaled_purchase_freq').dropna()


we have normalized the items accoreding to their purchase history, from 0–1 (with 1 being the most number of purchase for an item and 0 being 0 purchase count for that item.

# 6)-Split train and test set

we have three dataframes with purchase counts(data), purchase dummy(data_dummy), and scaled purchase counts(data_norm).

In [33]:
train, test = train_test_split(data, test_size = .2)
print(train.shape, test.shape)

(19848, 3) (4963, 3)


In [0]:
#convert dataframe to SFrame
train_data = tc.SFrame(train)
test_data = tc.SFrame(test)

A tabular, column-mutable dataframe object that can scale to big data <br>
https://turi.com/products/create/docs/generated/graphlab.SFrame.html

In [35]:
train_data

id,productId,purchase_count
1200,199,1
1561,131,1
2323,36,1
2482,76,1
602,27,1
1187,158,1
922,110,1
1738,226,1
2369,226,1
106,132,1


In [36]:
test_data

id,productId,purchase_count
992,180,1
1095,122,1
1982,17,1
169,21,1
1933,92,1
755,77,1
999,165,1
687,150,1
888,34,1
1890,237,1


### 6.2)- Define a split_data function 

In [0]:
def split_data(data):

    train, test = train_test_split(data, test_size = .2)
    train_data = tc.SFrame(train)
    test_data = tc.SFrame(test)
    return train_data, test_data

### 6.3)-Apply for both dummy table and scaled/normalized purchase table

In [0]:
train_data_dummy, test_data_dummy = split_data(data_dummy)
train_data_norm, test_data_norm = split_data(data_norm)

# 7)-Building Model

### 7.1)- Parameters to define field names for purchase count as target feature

In [0]:
user_id = 'id'
item_id = 'productId'
users_to_recommend=list(cust_2_rec)
target = 'purchase_count'
n_rec = 10 # number of items to recommend
n_display = 30 # 1st 30 rows to display

In [40]:
popularity_model = tc.popularity_recommender.create(train_data, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target)

In [41]:
popularity_recomm = popularity_model.recommend(users=users_to_recommend, k=n_rec)
popularity_recomm.print_rows(n_display)

+------+-----------+--------------------+------+
|  id  | productId |       score        | rank |
+------+-----------+--------------------+------+
| 148  |    207    | 1.0987654320987654 |  1   |
| 148  |     96    | 1.0933333333333333 |  2   |
| 148  |    201    | 1.0759493670886076 |  3   |
| 148  |    153    | 1.0740740740740742 |  4   |
| 148  |    152    | 1.0729166666666667 |  5   |
| 148  |     85    | 1.0721649484536082 |  6   |
| 148  |    150    | 1.0674157303370786 |  7   |
| 148  |    122    | 1.0666666666666667 |  8   |
| 148  |     26    | 1.0617283950617284 |  9   |
| 148  |    166    | 1.0588235294117647 |  10  |
| 1674 |    207    | 1.0987654320987654 |  1   |
| 1674 |     96    | 1.0933333333333333 |  2   |
| 1674 |    201    | 1.0759493670886076 |  3   |
| 1674 |    153    | 1.0740740740740742 |  4   |
| 1674 |    152    | 1.0729166666666667 |  5   |
| 1674 |     85    | 1.0721649484536082 |  6   |
| 1674 |    150    | 1.0674157303370786 |  7   |
| 1674 |    122    |

### Note 

Through this model, we predicted the recommendation items using scores by popularity. As you can tell for each model results above, the rows show the first 30 records from 1000 users with 10 recommendations.<br> These 30 records include 3 users and their recommended items, along with score and descending ranks.

## Checking most frequent items(as per assignment) 

In [42]:
 train.groupby(by=item_id)['purchase_count'].mean().sort_values(ascending=False).head(20)

productId
207    1.098765
96     1.093333
201    1.075949
153    1.074074
152    1.072917
85     1.072165
150    1.067416
122    1.066667
26     1.061728
166    1.058824
112    1.058140
71     1.056338
7      1.055556
117    1.055556
63     1.054945
50     1.054945
35     1.054054
88     1.054054
120    1.051948
125    1.051948
Name: purchase_count, dtype: float64

**products 96,125 201, 153, and 152 are the most popular (best-selling) across customers.**

### 7.2)- purchase dummy as target feature

In [0]:
user_id = 'id'
item_id = 'productId'
users_to_recommend=list(cust_2_rec)
target = 'purchase_dummy'
n_rec = 10 # number of items to recommend
n_display = 30 # 1st 30 rows to display

In [44]:
popularity_model_dummy = tc.popularity_recommender.create(train_data_dummy, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target)

In [45]:
popularity_recomm_4_dummy = popularity_model_dummy.recommend(users=users_to_recommend, k=n_rec)
popularity_recomm_4_dummy.print_rows(n_display)

+------+-----------+-------+------+
|  id  | productId | score | rank |
+------+-----------+-------+------+
| 148  |    158    |  1.0  |  1   |
| 148  |    111    |  1.0  |  2   |
| 148  |     71    |  1.0  |  3   |
| 148  |    182    |  1.0  |  4   |
| 148  |     87    |  1.0  |  5   |
| 148  |    245    |  1.0  |  6   |
| 148  |    131    |  1.0  |  7   |
| 148  |     7     |  1.0  |  8   |
| 148  |    163    |  1.0  |  9   |
| 148  |     86    |  1.0  |  10  |
| 1674 |    158    |  1.0  |  1   |
| 1674 |    111    |  1.0  |  2   |
| 1674 |     71    |  1.0  |  3   |
| 1674 |    182    |  1.0  |  4   |
| 1674 |     87    |  1.0  |  5   |
| 1674 |    245    |  1.0  |  6   |
| 1674 |    131    |  1.0  |  7   |
| 1674 |     7     |  1.0  |  8   |
| 1674 |    163    |  1.0  |  9   |
| 1674 |     86    |  1.0  |  10  |
| 337  |    111    |  1.0  |  1   |
| 337  |     71    |  1.0  |  2   |
| 337  |    182    |  1.0  |  3   |
| 337  |     87    |  1.0  |  4   |
| 337  |    245    |  1.0  |

### 7.3)- Applying 'scaled_purchase_freq' as target feature on model

In [0]:
user_id = 'id'
item_id = 'productId'
users_to_recommend=list(cust_2_rec)
target = 'scaled_purchase_freq'
n_rec = 10 # number of items to recommend
n_display = 30 # 1st 30 rows to display

In [47]:
popularity_model_scaled = tc.popularity_recommender.create(train_data_norm, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target)

In [48]:
popularity_recomm_4_scaled = popularity_model_scaled.recommend(users=users_to_recommend, k=n_rec)
popularity_recomm_4_scaled.print_rows(n_display)

+------+-----------+----------------------+------+
|  id  | productId |        score         | rank |
+------+-----------+----------------------+------+
| 148  |     96    | 0.09523809523809523  |  1   |
| 148  |    201    | 0.08450704225352113  |  2   |
| 148  |    207    | 0.07894736842105263  |  3   |
| 148  |    153    | 0.06578947368421052  |  4   |
| 148  |     49    | 0.06451612903225806  |  5   |
| 148  |    120    | 0.06329113924050633  |  6   |
| 148  |    129    |        0.0625        |  7   |
| 148  |     26    | 0.06097560975609756  |  8   |
| 148  |     35    | 0.057971014492753624 |  9   |
| 148  |    166    | 0.056179775280898875 |  10  |
| 1674 |     96    | 0.09523809523809523  |  1   |
| 1674 |    201    | 0.08450704225352113  |  2   |
| 1674 |    207    | 0.07894736842105263  |  3   |
| 1674 |    153    | 0.06578947368421052  |  4   |
| 1674 |     49    | 0.06451612903225806  |  5   |
| 1674 |    120    | 0.06329113924050633  |  6   |
| 1674 |    129    |        0.0

Great!.... We have all three models worked out. We have purchase frequency of each item customer-vice. <br>
Which one to trust and which one to discard? 

# 8)- Evaluate models

### 8.1)- Validation for Popularity Model on Purchase Counts

In [0]:
models_counts = [popularity_model]

In [0]:
model_names=['Popularity Model on Purchase Counts']

In [51]:
eval_counts = tc.recommender.util.compare_models(test_data, models_counts, model_names)

PROGRESS: Evaluate model Popularity Model on Purchase Counts



Precision and recall summary statistics by cutoff
+--------+----------------------+----------------------+
| cutoff |    mean_precision    |     mean_recall      |
+--------+----------------------+----------------------+
|   1    | 0.012545739675901724 | 0.004783063251437531 |
|   2    | 0.012284370099320435 | 0.010877330545391188 |
|   3    | 0.010106290294476395 | 0.012462972643317648 |
|   4    | 0.009670674333507599 | 0.016313817738281965 |
|   5    | 0.01003659174072137  | 0.01935441714584421  |
|   6    | 0.010454783063251429 | 0.024546959400592448 |
|   7    | 0.011052199238294388 | 0.030150225275682643 |
|   8    | 0.011173549398849984 | 0.03432342618176377  |
|   9    | 0.010977522216414006 | 0.03813506584024101  |
|   10   | 0.011134343962362795 | 0.04263497871704885  |
+--------+----------------------+----------------------+
[10 rows x 3 columns]


Overall RMSE: 0.15738715427700764

Per User RMSE (best)
+-----+------+-------+
|  id | rmse | count |
+-----+------+-------+
| 

### 8.2)- Validation of Popularity Model on Purchase Counts (dummy)

In [52]:
models_counts = [popularity_model_dummy]
model_names=['Popularity Model on Dummy Purchase Counts']
eval_counts_dummy = tc.recommender.util.compare_models(test_data_dummy, models_counts, model_names)

PROGRESS: Evaluate model Popularity Model on Dummy Purchase Counts



Precision and recall summary statistics by cutoff
+--------+----------------------+-----------------------+
| cutoff |    mean_precision    |      mean_recall      |
+--------+----------------------+-----------------------+
|   1    | 0.01781037192247252  | 0.0070892264711018066 |
|   2    | 0.014667365112624425 |  0.010990130293430453 |
|   3    | 0.013095861707700376 |  0.014443072497027446 |
|   4    | 0.012964903090623361 |  0.01794029118544573  |
|   5    | 0.012467260345730775 |  0.02134147355467424  |
|   6    | 0.01248472149467434  |  0.025663731530677567 |
|   7    | 0.012347526752974642 |  0.030100110587276633 |
|   8    | 0.01231011000523835  |  0.03493684884465402  |
|   9    | 0.01193178511146032  |  0.037491165490117856 |
|   10   | 0.011681508643268708 |  0.040854515369968473 |
+--------+----------------------+-----------------------+
[10 rows x 3 columns]


Overall RMSE: 0.0

Per User RMSE (best)
+-----+------+-------+
|  id | rmse | count |
+-----+------+-------+
| 87

### 8.3)- Validation for Popularity Model on Scaled Purchase Counts

In [53]:
models_counts = [popularity_model_scaled]
model_names=['Popularity Model on Scaled Purchase Counts']
eval_counts_dummy = tc.recommender.util.compare_models(test_data_norm, models_counts, model_names)

PROGRESS: Evaluate model Popularity Model on Scaled Purchase Counts



Precision and recall summary statistics by cutoff
+--------+----------------------+-----------------------+
| cutoff |    mean_precision    |      mean_recall      |
+--------+----------------------+-----------------------+
|   1    | 0.008017103153393896 | 0.0028683413504364875 |
|   2    |  0.0082843399251737  | 0.0058703010867628596 |
|   3    | 0.010689470871191866 |  0.012328523071441307 |
|   4    | 0.011357562800641392 |  0.018421521468020695 |
|   5    | 0.011223944414751474 |  0.02283601842661172  |
|   6    | 0.011491181186531287 |  0.029361685882263133 |
|   7    | 0.010994884324654482 |  0.03243172736759052  |
|   8    | 0.010422234099412075 |  0.03534015423379397  |
|   9    | 0.010273769226201062 |  0.03964712020564503  |
|   10   | 0.010154997327632288 |  0.042951948283321856 |
+--------+----------------------+-----------------------+
[10 rows x 3 columns]


Overall RMSE: 0.16004053225406084

Per User RMSE (best)
+-----+------+-------+
|  id | rmse | count |
+-----+----

Dummy purchase count model gives lowest rmse . So we will choose that as best model for our final step.

# 9) Submission

In [54]:
users_to_recommend = list(cust_2_rec)

final_model = tc.item_similarity_recommender.create(tc.SFrame(data_dummy), 
                                            user_id=user_id, 
                                            item_id=item_id, 
                                            target='purchase_dummy')


In [55]:
recom = final_model.recommend(users=users_to_recommend, k=n_rec)
recom.print_rows(n_display)

+------+-----------+----------------------+------+
|  id  | productId |        score         | rank |
+------+-----------+----------------------+------+
| 148  |    248    | 0.025486195087432863 |  1   |
| 148  |    168    | 0.02515336275100708  |  2   |
| 148  |    126    | 0.024275465806325277 |  3   |
| 148  |    214    | 0.021062183380126952 |  4   |
| 148  |    110    | 0.02066227197647095  |  5   |
| 148  |    216    | 0.020616809527079265 |  6   |
| 148  |    243    | 0.02060778538386027  |  7   |
| 148  |     61    | 0.02012509902318319  |  8   |
| 148  |    141    | 0.02008877197901408  |  9   |
| 148  |    139    | 0.019966328144073488 |  10  |
| 1674 |     17    | 0.027234587404463027 |  1   |
| 1674 |     79    | 0.025663250022464328 |  2   |
| 1674 |     80    | 0.02526219023598565  |  3   |
| 1674 |    112    | 0.025098118517133925 |  4   |
| 1674 |     40    | 0.024198671181996662 |  5   |
| 1674 |    177    | 0.02356364991929796  |  6   |
| 1674 |    122    | 0.02353888

### 9.2)- Checking most frequent items in final model

In [56]:
data_dummy.groupby(by=item_id)['purchase_count'].mean().sort_values(ascending=False).head(20)

productId
96     1.080808
207    1.076923
153    1.071429
201    1.069767
26     1.063158
152    1.060345
125    1.058824
112    1.058824
85     1.058824
131    1.057143
35     1.056180
63     1.054545
44     1.054545
129    1.053763
101    1.053763
117    1.052632
150    1.051724
122    1.051724
156    1.051546
120    1.050000
Name: purchase_count, dtype: float64

**RESULT: <BR>
products 96,207 , 153,201 and 26(and so on) are the most frequent selling items across customers.**

# 10)- Bonus Part- Recommending products to customers

In [57]:
# Create a csv output file
df_rec = recom.to_dataframe()
df_rec.head()

Unnamed: 0,id,productId,score,rank
0,148,248,0.025486,1
1,148,168,0.025153,2
2,148,126,0.024275,3
3,148,214,0.021062,4
4,148,110,0.020662,5


In [58]:
print(df_rec.shape)

(10000, 4)


In [0]:
df_rec['recommendedProducts'] = df_rec.groupby([id])[item_id].transform(lambda x: '|'.join(x.astype(str)))
df_output = df_rec[['id', 'recommendedProducts']].drop_duplicates().sort_values('id').set_index('id')

In [60]:
recomendation = final_model.recommend(users=users_to_recommend, k=n_rec)

In [0]:
df_rec = recomendation.to_dataframe()


In [0]:
df_rec['recommendedProducts'] = df_rec.groupby([user_id])[item_id] \
        .transform(lambda x: '|'.join(x.astype(str)))

In [0]:

df_output = df_rec[['id', 'recommendedProducts']].drop_duplicates() \
        .sort_values('id').set_index('id')

In [64]:
df_output.head()

Unnamed: 0_level_0,recommendedProducts
id,Unnamed: 1_level_1
0,125|158|122|214|225|208|137|85|209|87
3,245|163|102|155|82|36|175|48|86|170
5,18|106|152|139|37|79|131|122|87|142
9,141|137|206|227|231|130|7|117|175|50
10,240|152|66|90|108|39|28|159|178|6
