# Model and Conclusion

# Setup

In [1]:
# dependencies
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline 

import seaborn as sns

import missingno as msno

In [2]:
import turicreate as tc

# Get data

In [3]:
# Panda - stored in memory (limited)
# df = pd.read_csv('data/clean_table.csv')

In [29]:
# SFrame - better when you have to scale up
df = tc.SFrame.read_csv('./data/base_table.csv')

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[int,str,str,int,str,int,str,int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


# Data Exploration
## Standard analysis

In [30]:
df.shape

(3386, 10)

In [31]:
# panda only
# df.dtypes

In [32]:
df.head(5)

X1,activity,category,start_date[ms],start_date,end_date[ms]
0,Trello,Personal Adjusting,1540159273005,Mon Oct 22 00:01:13 GMT+02:00 2018 ...,1540159869559
1,Series / Docu,Entertainment,1540159869559,Mon Oct 22 00:11:09 GMT+02:00 2018 ...,1540162820068
2,Sleep,Refresh,1540162820068,Mon Oct 22 01:00:20 GMT+02:00 2018 ...,1540189458018
3,Moving - youtube,Transport,1540189458018,Mon Oct 22 08:24:18 GMT+02:00 2018 ...,1540189949037
4,Trello,Personal Adjusting,1540189949037,Mon Oct 22 08:32:29 GMT+02:00 2018 ...,1540190444165

end_date,activityDuration[m],id,user_id
Mon Oct 22 00:11:09 GMT+02:00 2018 ...,9,1,1
Mon Oct 22 01:00:20 GMT+02:00 2018 ...,49,2,1
Mon Oct 22 08:24:18 GMT+02:00 2018 ...,443,3,1
Mon Oct 22 08:32:29 GMT+02:00 2018 ...,8,4,1
Mon Oct 22 08:40:44 GMT+02:00 2018 ...,8,5,1


# Is the data good enough for the model?

Conclusion: A recommender system needs a table with an ID, an activity ID and optionally ratings. 

- We have a table with IDs
- We have a table with Activity IDs 
- We have no ratings. 

Based on no ratings we will find patterns in similarity of activities.



# Split dataset

In [16]:
train, test = tc.recommender.util.random_split_by_user(df, 'category', 'activity') 

# Build model pipeline

In [17]:
model = tc.recommender.create(train, 'user_id','activity', ranking=False)

# Declare hyperparameter

# Fit and tune with cross-validation

# Evaluate Model

In [34]:
results = model.recommend(exclude_known=False)
results

user_id,activity,score,rank
1,Work - Task planning,0.9883720930232558,1
1,Org,0.9883720930232558,2
1,WC,0.9883720930232558,3
1,Work - Purpose,0.9883720930232558,4
1,Social - Real Life,0.9883720930232558,5
1,Food,0.9883720930232558,6
1,Clean,0.9883720930232558,7
1,Moving - youtube,0.9883720930232558,8
1,Sleep,0.9883720930232558,9
1,Trello,0.9883720930232558,10


In [42]:
n_rows = 10
activities = ['Sleep']
similar_items = model.get_similar_items(activities, k=n_rows)
similar_items.print_rows(n_rows)

+----------+----------------------+-------+------+
| activity |       similar        | score | rank |
+----------+----------------------+-------+------+
|  Sleep   |        Trello        |  1.0  |  1   |
|  Sleep   |   Moving - youtube   |  1.0  |  2   |
|  Sleep   |        Clean         |  1.0  |  3   |
|  Sleep   |         Food         |  1.0  |  4   |
|  Sleep   |  Social - Real Life  |  1.0  |  5   |
|  Sleep   |    Work - Purpose    |  1.0  |  6   |
|  Sleep   |          WC          |  1.0  |  7   |
|  Sleep   |         Org          |  1.0  |  8   |
|  Sleep   | Work - Task planning |  1.0  |  9   |
|  Sleep   |      Work - Org      |  1.0  |  10  |
+----------+----------------------+-------+------+
[10 rows x 4 columns]



# Select winner model

# Save winning model

In [40]:
model.save("free_recommendation.model")

# Load winning model

In [41]:
model = tc.load_model("free_recommendation.model")

# Communicate Results (Conclusion)

We find a recommendation system that scores on similarity of an activity.

Sadly: the score for every activity is 1. So there is no real recommendation here, but rather a random display of activities.

### Conclusion:
- our reccommendation is to unspecific. We will recommend based on new specified time frames.
- We need a new set on features
  - we will put every activity as feature column (object to numerical)
  - we will look on the data on a weekly base
  - we will add time slots every half hour (48 per day | 333 per week) 
- We can evaluate now based on the metric that after sleep should be either prepare food, learning language or clean appear.  
  
Our new recommendation engine will recommend based on a given time slot.
```python 
# train model
model = tc.recommender.create(
    df, 'time_slot','activity', ranking=False)

# get recommendation
activities = ['Sleep']
similar_items = model.get_similar_items(activities, k=n_rows)
```


Observation_data   
user_id = Our time slot will be the user. so our recommendations get based on the time slot
itemId = Will be still our activity.  