## Recommender System

The goal of this document is to provide a way to recommend treatments to users.  For each condition, we can see what treatments have worked for other patients.  We can also go one step further and say, if Treatment/Tag A has worked for you, then other people who have had success with Treatment/Tag A have also had success with Treatment/Tag B.

The same will also be possible in reverse.  Some Treatments/Tags may cause Conditions/Symptoms to worsen, and we may be able to recommend against those Treatments/Tags.

Our ability to make these recommendations hinges on our ability to determine when a Treament/Tag is working, which we will measure as the correlation between a Treatment/Tag and our target Condition.  These correlations will each of a p-value which will need to be low in order for the correlation to be useful.  Correlations without low p-values will be discarded, leaving us less condition-treatment combinations to work with.  The best way to improve the p-values is to increase the number of samples, in this case the number of users.

In this notebook I'll just be focusing on depression because it is well represented in the data, and has a large number of associated treatments.  The same code can be used on all conditions or symptoms, but only if a high number of samples have been measured.

In [2]:
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv("flaredown_trackable_data_080316.csv")
df['checkin_date'] = pd.to_datetime(df['checkin_date'])

In [3]:
just_depressed_users = df.groupby(['user_id', 'checkin_date']).filter(lambda x: 'Depression' in x['trackable_name'].values)
#print just_depressed_users.head(20)
#print just_depressed_users[just_depressed_users['trackable_type'] == 'Treatment'].head(20)
def add_depression_score(x):
    return x[x['trackable_name'] == 'Depression']['trackable_value'].values[0]

#just_depressed_users['depression_score'] = just_depressed_users.groupby(['user_id', 'checkin_date']).transform(add_depression_score)
depression_days = just_depressed_users.groupby(['user_id', 'checkin_date'])
depression_scores = depression_days.apply(add_depression_score)
#print depression_scores

just_depressed_users = just_depressed_users[just_depressed_users['trackable_type'] == 'Treatment'].append(just_depressed_users[just_depressed_users['trackable_type'] == 'Tag'])
just_depressed_users = pd.get_dummies(just_depressed_users, columns=['trackable_name'])
print just_depressed_users

        user_id checkin_date  trackable_id trackable_type  trackable_value  \
181          25   2015-05-26          1470      Treatment         100.0 mg   
182          25   2015-05-26          5681      Treatment          20.0 mg   
183          25   2015-05-26          7404      Treatment         56.25 mg   
195          25   2015-05-27          1470      Treatment         100.0 mg   
196          25   2015-05-27          5681      Treatment          10.0 mg   
567          65   2015-05-26          1470      Treatment         100.0 mg   
568          65   2015-05-26          5681      Treatment          20.0 mg   
569          65   2015-05-26          7404      Treatment         56.25 mg   
581          65   2015-05-27          1470      Treatment         100.0 mg   
582          65   2015-05-27          5681      Treatment          10.0 mg   
615          72   2015-05-25          4544      Treatment         600.0 mg   
616          72   2015-05-25          5805      Treatment       

In [4]:
#the dataframe is now one-hotted and tells us whether a user used a treatment or tag on a specific day.
#for each row, multiply 1 + our depression ratings for that user/day.  That will give us 0 for unrated user days, and 1 to 5 for rated days
#TODO this is pretty slow, maybe merge/apply instead
for index, row in just_depressed_users.iterrows():
    depression = depression_scores[int(row['user_id']),row['checkin_date']]
    for i in range(5, len(row)):
        just_depressed_users.loc[index, just_depressed_users.columns[i]] = row[i] * (int(depression) + 1)
        
print just_depressed_users.head(100)

KeyboardInterrupt: 

In [None]:
def merge_depression_ratings(x):
    print x
    depression = depression_scores[int(x['user_id']),x['checkin_date']]
    print depression

just_depressed_users.apply(merge_depression_ratings)