## Finding Effective Treatments


Many of the analysis that we want to perform hinge on being able to determine which treatments or tags are having an effect on how a user feels.
In this document we will go over a few different ways to measure this.

There are two major challenges that we need to deal with.  First is that the effects that we are trying to measure may have unmeasured delays or cycles.  For example, a person who begins taking Synthroid for a thyroid condition is not expected to see improvement for several weeks.
The second major challenge is that this data is inherently noisey.  The noise is expected from user reported data.  Nobody uses the app every single day, and may not give perfectly accurate representations of their symptoms.  

We should also be aware that we may be dealing with some biases.  For example people who are drawn to use the app are likely suffering more than the average person, are more likely in a younger age group, and have likely already had some trouble with diagnosis.

I would like to make it clear that this document will not be attempting to measure the effectiveness of a treatment within a population, but rather for specific users.  To measure the effectiveness in a population, we would be creating a Patient Reported Outcome measure(more can be read about PRO standards [here.](http://www.ncbi.nlm.nih.gov/books/NBK126186/))  It is important to realize that Flaredown was developed as a tool for helping it's patients identify patterns in the flare-ups of auto-immune diseases and not specifically designed for the task of PRO gathering.  While create a new PRO measure using the Flaredown data is an interesting project, this is outside of the scope of this document.

NOTE ABOUT VERSIONS : this notebook now includes profile information in it's output.  For this reason you will need to use the datafile dated 083016 or later

## Correlation

The first thing we will try in finding effective treatments will be simply a correlation between treatments/tags and the severity of conditions/symptoms.

We will take a look at our most well represented condition, Depression, and see if it has a correlation between being on a treatment or tag on a specific day, and their depression rating for that day.

In [1]:
import numpy as np
import pandas as pd
from scipy.stats.stats import pearsonr
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv("flaredown_trackable_data_083016.csv")
df['checkin_date'] = pd.to_datetime(df['checkin_date'])

In [2]:
P_THRESHOLD = 0.05

just_depressed_users = df.groupby(['user_id', 'checkin_date']).filter(lambda x: 'Depression' in x['trackable_name'].values)

#create a table of depression scores by user/day
def add_depression_score(x):
    return x[x['trackable_name'] == 'Depression']['trackable_value'].values[0]
depression_days = just_depressed_users.groupby(['user_id', 'checkin_date'])

depression_scores = depression_days.apply(add_depression_score)
depression_scores = depression_scores.reset_index()
depression_scores.columns = ["user_id", "checkin_date", "depression_score"]

just_depressed_users = just_depressed_users[just_depressed_users['trackable_type'] == 'Treatment'].append(just_depressed_users[just_depressed_users['trackable_type'] == 'Tag'])
just_depressed_users = pd.get_dummies(just_depressed_users, columns=['trackable_name'])

just_depressed_users = just_depressed_users.merge(depression_scores, on=['user_id','checkin_date'])
just_depressed_users['depression_score'] = pd.to_numeric(just_depressed_users['depression_score'])

#check the correlation between a treatment being used and the depression score for each user
userlist = set(just_depressed_users['user_id'][0:200]) #just printing out first 200 users so it's not too long to read
for userid in userlist:  
    user = just_depressed_users[just_depressed_users['user_id'] == userid]
    first = True
    for column in user.columns[8:len(user.columns)-1]:
        corr,p =  pearsonr(user['depression_score'], user[column])
        if (p < P_THRESHOLD):
            print "found correlation between " + column + " and depression of " + str(corr) + " at p value " + str(p)

found correlation between trackable_name_tooth infection and depression of 0.488312241178 at p value 0.0113742772503
found correlation between trackable_name_Netflix day and depression of 0.430006581784 at p value 0.00885606973451
found correlation between trackable_name_tired and depression of 1.0 at p value 0.0
found correlation between trackable_name_Adderall and depression of -1.0 at p value 0.0
found correlation between trackable_name_tired and depression of 1.0 at p value 0.0
found correlation between trackable_name_tooth infection and depression of 0.488312241178 at p value 0.0113742772503


What makes this difficult is that we are trying to determine if a treatment/tag is working for an individual, not the whole group.  Which means that our sample size for each user depends on how many days they have logged in the app.  This causes fairly low sample sizes, and inacceptably high p values, especially for this shotgun approach.  We also need to consider, does the treatment make the user feel better the same day, what if the treatment always makes the user feel better the next day?  What if a treatment takes a week to kick in?  Or what if the treatment, such as Humira for Crohn's, functions on a two week cycle?

As can be seen above, the vast majority of users are not getting any correlations with an acceptable p value, so this strict method of determining if a treatment/tag is working can only be considered inconclusive.

This way of thinking may prove fruitful when the app has been around for longer, but at this time the number of logged days per user is too low.


## Comparison of Before and After Treatment

In order to simplify this metric, lets take a look at the day that a treatment started.  If a user suffers from a condition, and then starts logging a new treatment or tag, does the condition get better after that point?  

We will need to identify the date that a treatment or tag started, and then calculate the mean condition score before that date and the mean condition score after that date.  Hopefully the score after the date will be improved (lower).

Using this metric helps us get around a few quirks of the data.  Since we are taking the mean, we can ignore any cyclical nature of the condition.  Also it accounts for users who have started taking a treatment but are not consistantly logging it (which our exploration shows is very common).

In [3]:
#create a dataframe of treatments with their start date
treatments = df[df['trackable_type'] == "Treatment"]
treatment_start_dates = treatments.groupby(['user_id', 'trackable_name'])['checkin_date'].min().reset_index()
treatment_end_dates = treatments.groupby(['user_id', 'trackable_name'])['checkin_date'].max().reset_index()
treatment_dates = treatment_start_dates.merge(treatment_end_dates, on=['user_id','trackable_name'])

#create a dataframe of the conditions and symptoms by user with their start and end dates
conditions = df[(df['trackable_type'] == "Condition") | (df['trackable_type'] == "Symptom")]
conditions_start_end = conditions.groupby(['user_id', 'trackable_name'])['checkin_date'].min().reset_index()
conditions_start_end = conditions_start_end.merge(conditions.groupby(['user_id', 'trackable_name'])['checkin_date'].max().reset_index(), on=['user_id','trackable_name'])

#remove symptoms/conditions that haven't been recorded for at least a couple weeks (a couple weeks is arbitrary, feel free to adjust this value in the future)
conditions_start_end = conditions_start_end[(conditions_start_end['checkin_date_y'] - conditions_start_end['checkin_date_x']).astype('timedelta64[D]') > 14]

#merge the two dfs together
dates_df = conditions_start_end.merge(treatment_dates, on='user_id')

#changing column names because they are confusing post merge
dates_df.columns = ['user_id','symptom_name','symptom_start','symptom_end', 'treatment_name','treatment_start', 'treatment_end']

#remove symptoms/conditions that started logging around the same time as the treatment started logging
dates_df = dates_df[(dates_df['treatment_start'] - dates_df['symptom_start']).astype('timedelta64[D]') > 7 ]
#remove symptoms/conditions that stopped logging before the treatment started logging
dates_df = dates_df[(dates_df['symptom_end'] - dates_df['treatment_start']).astype('timedelta64[D]') > 7 ]

#iterate through and find the trackable_values before and after the treatment date
def findBeforeAndAfterTreatment(dates_df_local, orig_df):
    new_df = pd.DataFrame(columns=['user_id', 'age', 'sex', 'country', 'condition','treatment','before_value','after_value','effectiveness'])
    for row in dates_df_local.iterrows():
        row = row[1]
        df_user_symptom = orig_df[(orig_df['user_id'] == row['user_id']) & (orig_df['trackable_name'] == row['symptom_name'])]
        values_before = df_user_symptom[(df_user_symptom['checkin_date'] < row['treatment_start'])]['trackable_value'].values
        values_after = df_user_symptom[(df_user_symptom['checkin_date'] >= row['treatment_start']) & (df_user_symptom['checkin_date'] <= row['treatment_end'])]['trackable_value'].values
        values_before = [x for x in values_before if str(x) != 'nan']
        values_after = [x for x in values_after if str(x) != 'nan']
        values_before = map(int,values_before)
        values_after = map(int,values_after)
        mean_before = np.mean(values_before)
        mean_after = np.mean(values_after)
        new_df = new_df.append({'user_id' : row['user_id'], 'age' : orig_df[orig_df['user_id'] == row['user_id']]['age'].values[0], 'sex' : orig_df[orig_df['user_id'] == row['user_id']]['sex'].values[0], 'country' : orig_df[orig_df['user_id'] == row['user_id']]['country'].values[0], 'condition' : row['symptom_name'], 'treatment' : row['treatment_name'], 'before_value' : mean_before, 'after_value' : mean_after, 'effectiveness' : mean_before - mean_after}, ignore_index=True)
    return new_df

treatments_only = findBeforeAndAfterTreatment(dates_df, df)
print treatments_only.head(10)
print "users that started taking a treatment after logging conditions " + str(len(treatments_only))
print "users that report lower condition ratings while on a treatment " + str(len(treatments_only[treatments_only['effectiveness'] > 0]))

   user_id  age     sex country         condition     treatment  before_value  \
0       20   49    male      US           Fatigue      Provigil      1.750000   
1       20   49    male      US  major somnolence      Provigil      2.000000   
2       20   49    male      US        sleepiness      Provigil      1.750000   
3       52   44  female      US           Allergy  Escitalopram      1.083333   
4       52   44  female      US           Allergy     Magnesium      0.800000   
5       52   44  female      US           Anxiety  Escitalopram      2.066667   
6       52   44  female      US           Anxiety     Magnesium      1.782609   
7       52   44  female      US      Constipation  Escitalopram      0.666667   
8       52   44  female      US      Constipation     Magnesium      0.550000   
9       52   44  female      US        Depression  Escitalopram      0.600000   

   after_value  effectiveness  
0     1.444444       0.305556  
1     1.333333       0.666667  
2     1.7777

Now that's more like it!  We have found quite a large number of users that are rating their symptoms as less severe while on a treatment. 

## Working in the Tags

Tags may hold the highest value for the recommender system.  For treatment recommendations, users would be best served by talking to their doctor.  But for tags such as "stressed" or "good sleep", it can be very difficult to tell what is going to work for different people.

The above code works on an assumption that a person has been using a Treatment during the interval between the first time they logged it and the last time they logged it.  The same can not reasonably be said about tags.  If a person tags "good sleep", we can not assume that they continue to sleep well for any duration.  For this reason we will need to use a different system to try and determine what is working for each user.  Instead we will use the difference between their average symptom rating, and the average between the symptom rating for the day of the tag, as well as the next day.  This will capture at least some delayed effect from the tag.

In [6]:
import datetime, time
tags = df[df['trackable_type'] == "Tag"]
conditions = df[(df['trackable_type'] == "Condition") | (df['trackable_type'] == "Symptom")]
    
tag_dates = tags.groupby(['user_id', 'trackable_name'])['checkin_date'].apply(lambda x: x.tolist()).reset_index()

#iterating through and building a new df again, you weren't in a hurry right?
tag_df = pd.DataFrame(columns=['user_id', 'age', 'sex', 'country', 'condition','treatment','before_value','after_value','effectiveness'])
for row in tag_dates.iterrows():
    values = row[1]
    user_condition_rows = df[(df['user_id'] == values['user_id']) & (df['trackable_type'] == 'Condition')]
    conditions = set(user_condition_rows['trackable_name'])
    for condition in conditions:
        condition_values_unaligned = []
        condition_values_aligned = []
        condition_values = user_condition_rows[user_condition_rows['trackable_name'] == condition]['trackable_value'].values
        condition_dates = user_condition_rows[user_condition_rows['trackable_name'] == condition]['checkin_date'].values
        for i in range(len(condition_values)-1):
            matched = False
            for tag_date in values['checkin_date']:
                #ended up with dates in different formats, just converting them both to days for comparison
                condition_day = float(condition_dates[i]) / 86400000000000
                tag_day = time.mktime(tag_date.timetuple()) / 86400
                if condition_day - tag_day < 2:
                    matched = True
            if matched:
                condition_values_aligned.append(int(condition_values[i]))
            else:
                condition_values_unaligned.append(int(condition_values[i]))
        if (len(condition_values_aligned) > 0 and len(condition_values_unaligned) > 0): #don't record if we haven't seen at least one of each
            #sometimes treatments are recorded as both treatments and tags for the same user, need to find those cases or we end up with duplicate entries
            user_treatments = df[(df['user_id'] == values['user_id']) & (df['trackable_type'] == 'Treatment')]
            if not (values['trackable_name'] in user_treatments.values):
                before_value = np.mean(condition_values_unaligned)
                after_value = np.mean(condition_values_aligned)
                effectiveness = before_value - after_value
                tag_df = tag_df.append({'user_id' : values['user_id'], 'age' : df[df['user_id'] == values['user_id']]['age'].values[0], 'sex' : df[df['user_id'] == values['user_id']]['sex'].values[0], 'country' : df[df['user_id'] == values['user_id']]['country'].values[0], 'condition' : condition, 'treatment' : values['trackable_name'], 'before_value' : before_value, 'after_value' : after_value, 'effectiveness' : effectiveness}, ignore_index=True)
print tag_df.head(10)

   user_id  age     sex country                     condition  \
0        7   28  female      US  Generalized anxiety disorder   
1        7   28  female      US                 Gastroparesis   
2        7   28  female      US  Generalized anxiety disorder   
3        7   28  female      US                 Gastroparesis   
4        7   28  female      US  Generalized anxiety disorder   
5        7   28  female      US                 Gastroparesis   
6        7   28  female      US  Generalized anxiety disorder   
7        7   28  female      US                 Gastroparesis   
8        7   28  female      US  Generalized anxiety disorder   
9        7   28  female      US                 Gastroparesis   

               treatment  before_value  after_value  effectiveness  
0  Confronted a coworker      0.000000     1.333333      -1.333333  
1  Confronted a coworker      0.000000     1.500000      -1.500000  
2  Wrong about something      0.000000     1.333333      -1.333333  
3  Wrong

In [7]:
#since this takes a while to run, append the two dfs together and write to a csv file
treatments_and_tags = pd.concat([treatments_only,tag_df])
treatments_and_tags.to_csv('effectiveness.csv', index=False)