# Data Explore
The purpose of this notebook is to begin our initial investigation into the KAIRX resident experiment. We will see how many records we have, how many records we can match across surveys, and plausible analyses to complete

In [1]:
%matplotlib inline
import pandas as pd
import json
import numpy as np
import matplotlib.pyplot as plt
from textwrap import wrap

cq_control = pd.read_csv(header=1,filepath_or_buffer=open("./data/KaiRx_Clinical_Questionnaire__Control.csv","rb"))
cq_control['trt'] = 'control'
np.sum(~pd.isnull(cq_control['Please enter your Duke email address: ']))
print('Clinical Questionairre Control (n=', 
      cq_control.shape[0], 
      '). There are ', 
      np.sum(~pd.isnull(cq_control['Please enter your Duke email address: '])),
     ' valid email addresses to merge to.')
cq_interve = pd.read_csv(header=1,filepath_or_buffer=open("./data/KaiRx_Clinical_Questionnaire__Intervention.csv","rb"))
cq_interve['trt'] = 'intervention'
print('Clinical Questionairre Intervention (n=', 
      cq_interve.shape[0], 
      '). There are ', 
      np.sum(~pd.isnull(cq_interve['Please enter your Duke email address: '])),
     ' valid email addresses to merge to.')

#Clinical questionairre data
cq = cq_control.append(cq_interve)


us_control = pd.read_csv(header=1,filepath_or_buffer=open("./data/KaiRx_Usability_Survey_and_Feedback__Control.csv","rb"))
us_control['trt'] = 'control'
print('Usability Questionairre Control (n=', 
      us_control.shape[0], 
      '). There are ', 
      np.sum(~pd.isnull(us_control['Email address:'])),
     ' valid email addresses to merge to.')
us_interve = pd.read_csv(header=1,filepath_or_buffer=open("./data/KaiRx_Usability_Survey_and_Feedback__Intervention.csv","rb"))
us_interve['trt'] = 'intervention'
print('Usability Questionairre Intervention (n=', 
      us_interve.shape[0], 
      '). There are ', 
      np.sum(~pd.isnull(us_interve['Email address:'])),
     ' valid email addresses to merge to.')

us = us_control.append(us_interve)

Clinical Questionairre Control (n= 14 ). There are  12  valid email addresses to merge to.
Clinical Questionairre Intervention (n= 17 ). There are  17  valid email addresses to merge to.
Usability Questionairre Control (n= 11 ). There are  11  valid email addresses to merge to.
Usability Questionairre Intervention (n= 16 ). There are  16  valid email addresses to merge to.


> #### Results
From above, we can see that we have a total of (14+17=31) clinical questionairre records. *However*, it appears that we're missing two email addresses, so these two records will have to be dropped if we try to merge with the usability questionairre. We also find that we have (11+16=27) usability records. It appears that there are only 11 usability control records. Hopefully all 27 of these usability records can be matched back to the clinical questionairres (let's see).

### Merging Clinical and Usability Records
The intent of collecting email addresses was to be able to due more advanced modeling of results (adjusting for potential confounders). First, we'll see how viable it is to merge these two data sources and see any problems that occur. 

In [2]:
merged_email = cq[['Please enter your Duke email address: ','StartDate','trt']].merge(us[['Email address:','LocationLatitude','trt']],
                                                                 left_on='Please enter your Duke email address: ',
                                                                right_on='Email address:',
                                                                 how='right')
print('There are', merged_email.shape[0], ' merged records available.')

There are 29  merged records available.


> #### Results
In digging deeper into potential issues:
    *1. It appears there was one user who did't answer any clinical questionairres, but did answer the usability questions
    *2. It appears there was one user who answered the clinical questionairre for both clinical and intervention, but only answered the usability under control.

#### Moving forward
After an email discusson on the issue above, we decided to count scenario (2) above only once for the control and (1) we will remove all together becuase no clinical questions were answered, which would be dropped anyway from clinical questionairre analyses and could bias usability if the user didn't really understand or use the tool properly. The following is how we modify these records:

In [7]:
cq_questions_of_interest = ['Q_TotalDuration', 
                            'trt',
                            'Please enter your Duke email address: ', 
                            'Meet Mr. Smith. He is a 62 year old gentleman living in Durham, NC with his family. He has been s...', 
                            'Task 1: Diabetes 1. Which medication(s) is the patient currently taking for diabetes?', 
                            '2. Approximately how long has he been taking his diabetes medication(s)?', 
                            '3. Has he been compliant with his diabetes medication? How do you know? ', 
                            '4. You ask him about his diabetes medications and you can trust that he is telling you the truth...', 
                            '5. You see that he is currently on this maximum dose. You are wondering how recently this last ch...', 
                            '6. You realize that he has already taken the maximum dose of metformin for awhile without much ch...', 
                            '7. What was the starting dose of allopurinol prescribed for Mr. Smith? What was the last prescrib...', 
                            '8. Who prescribed his last refill of allopurinol? ', 
                            '9. You see a prescription but wonder if he has picked up his last prescription of allopurinol. Ba...', 
                            '10. He did not pick up his last allopurinol prescription at the pharmacy. What other agents was h...-a. No other medications have been prescribed for gout.', 
                            '10. He did not pick up his last allopurinol prescription at the pharmacy. What other agents was h...-b. Probenecid', 
                            '10. He did not pick up his last allopurinol prescription at the pharmacy. What other agents was h...-c. Naproxen', 
                            '10. He did not pick up his last allopurinol prescription at the pharmacy. What other agents was h...-d. Bupropion', 
                            '10. He did not pick up his last allopurinol prescription at the pharmacy. What other agents was h...-e. Colchicine', 
                            '11. After asking the patient about his gout, he explains that he has not picked up his gout medic...', 
                            'Task 2: Hypertension 1. Mr. Smith was diagnosed with hypertension 20 years ago. What was the firs...', 
                            '2. The patient’s blood pressure did not decrease in the 3 years after he was prescribed the thiaz...', 
                            '3. Over time, Mr. Smith’s blood pressure still continued to increase despite his compliance with...', 
                            '4. What do you suspect happened with his new antihypertensive regimen of HCTZ + ACE inhibitor tha...', 
                            '5. Despite Mr. Smith’s diligent adherence to his exercise and diet regimen and his past medicatio...', 
                            'Task 3: Depression 1. You continue with your check-up.The patient has a long-standing history o...-a. Selective serotonin reuptake inhibitors (SSRIs)', 
                            'Task 3: Depression 1. You continue with your check-up.The patient has a long-standing history o...-b. Selective norepinephrine reuptake inhibitors (SNRIs)', 
                            'Task 3: Depression 1. You continue with your check-up.The patient has a long-standing history o...-c. Tricyclic antidepressants (TCAs)', 
                            'Task 3: Depression 1. You continue with your check-up.The patient has a long-standing history o...-d. Monoamine oxidase inhibitors (MAOIs)', 
                            'Task 3: Depression 1. You continue with your check-up.The patient has a long-standing history o...-e. Atypical antidepressants', 
                            '2. Which drugs have been escalated in dose at least twice?(Check all that apply.)-a. Bupropion', 
                            '2. Which drugs have been escalated in dose at least twice?(Check all that apply.)-b. Sertraline', 
                            '2. Which drugs have been escalated in dose at least twice?(Check all that apply.)-c. Fluoxetine', 
                            '2. Which drugs have been escalated in dose at least twice?(Check all that apply.)-d. Duloxetine', 
                            '3. You ask Mr. Smith about the details of his depression medication changes in the past but he ha...', 
                            '4. You’d like to know what date that provider changed the medication so you can investigate the n...', 
                            '5. Why do you think the provider chose bupropion to prescribe in 2006? ',
                            'Score-sum']


us_questions_of_interest = ['Thank you for completing the interactive portion of this module! There is one last portion to com...',
       'Ease of Use (SUS)-I think that I would like to use this application frequently.',
       'Ease of Use (SUS)-I found the application unnecessarily complex.',
       'Ease of Use (SUS)-I thought the application was easy to use.',
       'Ease of Use (SUS)-I think that I would need the support of a technical person to be able to use this application.',
       'Ease of Use (SUS)-I found the various functions in the application were well integrated.',
       'Ease of Use (SUS)-I thought there was too much inconsistency in this application.',
       'Ease of Use (SUS)-I would imagine that most people would learn to use this application very quickly.',
       'Ease of Use (SUS)-I found the application very cumbersome to use.',
       'Ease of Use (SUS)-I felt very confident using the application.',
       'Ease of Use (SUS)-I needed to learn a lot of things before I could get going with this application.',
       'Cognitive Workload (NASA-TLX)-How mentally demanding was the task?',
       'Cognitive Workload (NASA-TLX)-How much time pressure did you feel?',
       'Cognitive Workload (NASA-TLX)-How successful were you in accomplishing what you were asked to do?',
       'Cognitive Workload (NASA-TLX)-How hard did you have to work to accomplish your level of performance?',
       'Cognitive Workload (NASA-TLX)-How insecure, discouraged, irritated, stressed, and annoyed were you?',
       'User Satisfaction-I believe this medication display would help improve my ability to manage patients’  medications.',
       'User Satisfaction-I believe this medication display would aid me in providing a better treatment regimen for my patients.',
       'User Satisfaction-I believe this medication display would improve patient outcomes.',
       'Please leave any comments, suggestions, questions in the space below: ',
       'Current Residency Year:', 
       'Email address:',
       'trt']

In [8]:
us_keep = us[us_questions_of_interest][us['Email address:']!='peter.hu@duke.edu']
cq_keep = cq[cq_questions_of_interest]

cq_keep = cq_keep[((cq_keep['Please enter your Duke email address: ']=='ahk18@duke.edu') & 
               (cq_keep['trt']!='intervention') |
               (cq_keep['Please enter your Duke email address: ']!='ahk18@duke.edu'))]


merged_email = cq_keep.merge(us_keep,
                             left_on=['Please enter your Duke email address: ','trt'],
                             right_on=['Email address:','trt'])

merged_email.to_csv('./data/01_filtered_combined.csv',index=False)
print('The final main cohort consists of ', merged_email.shape[0], 'records.  These are mapped to the usability survey')

The final main cohort consists of  25 records.  These are mapped to the usability survey


Notice the final cohort consists of **25 records**. This is becuase we also had to remove two records from the clinical questionairre control study where missing email addresses were located.