# Data_Analysis

Questions:

- Does X treatment affect Y symptom positively/negatively/not at all? What are the most strongly-correlated symptoms and treatments?
- Are there subsets within our current diagnoses that could more accurately represent symptoms and predict effective treatments?
- Can we reliably predict what triggers a flare for a given user or all users with a certain condition?
- Could we recommend treatments more effectively based on similarity of users, rather than specific symptoms and conditions? (Netflix recommendations for treatments)
- Can we quantify a patient’s level of disease activity based on their symptoms? How different is it from our existing measures?
- Can we predict which symptom should be treated to have the greatest effect on a given illness?
- How accurately can we guess a condition based on a user’s symptoms?
- Can we detect new interactions between treatments?

<a href="https://www.kaggle.com/flaredown/flaredown-autoimmune-symptom-tracker?select=export.csv">Source</a>

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [2]:
df = pd.read_csv(r'C:\Users\Chunna\Documents\Data_Analyst_Portfolio\Chronic_Illness\Chronic_Illness.csv', low_memory=False)
df

Unnamed: 0,user_id,age,sex,country,checkin_date,trackable_id,trackable_type,trackable_name,trackable_value
0,QEVuQwEABlEzkh7fsBBjEe26RyIVcg==,,,,2015-11-26,1069,Condition,Ulcerative colitis,0
1,QEVuQwEAWRNGnuTRqXG2996KSkTIEw==,32.0,male,US,2015-11-26,1069,Condition,Ulcerative colitis,0
2,QEVuQwEA+WkNxtp/qkHvN2YmTBBDqg==,2.0,female,CA,2017-04-28,3168,Condition,pain in left upper arm felt like i was getting...,4
3,QEVuQwEA+WkNxtp/qkHvN2YmTBBDqg==,2.0,female,CA,2017-04-28,3169,Condition,hip pain when gettin up,3
4,QEVuQwEA+WkNxtp/qkHvN2YmTBBDqg==,2.0,female,CA,2017-04-28,3170,Condition,pain in hand joints,4
...,...,...,...,...,...,...,...,...,...
7976218,QEVuQwEAtlfm8VyoxZ9biWjDHb74gQ==,22.0,female,GB,2019-12-04,1,Tag,tired,
7976219,QEVuQwEAtlfm8VyoxZ9biWjDHb74gQ==,22.0,female,GB,2019-12-04,2,Tag,stressed,
7976220,QEVuQwEAtlfm8VyoxZ9biWjDHb74gQ==,22.0,female,GB,2019-12-04,9002,Food,soup,
7976221,QEVuQwEAtlfm8VyoxZ9biWjDHb74gQ==,22.0,female,GB,2019-12-04,9139,Food,yogurt,


In [3]:
# Check for Nulls
null_count = df.isnull().sum().sum()
null_count

1666202

In [4]:
# Count rows
row_count = df.shape[0]
row_count

7976223

In [5]:
# Percentage of nulls
round(null_count/row_count * 100,2)

20.89

In [6]:
# Case deletion
df.dropna()
df.sample(5)

Unnamed: 0,user_id,age,sex,country,checkin_date,trackable_id,trackable_type,trackable_name,trackable_value
2846526,QEVuQwEAQQ3ad6+Yz7a7qu5MgGd8/w==,40.0,female,US,2018-07-30,5676,Treatment,Metformin,250mg qhs
3185704,QEVuQwEAnZmjZgZjRqV7rHnJsIn3Hg==,21.0,doesnt_say,GB,2018-09-24,397,Condition,Fibromyalgia,3
3713564,QEVuQwEAmudAnVOr31mxzTwu7H4/zA==,21.0,female,US,2019-01-10,2055,Condition,Stomach problems,3
6709026,QEVuQwEANsxIbSg0M10KHWTMvK0NWQ==,36.0,female,US,2019-11-04,219825,Weather,icon,clear-day
69284,QEVuQwEAwSdAon7QCJUAUcdkUDK/5g==,33.0,other,GB,2015-07-01,145,Symptom,Muscle pain,3


In [7]:
# Unique values in each column
for col in df:
    print(df[col].unique())

['QEVuQwEABlEzkh7fsBBjEe26RyIVcg==' 'QEVuQwEAWRNGnuTRqXG2996KSkTIEw=='
 'QEVuQwEA+WkNxtp/qkHvN2YmTBBDqg==' ... 'QEVuQwEAe+Me/lpz+GEjbH7bMu3UeA=='
 'QEVuQwEA/3ZnJPHdtLiOWBy1VAYIYA==' 'QEVuQwEAtlfm8VyoxZ9biWjDHb74gQ==']
[         nan  3.20000e+01  2.00000e+00  3.10000e+01  2.90000e+01
  3.30000e+01  3.70000e+01  2.40000e+01  4.20000e+01  4.50000e+01
  3.40000e+01  4.70000e+01  2.50000e+01  1.18000e+02  4.60000e+01
  5.10000e+01  3.80000e+01  4.40000e+01  6.20000e+01  2.00000e+01
  7.60000e+01  5.00000e+01  5.30000e+01  3.00000e+01  5.60000e+01
  3.50000e+01  2.20000e+01  6.30000e+01  5.80000e+01  2.30000e+01
  4.30000e+01  1.90000e+01  2.80000e+01  3.90000e+01  4.10000e+01
  5.50000e+01  6.50000e+01  2.70000e+01  3.60000e+01  4.80000e+01
  2.10000e+01  6.40000e+01  6.00000e+01  5.90000e+01  9.00000e+01
  2.60000e+01  7.00000e+01  5.40000e+01  6.90000e+01  4.00000e+01
  7.40000e+01  1.70000e+01  6.60000e+01  4.90000e+01  1.80000e+01
  7.10000e+01  3.00000e+00  7.80000e+01  6.10000e+01  5.

### Does X treatment affect Y symptom positively/negatively/not at all? What are the most strongly-correlated symptoms and treatments?

In [15]:
symptom = df['trackable_type']=='Symptom'
treatment = df['trackable_type']=='Treatment'
df_corr = df[symptom]
df_corr = pd.pivot_table(
                        df_corr,
                        values='trackable_value',
                        index=['user_id','age','sex','country'],
                        columns='trackable_type'
                        )
                                
                        
df_corr
#plt.matshow(df_corr.corr())
#plt.show()

DataError: No numeric types to aggregate