# Physical Activity Analysis

Dataset used is the **Physical Activity and Transit Survey (PAT)**. It is a survey conducted by the NYC Department of Health and Mental Hygiene in 2010-2011 consisting of 2 major parts: a **telephone survey of physical activity and health** (N=3811) and a **weeklong accelerometer device study** (N=679) . Note: A sub-sample of participants of the phone survey participated in the accelerometer study. 

[Link to survey](https://www.nyc.gov/site/doh/data/data-sets/physical-activity-and-transit-survey-public-use-data.page)

In [2]:
import numpy as np
import pandas as pd

In [101]:
# Adjust settings to display all rows and columns

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [48]:
import sas7bdat_converter

file_dicts = [
  {
    'sas7bdat_file': '/Users/alex/Downloads/accl_public.sas7bdat',
    'export_file': '/Users/alex/PythonTest/accl_public.csv',
  },
    {
    'sas7bdat_file': '/Users/alex/Downloads/minutes_file.sas7bdat',
    'export_file': '/Users/alex/PythonTest/minutes_file.csv',
  }
]
sas7bdat_converter.batch_to_csv(file_dicts)

In [124]:
#Missing values:
    
df.isna().sum().sort_values(ascending = False)[:10]

stairs9        24
Tot_sed        12
Mental          8
Energy          6
bmicat4         5
cholesterol     4
Sleep           3
Phys_health     3
screentime      2
Arthritis       2
dtype: int64

We will not be considering the features with half of the responses missing. 

In [119]:
len(df)

516

In [155]:
survey_df = pd.read_csv('pat_w1w2.csv')
weekly_df = pd.read_csv('accl_public.csv')

In [157]:
# Load data (survey Q&A and weekly accelerometer data) 


def create_df(survey_df,weekly_df):

        #select columns to leave in the dataframes
        survey_df = survey_df.loc[:,['PATCID','agegroup','dem3','bmicat4','status1','chronic1','chronic2','chronic3','chronic4',\
                                      'chronic5','chronic6','status2','status3','tobacco1','alcohol1',\
                                      'status4','status5','status6','gpaqsedall','gpaq18pa','gpaq8totmin',\
                                      'gpaqadd1_weekday','habits1','habits2',\
                                      'habits5','habits7','stairs9','MVPA']]

        weekly_df = weekly_df.loc[:,['patcid','Valid_Days','MVPA_bout','counts_avg','Sed_avg','Light_avg','moderate_avg','vigorous_avg']]


        # Set index to unique participant identifier
        survey_df = survey_df.astype({'PATCID': 'int32'})
        survey_df.set_index('PATCID',inplace=True)

        weekly_df = weekly_df.astype({'patcid': 'int32'})
        weekly_df.set_index('patcid',inplace=True)

        # Merge dataframes to include only those who participated in both the phone survey and acclerometer study
        l2 = weekly_df.index.unique().tolist()
        l1 = survey_df.index.unique().tolist()
        common_list = list(set(l2).intersection(l1))
        survey_df_ref = survey_df[survey_df.index.isin(common_list)]

        df = pd.concat([survey_df_ref,weekly_df],axis = 1)

        # For the merged datafrme, df, leave only the people who wore the accelerometer at least 6 days of the week
        df = df.loc[(df.Valid_Days == 7) | (df.Valid_Days == 6)]#.loc[:,['Valid_Days','MVPA_bout','PAGA08_4','counts_avg','Sed_avg']]
        weekly_df.drop('Valid_Days',axis = 1)

        # Rename columns
        df.rename(columns={"agegroup": "Age", "dem3": "Gender","status1": "Health",'status2':'Phys_health', "status3": "Mental",\
                          "status4": "Sleep", "status5": "Energy","status6": "Impairment", "chronic1": "Hypertension",\
                          "chronic2": "cholesterol",'chronic3':'Diabetes','chronic4':'Asthma','chronic5':'Arthritis',\
                          "chronic6": "Depression","status6": "Impairment", "gpaqsedall": "Tot_sed","habits1": "Exer_routine",\
                          "habits3_walking": "Walking", "habits3_wtlift": "Weightlift",\
                          "habits3_run": "Running", "habits3_tread": "Treadmill","habits3_aerob": "Aerobics", "habits3_bike": "Biking,",\
                          "habits5": "Phys_Activity", "habits7": "Diet",'transport7mpd':'commute','gpaqadd1_weekday':'screentime',\
                          "gpaq18pa":"phys7",'gpaq8totmin':'chores','habits2':'gym_member'},
              inplace= True)
        
        return(df)


In [158]:
df = create_df(survey_df,weekly_df)

In [161]:
# Display correlations between all the features in the dataframe
df.corr().style.background_gradient(cmap='coolwarm')

Unnamed: 0,Age,Gender,bmicat4,Health,Hypertension,cholesterol,Diabetes,Asthma,Arthritis,Depression,Phys_health,Mental,tobacco1,alcohol1,Sleep,Energy,Impairment,Tot_sed,phys7,chores,screentime,Exer_routine,gym_member,Phys_Activity,Diet,stairs9,MVPA,Valid_Days,MVPA_bout,counts_avg,Sed_avg,Light_avg,moderate_avg,vigorous_avg
Age,1.0,-0.036486,0.072103,0.179486,-0.368748,-0.105065,-0.233157,-0.040335,-0.328115,-0.030366,0.156027,-0.027668,0.011767,0.004079,-0.058951,-0.023441,-0.188309,0.026901,0.144688,-0.100141,0.086161,-0.048889,0.074652,0.025079,-0.14699,-0.098466,-0.180206,0.039073,-0.174001,-0.341539,0.158914,-0.131201,-0.290717,-0.159923
Gender,-0.036486,1.0,-0.041325,0.071493,-0.028044,-0.069175,0.116443,-0.029537,-0.117345,-0.056703,0.079052,0.036751,-0.042894,0.190856,0.104578,-0.096384,-0.02896,-0.000974,0.077453,0.072293,-0.003573,0.041913,-0.021167,0.093937,-0.015696,-0.119616,-0.04919,0.100655,-0.059256,-0.171218,-0.001595,-0.008997,-0.177355,-0.03844
bmicat4,0.072103,-0.041325,1.0,0.143546,-0.16865,-0.049672,-0.134925,-0.059198,-0.19693,0.017192,-0.019763,0.028535,-0.068186,-0.010573,-0.041808,-0.043728,-0.151192,0.043835,0.081689,-0.02851,0.187653,0.141804,0.052174,0.278642,0.283979,-0.022164,-0.064386,-0.010716,-0.164885,-0.155004,-0.0544,0.004225,-0.144321,-0.126481
Health,0.179486,0.071493,0.143546,1.0,-0.279678,-0.233502,-0.233293,-0.213739,-0.275883,-0.276317,0.472587,0.266559,-0.155363,0.071015,0.134026,-0.475308,-0.372946,0.028335,0.22002,-0.034848,0.141871,0.180134,0.182566,0.32061,0.369288,-0.104654,-0.086367,-0.018422,-0.14837,-0.263767,0.009452,-0.137911,-0.199406,-0.137207
Hypertension,-0.368748,-0.028044,-0.16865,-0.279678,1.0,0.170661,0.211152,0.086682,0.250427,0.048125,-0.189477,-0.037634,-0.055927,0.006103,-0.040967,0.180401,0.233081,0.036636,-0.130726,0.018013,-0.127001,-0.070492,-0.071961,-0.128615,-0.046046,0.027646,0.081469,0.067149,0.14024,0.231129,-0.019023,0.096411,0.218321,0.102123
cholesterol,-0.105065,-0.069175,-0.049672,-0.233502,0.170661,1.0,0.063915,0.07821,0.142174,0.176204,-0.140747,-0.140706,0.053129,-0.019583,-0.079743,0.139299,0.092098,-0.045347,-0.117725,-0.003715,-0.063434,-0.071574,0.001707,-0.12604,-0.185295,-0.021606,-0.002863,0.002624,0.066512,0.117507,-0.008001,0.064994,0.101307,0.054944
Diabetes,-0.233157,0.116443,-0.134925,-0.233293,0.211152,0.063915,1.0,0.096376,0.102054,0.065267,-0.03971,-0.030574,0.016233,-0.017162,0.010524,0.057373,0.089856,-0.018948,-0.057321,0.03761,-0.112027,-0.037924,-0.017221,-0.030263,-0.051052,0.076456,0.084114,0.102776,0.092505,0.165729,-0.025955,0.115614,0.128436,0.055416
Asthma,-0.040335,-0.029537,-0.059198,-0.213739,0.086682,0.07821,0.096376,1.0,0.164233,0.25438,-0.134969,-0.208795,0.118182,-0.020299,-0.151027,0.198728,0.163736,-0.022312,-0.094468,-0.054824,-0.033427,-0.076613,-0.02654,-0.114357,-0.138388,0.075226,-0.013521,0.015353,0.105042,0.087719,0.0276,0.005244,0.099443,0.040106
Arthritis,-0.328115,-0.117345,-0.19693,-0.275883,0.250427,0.142174,0.102054,0.164233,1.0,0.153966,-0.295565,-0.113106,-0.000923,-0.090026,-0.059295,0.209482,0.297066,-0.009709,-0.100781,0.057606,-0.150694,-0.062366,-0.050259,-0.085816,-0.08399,0.094958,0.060146,0.098642,0.175843,0.229531,-0.004231,0.079994,0.227776,0.109029
Depression,-0.030366,-0.056703,0.017192,-0.276317,0.048125,0.176204,0.065267,0.25438,0.153966,1.0,-0.252194,-0.414168,0.232077,0.025829,-0.116672,0.218316,0.213621,-0.136522,-0.027071,0.038051,-0.146842,-0.123135,-0.014716,-0.1557,-0.15723,0.050156,0.010704,0.046049,0.004588,0.054644,0.062409,0.097843,0.030211,0.033899


 <font size="3"> There are a few interesting insights gained from this correlation chart: </font> 
 
*Note: Because the scales/ranges were different for each question, a negative number does not mean negative correlation.*

- Participants who had more average daily accelerometer activity counts tended to have less Hypertension (corr. 0.22) and arthritis (corr. 0.22).

- Participants with more screentime tended to registed less activity on the accelerometer. They reported to be more sedentary (corr. 0.23), have worse diet (corr. 0.2), and have higher BMI (0.19).

- Participants with higher Body Mass Indexes are more likely to have arthritis (corr. -0.2)

- Participants who reported feeling the most healthy also reported being physically active (corr. 0.32) and good diet (corr. 0.37). Unsurprisingly, these individuals also had chronic physical conditions and depression.

- Participants with poor mental health had worse sleep (corr. 0.39), had less energy (significantly) (corr. -0.36) were less physically active (corr. 0.19), and had worse diets (corr. 0.17)

- Tobacco users had worse diets (corr. 	-0.21) and were more depressed (corr. 0.23)
