This analysis is a based on a publicly accessible dataset on CDC website - 2023 NATIONAL HEALTH INTERVIEW SURVEY (NHIS)
This project (which is primarily for personal study) employs Odds Ratio to explain the interaction of variables in predicting the general health status of children (up to 17 years) living with or without mental conditions. It also employs IRR in explaining the risk across groups (same groups used in Odds ration analysis) in predicting days of school missed due to illness or injury past 12 months.

This work starts with selecting a subset of the data, data cleaning and pre-processing, and final analysis.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
child_summary = pd.read_csv('child23.csv')
child_summary.head(3)

Unnamed: 0,URBRRL,RATCAT_C,INCTCFLG_C,IMPINCFLG_C,PPSU,PSTRAT,HISPALLP_C,RACEALLP_C,SCHDYMSSTC_C,AFNOW,...,PHSTAT_C,HHSTAT_C,INTV_MON,RECTYPE,IMPNUM_C,RELCHPARENTP1_C,RELCHPARENTP2_C,WTFA_C,HHX,POVRATTC_C
0,4,14,0,0,2,122,3,2,0.0,2.0,...,1,1,1,20,1,1,1,13012.875,H045277,6.73
1,4,10,0,0,2,122,2,1,0.0,1.0,...,1,1,1,20,1,1,1,16680.509,H021192,3.43
2,4,5,0,0,2,122,2,1,5.0,2.0,...,1,1,1,20,1,1,4,5404.923,H025576,1.27


##### Sample Child respondent is the household respondent: The table was filtered to take responses from respondents who are the houselhold respondents

In [3]:
#HHRESPSC_FLG-Sample Child respondent is the household respondent
#The table will be filtered based on respondents being the household respondent of surveys

child_summary['HHRESPSC_FLG'].value_counts()

1.0    6634
Name: HHRESPSC_FLG, dtype: int64

In [4]:
child_summary_updated_1 = child_summary[child_summary['HHRESPSC_FLG'].notnull()].copy()
child_summary_updated_1.shape

#Table has been filtered by respondent being the house hold respondent

(6634, 370)

##### Sex: Only Male and Females were chosen for this analysis

In [5]:
child_summary_updated_1['SEX_C'].value_counts()

#select only male and female

1    3426
2    3203
7       5
Name: SEX_C, dtype: int64

In [6]:
#selecting only male and female

child_summary_updated_2 = child_summary_updated_1[child_summary_updated_1['SEX_C'] != 7].copy()
child_summary_updated_2['SEX_C'].value_counts()

#only males and females are included in this study

1    3426
2    3203
Name: SEX_C, dtype: int64

In [7]:
#Using names of gender and not numeric values
child_summary_updated_2['SEX_C'] = child_summary_updated_2['SEX_C'].apply(lambda x: 'Male' if x==1 else 'Female')
child_summary_updated_2['SEX_C'].value_counts()

Male      3426
Female    3203
Name: SEX_C, dtype: int64

In [8]:
child_summary_updated_2['SEX_C'] = child_summary_updated_2['SEX_C'].astype('category')

##### Adults in Family: These were binned as seen below

In [9]:
#using well-named categorical names

child_summary_updated_2['PCNTADLT_C'] = child_summary_updated_2['PCNTADLT_C'].apply(
                                                                            lambda x: '0 adults' if x==0 else 
                                                                                      '1 adult' if x==1 else
                                                                                      '2 adults' if x==2 else
                                                                                      '3+ adults')
child_summary_updated_2['PCNTADLT_C'].value_counts()

2 adults     4205
3+ adults    1363
1 adult      1060
0 adults        1
Name: PCNTADLT_C, dtype: int64

In [10]:
child_summary_updated_2['PCNTADLT_C'] = child_summary_updated_2['PCNTADLT_C'].astype('category')

##### Kids in Family: These were binned as seen below

In [11]:
#using well-named categorical names

child_summary_updated_2['PCNTKIDS_C'] = child_summary_updated_2['PCNTKIDS_C'].apply(
                                                                            lambda x: '1 child' if x==1 else
                                                                                      '2 children' if x==2 else
                                                                                      '3+ children')
child_summary_updated_2['PCNTKIDS_C'].value_counts()

1 child        2799
2 children     2438
3+ children    1392
Name: PCNTKIDS_C, dtype: int64

In [12]:
child_summary_updated_2['PCNTKIDS_C'] = child_summary_updated_2['PCNTKIDS_C'].astype('category')

##### How often seems anxious, nervous, or worried

In [13]:
child_summary_updated_2['ANXFREQ_C'] = child_summary_updated_2['ANXFREQ_C'].apply(lambda x: "Not Anxious" if x==5 
                                                                   else 'Refused' if x==7
                                                                  else 'Anxious' if x==1 or x==2 or x==3 or x==4
                                                                  else "Don't Know" if x==9
                                                                  else np.nan)
child_summary_updated_2['ANXFREQ_C'].value_counts()

Anxious        2476
Not Anxious    2423
Don't Know       11
Refused           1
Name: ANXFREQ_C, dtype: int64

In [14]:
child_summary_updated_2['ANXFREQ_C'] = child_summary_updated_2['ANXFREQ_C'].astype('category')

##### How often seems sad or depressed

In [15]:
child_summary_updated_2['DEPFREQ_C'].value_counts()

5.0    3316
4.0     996
3.0     312
2.0     203
1.0      70
9.0      12
7.0       2
Name: DEPFREQ_C, dtype: int64

In [16]:
child_summary_updated_2['DEPFREQ_C'] = child_summary_updated_2['DEPFREQ_C'].apply(lambda x: "Not depressed" if x==5 
                                                                   else 'Refused' if x==7
                                                                  else 'Depressed' if x==1 or x==2 or x==3 or x==4
                                                                  else "Don't Know" if x==9
                                                                  else np.nan)
child_summary_updated_2['DEPFREQ_C'].value_counts()

Not depressed    3316
Depressed        1581
Don't Know         12
Refused             2
Name: DEPFREQ_C, dtype: int64

##### General health status: Binning health status into "Excellent, Very good, Good" and "Fair, Poor"

In [17]:
child_summary_updated_2['PHSTAT_C'].value_counts()

1    4344
2    1389
3     745
4     128
5      20
7       2
9       1
Name: PHSTAT_C, dtype: int64

In [18]:
child_summary_updated_2['PHSTAT_C'] = child_summary_updated_2['PHSTAT_C'].apply(lambda x: "Excellent, Very good, Good" 
                                                                                 if x==1 or x==2 or x==3
                                                                                 else "Fair, Poor" if x==4 or x==5
                                                                                 else 'Refused' if x==7
                                                                                 else "Don't know" if x==9
                                                                                 else np.nan)

In [19]:
child_summary_updated_2['PHSTAT_C'].value_counts()

Excellent, Very good, Good    6478
Fair, Poor                     148
Refused                          2
Don't know                       1
Name: PHSTAT_C, dtype: int64

#### The function will rename the values in the columns_to_use list found below

In [20]:
columns_to_use = ['HICOV_C', 'SCHSPEDEV_C', 'SCHSPED_C', 'SCHSPEDEM_C', 
                    'COMSUPPORT_C', 'JAILEV1_C', 'VIOLENEV_C',  'MENTDEPEV_C',
                    'ALCDRUGEV_C', 'MHRX_C' , 'MHTHRPY_C', 'BNEEDS_C', 'PARWKFT1_C',
                    'NATUSBORN_C', 'CITZNSTP_C', 'FSNAP12M_C', 'FWIC12M_C', 'ADHDEV_C', 'ADHDNW_C',
                     'ASDEV_C', 'ASDNW_C', 'DDEV_C', 'DDNW_C', 'DISAB5_C']


In [21]:
def clean_1(dataframe, columns):
    for col in columns:
        dataframe[col] = dataframe[col].apply(lambda x: 'Yes' 
                                               if x==1 else 'No' 
                                               if x==2 else "Don't Know" 
                                               if x==9 else 'Not ascertained' 
                                               if x==8 else 'Refused' 
                                               if x==7 else np.nan)

In [22]:
clean_1(child_summary_updated_2, columns_to_use)

###### Coverage status as used in Health United States

In [23]:
child_summary_updated_2['NOTCOV_C'] = child_summary_updated_2['NOTCOV_C'].apply(lambda x: 'Not covered' 
                                                                               if x==1 else 'Covered' 
                                                                               if x==2 else "Don't Know" 
                                                                               if x==9 else np.nan)
child_summary_updated_2['NOTCOV_C'].value_counts()

Covered        6376
Not covered     235
Don't Know       18
Name: NOTCOV_C, dtype: int64

##### REGION

In [24]:
child_summary_updated_2['REGION'] = child_summary_updated_2['REGION'].apply(lambda x: 'Northeast' 
                                                                               if x==1 else 'Midwest' 
                                                                               if x==2 else "South" 
                                                                               if x==3 else 'West')
child_summary_updated_2['REGION'].value_counts()

South        2484
West         1792
Midwest      1392
Northeast     961
Name: REGION, dtype: int64

##### Highest level of education of all SC's parents: Binning the education types into High School or less, College degree, Bachelors and above

In [25]:
child_summary_updated_2['MAXPAREDUP_C'] = child_summary_updated_2['MAXPAREDUP_C'].apply(
                                lambda x: 'High school or less' if x==1 or x==2 or x==3 or x==4 
                                else "Some college degree" if x==5 or x==6 or x==7 
                                else "Bachelor's degree & above" if x==8 or x==9 or x==10 else np.nan)

child_summary_updated_2['MAXPAREDUP_C'].value_counts()

Bachelor's degree & above    3236
Some college degree          1662
High school or less          1561
Name: MAXPAREDUP_C, dtype: int64

##### Hispanic origin detail

In [26]:
child_summary_updated_2['HISDETP_C'] = child_summary_updated_2['HISDETP_C'].apply(lambda x: 'Hispanic (Mexican/Mexican American)' 
                                               if x==1 else 'Hispanic (all other groups)' 
                                               if x==2 else "Don't Know" 
                                               if x==9 else 'Not ascertained' 
                                               if x==8 else 'Not Hispanic' 
                                               if x==3 else np.nan)

child_summary_updated_2['HISDETP_C'].value_counts()

Not Hispanic                           4849
Hispanic (Mexican/Mexican American)    1072
Hispanic (all other groups)             683
Not ascertained                          20
Don't Know                                5
Name: HISDETP_C, dtype: int64

In [27]:
#Selecting the columns to use

all_columns = ['REGION', 'SEX_C', 'AGEP_C', 'PCNTADLT_C', 
               'PCNTKIDS_C', 'ADHDEV_C', 'ADHDNW_C', 
              'ASDEV_C', 'ASDNW_C','DDEV_C', 'DDNW_C', 'DISAB5_C',
              'SCHDYMSSTC_C', 'SCHSPEDEV_C', 'SCHSPED_C', 
              'SCHSPEDEM_C', 'NOTCOV_C', 'HICOV_C',
              'COMSUPPORT_C', 'JAILEV1_C', 'VIOLENEV_C',
              'MENTDEPEV_C', 'ALCDRUGEV_C', 'LASTDR_C', 
              'MHRX_C', 'MHTHRPY_C', 'BNEEDS_C', 'PARWKFT1_C',
              'MAXPAREDUP_C', 'POVRATTC_C', 'NATUSBORN_C',
              'CITZNSTP_C', 'FSNAP12M_C', 'FWIC12M_C', 
              'HISDETP_C', 'RATCAT_C', 'PHSTAT_C', 'ANXFREQ_C',
              'DEPFREQ_C']

In [28]:
all_data = child_summary_updated_2[all_columns].copy()
all_data.head(2)

Unnamed: 0,REGION,SEX_C,AGEP_C,PCNTADLT_C,PCNTKIDS_C,ADHDEV_C,ADHDNW_C,ASDEV_C,ASDNW_C,DDEV_C,...,POVRATTC_C,NATUSBORN_C,CITZNSTP_C,FSNAP12M_C,FWIC12M_C,HISDETP_C,RATCAT_C,PHSTAT_C,ANXFREQ_C,DEPFREQ_C
0,South,Female,14,2 adults,2 children,No,,No,,No,...,6.73,Yes,Yes,No,No,Not Hispanic,14,"Excellent, Very good, Good",Not Anxious,Not depressed
2,South,Male,15,2 adults,1 child,No,,No,,No,...,1.27,Yes,Yes,No,No,Not Hispanic,5,"Excellent, Very good, Good",Not Anxious,Not depressed


In [29]:
#Renaming the variables

all_data_updated_1 = all_data.rename({'REGION':'Region', 
                                        'SEX_C':'Sex', 
                                        'AGEP_C':'Age',
                                        'PCNTADLT_C':'Num_of_adults',
                                        'PCNTKIDS_C':'Num_of_children',
                                        'ADHDEV_C':'Ever_had_ADD_ADHD',
                                        'ADHDNW_C':'Curr_has_ADD_ADHD',
                                        'ASDEV_C':'Ever_had_autism',
                                        'ASDNW_C':'Curr_has_autism',
                                        'DDEV_C':'Ever_had_devel_delay',
                                        'DDNW_C':'Curr_has_devel_delay',
                                        'DISAB5_C':'Disability',
                                        'SCHDYMSSTC_C':'Days_of_school_missed', 
                                        'SCHSPEDEV_C':'Special_Edu',
                                        'SCHSPED_C':'Cur_Special_Edu',
                                        'SCHSPEDEM_C':'Services_for_mental health',
                                        'NOTCOV_C':'Coverage_status',
                                        'HICOV_C':'Health_insurance',
                                        'COMSUPPORT_C':'Community_support',
                                        'JAILEV1_C':'SFP_incarcerated',
                                        'VIOLENEV_C':'Victim_or_witnessed_violence',
                                        'MENTDEPEV_C':'LWAA_D',
                                        'ALCDRUGEV_C':'LWAMD',
                                        'LASTDR_C':'TSLSD', 
                                        'MHRX_C':'TMFMH',
                                        'MHTHRPY_C':'Received_couns/therapy',
                                        'BNEEDS_C':'Lacking_basic_needs',
                                        'PARWKFT1_C':'Parent_works',
                                        'MAXPAREDUP_C':'Parent_edu',
                                        'POVRATTC_C':'Fam_Poverty_ratio',
                                        'NATUSBORN_C':'Born_in_US',
                                        'CITZNSTP_C':'Citizenship',
                                        'FSNAP12M_C':'Food_stamps',
                                        'FWIC12M_C':'WIC_benefits',
                                        'HISDETP_C':'Origin',
                                        'RATCAT_C':'Ratio_of_family_income_to_poverty', 
                                        'PHSTAT_C':'General_health_status', 
                                        'ANXFREQ_C':'Anxious_Nervous_Worried', 
                                        'DEPFREQ_C':'Sad_Depressed' 
                                        }, axis=1).copy()
all_data_updated_1.head(2)

Unnamed: 0,Region,Sex,Age,Num_of_adults,Num_of_children,Ever_had_ADD_ADHD,Curr_has_ADD_ADHD,Ever_had_autism,Curr_has_autism,Ever_had_devel_delay,...,Fam_Poverty_ratio,Born_in_US,Citizenship,Food_stamps,WIC_benefits,Origin,Ratio_of_family_income_to_poverty,General_health_status,Anxious_Nervous_Worried,Sad_Depressed
0,South,Female,14,2 adults,2 children,No,,No,,No,...,6.73,Yes,Yes,No,No,Not Hispanic,14,"Excellent, Very good, Good",Not Anxious,Not depressed
2,South,Male,15,2 adults,1 child,No,,No,,No,...,1.27,Yes,Yes,No,No,Not Hispanic,5,"Excellent, Very good, Good",Not Anxious,Not depressed


##### More pre-processing

In [30]:
#selecting variables I want to use

general_health_features = ['Region', 
                           'Sex',
                           'Age',
                           'Num_of_children',
                           'Days_of_school_missed', 
                           'Special_Edu',
                           'Coverage_status',
                           'Health_insurance', 
                           'TSLSD',
                           'Lacking_basic_needs',
                           'Parent_works',
                           'Parent_edu', 
                           'Fam_Poverty_ratio',
                           'Born_in_US', 
                           'Citizenship',
                           'Origin', 
                           'Food_stamps',
                           'WIC_benefits',
                           'Ever_had_ADD_ADHD',
                           'Ever_had_autism',
                           'Ever_had_devel_delay',
                           'SFP_incarcerated',
                           'LWAA_D',
                           'LWAMD',
                           'Victim_or_witnessed_violence',
                          'General_health_status',
                          'Ratio_of_family_income_to_poverty']

general_health_data = all_data_updated_1[general_health_features].copy() #only these variables will be used.

##### Selecting only 'yes' or 'no' values in the variables contained in columns_yes_no using a function I created

In [31]:
#Columns with NO, YES, AND OTHER VALUES. THESE COLUMNS MUST CONTAIN ONLY YES OR NO VALUES.

columns_yes_no = ['Ever_had_ADD_ADHD','Ever_had_autism',
                 'Ever_had_devel_delay','SFP_incarcerated', 
                 'LWAA_D', 'LWAMD', 'Victim_or_witnessed_violence', 
                 'Special_Edu', 'Health_insurance',
                 'Lacking_basic_needs', 'Parent_works', 'Born_in_US', 'Citizenship', 
                 'Food_stamps', 'WIC_benefits' ]

In [32]:
#This function selects only responses that are YES or No in the features contained in the list called columns_yes_no
#Yes or No are then classified as categorical values.

def clean(df,columns):
    for col in columns:
        keep = df[col]
        keep_1 = df.loc[(keep == 'No') | (keep == 'Yes'), col].astype('category')
        df[col] = keep_1

In [33]:
#applying the function

clean(general_health_data, columns_yes_no)

###### Coverage status: selecting only 'Covered' or 'Not covered' values in this column and setting the column as a categorical variable

In [34]:
coverage = general_health_data['Coverage_status']

In [35]:
coverage_1 = general_health_data.loc[(coverage=='Covered') | (coverage=='Not covered'),
                                     'Coverage_status'].astype('category')

In [36]:
general_health_data['Coverage_status'] = coverage_1

###### Highest lev. of edu. of parents: Setting this variable as a categorical variable

In [37]:
general_health_data['Parent_edu'] = general_health_data['Parent_edu'].astype('category')

###### Hispanic origin detail: Selecting origin as 'Not Hispanic' or 'Hispanic (Mexican/Mexican American)' or 'Hispanic (all other groups)'

In [38]:
origin = general_health_data['Origin']

In [39]:
origin_1 = general_health_data.loc[(origin=='Not Hispanic')|
                                   (origin=='Hispanic (Mexican/Mexican American)')|
                                   (origin=='Hispanic (all other groups)'),
                                       'Origin'].astype('category')

In [40]:
general_health_data['Origin'] = origin_1

# DEPENDENT VARIABLE
##### General health status: This is used as a dependent variable for the logistic regression model. 0 is chosen for "Excellent, Very good, Good" status while 1 is for "Fair, Poor"

In [41]:
gen_health = general_health_data['General_health_status']

In [42]:
gen_health_1 = general_health_data.loc[(gen_health=='Excellent, Very good, Good')|
                                       (gen_health=='Fair, Poor'),
                                       'General_health_status']

In [43]:
general_health_data['General_health_status'] = gen_health_1.apply(lambda x: 0 
                                                                  if x=='Excellent, Very good, Good' 
                                                                  else 1).astype('int')

##### Handling duplicates and missing vales

In [44]:
#dropping duplicated observations

general_health_data.drop_duplicates(inplace=True)

In [45]:
#dropping any column with missing values

general_health_data.dropna(inplace=True)

# MODEL 1

# ODDS RATIO FOR GENERAL HEALTH STATUS MODEL

In [46]:
import statsmodels.formula.api as smf 

In [47]:
#formula for Logistic model

formula=('General_health_status ~ Region + Sex + Age +  '
                            'Days_of_school_missed +'
                            'Health_insurance + TSLSD + '
                            'Lacking_basic_needs + '
                            'Fam_Poverty_ratio + '
                            'Food_stamps + Ever_had_ADD_ADHD + Ever_had_autism + Ever_had_devel_delay +'
                            'WIC_benefits')

In [48]:
#Fitting the model

reg = smf.logit(formula=formula ,data=general_health_data).fit()

Optimization terminated successfully.
         Current function value: 0.079850
         Iterations 10


In [49]:
#Results of the model

print(reg.summary())

                             Logit Regression Results                            
Dep. Variable:     General_health_status   No. Observations:                 3548
Model:                             Logit   Df Residuals:                     3532
Method:                              MLE   Df Model:                           15
Date:                   Sun, 18 Aug 2024   Pseudo R-squ.:                  0.1859
Time:                           15:07:49   Log-Likelihood:                -283.31
converged:                          True   LL-Null:                       -348.00
Covariance Type:               nonrobust   LLR p-value:                 2.807e-20
                                  coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------------------
Intercept                      -5.8207      1.453     -4.005      0.000      -8.669      -2.972
Region[T.Northeast]             0.0334      0.523      0

In [50]:
#Saving the coefficients of the variables
params = reg.params

In [51]:
#Bringing together the coefficients with their confidence intervals

results_log = reg.conf_int()
results_log['OR'] = params
results_log.columns = ['Lower CI', 'Upper CI', 'Odds Ratio']

In [52]:
#Adding the pvalues of the variables

results_log_1 = np.exp(results_log) #calculates the OR for the variables
results_log_1['p_values'] = round(reg.pvalues,4)
results_log_1 = results_log_1[['Odds Ratio', 'Lower CI', 'Upper CI', 'p_values']].copy()

In [53]:
print(f"RESULTS")
print(f"Statistically significant features: ")
print()
print(f"{results_log_1[results_log_1['p_values']<=0.05]}")

RESULTS
Statistically significant features: 

                             Odds Ratio  Lower CI   Upper CI  p_values
Intercept                      0.002965  0.000172   0.051176    0.0001
Food_stamps[T.Yes]             2.210545  1.208131   4.044685    0.0101
Ever_had_autism[T.Yes]         6.493672  3.122297  13.505370    0.0000
Ever_had_devel_delay[T.Yes]    4.959847  2.559875   9.609880    0.0000
WIC_benefits[T.Yes]            3.407517  1.507031   7.704670    0.0032
Age                            1.132742  1.055137   1.216055    0.0006
Days_of_school_missed          1.023833  1.012565   1.035227    0.0000


In [54]:
round(reg.llr_pvalue,4)

0.0

# RESULT

With a significance level of 0.05, the pvalue of the fitted model (p-value=0.0000) indicates that the model is significant and can be explained better by the variables than a null model which involves only the intercept. This means the model created is relevant.

The following discussion is focused on children with Autism, Developmental delay:

Filtering only the significant variables in the model, the odds of having a 'Fair, Poor' health is 2.2 times for children receiving receiving 'Food_stamps' than those not receiving. Similarly, children receiving benefits from the WIC program also had higher oddS of having a 'Fair, poor' health.

Children who have 'Ever had autism' recorded higher odds of having 'Fair, Poor' health than children with no autism. This trend is similar to children who have 'Ever had developmental delay'.

# MODEL 2

# IRR FOR DAYS OF SCHOOL MISSED

In [55]:
formula_p=('Days_of_school_missed ~ Region + Sex + Age +  '
                            'Num_of_children +'
                            'Special_Edu + Coverage_status +'
                            'Health_insurance + TSLSD + '
                            'Lacking_basic_needs + '
                            'Parent_works + '
                            'Parent_edu + Fam_Poverty_ratio + '
                            'Born_in_US + Citizenship + '
                            'Origin + Food_stamps + Ever_had_ADD_ADHD + Ever_had_autism + Ever_had_devel_delay +'
                            'WIC_benefits + SFP_incarcerated + LWAA_D + LWAMD + Victim_or_witnessed_violence')

In [56]:
poisson = smf.poisson(formula=formula_p, data=general_health_data).fit()

Optimization terminated successfully.
         Current function value: 6.242849
         Iterations 6


In [57]:
print(poisson.summary())

                            Poisson Regression Results                           
Dep. Variable:     Days_of_school_missed   No. Observations:                 3548
Model:                           Poisson   Df Residuals:                     3518
Method:                              MLE   Df Model:                           29
Date:                   Sun, 18 Aug 2024   Pseudo R-squ.:                 0.04530
Time:                           15:07:50   Log-Likelihood:                -22150.
converged:                          True   LL-Null:                       -23201.
Covariance Type:               nonrobust   LLR p-value:                     0.000
                                            coef    std err          z      P>|z|      [0.025      0.975]
---------------------------------------------------------------------------------------------------------
Intercept                                 1.5401      0.099     15.606      0.000       1.347       1.734
Region[T.Northeast]       

In [58]:
#Saving the coefficients of the variables
poisson_params = poisson.params

In [59]:
#Bringing together the coefficients with their confidence intervals

poisson_df = poisson.conf_int()
poisson_df['IRR'] = poisson_params
poisson_df = poisson_df[['IRR',0,1]].copy()
poisson_df.columns = ['IRR','Lower CI','Upper CI']

In [60]:
poisson_df = np.exp(poisson_df) #calculates the IRR for the variables
poisson_df['pvalues'] = round(poisson.pvalues,4) #getting the pvalues of the variables

In [61]:
print(f"RESULTS")
print(f"Statistically significant features: ")
print()
print(poisson_df[poisson_df['pvalues']<=0.05])

RESULTS
Statistically significant features: 

                                            IRR  Lower CI  Upper CI  pvalues
Intercept                              4.665099  3.844645  5.660638   0.0000
Region[T.South]                        0.896591  0.862296  0.932249   0.0000
Sex[T.Male]                            0.832602  0.808442  0.857484   0.0000
Num_of_children[T.2 children]          0.888425  0.859424  0.918406   0.0000
Num_of_children[T.3+ children]         0.867460  0.832234  0.904177   0.0000
Special_Edu[T.Yes]                     1.389764  1.333411  1.448497   0.0000
Coverage_status[T.Not covered]         0.682463  0.592768  0.785730   0.0000
Health_insurance[T.Yes]                0.504505  0.441245  0.576834   0.0000
Lacking_basic_needs[T.Yes]             0.753184  0.688505  0.823938   0.0000
Parent_works[T.Yes]                    0.958434  0.921735  0.996594   0.0331
Parent_edu[T.Some college degree]      1.227360  1.182236  1.274206   0.0000
Born_in_US[T.Yes]             

# RESULT

Also using a significance level of 0.05 for this model, the pvalue of this poisson model was 0.000 indicating that the model is relevant to be interpreted.

The following discussion is focused on children with Autism, Developmental delay and ADD/ADHD:

The incidence of days of school missed due to injury or sickness was higher in children who recorded that they have 'Ever had ADD/ADHD'. Also, a similar trend was noticed for children who have 'Ever had a development delay'.

Children who have 'Lived with anyone with alcohol/drug problem' and children who have been a victim of or witnessed violence had a higher incidence of days of school missed due to injury or sickness than those who have not. 

The incidence of days of school missed due to injury or sickness was lower for children who have 'Ever had autism'