### Project Objective

Investigate whether there is any measurable difference in progress based on the eligibility category. 
There are three options: Preliminary Analysis

+ Developmental evaluation (delay)
+ Diagnosed condition
+ Diagnosed condition, developmental evaluation (by both delay and diagnosis)

1. Start by looking at overall progress by eligibility category (column AI of the "Elig Timeline Rpt 2018-2022" tab)
2. Factor in the time of service ("ECO with Exit-21-22" tab)
3. Do the above comparison by POE as well (column A of the "ECO with Exit21-22") Additional Analysis Additional areas you can look into are listed below:
+ Does typical time of service differ for different eligibility categories?
+ Do exit reasons vary by eligibility category? Do more children in one eligibility category age out compared to leaving for other reasons?
+ "ECO with Exit21-22" contains the entry ECO scores (columns D, E, and F) and exit scores (columns H, I, and J) Analyze these scores by looking at the typical improvement seen for each entry rating compared to the time of service. What percentage of children entering with a score of 1 also exit with a score of 1? How many improve to a 2 or a 3? How does this vary by time of service?

### Part 2

Does typical time of service differ for different eligibility categories?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

In [None]:
%matplotlib inline

In [None]:
#reading in the data from the eco_exit sheet 
eco_21_22_exit = pd.read_excel("../Data/TEIS-NSS Project Data 10-2022.xlsx", sheet_name="ECO with Exit21-22", nrows=8632)
eco_21_22_exit

In [None]:
#rename the columns so that i can join them later to the timeline data sheet
eco_21_22_exit.rename(columns = {'CHILD_ID':'Child ID'}, inplace = True)
eco_21_22_exit

In [None]:
#reading in the second sheet we need to do the analysis
eco_21_22_data = pd.read_excel("../Data/TEIS-NSS Project Data 10-2022.xlsx", sheet_name="Elig Timeline Rpt 2018-2022")
eco_21_22_data

In [None]:
#renaming the columns because the . does not work well when calling on it later 
eco_21_22_data.rename(columns = {'Init. Elig. Category': 'Init_Elig_Cat'}, inplace=True)
eco_21_22_data

In [None]:
#making a new dataframe out the information in the timeline sheet that only has the two columns i am interested in 
df2_data = eco_21_22_data[['Child ID','Init_Elig_Cat']]
df2_data

In [None]:
#making a new dataframe with only the columns I am interested in in the eco sheet
df1_exit = eco_21_22_exit[['Child ID','Days btw Initial and Exit', 'Days btw I-IFSP to Exit ECO', '<Calc> Entrance Age (months)']].dropna()
df1_exit

In [None]:
#renamed the age column to be easier to call in later on graphs
df1_exit = df1_exit.rename(columns={'<Calc> Entrance Age (months)': 'Entrance Age'})
df1_exit

In [None]:
#makng sure to exclude any data that's negative or not 6 at least 6 months 
df1_exit = df1_exit.loc[(df1_exit['Days btw Initial and Exit']>=183)]
df1_exit = df1_exit.loc[(df1_exit['Days btw I-IFSP to Exit ECO']>0)]

In [None]:
#checking the dataframe now to make sure it got reid of those 
df1_exit

In [None]:
#now i am merging the two datafames together on child id
df_merge1 = df2_data.merge(df1_exit, on = 'Child ID')
df_merge1


In [None]:
#i then thought it would be nice to look at the averages b/c the question wants to know time in program based on eligibilty category. Used a groupby function/call - you will get a warning - just ignore 
df1_merge_group = df_merge1.groupby('Init_Elig_Cat', as_index=False)['Days btw Initial and Exit','Days btw I-IFSP to Exit ECO', 'Entrance Age'].mean()
df1_merge_group

<b> CALLOUT: <b>
   + it's interesting the average age (by month) is all close to 3mo at the entrance/start into the program?

In [None]:
#wanted to make columns in the dataframe that also gave me the avg months in the program in case that was interesting to see when making graphs - i also rounded it because i didn't think the ling line of numbers was nice looking  
df1_merge_group['Avg Months btw I and E'] = df1_merge_group['Days btw Initial and Exit']/30
df1_merge_group.round()

In [None]:
#adding the second column 
df1_merge_group['Avg Months btw I-IFSP and E ECO'] = df1_merge_group['Days btw I-IFSP to Exit ECO']/30
df1_merge_group.round()

In [None]:
#final column
df1_merge_group = df1_merge_group.round()
df1_merge_group