# Project Objective
Investigate whether there is any measurable difference in progress based on the eligibility category. There are three options:
*Preliminary Analysis*
+ Developmental evaluation (delay)
+ Diagnosed condition
+ Diagnosed condition, developmental evaluation (by both delay and diagnosis)
1. Start by looking at overall progress by eligibility category (column AI of the "Elig Timeline Rpt 2018-2022" tab)
2. Factor in the time of service ("ECO with Exit-21-22" tab)
3. Do the above comparison by POE as well (column A of the "ECO with Exit21-22")
*Additional Analysis*
Additional areas you can look into are listed below:
+ Does typical time of service differ for different eligibility categories?
+ Do exit reasons vary by eligibility category? Do more children in one eligibility category age out compared to leaving for other reasons?
+ "ECO with Exit21-22" contains the entry ECO scores (columns D, E, and F) and exit scores (columns H, I, and J) Analyze these scores by looking at the typical improvement seen for each entry rating compared to the time of service. What percentage of children entering with a score of 1 also exit with a score of 1? How many improve to a 2 or a 3? How does this vary by time of service?

<b> Part 2 <b>
    
Does typical time of service differ for different eligibility categories?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

In [None]:
%matplotlib inline

<b> eco21_22_exit - has negative numbers in days btw IFSP to Exit Eco <b>

In [None]:
eco_21_22_exit = pd.read_excel("../Data/TEIS-NSS Project Data 10-2022.xlsx", sheet_name="ECO with Exit21-22", nrows=8632)
eco_21_22_exit

In [None]:
eco_21_22_exit.head()

In [None]:
eco_21_22_exit.info()

In [None]:
eco_21_22_exit.rename(columns = {'CHILD_ID':'Child ID'}, inplace = True)
eco_21_22_exit

<b> eco_21_22_data <b>

In [None]:
eco_21_22_data = pd.read_excel("../Data/TEIS-NSS Project Data 10-2022.xlsx", sheet_name="Elig Timeline Rpt 2018-2022")
eco_21_22_data

In [None]:
eco_21_22_data = pd.read_excel("../Data/TEIS-NSS Project Data 10-2022.xlsx", sheet_name="Elig Timeline Rpt 2018-2022")
eco_21_22_data.head()

In [None]:
eco_21_22_data.rename(columns = {'Init. Elig. Category': 'Init_Elig_Cat'}, inplace=True)
eco_21_22_data

In [None]:
eco_21_22_data.info()

<b> Making new dataframes for eco_21_22_data and eco_21_22_exit<b>
    

In [None]:
df2_data = eco_21_22_data[['Child ID','Init_Elig_Cat']]
df2_data

In [None]:
df1_exit = eco_21_22_exit[['Child ID','Days btw Initial and Exit', 'Days btw I-IFSP to Exit ECO','Ent SOCIAL_SCALE','Ent KNOWLEDGE_SCALE', 'Ent APPROPRIATE_ACTION_SCALE', 'Exit SOCIAL_SCALE','Exit KNOWLEDGE_SCALE','Exit APPROPRIATE_ACTION_SCALE']].dropna()
df1_exit

In [None]:
df1_exit.tail()

1. need to filter df2 to pull 2021,2022 data
2. merge df1 and df2 on child id 
3. can start looking at performance progress (average performance progress grouped by district, elegibality)


In [None]:
df1_exit = df1_exit.astype({'Child ID':'int','DISTRICT':'str','<Calc> Months in Program':'int','Ent SOCIAL_SCALE':'int','Ent KNOWLEDGE_SCALE':'int', 'Ent APPROPRIATE_ACTION_SCALE':'int', 'Exit SOCIAL_SCALE':'int','Exit KNOWLEDGE_SCALE':'int','Exit APPROPRIATE_ACTION_SCALE':'int'})
df1_exit

In [None]:
df1_exit.info()

In [None]:
df2_dd = df2_data.loc[df2_data.Init_Elig_Cat == 'Developmental Evaluation', :]
df2_dd

<b> Merging the two new dataframes I made that only have the Cat= Developmental Evaluation and then has the progrerss in the other<b>
    
    new dataframe is df_merge

In [None]:
df_merge = df2_dd.merge(df1_exit, on='Child ID')
df_merge

In [None]:
print("The mean of column 'Months in Program' is for Developmental Evaluation :")
print(df_merge['<Calc> Months in Program'].mean())

In [None]:
print(df_merge['Ent SOCIAL_SCALE'].sum())

In [None]:
print(df_merge['Exit SOCIAL_SCALE'].sum())


In [None]:
20247-14449

### Starting from the top

In [None]:
df2_data = eco_21_22_data[['Child ID','Init_Elig_Cat']]
df2_data

In [None]:
df1_exit = eco_21_22_exit[['Child ID','Days btw Initial and Exit', 'Days btw I-IFSP to Exit ECO','Ent SOCIAL_SCALE','Ent KNOWLEDGE_SCALE', 'Ent APPROPRIATE_ACTION_SCALE', 'Exit SOCIAL_SCALE','Exit KNOWLEDGE_SCALE','Exit APPROPRIATE_ACTION_SCALE']].dropna()
df1_exit

In [None]:
df1_exit = df1_exit.loc[(df1_exit['Days btw Initial and Exit']>=183)]
df1_exit = df1_exit.loc[(df1_exit['Days btw I-IFSP to Exit ECO']>0)]

In [None]:
df1_exit

In [None]:
df_merge1 = df2_data.merge(df1_exit, on = 'Child ID')
df_merge1

In [None]:
df1_merge_group = df_merge1.groupby('Init_Elig_Cat', as_index=False)['Days btw Initial and Exit','Days btw I-IFSP to Exit ECO'].mean()

In [None]:
df1_merge_group

In [None]:
453/30


In [None]:
339/30

In [None]:
687/30

In [None]:
559/30

In [None]:
623/30

In [None]:
499/30

In [None]:
df1_merge_group.info()

In [None]:
df1_merge_group['Avg Months btw I and E'] = df1_merge_group['Days btw Initial and Exit']/30
df1_merge_group

In [None]:
df1_merge_group['Avg Months btw I-IFSP and E ECO'] = df1_merge_group['Days btw I-IFSP to Exit ECO']/30
df1_merge_group.round()

In [None]:
df1_merge_group_re = df1_merge_group.reindex(columns= ['Init_Elig_Cat', 'Days btw Initial and Exit', 'Avg Months btw I and E', 'Days btw I-IFSP to Exit ECO', 'Avg Months btw I-IFSP and E ECO'])
df1_merge_group_re.round()

In [None]:
Init_Elig_Cat = ['Developmental Evaluation', 'Diagnosed Condition', 'Both']
Months = [15,23,21]
colors = ['grey', 'blue', 'grey']


plt.bar(Init_Elig_Cat, Months, color = colors)
plt.title('Time of Service Per Category')
plt.xlabel('Category')
plt.ylabel('Months')
plt.show();



In [None]:
#let's plot this as a seaborn and have them combined together (days btw initial and days btw i-ifsp) 

In [None]:
#maybe look at doing a line chart.. figma one or whatever it's called 