The data reported to the public is based on Early Childhood Outcomes (ECO) data, which is collected every six months in conjunction with the Individualized Family Service Plan (IFSP) review cycle and is measured on three outcomes. For each outcome a child is placed into 5 possible progress categories:
1. Children who did not improve functioning
2. Children who improved functioning, but not sufficient to move nearer to functioning comparable to same-aged peers
3. Children who improved functioning to a level nearer to same-aged peers, but did not reach it
4. Children who improved functioning to reach a level comparable to same-aged peers
5. Children who maintained functioning at a level comparable to same-aged peers

In the "Summary Statements" tab of the Excel Spreadsheet you have been provided, you can see the calculation for the overall count and percentage of children in each category for each outcome along with the percentage of children that substantially increased their rate of growth and percentage of children who were functioning within age expectation by outcome. These calculations are based on data contained in the "ECO with Exit21-22" tab.

Your primary objective in this project is to investigate whether there is any measurable difference in progress based on the eligibility category. There are three options:
* Developmental evaluation (delay)
* Diagnosed condition
* Diagnosed condition, developmental evaluation (by both delay and diagnosis)

Start by looking at overall progress by eligibility category. This information is contained in column AI of the "Elig Timeline Rpt 2018-2022" tab. After looking at overall rates, factor in the time of service, which is contained in the "ECO with Exit-21-22" tab.

Each child is associated with a Point of Entry (POE) office, as indicated in column A of the "ECO with Exit21-22" column. Do the above comparison by POE as well, similar to the calculations in the "ECO by POE" tab.

After answering the above questions, additional areas you can look into are listed below: 
* Does typical time of service differ for different eligibility categories?
* Do exit reasons vary by eligibility category? Do more children in one eligibility category age out compared to leaving for other reasons?
* In the "ECO with Exit21-22" sheet, columns D, E, and F contain the entry ECO scores, and columns H, I, and J contain the exit scores. Analyze these scores by looking at the typical improvement seen for each entry rating compared to the time of service. For example, what percentage of children entering with a score of 1 also exit with a score of 1? How many improve to a 2 or a 3? How does this vary by time of service? 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
eco = pd.read_excel('../Data/TEIS_data.xlsx', sheet_name = 'ECO with Exit21-22')
eco.head()

In [None]:
elig = pd.read_excel('../Data/TEIS_data.xlsx', sheet_name = 'Elig Timeline Rpt 2018-2022')
elig.head()

In [29]:
elig['Init. Elig. Category'].value_counts()

Developmental Evaluation                         28317
Diagnosed Condition, Developmental Evaluation     5298
Diagnosed Condition                               4675
Name: Init. Elig. Category, dtype: int64

In [None]:
elig.info()

In [55]:
eco.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8636 entries, 0 to 8635
Data columns (total 60 columns):
 #   Column                                Non-Null Count  Dtype         
---  ------                                --------------  -----         
 0   DISTRICT                              8633 non-null   object        
 1   CHILD_ID                              8636 non-null   float64       
 2   ECO_Entry_DATE                        5721 non-null   datetime64[ns]
 3   Ent SOCIAL_SCALE                      5721 non-null   float64       
 4   Ent KNOWLEDGE_SCALE                   5721 non-null   float64       
 5   Ent APPROPRIATE_ACTION_SCALE          5721 non-null   float64       
 6   ECO_Exit_DATE                         5721 non-null   datetime64[ns]
 7   Exit SOCIAL_SCALE                     5721 non-null   float64       
 8   Exit KNOWLEDGE_SCALE                  5721 non-null   float64       
 9   Exit APPROPRIATE_ACTION_SCALE         5721 non-null   float64       
 10  

In [None]:
elig.dtypes

In [None]:
eco.dtypes

In [None]:
eco['CHILD_ID'] = eco['CHILD_ID'].fillna(0)
elig['CHILD_ID'] = elig['CHILD_ID'].astype('float64')

In [45]:
elig.rename(columns = {'Child ID':'CHILD_ID', 'District': 'DISTRICT'}, inplace=True)

In [46]:
elig.head()

Unnamed: 0,DISTRICT,CHILD_ID,Child Status,Child Phase,Init. IFSP Due Date,IFSP Due Date,IFSP Late Reason,Active Ref. Date,Parent Consent/Intake Date,Date Dev. Evaluator Rec'd. Ref.,...,# Days Parent Consent to Dev. Eval.,# Days Dev. Evaluator Assigned to Dev. Eval.,1st Date Requested to Receipt of Med. Records,# Days Dev. Eval. to Elig. Det.,# Days Ref. to Elig. Det.,Init. Elig. Decision,Init. Elig. Category,Init. Elig. Det. Date,Exit Date,Exit Reason
0,ET,453926.0,Inactive,IFSP,2019-03-04,2019-03-27,System,2019-01-18,2019-02-04,2019-01-22 00:00:00,...,17.0,30.0,,4.0,38.0,Eligible,Developmental Evaluation,2019-02-25,2020-01-11,618 - Part B eligible
1,ET,431729.0,Inactive,IFSP,2018-10-26,2018-10-26,,2018-09-11,2018-09-25,2018-09-12 00:00:00,...,0.0,13.0,,6.0,20.0,Eligible,"Diagnosed Condition, Developmental Evaluation",2018-10-01,2020-12-11,618 - Part B eligibility not determined
2,ET,462474.0,Inactive,IFSP,2019-06-28,2019-06-14,,2019-05-14,2019-05-30,2019-05-15 00:00:00,...,0.0,15.0,0.0,1.0,17.0,Eligible,Developmental Evaluation,2019-05-31,2020-11-12,618 - Part B eligibility not determined
3,ET,446841.0,Inactive,Eligibility,2018-11-23,NaT,,2018-10-09,2018-10-25,2018-10-11 00:00:00,...,0.0,14.0,,1.0,17.0,Eligible,Diagnosed Condition,2018-10-26,2020-08-12,Parent decline
4,ET,459629.0,Inactive,IFSP,2019-05-16,2019-04-24,,2019-04-01,2019-04-10,2019-04-02 00:00:00,...,0.0,8.0,0.0,2.0,11.0,Eligible,Developmental Evaluation,2019-04-12,2019-09-26,618 - Parent withdraw


In [48]:
eco_essential = eco[['DISTRICT', 'CHILD_ID','Ent SOCIAL_SCALE', 'Ent KNOWLEDGE_SCALE', 'Ent APPROPRIATE_ACTION_SCALE', 'Exit SOCIAL_SCALE', 'Exit KNOWLEDGE_SCALE', 'Exit APPROPRIATE_ACTION_SCALE']]
eco_essential.head()

Unnamed: 0,DISTRICT,CHILD_ID,Ent SOCIAL_SCALE,Ent KNOWLEDGE_SCALE,Ent APPROPRIATE_ACTION_SCALE,Exit SOCIAL_SCALE,Exit KNOWLEDGE_SCALE,Exit APPROPRIATE_ACTION_SCALE
0,ET,500335.0,5.0,3.0,4.0,6.0,3.0,4.0
1,ET,479453.0,7.0,7.0,7.0,7.0,7.0,7.0
2,ET,510663.0,,,,,,
3,ET,452482.0,2.0,3.0,3.0,5.0,3.0,5.0
4,ET,506507.0,,,,,,


In [27]:
eco_essential.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8636 entries, 0 to 8635
Data columns (total 7 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   CHILD_ID                       8636 non-null   float64
 1   Ent SOCIAL_SCALE               5721 non-null   float64
 2   Ent KNOWLEDGE_SCALE            5721 non-null   float64
 3   Ent APPROPRIATE_ACTION_SCALE   5721 non-null   float64
 4   Exit SOCIAL_SCALE              5721 non-null   float64
 5   Exit KNOWLEDGE_SCALE           5721 non-null   float64
 6   Exit APPROPRIATE_ACTION_SCALE  5721 non-null   float64
dtypes: float64(7)
memory usage: 472.4 KB


In [47]:
elig_essential = elig[['DISTRICT', 'CHILD_ID', 'Init. Elig. Category']]
elig_essential.head()

Unnamed: 0,DISTRICT,CHILD_ID,Init. Elig. Category
0,ET,453926.0,Developmental Evaluation
1,ET,431729.0,"Diagnosed Condition, Developmental Evaluation"
2,ET,462474.0,Developmental Evaluation
3,ET,446841.0,Diagnosed Condition
4,ET,459629.0,Developmental Evaluation


In [25]:
elig_essential.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67610 entries, 0 to 67609
Data columns (total 2 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   CHILD_ID              67610 non-null  float64
 1   Init. Elig. Category  38290 non-null  object 
dtypes: float64(1), object(1)
memory usage: 1.0+ MB


In [49]:
essentials_1 = pd.merge(elig_essential, eco_essential, on=['CHILD_ID', 'DISTRICT'], how = 'outer')
essentials_1.head()

Unnamed: 0,DISTRICT,CHILD_ID,Init. Elig. Category,Ent SOCIAL_SCALE,Ent KNOWLEDGE_SCALE,Ent APPROPRIATE_ACTION_SCALE,Exit SOCIAL_SCALE,Exit KNOWLEDGE_SCALE,Exit APPROPRIATE_ACTION_SCALE
0,ET,453926.0,Developmental Evaluation,,,,,,
1,ET,431729.0,"Diagnosed Condition, Developmental Evaluation",,,,,,
2,ET,462474.0,Developmental Evaluation,,,,,,
3,ET,446841.0,Diagnosed Condition,,,,,,
4,ET,459629.0,Developmental Evaluation,,,,,,


In [50]:
essentials_1.isna().sum()

DISTRICT                             3
CHILD_ID                             0
Init. Elig. Category             29333
Ent SOCIAL_SCALE                 61902
Ent KNOWLEDGE_SCALE              61902
Ent APPROPRIATE_ACTION_SCALE     61902
Exit SOCIAL_SCALE                61902
Exit KNOWLEDGE_SCALE             61902
Exit APPROPRIATE_ACTION_SCALE    61902
dtype: int64

In [51]:
essentials_1['Init. Elig. Category'].value_counts()

Developmental Evaluation                         28317
Diagnosed Condition, Developmental Evaluation     5298
Diagnosed Condition                               4675
Name: Init. Elig. Category, dtype: int64

In [52]:
essentials_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 67623 entries, 0 to 67622
Data columns (total 9 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   DISTRICT                       67620 non-null  object 
 1   CHILD_ID                       67623 non-null  float64
 2   Init. Elig. Category           38290 non-null  object 
 3   Ent SOCIAL_SCALE               5721 non-null   float64
 4   Ent KNOWLEDGE_SCALE            5721 non-null   float64
 5   Ent APPROPRIATE_ACTION_SCALE   5721 non-null   float64
 6   Exit SOCIAL_SCALE              5721 non-null   float64
 7   Exit KNOWLEDGE_SCALE           5721 non-null   float64
 8   Exit APPROPRIATE_ACTION_SCALE  5721 non-null   float64
dtypes: float64(7), object(2)
memory usage: 5.2+ MB


In [56]:
essentials_1['Ent SOCIAL_SCALE'].describe()

count    5721.000000
mean        3.283167
std         1.930219
min         1.000000
25%         2.000000
50%         3.000000
75%         5.000000
max         7.000000
Name: Ent SOCIAL_SCALE, dtype: float64

In [60]:
essentials_1.loc[essentials_1['Init. Elig. Category'] == 'Developmental Condition'].mean()['Ent SOCIAL_SCALE']

nan

In [59]:
essentials_1.mean()['Ent SOCIAL_SCALE']

  essentials_1.mean()['Ent SOCIAL_SCALE']


3.2831672784478236

In [61]:
essentials_drop = essentials_1.dropna()

In [62]:
essentials_drop.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5717 entries, 5 to 67600
Data columns (total 9 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   DISTRICT                       5717 non-null   object 
 1   CHILD_ID                       5717 non-null   float64
 2   Init. Elig. Category           5717 non-null   object 
 3   Ent SOCIAL_SCALE               5717 non-null   float64
 4   Ent KNOWLEDGE_SCALE            5717 non-null   float64
 5   Ent APPROPRIATE_ACTION_SCALE   5717 non-null   float64
 6   Exit SOCIAL_SCALE              5717 non-null   float64
 7   Exit KNOWLEDGE_SCALE           5717 non-null   float64
 8   Exit APPROPRIATE_ACTION_SCALE  5717 non-null   float64
dtypes: float64(7), object(2)
memory usage: 446.6+ KB


In [63]:
essentials_drop.loc[essentials_drop['Init. Elig. Category'] == 'Developmental Condition'].mean()['Ent SOCIAL_SCALE']

nan