The data reported to the public is based on Early Childhood Outcomes (ECO) data, which is collected every six months in conjunction with the Individualized Family Service Plan (IFSP) review cycle and is measured on three outcomes. For each outcome a child is placed into 5 possible progress categories:
1. Children who did not improve functioning
2. Children who improved functioning, but not sufficient to move nearer to functioning comparable to same-aged peers
3. Children who improved functioning to a level nearer to same-aged peers, but did not reach it
4. Children who improved functioning to reach a level comparable to same-aged peers
5. Children who maintained functioning at a level comparable to same-aged peers

In the "Summary Statements" tab of the Excel Spreadsheet you have been provided, you can see the calculation for the overall count and percentage of children in each category for each outcome along with the percentage of children that substantially increased their rate of growth and percentage of children who were functioning within age expectation by outcome. These calculations are based on data contained in the "ECO with Exit21-22" tab.

Your primary objective in this project is to investigate whether there is any measurable difference in progress based on the eligibility category. There are three options:
* Developmental evaluation (delay)
* Diagnosed condition
* Diagnosed condition, developmental evaluation (by both delay and diagnosis)

Start by looking at overall progress by eligibility category. This information is contained in column AI of the "Elig Timeline Rpt 2018-2022" tab. After looking at overall rates, factor in the time of service, which is contained in the "ECO with Exit-21-22" tab.

Each child is associated with a Point of Entry (POE) office, as indicated in column A of the "ECO with Exit21-22" column. Do the above comparison by POE as well, similar to the calculations in the "ECO by POE" tab.

After answering the above questions, additional areas you can look into are listed below: 
* Does typical time of service differ for different eligibility categories?
* Do exit reasons vary by eligibility category? Do more children in one eligibility category age out compared to leaving for other reasons?
* In the "ECO with Exit21-22" sheet, columns D, E, and F contain the entry ECO scores, and columns H, I, and J contain the exit scores. Analyze these scores by looking at the typical improvement seen for each entry rating compared to the time of service. For example, what percentage of children entering with a score of 1 also exit with a score of 1? How many improve to a 2 or a 3? How does this vary by time of service? 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
eco = pd.read_excel('../Data/TEIS_data.xlsx', sheet_name = 'ECO with Exit21-22')
eco

In [None]:
elig = pd.read_excel('../Data/TEIS_data.xlsx', sheet_name = 'Elig Timeline Rpt 2018-2022')
elig

In [None]:
elig.head()

In [None]:
eco.head()

In [None]:
elig['Init. Elig. Category'].value_counts()['Developmental Evaluation']

In [None]:
elig['Init. Elig. Category'].value_counts()['Diagnosed Condition']

In [None]:
elig['Init. Elig. Category'].value_counts()['Diagnosed Condition, Developmental Evaluation']

In [None]:
elig.info

In [None]:
elig['District'].value_counts()['ET']

In [None]:
print(elig.dtypes)
print(eco.dtypes)

In [None]:
eco['CHILD_ID'] = eco['CHILD_ID'].fillna(0).astype(np.int64)
eco

In [None]:
elig.rename(columns = {'Child ID':'CHILD_ID'}, inplace=True)
eco_elig = pd.merge(elig, eco, on=['CHILD_ID'], how='outer')
eco_elig