# AEPS Improvement by Point of Entry

This workbook creates two graphs included in the final presentation to DDID:
- Slide 12 -- Lowest Items: Average Improvement by Point of Entry
- Slide 14 -- Highest Items: Average Improvement by Point of Entry

This workbook also includes other analyses of AEPS scores focused on Point of Entry.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt

In [None]:
%matplotlib inline

In [None]:
# Read in csv created by running aeps_cleaning notebook.
aeps_df = pd.read_csv('../data/aeps_cleansed_data.csv')

### POE Name and Abbreviation
Changing main dataframe to have two columns with POE info:
- **poe_name:** Full name of POE. Renamed existing column 'TEIS Point of Entry Office (POE)'.
- **poe_abbr:** Two letter abbreviation for POE.

In [None]:
aeps_df = aeps_df.rename(columns = {'TEIS Point of Entry Office (POE)': 'poe_name'})

In [None]:
aeps_df['poe_name'].value_counts(dropna = False)

In [None]:
poe_data = [['East Tennessee', 'ET'],
            ['Greater Nashville Tennessee', 'GN'],
            ['Southcentral Tennessee', 'SC'],
            ['First Tennessee', 'FT'],
            ['Memphis/Delta Tennessee', 'MD'],
            ['Upper Cumberland Tennessee', 'UC'],
            ['Southeast Tennessee', 'SE'],
            ['Southwest Tennessee', 'SW'],
            ['Northwest Tennessee', 'NW']]
poe_abbr_df = pd.DataFrame(data = poe_data, columns=['poe_name', 'poe_abbr'])

In [None]:
aeps_df = pd.merge(aeps_df, poe_abbr_df, how = 'left', on = ['poe_name', 'poe_name'])

### Rename Item and Test Date Columns

In [None]:
aeps_df = aeps_df.rename(columns = {'Test Date': 'test_date'})

In [None]:
aeps_df = aeps_df.rename(columns = {'fm_B4.0':'FM B4.0', 'fm_B5.0':'FM B5.0', 'cog_D2.0':'COG D2.0', 'cog_E2.0':'COG E2.0', 
                                    'cog_E4.0':'COG E4.0', 'cog_F1.0':'COG F1.0', 'cog_G1.0':'COG G1.0', 
                                    'cog_G2.0':'COG G2.0', 'cog_G3.0':'COG G3.0', 'cog_G4.0':'COG G4.0', 
                                    'cog_G5.0':'COG G5.0', 'cog_G6.0':'COG G6.0', 'sc_B1.0':'SC B1.0', 
                                    'sc_B2.0':'SC B2.0', 'sc_D1.0':'SC D1.0', 'sc_D2.0':'SC D2.0', 'sc_D3.0':'SC D3.0'})

### First and Last Test Scores
This section creates dataframes containing only first tests and last tests.
It also filters out records for children who do not have a first and last test at least 183 days apart.

In [None]:
#Makes a copy of aeps_df to use in this section.
test_dates = aeps_df.copy()

In [None]:
test_dates['test_date'] = pd.to_datetime(test_dates['test_date'], errors = 'coerce')

In [None]:
test_dates = test_dates.dropna(subset=['test_date'])
print('Count of all tests: ',test_dates.shape[0])

In [None]:
#First and last tests for each Child ID are determined based on looking for the min and max test_date for that Child ID.
#If a Child ID has only one test, it will be counted as a first test and as a last test (instead of only counting as 
#a first test); this won't matter in a few steps after we filter out children that don't have a first and last test 
#more than 183 days apart.

#first_test_list is a dataframe that includes only Child ID and test_date, where test_date contains the date of the 
#child's first test. last_test_list is similar except that test_date contains the date of the child's last test.
first_test_list = test_dates.groupby('Child ID')['test_date'].min().reset_index()
last_test_list = test_dates.groupby('Child ID')['test_date'].max().reset_index()

In [None]:
#Create first_test_data, which is a dataframe contains only rows from test_dates that represent first tests. 
first_test_data = pd.merge(first_test_list, test_dates, how = 'inner', on = ['Child ID', 'test_date'])
print('Count of first tests: ',first_test_data.shape[0])

#Create last_test_data, which is a dataframe contains only rows from test_dates that represent last tests. 
last_test_data = pd.merge(last_test_list, test_dates, how = 'inner', on = ['Child ID', 'test_date'])
print('Count of last tests: ',last_test_data.shape[0])

In [None]:
#Create dataframe containing one for each unique Child ID that shows the dates of their first and last tests
#and the number of days between their first and last tests.

first_test_temp_list = first_test_list.rename(columns = {'test_date': 'first_test_date'})
last_test_temp_list = last_test_list.rename(columns = {'test_date': 'last_test_date'})

child_fl_test_dates = pd.merge(first_test_temp_list, last_test_temp_list, on=['Child ID', 'Child ID'])
child_fl_test_dates['diff_days'] = (child_fl_test_dates['last_test_date'] - child_fl_test_dates['first_test_date']).dt.days
print('Count of unique Child IDs: ',child_fl_test_dates.shape[0])
child_fl_test_dates.head()

In [None]:
child_keep_list = child_fl_test_dates.loc[child_fl_test_dates.diff_days >= 183]
print('Count of Child IDs that have a first and last test at least 183 days apart: ',child_keep_list.shape[0])

In [None]:
#Filter main aeps_df dataframe to only include rows for children with more than 183 days between first and last tests.
print('Count of unique Child IDs before filtering:', aeps_df['Child ID'].value_counts().shape[0])
print('Count of all tests before filtering: ', aeps_df.shape[0])
aeps_df = aeps_df.loc[aeps_df['Child ID'].isin(child_keep_list['Child ID'])]
print('Count of unique Child IDs after filtering:', aeps_df['Child ID'].value_counts().shape[0])
print('Count of all tests after filtering: ', aeps_df.shape[0])

In [None]:
#Filter first_test_data and last_test_data to only include rows for children with more than 183 days
#between first and last tests.
print('Count of first tests before filtering: ', first_test_data.shape[0])
first_test_data = first_test_data.loc[first_test_data['Child ID'].isin(child_keep_list['Child ID'])]
print('Count of first tests after filtering: ', first_test_data.shape[0])

print('Count of last tests before filtering: ', last_test_data.shape[0])
last_test_data = last_test_data.loc[last_test_data['Child ID'].isin(child_keep_list['Child ID'])]
print('Count of last tests after filtering: ', last_test_data.shape[0])

### Average Scores for First and Last Tests

In [None]:
#These lists are used throughout notebook to select a subset of columns for specific tests.
outcome_b_items = ['FM B4.0', 'FM B5.0', 'COG D2.0', 'COG E2.0', 'COG E4.0', 'COG F1.0', 'COG G1.0', 'COG G2.0', 
                   'COG G3.0', 'COG G4.0', 'COG G5.0', 'COG G6.0', 'SC B1.0', 'SC B2.0', 'SC D1.0', 'SC D2.0', 'SC D3.0']
outcome_b_low_items = ['FM B5.0', 'COG G2.0', 'SC D3.0']
outcome_b_high_items = ['FM B4.0', 'COG E2.0', 'SC B1.0']

In [None]:
poe_ft_scores = first_test_data[['poe_name', 'poe_abbr'] + outcome_b_items]
poe_lt_scores = last_test_data[['poe_name', 'poe_abbr'] + outcome_b_items]

In [None]:
#Create dataframes containing the average scores for each test grouped by POE.
poe_ft_avg_scores = poe_ft_scores.groupby(['poe_name', 'poe_abbr']).agg('mean')
poe_ft_avg_scores = poe_ft_avg_scores.loc[:,'FM B4.0':].reset_index()

poe_lt_avg_scores = poe_lt_scores.groupby(['poe_name', 'poe_abbr']).agg('mean')
poe_lt_avg_scores = poe_lt_avg_scores.loc[:,'FM B4.0':].reset_index()

poe_lt_avg_scores.head()

In [None]:
poe_ft_avg_scores_melt = pd.melt(poe_ft_avg_scores, 
                                 id_vars=['poe_name', 'poe_abbr'],
                                 var_name='item',
                                 value_name='ft_avg_score')
poe_ft_avg_scores_melt.head()

poe_lt_avg_scores_melt = pd.melt(poe_lt_avg_scores, 
                                 id_vars=['poe_name', 'poe_abbr'],
                                 var_name='item',
                                 value_name='lt_avg_score')
poe_lt_avg_scores_melt.head()

### Difference in Average Scores

In [None]:
#Merge results from prior step to create data frame that shows the first test average score, last test average score,
#and average score difference for each combination of POE and item.

poe_avg_scores_diff = pd.merge(poe_ft_avg_scores_melt, 
                                   poe_lt_avg_scores_melt, 
                                   how = 'inner', 
                                   on = ['poe_name','poe_abbr', 'item'])

poe_avg_scores_diff['avg_score_diff'] = poe_avg_scores_diff['lt_avg_score'] - poe_avg_scores_diff['ft_avg_score']
poe_avg_scores_diff.head()

In [None]:
#Create dataframes that include a subset of rows for three low scoring items and for three high scoring items.

#poe_low_avg_scores_diff = poe_avg_scores_diff.loc[poe_avg_scores_diff['item'].isin(outcome_b_low_items)]
poe_low_avg_scores_diff = poe_avg_scores_diff.loc[poe_avg_scores_diff['item'].isin(outcome_b_low_items)]
#poe_low_avg_scores_diff.item.value_counts()

poe_high_avg_scores_diff = poe_avg_scores_diff.loc[poe_avg_scores_diff['item'].isin(outcome_b_high_items)]
#poe_high_avg_scores_diff.item.value_counts()

### Create graphs for lowest and highest items: Average Improvement by Point of Entry
#### These graphs appear on slides 12 and 14 of the final presentation.

In [None]:
#palette = ['#ff0000', '#27365a', '#a6a6a6']
poe_low_avg_diff_fg = sns.FacetGrid(poe_low_avg_scores_diff,col='item', 
                                    hue='item', 
                                    palette = ['#C9CDD6', '#818AF9', '#394768'])
poe_low_avg_diff_plot = poe_low_avg_diff_fg.map_dataframe(sns.barplot,x='avg_score_diff',y='poe_name')

#Normal xlabels are not used because they overlap when they appear under each of the three facet graphs.
#poe_low_avg_diff_plot.set_xlabels('Average Change in Score', 
#                                  color = '#FF0000', 
#                                  font='serif', 
#                                  fontweight = 'bold',
#                                  fontsize = 14)
poe_low_avg_diff_plot.set_xlabels('')
plt.text(x=-0.4, y=10.5, s='Difference in Average Score',
         color = '#FF0000', font='serif', fontsize = 14, fontweight = 'bold')

poe_low_avg_diff_plot.set_ylabels('Point of Entry', 
                                  color = '#FF0000', 
                                  font='serif', 
                                  fontweight = 'bold',
                                  fontsize = 14)

poe_low_avg_diff_fg.set_titles("{col_name}", color = '#FF0000', font='serif', fontweight = 'bold', size = 14)

poe_low_avg_diff_fg.fig.suptitle('Lowest Items: Average Improvement by Point of Entry', 
                                 color = '#FF0000', 
                                 font='serif', 
                                 fontsize = 20,
                                 fontweight = 'bold',
                                 y=1.17)
;

In [None]:
poe_high_avg_diff_fg = sns.FacetGrid(poe_high_avg_scores_diff,col='item', 
                                    hue='item', 
                                    palette = ['#C9CDD6', '#818AF9', '#394768'])
poe_high_avg_diff_plot = poe_high_avg_diff_fg.map_dataframe(sns.barplot,x='avg_score_diff',y='poe_name')

poe_high_avg_diff_plot.set_xlabels('')
plt.text(x=-1.6, y=10.5, s='Difference in Average Score',
         color = '#FF0000', font='serif', fontsize = 14, fontweight = 'bold')

poe_high_avg_diff_plot.set_ylabels('Point of Entry', 
                                  color = '#FF0000', 
                                  font='serif', 
                                  fontweight = 'bold',
                                  fontsize = 14)
poe_high_avg_diff_fg.set_titles("{col_name}", color = '#FF0000', font='serif', fontweight = 'bold', size = 14)
poe_high_avg_diff_fg.fig.suptitle('Highest Items: Average Improvement by Point of Entry', 
                                 color = '#FF0000', 
                                 font='serif', 
                                 fontsize = 20,
                                 fontweight = 'bold',
                                 y=1.17)

plt.savefig('../data/High_Items_Avg_Change_By_POE.png')
;

In [None]:
#Create three csv files with data that can be sent to Shannon at DDID. Two of files contain the data used to 
#create the two graphs (see above) included in the final presentation. A third file contains similar data
#but includes all of the items relevant to Outcome B.

all_item_save = poe_avg_scores_diff.rename(columns = {'ft_avg_score': 'first_test_avg_score',
                                                      'lt_avg_score': 'last_test_avg_score'})
all_item_save.to_csv('../data/All_Items_Improv_by_POE.csv')

low_item_save = poe_low_avg_scores_diff.rename(columns = {'ft_avg_score': 'first_test_avg_score',
                                                          'lt_avg_score': 'last_test_avg_score'})
low_item_save.to_csv('../data/Slide12_Low_Items_Improv_by_POE.csv')

high_item_save = poe_high_avg_scores_diff.rename(columns = {'ft_avg_score': 'first_test_avg_score',
                                                          'lt_avg_score': 'last_test_avg_score'})
high_item_save.to_csv('../data/Slide14_High_Items_Improv_by_POE.csv')

# Additional analysis not used in final presentation

### Average Scores by POE and Item

In [None]:
poe_scores = aeps_df[['poe_name', 'poe_abbr'] + outcome_b_items]

In [None]:
poe_avg_scores = poe_scores.groupby(['poe_name', 'poe_abbr']).agg('mean')
poe_avg_scores = poe_avg_scores.loc[:,'FM B4.0':].reset_index()  #Should be FM B4.0

In [None]:
poe_avg_scores_melt = pd.melt(poe_avg_scores, 
                              id_vars=['poe_name', 'poe_abbr'],
                              var_name='item',
                              value_name='avg_score')
poe_avg_scores_melt.head()

In [None]:
poe_avg_fg = sns.FacetGrid(poe_avg_scores_melt,col='poe_abbr',col_wrap=5)
poe_avg_plot = poe_avg_fg.map_dataframe(sns.barplot,x='avg_score',y='item')

poe_avg_fg.fig.suptitle('All Tests: Average Scores by POE and Item (version 1)', 
                        color = '#FF0000', 
                        font='serif', 
                        fontsize = 20,
                        fontweight = 'bold',
                        y=1.05)

plt.savefig('../data/Avg_Scores_By_POE.png');

In [None]:
poe_avg_fg_2 = sns.FacetGrid(poe_avg_scores_melt,col='item',col_wrap=6)
poe_avg_plot_2 = poe_avg_fg_2.map_dataframe(sns.barplot,x='avg_score',y='poe_abbr')

poe_avg_fg_2.fig.suptitle('All Tests: Average Scores by POE and Item (version 2)', 
                        color = '#FF0000', 
                        font='serif', 
                        fontsize = 20,
                        fontweight = 'bold',
                        y=1.05)

plt.savefig('../data/Avg_Scores_By_POE_2.png');

### Average Scores for First and Last Tests

In [None]:
poe_ft_avg_fg = sns.FacetGrid(poe_ft_avg_scores_melt,col='poe_abbr',col_wrap=5)
poe_ft_avg_plot = poe_ft_avg_fg.map_dataframe(sns.barplot,x='ft_avg_score',y='item')

poe_ft_avg_fg.fig.suptitle('First Tests: Average Scores by POE and Item (version 1)', 
                        color = '#FF0000', 
                        font='serif', 
                        fontsize = 20,
                        fontweight = 'bold',
                        y=1.05);

In [None]:
poe_ft_avg_fg = sns.FacetGrid(poe_ft_avg_scores_melt,col='item',col_wrap=6)
poe_ft_avg_plot = poe_ft_avg_fg.map_dataframe(sns.barplot,x='ft_avg_score',y='poe_abbr')

poe_ft_avg_fg.fig.suptitle('First Tests: Average Scores by POE and Item (version 2)', 
                        color = '#FF0000', 
                        font='serif', 
                        fontsize = 20,
                        fontweight = 'bold',
                        y=1.05);

In [None]:
poe_lt_avg_fg = sns.FacetGrid(poe_lt_avg_scores_melt,col='poe_abbr',col_wrap=5)
poe_lt_avg_plot = poe_lt_avg_fg.map_dataframe(sns.barplot,x='lt_avg_score',y='item')

poe_lt_avg_fg.fig.suptitle('Last Tests: Average Scores by POE and Item (version 1)', 
                        color = '#FF0000', 
                        font='serif', 
                        fontsize = 20,
                        fontweight = 'bold',
                        y=1.05);

In [None]:
poe_lt_avg_fg = sns.FacetGrid(poe_lt_avg_scores_melt,col='item',col_wrap=6)
poe_lt_avg_plot = poe_lt_avg_fg.map_dataframe(sns.barplot,x='lt_avg_score',y='poe_abbr')

poe_lt_avg_fg.fig.suptitle('Last Tests: Average Scores by POE and Item (version 2)', 
                        color = '#FF0000', 
                        font='serif', 
                        fontsize = 20,
                        fontweight = 'bold',
                        y=1.05);

# Domain Score Analysis

### Domain Percentage Boxplots
The following graph shows the distribution of FM Percentage values across all tests and all children.
- FM Percentage is the sum of the scores for all FM domain items as a percentage of the maximum possible total score for all FM domain items. 
- FM Percentage **includes all FM domain items, not just those relevant to Outcome B.**

In [None]:
sns.boxplot(data = aeps_df,
            x = 'FM Percentage');

In [None]:
#The following graph shows the distribution of COG Percentage values across all tests 
#and all children. It includes all COG domain items, not just those relevant to Outcome B.
sns.boxplot(data = aeps_df,
            x = 'Cog Percentage');

In [None]:
#The following graph shows the distribution of SC Percentage values across all tests 
#and all children. It includes all SC domain items, not just those relevant to Outcome B.
sns.boxplot(data = aeps_df,
            x = 'SC Percentage');

In [None]:
#The following graph shows the distribution of FM Percentage values across all tests 
#and all children, broken out by POE. It includes all FM domain items, not just those 
#relevant to Outcome B.

sns.boxplot(data = aeps_df,
            x = 'FM Percentage',
            y = 'poe_name');

In [None]:
#The following graph shows the distribution of COG Percentage values across all tests 
#and all children, broken out by POE. It includes all COG domain items, not just those 
#relevant to Outcome B.

sns.boxplot(data = aeps_df,
            x = 'Cog Percentage',
            y = 'poe_name');

In [None]:
#The following graph shows the distribution of SC Percentage values across all tests 
#and all children, broken out by POE. It includes all SC domain items, not just those 
#relevant to Outcome B.

sns.boxplot(data = aeps_df,
            x = 'SC Percentage',
            y = 'poe_name');

#### The following graphs only include items for each domain that are relevant to Outcome B.

In [None]:
#The following graph shows the distribution of FM Percentage values across all tests 
#and all children, broken out by POE. It includes only FM domain items that are relevant
#to Outcome B.

#Note that the scores for FM B4.0 and FM B5.0 are so low, that any score above a zero 
#is considered an outlier!

aeps_df['objb_fm_raw'] = aeps_df['FM B4.0']+aeps_df['FM B5.0']
aeps_df['objb_fm_pct'] = aeps_df['objb_fm_raw']/4

In [None]:
sns.boxplot(data = aeps_df,
            x = 'objb_fm_pct',
            y = 'poe_name');

In [None]:
#The following graph shows the distribution of COG Percentage values across all tests 
#and all children, broken out by POE. It includes only COG domain items that are relevant
#to Outcome B.

aeps_df['objb_cog_raw'] = aeps_df['COG D2.0'] + aeps_df['COG E2.0'] + aeps_df['COG E4.0'] + aeps_df['COG F1.0'] + aeps_df['COG G1.0'] + aeps_df['COG G2.0'] + aeps_df['COG G3.0'] + aeps_df['COG G4.0'] + aeps_df['COG G5.0'] + aeps_df['COG G6.0']
aeps_df['objb_cog_pct'] = aeps_df['objb_cog_raw'] / 20

In [None]:
sns.boxplot(data = aeps_df,
            x = 'objb_cog_pct',
            y = 'poe_name');

In [None]:
#The following graph shows the distribution of SC Percentage values across all tests 
#and all children, broken out by POE. It includes only SC domain items that are relevant
#to Outcome B.

aeps_df['objb_sc_raw'] = aeps_df['SC B1.0'] + aeps_df['SC B2.0'] + aeps_df['SC D1.0'] + aeps_df['SC D2.0'] + aeps_df['SC D3.0']
aeps_df['objb_sc_pct'] = aeps_df['objb_sc_raw'] / 10

In [None]:
sns.boxplot(data = aeps_df,
            x = 'objb_sc_pct',
            y = 'poe_name');