# Meaning/Usage of Applicants Gap

*this kernel is a continuation of the [last one][1]*

We estimated the applicants gap... Okat. But, what does it really mean?

It is the difference between what we expected and what really happens.  What does this difference means though?

It shows any relation that wasn't captured by our model. That is, it measures the effect of factors not present in the following list:

- Charter school?
- Ethnicity
- English language learners?
- Students with disabilities?
- Economic need of students?
- Scores in the NYS yearly tests
- Chronic absenteeism rate

This is a vague definition, I know, but the effects that the gap capture are relevant. They may include things such as:

- Lack of awareness in schools
- Shortage of means with which students can apply to SHSAT easily
- Students underestimating their capacity of receiving an offer from Specialized High Schools
- Etc<sup>1</sup>

When a big gap is present, there may be many of these factors at play. And, this is where PASSNYC can come in. If something works differently at a school, PASSNYC can go there, understand what is happening andbe useful when it can.

<sub>1: One important factor that I didn't mention is noise. This relationship is better understood in the following kernel:<sub>

[1]: https://www.kaggle.com/araraonline/measuring-the-applicants-gap-in-2017

# But, how should we use the gap, exactly?

Well, the specifics are of PASSNYC's choice. I can give some suggestions though.

- As a way of filtering schools

   PASSNYC won't be much effective if they target schools that already have a high number of applicants (compared to what would be expected out of them). So, one might filter out schools that have an uninteresting gap.
   
- Alongside an attractiveness score (please take a look at [this kernel][1] for an example of what I'm talking about)

   The applicants gap can possibly be seen as a way of estimating the amount of convert PASSNYC can get if it intervenes in a school. If this potentiality is combined with a measure of attractiveness, the result is a value that balance the potential of PASSNYC when getting in a school along with its desire to do so.

- If everything goes wrong it can still be used as an aid to choose schools. Let's say, for example, that PASSNYC discovers that what the gaps measure is some sort of characteristic that we can't even fream on changing. Well... Now we know which schools have more or less of it.<sup>1</sup>

<sub>1: In this situtation, though, I would suggest using the model directly instead of the measured gap. The model could give the influence of characteristics that PASSNYC *is* able to change (in opposition to the hypothetical gap failure)</sub>

# This gap is nonsensical by itself, can you make it a bit more palpable?

Sure thing! Here is a list of schools freshly taken out of the [NYC Open Data][1] and [NYC InfoHub][2]. It alreasy contains the measured gaps in counts and percentages, alongside a ratio of the estimated gap divided by the number of students who sat for the SHSAT.

(schools with up to 5 applicants were considered to have exactly 5 applicants, as they already shine a spotlight on them)

[1]: https://opendata.cityofnewyork.us/
[2]: https://infohub.nyced.org/

In [None]:
import numpy as np
import pandas as pd

pd.set_option('display.max_columns', None)

In [None]:
base_df = pd.read_pickle('../data/process/schools2017.pkl')

ex1 = pd.read_csv('../data/keep/expected_testers.csv', index_col=0)
ex2 = pd.read_csv('../data/keep/expected_testers2.csv', index_col=0)
ex = pd.concat([ex1, ex2])

In [None]:
hs_counts = base_df['# Students in HS Admissions']

cnt_shsat_testers = base_df['# SHSAT Testers'].fillna(5)
pct_shsat_testers = cnt_shsat_testers / hs_counts
ex_cnt_shsat_testers = ex['Expected % of SHSAT Testers'] * hs_counts
ex_pct_shsat_testers = ex['Expected % of SHSAT Testers']

df_cnt_shsat_testers = ex_cnt_shsat_testers - cnt_shsat_testers
df_pct_shsat_testers = ex_pct_shsat_testers - pct_shsat_testers
df_ratio = df_cnt_shsat_testers / cnt_shsat_testers

In [None]:
table = pd.DataFrame({
    'Estimated Sit #': ex_cnt_shsat_testers,
    'Actual Sit #': cnt_shsat_testers,
    'Diff Sit #': df_cnt_shsat_testers,
    
    'Estimated Sit %': ex_pct_shsat_testers,
    'Actual Sit %': pct_shsat_testers,
    'Diff Sit %': df_pct_shsat_testers,
    
    'Difference Ratio': df_ratio,
})
joined = base_df.join(table)
joined.head()

In [None]:
# joined.to_csv('deliver.csv')