## MealsCount Algorithm (v2)
  
This notebook details the implementation of an algorithm to groups schools (within a given school district) for maximizing federal funds received through the [**C**ommunity **E**ligiblity **P**rogram](https://www.fns.usda.gov/school-meals/community-eligibility-provision). The groupings generated by the algorithm are near-optimal, optimality being constrained by the need to minimize computational complexity.  
  
### Background  
  
Currently, the Federal government, through the [Food and Nutrition Service](https://www.usda.gov/topics/food-and-nutrition) of the US Dept of Agriculture, offers multiple programs to provide free and/or subsidized meals to school-going children. These programs are targeted at students with low income families. The CEP is one such program.  
  
School districts apply to enrol their schools in CEP once every 4 years (or each year, under certain circumstances). **CEP eligibility criteria** are listed in detail [here](https://www.cde.ca.gov/ls/nu/sn/cepfactsheet.asp). The program allows schools within a school district to enrol individually or in groups (a minimum of 2 schools per group, up to max schools in the district). There is no limit on the number of groups per school district. A school can only be part of one group (?). Further, groups may contain schools of different types (charter, non-charter and so on).  
  
### Problem Statement  
  
While schools enrolled in CEP **must** serve meals to **all** students, the percentage of such meals covered by federal funds is computed based on the Identified Student Percentage (ISP) of each school. Specifically, it is given by the below formula:  
> *__% Meals Covered__* = *__ISP__ X __1.6__*  
  
This implies that in order to be fully (100%) funded the school (or school group) must have an ISP of at least 62.5% (since *62.5 X 1.6 = 100*). For schools (or school groups) with less than 62.5% ISP, the percentage of meals covered by federal funds decreases on a sliding scale until it reaches a minimum of 64% (since a minimum ISP of 40% is required for CEP enrolment, and *40 X 1.6 = 64*). Any meals not funded by CEP will have to be paid for by the student, or by the school itself in case of the student's inability to pay for the same. The latter is more common than not and leaves schools burdened with debt from partially subsidized meals. It is therefore in the school's best interest to meet the 62.5% ISP threshold for full coverage, either by itself, or as part of a school group.  
  
Currently, school groups within a school district are generated manually (through school officials interacting with spreadsheet data). This often results in sub-optimal groupings leading to either many schools not qualifying for CEP entirely, or failing to get adequate funding for meals served.  
  
### The MealsCount Solution    
  
The MealsCount approach to address the sub-optimalities mentioned above is to use an algorithm to generate the school groups. The algorithm is designed with the following optimization criteria:    
  
1. Maximize the percentage of meals funded by CEP, on a per school basis
2. Maximize the number of schools (i.e.: number of students) enrolled in CEP, on a per district basis  
  
In concrete terms, (1) attempts to generate school groups that have an aggregated ISP of 62.5% but not too much lower (or for that matter, too much higher) than that. (2) attemtps to increase the percentage of schools in a CEP eligible group (i.e.: the group's aggregated ISP is 40% or more, ideally no more than 62.5%) such that it is at or near 100% for the school district.

### Algorithm Design   
  
Generating sets of unique groups (i.e.: school groups in our case) from within a large set (i.e.: school district) is, at its core, a combinatrics problem. More specifically, it falls in the realm of [combinatorial optimization](https://en.wikipedia.org/wiki/Combinatorial_optimization). A set with *__n__* elements has *__2<sup>n<sup>__* unique combinations. A typical school district has anywhere from 15-30 schools, resulting in anywhere from 32K to 1B unique groups that would have to be searched for the above optimization criteria. At this size the problem is not trivial but nevertheless manageable. However, it is not uncommon to find school districts with anywhere from a 100 to 1000 schools (e.g.: LA Unified has a 1000+ schools). This leads to an unimaginably large search space rendering a brute-force solution infeasible. Any practical solutions to the problem will only be **near-optimal**.   

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import pprint

# display related
from IPython.display import display, HTML

In [2]:
import backend_utils as bu
import config_parser as cp

In [3]:
CWD = os.getcwd()

DATADIR = "data"
DATAFILE = "calpads_sample_data.xlsx"

CONFIG_FILE = "config.json"

##### Algorithm Inputs  
  
The algorithm takes the following inputs:  
  
* School District Input: this contains information needed to compute per-school ISP  
* Configuration: this contains school meal rates, ISP thresholds among other information

In [4]:
data_in = bu.mcXLSchoolDistInput(os.path.join(DATADIR,DATAFILE))
df = data_in.to_frame()
df.head(n=3)

Unnamed: 0,school_code,school_name,total_enrolled,frpm,foster,homeless,migrant,direct_cert,frpm_nodup,el,frpm_el_nodup,school_type
0,1000001,School NC01,37,4,27,0,0,6,29,5,30,non-charter
1,1000002,School NC02,1111,503,2,7,0,215,527,122,556,non-charter
2,1000003,School NC03,2332,897,2,14,0,440,979,169,1037,non-charter


In [5]:
cfg = cp.mcModelConfig(CONFIG_FILE)
cfg.show()



MealsCount Model Configuration
------------------------------
Version: 1.0
Min CEP Threshold (%): 0.4
Max CEP Threshold (%): 0.625
CEP Rates Table:
         nslp_lunch_free_rate  nslp_lunch_paid_rate  sbp_bkfst_free_rate  \
default                  3.23                  0.31                 1.75   
AK                       5.24                  0.50                 2.79   
HI                       3.78                  0.36                 2.03   
PR                       3.78                  0.36                 2.03   

         sbp_bkfst_paid_rate  
default                 0.30  
AK                      0.45  
HI                      0.34  
PR                      0.34  


##### Compute ISP  
  
The ISP for each school in the school district is computed from the CALPADs school district input data as below:  
> ISP = (Foster + Homeless + Migrant + Direct Certification) / Total Enrollment

In [6]:
# remove aggregated records
df = df[df['school_name']!='total']

In [7]:
total_eligible = (df['foster'] + df['homeless'] + df['migrant'] + df['direct_cert'])
s = (total_eligible/df['total_enrolled']) * 100
df = df.assign(total_eligible=total_eligible)
df = df.assign(isp=s)
df.loc[:,'isp'] = np.around(df['isp'].astype(np.double),2)

In [8]:
df.head(n=3)

Unnamed: 0,school_code,school_name,total_enrolled,frpm,foster,homeless,migrant,direct_cert,frpm_nodup,el,frpm_el_nodup,school_type,total_eligible,isp
0,1000001,School NC01,37,4,27,0,0,6,29,5,30,non-charter,33,89.19
1,1000002,School NC02,1111,503,2,7,0,215,527,122,556,non-charter,224,20.16
2,1000003,School NC03,2332,897,2,14,0,440,979,169,1037,non-charter,456,19.55


Sort schools within the district by their ISP in descending order (higher ISP schools appear earlier than lower ISP ones).

In [9]:
KEEP_COLS = ['school_code','total_enrolled','total_eligible','isp']

# remove cols not needed for further analysis
drop_cols = [s for s in df.columns.tolist() if s not in set(KEEP_COLS)]
df.drop(drop_cols,axis=1,inplace=True)

In [10]:
df.sort_values('isp',ascending=False,inplace=True)
df.reset_index(inplace=True)
df.drop('index',axis=1,inplace=True)

Compute cumulative ISPs for the entire district.

In [11]:
df = df.assign(cum_isp=np.around((df['total_eligible'].cumsum()/df['total_enrolled'].cumsum()).astype(np.double)*100,2))
df.head(n=3)

Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
0,1000001,37,33,89.19,89.19
1,1000027,24,13,54.17,75.41
2,1000022,366,190,51.91,55.27


###### Binning Strategies  

In [12]:
NUM_ISP_BINS_MAX = 10
NUM_ISP_BINS_MIN = 5
NUM_ISP_BINS_DEFAULT = NUM_ISP_BINS_MAX

In [13]:
%%capture
'''
# generating bins of fixed width
groups = df.groupby(pd.cut(df['isp'], NUM_ISP_BINS_DEFAULT))
grp_counts = pd.DataFrame(groups.size()).rename(columns={0:'count'})
ivals = ['{0:.2f}-{0:.2f}'.format(s.left,s.right) for s in grp_counts.index.values]
size = [round(s.right-s.left,2) for s in grp_counts.index.values]
grp_counts = grp_counts.assign(ival=ivals)
grp_counts = grp_counts.assign(size=size)
grp_counts.reset_index(inplace=True)
grp_counts.drop('isp',axis=1,inplace=True)
grp_counts.T
'''

'''
# generating bins with uniform distribution => variable bin width
def ival_str(x):
    s = '{}-{}'.format(x.min(),x.max())
    return s

def ival_size(x):
    return round(x.max()-x.min(),2)

groups = df.groupby(pd.cut(df.index, NUM_ISP_BINS_DEFAULT,precision=0))
grp_counts = pd.DataFrame(groups.size()).reset_index().drop(['index'],axis=1).rename(columns={0:'count'})
ivals = pd.Series(groups['isp'].agg([('isp',ival_str)])['isp'].values)
size = pd.Series(groups['isp'].agg([('isp',ival_size)])['isp'].values)
grp_counts = grp_counts.assign(ival=ivals)
grp_counts = grp_counts.assign(size=size)
grp_counts.T
'''

##### Binning Schools  
  
We first bin schools based on the combined ISP (i.e.:`cum_isp`) required for CEP eligibility at 100% funding level.

In [14]:
def group_isp(x):
    return round((x.total_eligible.sum()/x.total_enrolled.sum())*100,2)

In [15]:
def group_summary(groups):
    group_df = pd.DataFrame(groups.size()).rename(columns={0:'count'})
    group_df = group_df.assign(grp_isp=groups.apply(group_isp).values)
    group_df = group_df.assign(grp_total_enrolled=groups['total_enrolled'].agg(['sum']).values)
    group_df = group_df.assign(grp_total_eligible=groups['total_eligible'].agg(['sum']).values)
    return group_df

In [16]:
bins = [0.,cfg.max_cep_thold_pct()*100,100.]
groups = df.groupby(pd.cut(df['cum_isp'], bins))
group_df = group_summary(groups)
group_df

Unnamed: 0_level_0,count,grp_isp,grp_total_enrolled,grp_total_eligible
cum_isp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"(0.0, 62.5]",31,32.02,40258,12891
"(62.5, 100.0]",2,75.41,61,46


We then compute the impact of the rest of the schools, each one taken individually, if they were to be brought into the group that meets the 62.5% cutoff needed for 100% funding level. Our aim is to continue to maintain the aggregate ISP of this group at 62.5 (or higher).

In [17]:
def isp_impact(x,dst_grp_total_enrolled,dst_grp_total_eligible,dst_grp_isp):
    
    new_total_enrolled = x['total_enrolled'] + dst_grp_total_enrolled
    new_isp = np.around((((x['total_eligible'] + dst_grp_total_eligible)/new_total_enrolled)*100).astype(np.double),2)        
    x_isp = np.around(((x['total_eligible']/new_total_enrolled)*100).astype(np.double),2)
       
    return pd.DataFrame({'new_isp':new_isp})

ivals = group_df.index.tolist()
dst = group_df.loc[ivals[1],:]
grp = groups.get_group(ivals[0])

tmp_df = isp_impact(grp,dst['grp_total_enrolled'],dst['grp_total_eligible'],dst['grp_isp'])   
tmp_df.T

Unnamed: 0,2,3,4,5,6,7,8,9,10,11,...,23,24,25,26,27,28,29,30,31,32
new_isp,55.27,53.11,53.52,58.33,49.29,46.75,49.69,47.47,47.06,46.89,...,33.02,45.65,23.04,20.98,18.51,17.15,15.77,37.42,14.72,13.65


In [18]:
# Funtion to create a summary dataframe for school group specified as input
def school_group_summary(sch_grp_df):
        summary = sch_grp_df[['total_enrolled','total_eligible']].aggregate(['sum'])        
        summary = summary.assign(grp_isp=round((summary['total_eligible']/summary['total_enrolled'])*100,2))            
        summary = summary.assign(size=sch_grp_df.shape[0])
        grp_isp = summary.loc['sum','grp_isp']
        free_rate = round(grp_isp * 1.6,2) if grp_isp >= (cfg.min_cep_thold_pct()*100) else 0.0
        free_rate = 100. if free_rate > 100. else free_rate
        summary = summary.assign(free_rate=free_rate)
        paid_rate = (100.0 - free_rate)
        summary = summary.assign(paid_rate=paid_rate)
        
        return summary

In [19]:
df_cep_0 = groups.get_group(ivals[1]).apply(list).apply(pd.Series)

idx = tmp_df[tmp_df['new_isp'] >= (cfg.max_cep_thold_pct()*100)].index
if len(idx) > 0:
    schools_to_add = df.loc[idx,:]
    df_cep_0 = pd.concat([df_cep_0, schools_to_add],axis=0)
    
display(HTML('GRP 0'))
display(HTML(school_group_summary(df_cep_0).to_html()))
display(HTML(df_cep_0.to_html()))    

Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,61,46,75.41,2,100.0,0.0


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
0,1000001,37,33,89.19,89.19
1,1000027,24,13,54.17,75.41


With the remaining schools the objective is no longer about getting to 100% funding, rather it has to do with maximizing CEP eligibility while continuing to achieve as high a funding rate as possible (on a per school basis). High ISP schools are still prioritized overall (as before). Except now, multiple school group combinations are generated, some favoring CEP coverage and others favoring a higher funding level for high ISP schools.

In [20]:
# drop schools that are already part of a group
drop_idx = df_cep_0.index
df.drop(drop_idx,axis=0,inplace=True)

In [21]:
df = df.assign(cum_isp=np.around((df['total_eligible'].cumsum()/df['total_enrolled'].cumsum()).astype(np.double)*100,2))
df.head(n=3)

Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
2,1000022,366,190,51.91,51.91
3,1000017,792,407,51.39,51.55
4,1000029,507,258,50.89,51.35


In [22]:
df.tail(n=3)

Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
30,1000018,94,12,12.77,34.63
31,1000023,1712,215,12.56,33.62
32,1000009,3031,376,12.41,32.02


**Note**: With the first two schools removed (as part of the 62.5% grouping) the cumulative ISPs of the remaining schools seems to have been boosted significantly. This need not always be the same but underlines the need to recalculate cumulative ISPs at each step in the iteration.

Moving forward, we group schools from among these such that the group includes schools up until the point that the group's ISP falls by a significant amount. This amount is configurable and is set to the 5% funding level by default. 5% in funding level translates to 3.125% in ISP percentage (since 3.125 * 1.6 = 5). 

In [23]:
# width of ISP percentages allowed per school group
ISP_WIDTH = 3.125

In [24]:
def group_schools(df2):
    
    # recalculate cumulative-isp
    df2 = df2.assign(
        cum_isp=np.around((df2['total_eligible'].cumsum()/df2['total_enrolled'].cumsum()).astype(np.double)*100,2))

    top_isp = df2.iloc[0]['isp']
    # if the top percentage is less than that needed for 
    # CEP eligibility we have nothing more to do
    if top_isp < (cfg.min_cep_thold_pct()*100):
        return None
    
    isp_thold = (top_isp - ISP_WIDTH) if (top_isp-ISP_WIDTH) >= (cfg.min_cep_thold_pct()*100) else (cfg.min_cep_thold_pct()*100)
    grps2 = df2.groupby(pd.cut(df2['cum_isp'], [0.,isp_thold,top_isp]))
    
    return grps2        

In [25]:
def generate_school_groups(df1):

    grpno=1
    sch_grps = []
    top_isp = df1.iloc[0]['isp']
    
    while top_isp >= (cfg.min_cep_thold_pct()*100):
    
        grps1 = group_schools(df1)    
    
        if (grps1 != None):
            ivals = pd.DataFrame(grps1.size()).index.tolist()
            df_cep_grp = grps1.get_group(ivals[-1]) 
            grp_summary = school_group_summary(df_cep_grp)
            display(HTML('GRP {}'.format(grpno)))
            display(HTML(grp_summary.to_html()))
            display(HTML(df_cep_grp.to_html()))
            sch_grps.append(df_cep_grp)
            df1.drop(df_cep_grp.index.tolist(),axis=0,inplace=True)        
            top_isp = df1.iloc[0]['isp']
            grpno += 1

    display(HTML('INELIGIBLE FOR CEP:'))
    display(HTML(school_group_summary(df1).to_html()))
    df1 = df1.assign(
        cum_isp=np.around((df1['total_eligible'].cumsum()/df1['total_enrolled'].cumsum()).astype(np.double)*100,2))
    display(HTML(df1.to_html()))
    sch_grps.append(df1)
    
    return sch_grps

In [26]:
df_copy = df.copy()
school_groups = generate_school_groups(df_copy)

Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,2439,1222,50.1,5,80.16,19.84


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
2,1000022,366,190,51.91,51.91
3,1000017,792,407,51.39,51.55
4,1000029,507,258,50.89,51.35
5,1000020,131,66,50.38,51.28
6,1000025,643,301,46.81,50.1


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,14794,6396,43.23,10,69.17,30.83


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
7,1000028,2649,1221,46.09,46.09
8,2000002,420,193,45.95,46.07
9,1000011,967,442,45.71,45.99
10,1000014,789,354,44.87,45.8
11,1000006,856,384,44.86,45.66
12,1000004,854,378,44.26,45.48
13,1000024,2442,1028,42.1,44.56
14,1000007,2377,990,41.65,43.95
15,1000026,1812,742,40.95,43.54
16,1000005,1628,664,40.79,43.23


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,23025,5273,22.9,16,0.0,100.0


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
17,1000015,1588,634,39.92,39.92
18,2000001,460,182,39.57,39.84
19,2000003,246,96,39.02,39.76
20,1000016,858,329,38.34,39.37
21,1000016,1795,665,37.05,38.53
22,1000012,1016,366,36.02,38.1
23,1000013,2089,664,31.79,36.46
24,1000008,77,17,22.08,36.33
25,1000002,1111,224,20.16,34.38
26,1000003,2332,456,19.55,31.39


With different significance levels one can either adjust for a higher CEP coverage or higher funding for high ISP schools.

In [27]:
ISP_WIDTH = (10/1.6)

df_copy = df.copy()
school_groups = generate_school_groups(df_copy)

Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,11416,5222,45.74,12,73.18,26.82


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
2,1000022,366,190,51.91,51.91
3,1000017,792,407,51.39,51.55
4,1000029,507,258,50.89,51.35
5,1000020,131,66,50.38,51.28
6,1000025,643,301,46.81,50.1
7,1000028,2649,1221,46.09,48.01
8,2000002,420,193,45.95,47.86
9,1000011,967,442,45.71,47.54
10,1000014,789,354,44.87,47.25
11,1000006,856,384,44.86,47.0


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,8969,3637,40.55,7,64.88,35.12


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
14,1000007,2377,990,41.65,41.65
15,1000026,1812,742,40.95,41.35
16,1000005,1628,664,40.79,41.19
17,1000015,1588,634,39.92,40.92
18,2000001,460,182,39.57,40.84
19,2000003,246,96,39.02,40.78
20,1000016,858,329,38.34,40.55


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,19873,4032,20.29,12,0.0,100.0


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
21,1000016,1795,665,37.05,37.05
22,1000012,1016,366,36.02,36.68
23,1000013,2089,664,31.79,34.59
24,1000008,77,17,22.08,34.4
25,1000002,1111,224,20.16,31.8
26,1000003,2332,456,19.55,28.41
27,1000021,2403,410,17.06,25.89
28,1000019,2505,394,15.73,23.98
29,1000010,1708,233,13.64,22.81
30,1000018,94,12,12.77,22.74


In [28]:
ISP_WIDTH = (20/1.6)

df_copy = df.copy()
school_groups = generate_school_groups(df_copy)

Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,26473,10795,40.78,24,65.25,34.75


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
2,1000022,366,190,51.91,51.91
3,1000017,792,407,51.39,51.55
4,1000029,507,258,50.89,51.35
5,1000020,131,66,50.38,51.28
6,1000025,643,301,46.81,50.1
7,1000028,2649,1221,46.09,48.01
8,2000002,420,193,45.95,47.86
9,1000011,967,442,45.71,47.54
10,1000014,789,354,44.87,47.25
11,1000006,856,384,44.86,47.0


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,13785,2096,15.2,7,0.0,100.0


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
26,1000003,2332,456,19.55,19.55
27,1000021,2403,410,17.06,18.29
28,1000019,2505,394,15.73,17.4
29,1000010,1708,233,13.64,16.69
30,1000018,94,12,12.77,16.64
31,1000023,1712,215,12.56,15.99
32,1000009,3031,376,12.41,15.2


In [29]:
ISP_WIDTH = (1/1.6)

df_copy = df.copy()
school_groups = generate_school_groups(df_copy)

Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,1665,855,51.35,3,82.16,17.84


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
2,1000022,366,190,51.91,51.91
3,1000017,792,407,51.39,51.55
4,1000029,507,258,50.89,51.35


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,131,66,50.38,1,80.61,19.39


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
5,1000020,131,66,50.38,50.38


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,3712,1715,46.2,3,73.92,26.08


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
6,1000025,643,301,46.81,46.81
7,1000028,2649,1221,46.09,46.23
8,2000002,420,193,45.95,46.2


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,2612,1180,45.18,3,72.29,27.71


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
9,1000011,967,442,45.71,45.71
10,1000014,789,354,44.87,45.33
11,1000006,856,384,44.86,45.18


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,854,378,44.26,1,70.82,29.18


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
12,1000004,854,378,44.26,44.26


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,6631,2760,41.62,3,66.59,33.41


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
13,1000024,2442,1028,42.1,42.1
14,1000007,2377,990,41.65,41.88
15,1000026,1812,742,40.95,41.62


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,3922,1576,40.18,4,64.29,35.71


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
16,1000005,1628,664,40.79,40.79
17,1000015,1588,634,39.92,40.36
18,2000001,460,182,39.57,40.26
19,2000003,246,96,39.02,40.18


Unnamed: 0,total_enrolled,total_eligible,grp_isp,size,free_rate,paid_rate
sum,20731,4361,21.04,13,0.0,100.0


Unnamed: 0,school_code,total_enrolled,total_eligible,isp,cum_isp
20,1000016,858,329,38.34,38.34
21,1000016,1795,665,37.05,37.47
22,1000012,1016,366,36.02,37.07
23,1000013,2089,664,31.79,35.15
24,1000008,77,17,22.08,34.98
25,1000002,1111,224,20.16,32.61
26,1000003,2332,456,19.55,29.33
27,1000021,2403,410,17.06,26.8
28,1000019,2505,394,15.73,24.85
29,1000010,1708,233,13.64,23.64


##### ISP Width
  
The ISP width, which we've been using to determine the granularity of the generated groups (those with less than 100% funding), results in a trade off between the percentage of schools eligible for CEP and the funding level of high ISP schools. And the two have something of an inverse relationship. In other words the **more schools** we tack onto a single group so as to get them to enrol in CEP, the lower the group ISP and consequently **lower the funding level**. We can adjust this by creating many smaller groups with higher ISP as we did when `ISP_WIDTH` was set to 1 and 5 (as opposed to 10 and 20). When `ISP_WIDTH` was set to 10 and 20 only 12 and 7 schools, respectively, did not qualify for CEP; when set to 0.625 and 3.125, 13 schools were ineligible each time.  
  
Essentially, we control the group size by controlling the range of the CEP funding level for each school group - this is denoted by *CEP Funding Step Size* in the below equation. The larger the step size, larger is the group size and lower is the group ISP and consequently the funding level for the group.
  
> *__ISP Width__* = *CEP Funding Step Size* / *1.6*    