In [None]:
import numpy as np
import sys
sys.path.append('../')

from src.d04_modeling.propositional_classifier import PropositionalClassifier, andClassifier, orClassifier
from src.d06_reporting.sample_evaluation import SampleEvaluation

# 1. Methodology

We propose that the tiebreaker is a classifier on the set of geopgraphic blocks. Each block is identified by its geoid. The general architecture of the tiebreaker is shown in the following diagram:

![General Tiebreaker](images/methodology-general_tiebreaker.png)

The classification is divided in two steps, eligibility and classification. Each of them has a different rationale and function:

1. The eligibility step selects blocks which should be considered for the tiebreaker. This is a qualitative step, that functions to consider non-demographic block features and align the tiebreaker with greater policy goals. These criteria may not depend on the definition of focal students, so that they would not be satisfied by an "optimal" tiebreaker necessairily. For example, in this step we can consider criteria such as:
    
    a. Is the block in a neighborhood with low access to resources?
    
    b. Is the block in a previously red-lined zone?
    
    c. Is the block in a neighborhood with high percentage of focal students?
    
    d. Does the block contains a federal public housing project?
    

2. The qualification step optimizes the block selection by maximizing the number of focal students receiving a tiebreaker while minimizing the number of non-focal students receiving a tiebreaker. This is a mathematical step, and it crucially depends on the definition of focal students. Only eligible blocks are considered.

### Q: Why do we need both of these criteria?

Both of these steps are, in our view, necessary.

The eligibility step ensures that focal students with greater access to resources do not receive the same type of priority as others and allows for the incorporation of different criteria (such as public housing) that are in the policy text but are not in the definition of focal students directly.

The qualification criteria provides some mathematical optimality guarantee. Besides, it reduces the gameability of the system: the eligibility step is deterministic, and often based on criteria that do not depend on demographic changes. A non-focal family who wants to take advantage of the tiebreaker, aware of the eligibility criteria, could move to a block that satisfies those criteria; if there was no qualification step then they would certainly receive the tiebreaker. However with the qualification step the percentage of focal students in a block is recomputed (hopefully annually or biannually), so that displacement could cause a block to lose its tiebreaker.

### Q: Where can I see more of your thought process/explanation of the tiebreaker?

Our development of the tiebreaker is reflected on notebooks

1. and-or-classifier (this notebooke is very helpful to understand tiebreaker initialization)
2. evaluate-models
3. neighborhood-proxy
4. eligibility

## 1.1. Data

We propose that the tiebreaker uses the intersection students (both AALPI and FRL) as the focal group.

In [None]:
#'Focal' ---> students that are either FRL or  AALPI      (~64%)

#'AALPI' ---> students that are        AALPI              (~36%)
#'FRL'   ---> students that are        FRL                (~52%)
#'AA'    ---> students that are        African-American   (~6%)

#'Both'  ---> students that are both   FRL and AALPI      (~24%)
#'AAFRL' ---> students that are both   FRL and AA         (~4%)

proposed_focal_group = 'Both'  

#RMK: We strongly encourage you to try out a narrower definition of focal group. AAFRL for example.

We use grades TK-5 data:

In [None]:
#'tk5'   --->  data for TK-5  students
#'tk12'  --->  data for TK-12 students

proposed_grades_range = 'tk5'

## 1.2 Proposed Equity Tiebreaker

Our equity tiebreaker has two eligibility criteria. A block satisfying EITHER of those is eligible:

1. The block contains a federal (non-senior) public housing project
2. The block is in a previously red-lined zone

![Proposed Tiebreaker](images/methodology-proposed_tiebreaker.png)

We chose those criteria to be coherent with the policy text:

“The equity tiebreaker will be applied to applicants who either reside in Federal public housing or in historically underserved areas of San Francisco.”

In [None]:
proposed_eligibility_classifier = orClassifier(["Housing", "Redline"], binary_var=[0,1])
proposed_equity_tiebreaker = andClassifier(["pct"+proposed_focal_group], positive_group= "n"+proposed_focal_group,
                                           frl_key=proposed_grades_range,
                                           eligibility_classifier=proposed_eligibility_classifier)

All points in the following curve (knows as the ROC curve) are optimal:

In [None]:
proposed_equity_tiebreaker.plot_roc(np.linspace(0.,1., 100))

This curve illustrates the trade-off between false positives (non-focal students receiving the tiebreaker) and true positives (focal students receiving the tiebreaker). Note that we can achieve a true positive rate of around 63% at most. That is, only 63% of the focal students live in a previously red-lined zone or in a block with public housing. This illustrates how the eligibility criteria serves to bring the focal student definition back to the more strict criteria delimited in the policy text, as we are not targeting focal students outside of the eligible areas.

The equity tiebreaker requires then a threshold. This threshold is the percentage of focal students necessary to qualify a block.

In [None]:
#Test several values. Our recomendations depend on the focal group. Some examples:

#'Focal' ---> ~70% (0.70)
#'Both'  ---> ~35% (0.35)
#'AAFRL' ---> ~5%  (0.05)

proposed_threshold = 0.35

We can view the result on the map (yellow blocks receive the tiebreaker):

In [None]:
proposed_equity_tiebreaker.plot_map(proposed_threshold)

## 1.3. Benchmark Tiebreaker

To make sure the eligibility criteria are not too restrictive, we would like to compare the performance of our proposed tiebreaker against a benchmark tiebreaker. In the benchmark tiebreaker all blocks are eligible. Then the tiebreaker will simply optimize block classification based on the percentage of focal students.

In [None]:
proposed_benchmark_tiebreaker = andClassifier(["pct"+proposed_focal_group], positive_group="n"+proposed_focal_group)

In [None]:
proposed_benchmark_tiebreaker.plot_roc(np.linspace(0.,1., 100))

In [None]:
proposed_benchmark_tiebreaker.plot_map(proposed_threshold)

In [None]:
proposed_benchmark_tiebreaker.fpr(proposed_threshold)