In [None]:
import pandas as pd
import numpy as np

import sys
sys.path.append('../')

# Background

The Stanford research team constructed a new diversity index defined by district staff. The diversity index is defined for each census block group, and is an average of 4 scores: a FRL score, a neighborhood SES score, an academic score, and AALPI score (African American, Latinx and Pacific Islander)

In [None]:
from src.d01_data.student_data_api import StudentDataApi, _block_features, _census_block_column

student_data_api = StudentDataApi()
df_students = student_data_api.get_data().set_index('studentno')
np.random.seed(1992)
# np.random.seed(2021)
studentno = np.random.choice(df_students.index)

census_blockgroup = df_students.loc[studentno, _census_block_column]
census_block = df_students.loc[studentno, 'census_block']
print(census_block)
df_students.loc[studentno, _block_features]
# df_students.loc[studentno]

In [None]:
from src.d01_data.block_data_api import BlockDataApi
block_data_api = BlockDataApi()
df1 = block_data_api.get_data(sfha=False).set_index('Block')

In [None]:
# df2 = block_data_api.get_data(True).set_index('Block')
# print(df1.loc[int(census_block)].reset_index().to_string())
block_acs_metrics = df1.loc[int(census_block), :].copy()
print(block_acs_metrics.reset_index().to_string())

In [None]:
print(block_acs_metrics.index[89:97].to_list())

## FRL score

The FRL score measures the percentage of students in block $b\in B$ eligible for free or reduced
price lunch ($FRL\%(b)$) as given by SFUSD Student Nutrition Services, normalized by the maximum
percentage over all blocks

$$FRLScore(b) = \frac{FRL\%(b)}{\underset{b\in B}{\max} FRL\%(b)}$$


In [None]:
# Where can we find this raw data?

# FRLxEthncity SY16-SY19 - FRL data by block averaged over 4 years, broken down by ethnicity 
# (appears to be racex categorization, not resolved_ethnicity)

## SES score

The neighborhood socioeconomic status score ($SESScore(b)$) uses data from the American Community
Survey 5-year estimates 2013-17, including median household income in the block ($HHInc(b)$),
poverty level ($Pov\%(b)$), and adult educational attainment ($BachDeg\%(b)$), as measured by the
percentage of residents 25 years of age or older in the block who have a bachelors degree.

We can define the SES Index

$$SESMetric(b) = 1 - \frac{HHInc(b)}{\underset{b'\in B}{\max} HHInc(b')} + \frac{Pov\%(b)}{\underset{b'\in B}{\max} Pov\%(b')} + 1 - \frac{BachDeg\%(b)}{\underset{b'\in B}{\max} BachDeg\%(b')}$$

$$SESScore(b) = \frac{SESMetric(b)}{\underset{b'\in B}{\max} SESMetric(b')}$$


In [None]:
block_ses = block_data_api.get_ses_score()

expected = df_students.loc[studentno, 'Nhood SES Score']
result = block_ses.loc[int(census_blockgroup), 'score']
assert abs(expected - result) < 1e-6, "%.6f <> %.6f" % (expected, result)

## Academic Score

The block group academic score ($AcademicScore(b)$) measures the percentage of students with
level 1 test scores,19 normalized by the maximum percentage over all blocks

$$AcademicScore(b) = \frac{L1\%(b)}{\underset{b'\in B}{\max} L1\%(b')}$$


In [None]:
block_academics = block_data_api.get_academic_score()

expected = df_students.loc[studentno, 'Academic Score']
result = block_academics.loc[int(census_blockgroup), 'score']
assert abs(expected - result) < 1e-6, "%.6f <> %.6f" % (expected, result)

## AALPI Score

The AALPI score measures the percentage of students from the historically underserved ethnic
groups of African American, Latino, and Pacific Islander students,

$$AALPIScore(b) = \frac{AALPI\%(b)}{\underset{b'\in B}{\max} AALPI\%(b')}$$


In [None]:
block_aalpi = block_data_api.get_aalpi_score()
expected = df_students.loc[studentno, 'AALPI Score']
result = block_aalpi.loc[int(census_blockgroup), 'score']
assert abs(expected - result) < 1e-6, "%.6f <> %.6f" % (expected, result)

## SES Index

The SES index uses only the socioeconomic and free and reduced price lunch
components of the diversity index, and is de ned as follows

$$SESIndex(b) = \frac{FRLScore(b) + SESMetric(b)}{4}$$
