# ISF 189 Lab

Welcome to lab! Please read this lab in its entiretey as the analysis will make a lot more sense with the background context provided.

This lab is intended to be a hands-on introduction to data science as it can be applied to SES (socio-economic status). 

SES is commonly measured in variables of salary, parental education, occupation status, and other factors. Historically, income has been treated as the defining factor of SES, and there have been many research studies that have been conducted that show that SES as defined by income is the leading factor for school performance in test scores.

Today we are going to suggest using real-world data that this may not be the case. We are going to give you the chance to explore and toggle between different variables to define SES and how much importance to assign to each variable. Data science is not just about plugging in the numbers, it is about making many decisions and backing up those decisions with quantitative reasoning. This can have implications later in how the world treats SES and public policy suggestions.

In addition, we are going to suggest that the defining variable for SES might not even be income, but rather family wealth in providing a more accurate prediction for school performance. 

There is no standard way of measuring SES. In this lab, we are going to define SES in the following ways:

1. SES is a positive number that represents the ratio:    $individual SES / median SES$ where 1 means that this individual is exactly at the median SES. This number will always be positive because we are assuming that there will be no cases of negative wealth.

2. individual SES is a number such that $p_i * income score + p_e * years of education$ where $p_i$ is the importance we assign to income on a scale of 0 to 1 and $p_e$ is the importance we assign to years of education on a scale of 0 to 1. Throughout this lab we will be changing the values of $p_i$ and $p_e$ but it is important to remember that these values *must add up to 1*. Our default values at first are $p_i$ = 0.8 and $p_e$ = 0.2.

3. median SES is calculated based on the value that median income in America is 51,939 dollars based on the U.S. Census Bureau of 2013 and median years of education is #FILL IN VALUE based on #CITE SOURCE

4. income score is a score assigned to each income bracket to ensure that all our factors are on the same scale. Our income scores are defined as the following:

<table>
  <tr>
    <th>Income Score</th>
    <th>Income Bracket (\$) </th>
  </tr>
  <tr>
    <td>1</td>
    <td>less than \$10,000</td>
  </tr>
 <tr>
    <td>2</td>
    <td>\$15,000 - \$19,999</td>
  </tr>
  <tr>
    <td>3</td>
    <td>\$20,000 - \$24,999</td>
  </tr>
  <tr>
    <td>4</td>
    <td>\$25,000 - \$29,999</td>
  </tr>
  <tr>
    <td>5</td>
    <td>\$30,000 - \$34,999</td>
  </tr>
    <tr>
    <td>6</td>
    <td>\$35,000 - \$39,999</td>
  </tr>
    <tr>
    <td>7</td>
    <td>\$40,000 - \$44,999</td>
  </tr>
    <tr>
    <td>8</td>
    <td>\$45,000 - \$49,999</td>
  </tr>
    <tr>
    <td>9</td>
    <td>\$50,000 - \$59,999</td>
  </tr>
   <tr>
    <td>10</td>
    <td>\$60,000 - \$74,999</td>
  </tr> 
   <tr>
    <td>11</td>
    <td>\$75,000 - \$99,999</td>
  </tr>   
   <tr>
    <td>12</td>
    <td>\$100,000 - \$124,999</td>
  </tr> 
   <tr>
    <td>13</td>
    <td>\$125,000 - \$149,999</td>
  </tr> 
   <tr>
    <td>14</td>
    <td>\$150,000 - \$199,999</td>
  </tr>
   <tr>
    <td>15</td>
    <td>\$200,000 or more</td>
  </tr> 
</table>

As you work through the lab, there will be lab assistants in the room to answer any of your questions. If you get stuck at any point, feel free to ask a neighbor or one of the lab assistants for help.

## What This Lab Will Cover
1. Running Jupyter Notebooks
2. Data Analysis
3. Interpretation

## What you need to do:
* Read the content, complete the questions 
* Analyze the data
* Submit the assignment

# 1. Running Jupyter Notebooks

You are currently working in a Jupyter Notebook. A Notebook allows text and code to be combined into one document. Each rectangular section of a notebook is called a "cell." There are two types of cells in this notebook: text cells and code cells. 

Jupyter allows you to run simulations and regressions in real time. To do this, select a code cell, and click the "run cell" button at the top that looks like ▶| to confirm any changes. Alternatively, you can hold down the `shift` key and then press `return` or `enter`.

In the following simulations, anytime you see `In [ ]` you should click the "run cell" button to see output.

## 1.1 Importing Modules

First we need to import some modules so that we can call the functions from within. We are going to use these functions to manipulate data tables and conduct math operations. Run the code cell below to import these modules.

In [23]:
from datascience import *
import numpy as np
%matplotlib inline

## 1.2 Reading Data
In order to examine the how the weighting of these variables affects SES, we are going to examine datasets from the 5-year American Community Survey (ACS). The American Community Survey is an ongoing statistical survey by the U.S. Census Bureau. In this table, we have consolidated household income (inflation adjusted dollar) and years of education (educational attainment), and Food Stamps/Supplemental Nutrition Assistance Program (SNAP) status data for 4 different ethnicities. Run the code cell below to see what the data looks like.
<br>Source: https://factfinder.census.gov

In [25]:
educ = Table.read_table('data/totaleduc.csv')
educbyrace = Table.read_table('data/educbyrace.csv')
income = Table.read_table('data/totalincome.csv')
incomebyrace = Table.read_table('data/incomebyrace.csv')

In [26]:
educbyrace

Educational Attainment,Total Estimate (African American),Total Estimate (Asian population),Total Estimate (Hispanic/Latino),Total Estimate (White)
Total,25578326,12076055,31653207,163862749
Less than 9th grade,1108681,991370,6381512,7372882
"9th to 12th grade, no diploma",2795627,635932,4381659,5017591
Regular high school diploma,6866990,1705157,7448990,9154147
GED or alternative credential,1195082,164285,1289000,1453285
"Some college, no degree",6335160,1466673,5575654,7042327
Associate's degree,2101773,795105,1899028,2694133
Bachelor's degree,3266913,3622202,3222634,6844836
Graduate or professional degree,1908100,2695331,1454730,4150061


# 2. Data Analysis

## 2.1 SES with Income

Now we are going to plot a linear regression line for the data points. Recall that a linear regression line is a line with the minimum aggregate distance between the y-value of the data point and the y-value of the line at that x-value. We compute the r-value as a measure of correlation. An r-value of 1 means that the data is perfectly positively linearly correlated, and an r-value of -1 means that the data is perfectly negatively linearly correlated.

Run the code cell below to plot the regression line and compute the r-value.

In [None]:
#take in cell data
#plot data points on graph
#run regression line
#compute r-value

In the plot above, we defined SES with $p_i$ = 0.8 and $p_e$ = 0.2. Try toggling the slider below to change up the values and find the maximum r-value. Remember that $p_i$ + $p_e$ = 1.

In [None]:
#slider here
#as values change, the regression line should change <-- need to double check to make sure that this is actually going to happen
#based on how we defined SES and p_i and p_e

Enter in your values of $p_i$ and $p_e$ and the maximum r-value that you were able to find below with $p_i$ as the first value and r as the last value.

Ex.
0.8, 0.2, 0.9999

## 2.2 SES with Wealth

Now we are going to do the same thing with family wealth and see which SES definition serves as a better indicator for school performance.

Run the code below to generate a regression line and a default r-value.

In [None]:
#generate regression with wealth on x axis
#compute r value

Toggle $p_w$ where $p_w$ represents the importance of wealth in our definition of SES to find the maximum r-value. Recall that $p_w$ + $p_e$ = 1.

In [None]:
#Toggle thing should change regression line and r2-value

Enter in your values for $p_w$, $p_e$, and $r$.

Did income or wealth give you a higher r-value? Write a paragraph explaining your results and possible reasons for this result.

Write up for each of the following:

1. What are some other factors that can influence SES. How would you measure them?

2. What are some other aspects other than test scores and school performance that can be influenced by SES. How would you measure them?

## 2.3 Income Inequality benchmarks- the Lorenz Curve & Gini Coefficent
We can depict the concentration of wealth using two interlinked methods: the Lorenz Curve and the Gini Coefficient.  The Lorenz curve is a graphical illustration of this and the Gini Coefficient is a numeric representation (a ratio).  
The Lorenz curve lines up the population from poorest to richest on the x axis, and then the y-axis graphs cumulative income of all the people poorer than the one on the x axis.  To make these comparable across countries, these are translated into percentages.  
(0,0) and (100,100) are always the end points of the curve: 0% of the people own nothing while 100% of the people own 100% of the wealth.  

Let's line up our individual income earners to look at inequality just among income earners.  Let's start with a fresh table, with the column of the income.  Let's put the table of incomes in order from lowest to highest.  Then let's make a new column, that adds up all incomes above the row.  Here's an example:

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh">Income</th>
    <th class="tg-baqh">Cumulative Income</th>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">3</td>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">6</td>
  </tr>
  <tr>
    <td class="tg-baqh">4</td>
    <td class="tg-baqh">10</td>
  </tr>
  <tr>
    <td class="tg-baqh">7</td>
    <td class="tg-baqh">17</td>
  </tr>
  <tr>
    <td class="tg-baqh">8</td>
    <td class="tg-baqh">26</td>
  </tr>
</table>

We want to calculate the household income per capita and append it our data table. In order to do that, we will have to calculate the total household income and total number of people for each household. 
We will also do this for households that do not have income earners.

In [None]:
# DRAFT, depending on data


# First, add a column 'Household Size' to your data table, 
# which has the number of people in the household for each row
# hint: you may have this in the household composition lab (Lab 4)
# Next, using your income_earners table, group by Household ID and find
# the sum of incomes for each household. Turn that into a table with two
# columns, 'Household ID' and 'Household Income'
# Now combine the two tables using the join_keeping_all_main_table_rows
# function, and use these values for your arguments:
# main_table = your data table
# joined = your table with household incomes
# fill_in_value = 0 (households without any income earners have zero dollars)
# combining_column_label = 'Household ID'
# Finally, you will create a new column, 'Household Income per Capita,' which is 
# equal to 'Household Income' divided by 'Household Size'

In [None]:
# First, you are going to want to get your incomes in order
# this should be a list or an array
ordered_incomes = 

In [None]:
# This for loop creates a list called cumulative_step, which stores the values for the 'Cumulative Income' column
total = 0
cumulative_step = []
for x in ordered_incomes:
    total = x + total
    cumulative_step.append(total)

In [None]:
lorenz_table = Table().with_columns(
    ['Income', ordered_incomes],
    ['Cumulative Income', cumulative_step])

Now we will need to turn these into percentages.  Go to the last row of the table.  This will tell you the total amount of income in the economy.  We will divide the Cumulative Income column by this number.  In the example, that is 25.  Although my table shows the percent sign, you do not have to include the percent sign in your table.  In fact, there will be one less step for graphing if you do not include the percent sign.

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh">Income</th>
    <th class="tg-baqh">Cumulative Income</th>
    <th class="tg-yw4l">Percentage Income</th>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">3</td>
    <td class="tg-yw4l">12%</td>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">6</td>
    <td class="tg-yw4l">24%</td>
  </tr>
  <tr>
    <td class="tg-baqh">4</td>
    <td class="tg-baqh">10</td>
    <td class="tg-yw4l">40%</td>
  </tr>
  <tr>
    <td class="tg-baqh">7</td>
    <td class="tg-baqh">17</td>
    <td class="tg-yw4l">68%</td>
  </tr>
  <tr>
    <td class="tg-baqh">8</td>
    <td class="tg-baqh">25</td>
    <td class="tg-yw4l">100%</td>
  </tr>
</table>


In [None]:
# This for loop creates a list called percents, which stores the values for the 'Percentage Income' column
# Define total_income, which is the sum of all of the incomes in the table
total_income = 
percents = []
for x in cumulative_step:
    percent = x / total_income
    percents.append(percent)

In [None]:
# Add 'percents' to 'lorenz_table'
lorenz_table = 

Now we need to put the population in terms of percentage.  Add a new column that counts the income earners.  As in this table, there may be some people that have the same income.  It does not matter which order these go in.  Again, go to the last row and divide the count column by the total number of income earners.

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh">Income</th>
    <th class="tg-baqh">Cumulative Income</th>
    <th class="tg-yw4l">Percentage Income</th>
    <th class="tg-yw4l">Count</th>
    <th class="tg-yw4l">Percentage Population</th>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">3</td>
    <td class="tg-yw4l">12%</td>
    <td class="tg-yw4l">1</td>
    <td class="tg-yw4l">20%</td>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">6</td>
    <td class="tg-yw4l">24%</td>
    <td class="tg-yw4l">2</td>
    <td class="tg-yw4l">40%</td>
  </tr>
  <tr>
    <td class="tg-baqh">4</td>
    <td class="tg-baqh">10</td>
    <td class="tg-yw4l">40%</td>
    <td class="tg-yw4l">3</td>
    <td class="tg-yw4l">60%</td>
  </tr>
  <tr>
    <td class="tg-baqh">7</td>
    <td class="tg-baqh">17</td>
    <td class="tg-yw4l">68%</td>
    <td class="tg-yw4l">4</td>
    <td class="tg-yw4l">80%</td>
  </tr>
  <tr>
    <td class="tg-baqh">8</td>
    <td class="tg-baqh">26</td>
    <td class="tg-yw4l">100%</td>
    <td class="tg-yw4l">5</td>
    <td class="tg-yw4l">100%</td>
  </tr>
</table>

In [None]:
# This for loop creates two lists, spot and pop_percent, which
# contain the cumulative number of people and the cumulative population percentage
# Define total_pop, which the number of people in this table
total_pop = 
count = 0
spot = []
pop_percent = []
for i in ordered_incomes:
    count += 1
    spot.append(count)
    pop_percent.append(count / total_pop)

We are missing the 0,0 row at the begining.  Add this row in.  You have not been using percentage signs in your table, so you don't need to add them in for the these zeros either.

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh">Income</th>
    <th class="tg-baqh">Cumulative Income</th>
    <th class="tg-yw4l">Percentage Income</th>
    <th class="tg-yw4l">Count</th>
    <th class="tg-yw4l">Percentage Population</th>
  </tr>
  <tr>
    <td class="tg-yw4l"></td>
    <td class="tg-yw4l"></td>
    <td class="tg-yw4l">0%</td>
    <td class="tg-yw4l"></td>
    <td class="tg-yw4l">0%</td>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">3</td>
    <td class="tg-yw4l">12%</td>
    <td class="tg-yw4l">1</td>
    <td class="tg-yw4l">20%</td>
  </tr>
  <tr>
    <td class="tg-baqh">3</td>
    <td class="tg-baqh">6</td>
    <td class="tg-yw4l">24%</td>
    <td class="tg-yw4l">2</td>
    <td class="tg-yw4l">40%</td>
  </tr>
  <tr>
    <td class="tg-baqh">4</td>
    <td class="tg-baqh">10</td>
    <td class="tg-yw4l">40%</td>
    <td class="tg-yw4l">3</td>
    <td class="tg-yw4l">60%</td>
  </tr>
  <tr>
    <td class="tg-baqh">7</td>
    <td class="tg-baqh">17</td>
    <td class="tg-yw4l">68%</td>
    <td class="tg-yw4l">4</td>
    <td class="tg-yw4l">80%</td>
  </tr>
  <tr>
    <td class="tg-baqh">8</td>
    <td class="tg-baqh">26</td>
    <td class="tg-yw4l">100%</td>
    <td class="tg-yw4l">5</td>
    <td class="tg-yw4l">100%</td>
  </tr>
</table>

You are ready to graph the Lorenz curve!

In [None]:
# Because we will only be using the columns 'Population Income' and 'Percentage Population,'
# it does not matter what you fill in for the other column values of the inserted row

Make a line graph with percentage population on the x axis and percentage income on the y axis.

In [None]:
x_values = lorenz_table['Percentage Population']
y_values = lorenz_table['Percentage Income']

plots.xlabel('Income Percentage')
plots.ylabel('Population Percentage')
plots.title('Lorenz Curve')

plots.scatter(x_values,y_values)

## Calculate the Gini coefficient:
The Gini coefficient is the area between the line and the 45% line.  It is the sum of the differences between Percent Population and Percent Income divided by the sum of Percent Population.  This number is between 0 and 1.  Consider if a smaller number represents more inequality and what the relationship of the formula to the graph is. The higher the Gini-coefficient is, the more unequal is the distribution of the thing being distributed across the population in question.  

$$ \frac{\Sigma_{i=1}^{N} (\% Pop_i - \% Inc_i)}{\Sigma_{i=1}^{N} \% Pop_i}$$

Note that if there were perfect equality, $\% Pop_i = \% Inc_i$.

# Initial SES Comparison

We are going to analyze SES in relation to race, first by determining the ratio of how the average SES (income) for a specific race (e.g. African American) compares in relation to the average SES (income) for all US citizens.

Then we are going to find the same ratio again, but this time with average African American SES (wealth) in relation to average SES (wealth) for all US citizens.

This may give an indication of how including wealth as a factor can dramatically change SES index values.

Henceforth, all references to SES will refer to the standard income definition of SES.

Then we are going to change the weightings used in SES and see how that impacts our SES index and performance ranking.

First we are going to determine the ratio:

$average SES for African Americans / average SES for all US citizens$

We are going to look specifically at 2 variables: income and years of education. 
The following table displays information for income for African-Americans.

In [41]:
aaincome = incomebyrace.select(0,1)
aaincome.show()

Income Bracket ($),Total Estimate (African American)
Total,14186983
"less than 10,000",2055646
"10,000 - 14,999",1200275
"15,000 - 19,999",1036925
"20,000 - 24,999",963763
"25,000 - 29,999",871472
"30,000 - 34,999",852511
"35,000 - 39,999",733425
"40,000 - 44,999",685742
"45,000 - 49,999",574233


In [38]:
#median income for African Americans: $37,500

7093491.5

The following table displays the same information regarding income for all US citizens in aggregate: 

In [34]:
income.show()
#median income for all US citizens: $55,000 

Income Bracket ($),Total Estimate
Total,116926305
"less than 10,000",8421482
"10,000 - 14,999",6161477
"15,000 - 19,999",6139644
"20,000 - 24,999",6227524
"25,000 - 29,999",5852048
"30,000 - 34,999",5951926
"35,000 - 39,999",5436927
"40,000 - 44,999",5428388
"45,000 - 49,999",4807116


In [39]:
educ.show()

Educational Attainment,Male,Female,Total Estimate,Cumulative Population
Total,102007511,109455011,211462522,211462522
No schooling completed,1433002,1541446,2974448,2974448
Nursery to 4th grade,828764,844074,1672838,4647286
5th and 6th grade,1744550,1706996,3451546,8098832
7th and 8th grade,1997412,1997625,3995037,12093869
9th grade,1814562,1682023,3496585,15590454
10th grade,2120597,2026065,4146662,19737116
11th grade,2391837,2306193,4698030,24435146
"12th grade, no diploma",1991470,1802478,3793948,28229094
High school graduate (includes equivalency),28952834,29769694,58722528,86951622


In [28]:
# Calculate the median years of education for the total US population
toteduc = int(educ.column('Total Estimate')[0].replace(',',''))
medi = toteduc/2
medi

105731261.0

Which population bracket does this fall under?

If we wanted to roughly quanitfy  median years of education it falls between "Some college, less than 1 year" and "Some college, 1 or more years, no degree" is approximately 12.5


In [1]:
#Assign this to the variable 'mededuc'
mededuc = 12.5
# Repeat for AA

We have already crunched some numbers for you, to get the following data:

median income for African Americans: $37,500
<br>
median years of school for African Americans: 13 years of school

median income for all US citizens: $55,000
<br>
median years of school for all US citizens: 12.5 years of school

Source: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_B15002&prodType=table

From here we're going to use weightings of $p_i$ = 0.8 and $p_e$ 0.2 to calculate our SES. Recall that our formula for SES is:

$p_i * income score + p_e * years of education$


Recall the income score chart:
<table>
  <tr>
    <th>Income Score</th>
    <th>Income Bracket (\$) </th>
  </tr>
  <tr>
    <td>1</td>
    <td>less than \$10,000</td>
  </tr>
 <tr>
    <td>2</td>
    <td>\$15,000 - \$19,999</td>
  </tr>
  <tr>
    <td>3</td>
    <td>\$20,000 - \$24,999</td>
  </tr>
  <tr>
    <td>4</td>
    <td>\$25,000 - \$29,999</td>
  </tr>
  <tr>
    <td>5</td>
    <td>\$30,000 - \$34,999</td>
  </tr>
    <tr>
    <td>6</td>
    <td>\$35,000 - \$39,999</td>
  </tr>
    <tr>
    <td>7</td>
    <td>\$40,000 - \$44,999</td>
  </tr>
    <tr>
    <td>8</td>
    <td>\$45,000 - \$49,999</td>
  </tr>
    <tr>
    <td>9</td>
    <td>\$50,000 - \$59,999</td>
  </tr>
   <tr>
    <td>10</td>
    <td>\$60,000 - \$74,999</td>
  </tr> 
   <tr>
    <td>11</td>
    <td>\$75,000 - \$99,999</td>
  </tr>   
   <tr>
    <td>12</td>
    <td>\$100,000 - \$124,999</td>
  </tr> 
   <tr>
    <td>13</td>
    <td>\$125,000 - \$149,999</td>
  </tr> 
   <tr>
    <td>14</td>
    <td>\$150,000 - \$199,999</td>
  </tr>
   <tr>
    <td>15</td>
    <td>\$200,000 or more</td>
  </tr> 
</table>

Using the information above, find the SES for the median African American. Enter this value in the cell below.

Find the SES for the median US citizen. Enter this value in the cell below.

Find SES (income). Recall the formula for SES (income):

$p_i * income score + p_e * years of education$

We will now calculate SES (wealth). In this lab, we measure the level of wealth for a race as the portion of citizens who use the Food Stamp / SNAP Program. This is a table for the portion of SNAP providees for all US citizens.

In [None]:
	Estimate	Margin of Error
Total:                                                          	116,926,305	+/-226,951
Household received Food Stamps/SNAP in the past 12 months:      	15,399,651	+/-26,351
Income in the past 12 months below poverty level                	7,892,966	+/-21,830
Income in the past 12 months at or above poverty level          	7,506,685	+/-19,400
Household did not receive Food Stamps/SNAP in the past 12 months:	101,526,654	+/-241,611
Income in the past 12 months below poverty level                	8,918,629	+/-20,355
Income in the past 12 months at or above poverty level          	92,608,025	+/-253,203
		
		
portion of population who receive food stamps for all US citizens	0.131703905	

Source: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_B22003&prodType=table

The table below shows the portion of the population who receives food stamps for African Americans.

In [None]:
	United States	
	Estimate	Margin of Error
Total:                                                          	14,186,983	+/-27,959
Household received Food Stamps/SNAP in the past 12 months       	4,016,335	+/-15,107
Household did not receive Food Stamps/SNAP in the past 12 months	10,170,648	+/-23,261
		
portion of population who received food stamps for African Americans	0.283100008	

Source: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_15_5YR_B22005B&prodType=table

Let us now calculate SES wealth using the new formula using the wealth score as defined below: 

US proportion =               0.131703905 --> score of 1.32
<br>
African American proportion = 0.283100008 --> score of 2.83

Recall the formula: 
<br>
<br>
$p_w * wealth score + p_e * years of education$, 
<br>
<br>
where $p_w$ and $p_e$ retain the same weights as before, 0.8 and 0.2 respectively.

Enter in your value for SES (wealth) below.

Compare your values for SES (income) and SES (wealth). What are the implications of having different SES values?

# Changing SES Weights

Moving foward, we will only be using SES (income). Anytime in the later text in which this lab references, we are referring to SES (income), defined as:
<br>
<br>
$p_i * income score + p_e * years of education$, 
<br>
<br>
Up to this point, we have maintained weights of $p_i$ = 0.8 and $p_e$ = 0.2. These weights are arbitrary numbers that sum up to 1.0 chosen by how much importance we assign to a factor in our SES index value. In this case, we say that 80% of our SES index is determined by income and 20% is determined by years of education.

We will now experiment with toggling between different weightings for SES. 

We are going to take the case of Jim: an African-American with an income of $35,000 and 16 years of education. First we will compute the ratio of $individual SES / median US SES$, and then we will compute $individual SES / median African American SES$.


Recall the information that was provided earlier:

median income for African Americans: $37,500
<br>
median years of school for African Americans: 13 years of school

median income for all US citizens: $55,000
<br>
median years of school for all US citizens: 12.5 years of school

Compute his $individual SES / median US SES$ using weightings of 0.8 and 0.2. The income score table has been copied below for your convenience.  Is he overperforming or underperforming?

<table>
  <tr>
    <th>Income Score</th>
    <th>Income Bracket (\$) </th>
  </tr>
  <tr>
    <td>1</td>
    <td>less than \$10,000</td>
  </tr>
 <tr>
    <td>2</td>
    <td>\$15,000 - \$19,999</td>
  </tr>
  <tr>
    <td>3</td>
    <td>\$20,000 - \$24,999</td>
  </tr>
  <tr>
    <td>4</td>
    <td>\$25,000 - \$29,999</td>
  </tr>
  <tr>
    <td>5</td>
    <td>\$30,000 - \$34,999</td>
  </tr>
    <tr>
    <td>6</td>
    <td>\$35,000 - \$39,999</td>
  </tr>
    <tr>
    <td>7</td>
    <td>\$40,000 - \$44,999</td>
  </tr>
    <tr>
    <td>8</td>
    <td>\$45,000 - \$49,999</td>
  </tr>
    <tr>
    <td>9</td>
    <td>\$50,000 - \$59,999</td>
  </tr>
   <tr>
    <td>10</td>
    <td>\$60,000 - \$74,999</td>
  </tr> 
   <tr>
    <td>11</td>
    <td>\$75,000 - \$99,999</td>
  </tr>   
   <tr>
    <td>12</td>
    <td>\$100,000 - \$124,999</td>
  </tr> 
   <tr>
    <td>13</td>
    <td>\$125,000 - \$149,999</td>
  </tr> 
   <tr>
    <td>14</td>
    <td>\$150,000 - \$199,999</td>
  </tr>
   <tr>
    <td>15</td>
    <td>\$200,000 or more</td>
  </tr> 
</table>

Compute his $individual SES / median African American SES$. Is he overperforming or underperforming?

Now lets change the weights to 0.5 and 0.5. Compute his $individual SES / median US SES$. Is he overperforming or underperforming?

Compute his $individual SES / median African American SES$ with weights of 0.5 and 0.5. Is he overperforming or underperforming?

How does the importance that we place on certain factors affect Jim's socio-economic status?

Let's change the weightings further: 0.2 for $p_i$ and 0.8 for $p_e$. Calculate the median SES for African Americans.

Calculate the median SES for all US citizens using the same weights of 0.2 and  0.8.

Are African Americans overperforming or underperforming in relation to all US citizens?

How does this compare to values that you found previously with weights of 0.8 $p_i$ and 0.2 $p_e$?

Suppose there was a public policy initiative to provide increased funding to households of low socio-economic status. The board of this policy is debating if they should focus specially on publicizing this initiative to a certain African American community. How would their decision change based on the SES values that we found?

Data science involves making decisions like assigning weights to factors to determine SES indices. These decisions are often ambiguous and do not readily provide a clear answer, but the insights and the conclusions that we draw can influence public policy decisions. One important objective of this lab was to demonstrate that data science isn't always clear cut and involves making judgment calls as well.