# Gentrification Project
# CS 467/567
- Notion link: https://www.notion.so/Big-Data-Energy-Project-f6bb9a10236f476cbea3d0d4796962f6?n=page_invite  

# Core causes for Gentrification
 - Higher income residents
 - Rent/Wage gap
 - Real estate investments
 - Higher income residents moving in
 - Pricier business locations
 - Education level
 - Racial demographic
 - Historic conditions (policies and practices that could make communities susceptible)
 - Rising median household cost

# Charactristics of Gentrification
 - Increased investment in neighborhood amenities, like transit and parks
 - Industrial land could change to restaurants and storefronts
 - Investors flipping properties for large profits
 - High-end development which leads to landlords looking for higher-paying tenants
 - Higher median household cost

# Measurements of Gentrification
 - Census data to measure changes in neighborhood composition by income
 - Race
 - Education
 - Housing value

## Current datasets

### All datasets provided by the US Census

- Population/Racial demographic
 - [https://data.census.gov/cedsci/table?q=race&g=0500000US35001&tid=ACSDT1Y2019.B02001](https://data.census.gov/cedsci/table?q=race&g=0500000US35001&tid=ACSDT1Y2019.B02001)
- Median house price
 - [https://data.census.gov/cedsci/table?q=B25077%3A MEDIAN VALUE (DOLLARS)&g=0500000US35001&tid=ACSDT1Y2019.B25077](https://data.census.gov/cedsci/table?q=B25077%3A%20MEDIAN%20VALUE%20%28DOLLARS%29&g=0500000US35001&tid=ACSDT1Y2019.B25077)
- Education Data
 - https://data.census.gov/cedsci/table?q=Educational%20Attainment&g=0500000US35001&tid=ACSDT1Y2019.C15003
- Income/Employment data
 - [https://data.census.gov/cedsci/table?q=Income (Households, Families, Individuals)&g=0500000US35001&tid=ACSST1Y2019.S1901&hidePreview=true](https://data.census.gov/cedsci/table?q=Income%20%28Households,%20Families,%20Individuals%29&g=0500000US35001&tid=ACSST1Y2019.S1901&hidePreview=true)
    
# Ideas
 - Data from County or Zip code or full city? I personally say county but has to line up with other data
  - LA County data: https://data.census.gov/cedsci/profile?g=0600000US0603791750
 - Potential Causes:
  - Size of household
  - Proximity to city transit
  - What is an "atractive" business
  - Non-english speakers
  - Cause? Price of inner-city housing to decline to the point where it becomes desirable for outsiders to buy it and convert it to a higher value use
  - As wealthier homeowners move in, old houses get fixed up, the aesthetics of the neighborhood improve, and more businesses spring up to serve the new residents.  All of this makes the neighborhood even more attractive to potential (wealthy) buyers. Once a neighborhood begins to gentrify, it can take on a new character that attracts like-minded people
  - Neighborhoods impacted by gentrification have been shaped historically by decades of discriminatory public policies and private real estate practices that undermined property values, facilitated substandard living conditions, and generated racially segregated housing patterns. These neighborhoods’ lower property values, location in the urban core near good jobs and transit, and historical and cultural character are all factors that are making them more attractive to newcomers and susceptible to redevelopment

## Links
 - https://www.urbandisplacement.org/about/what-are-gentrification-and-displacement/ 
 - https://www.vox.com/22629826/gentrification-definition-housing-racism-segregation-cities
 - https://dspace.mit.edu/handle/1721.1/123884 
 - https://www.elca.org/JLE/Articles/1135
 - https://sites.utexas.edu/gentrificationproject/understanding-gentrification-and-displacement/

#TODO:
 - Albuquerque Data http://data.cabq.gov/business/busregistration/
 - Categorize zip code and housing prices
- Cluster the zip codes with average home prices the find business' in those zip codes
 - Determine core causes of gentrification 
 - Collect Education data
 - Collect Employment data
 - Displacement vs Gentrified

In [178]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [179]:
import numpy as np
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D  
import matplotlib.pyplot as plt          
import numpy as np                    
import os 

In [180]:
years = [*range(2010,2020)]
race = {}
median_house = {}
education = {}
income = {}

# Create and clean racial demographic data

In [181]:
race_clean_col = {'id': 'ID',
             'Geographic Area Name': 'County',
             'Estimate!!Total': 'Total Population',
             'Estimate!!Total:': 'Total Population',
             'Estimate!!Total!!White alone': 'White alone',
             'Estimate!!Total:!!White alone': 'White alone',
             'Estimate!!Total!!Black or African American alone': 'Black or African American alone',
             'Estimate!!Total:!!Black or African American alone': 'Black or African American alone',
             'Estimate!!Total!!American Indian and Alaska Native alone': 'American Indian and Alaska Native along',
             'Estimate!!Total:!!American Indian and Alaska Native alone': 'American Indian and Alaska Native along',
             'Estimate!!Total!!Asian alone': 'Asain alone',
             'Estimate!!Total:!!Asian alone': 'Asain alone',
             'Estimate!!Total!!Native Hawaiian and Other Pacific Islander alone': 'Native Hawaiian and Other Pacific Islander alone',
             'Estimate!!Total:!!Native Hawaiian and Other Pacific Islander alone': 'Native Hawaiian and Other Pacific Islander alone',
             'Estimate!!Total!!Some other race alone': 'Some other race alone',
             'Estimate!!Total:!!Some other race alone': 'Some other race alone',
             'Estimate!!Total!!Two or more races': 'Two or more races',
             'Estimate!!Total:!!Two or more races:': 'Two or more races',
             'Estimate!!Total!!Two or more races!!Two races including Some other race': 'Two races including some other race',
             'Estimate!!Total:!!Two or more races:!!Two races including Some other race': 'Two races including some other race',
             'Estimate!!Total!!Two or more races!!Two races excluding Some other race, and three or more races': 'Two races excluding Some other race, and three or more races',
             'Estimate!!Total:!!Two or more races:!!Two races excluding Some other race, and three or more races': 'Two races excluding Some other race, and three or more races'
             }

In [182]:
# Read in ACS race Demographics by Year
for year in years: # Each year from 2010-2019
  new_df = pd.read_csv(f'/content/drive/MyDrive/GentrificationProj/Race/ACS_county{year}.csv', delimiter=',', header=1) # read in the CSV file
  new_df.dataframeName = f"ACS County Demographics {year}"  # Name of the dataframe
  cols = range(3, new_df.shape[1], 2)                       # Indecies of margin of error columns
  new_df.drop(new_df.columns[cols],axis=1,inplace=True)     # Remove all margin of error columns
  new_df.rename(columns=race_clean_col,inplace=True)            # Clean the column names so they are more readable
  new_df.dropna(inplace=True)                               # Remove NA Elements
  new_df = new_df[~new_df["County"].str.contains('Puerto Rico')] # Remove Puerto Rico, only want to look at mainland US.
  race[year] = new_df                                        # Assign to dictionary

# Create and clean median house price data

In [183]:
median_clean_col = {'id': 'ID',
                    'Geographic Area Name': 'County',
                    'Estimate!!Median value (dollars)': 'Median House Value (Dollars)'
                    }

In [184]:
# Read in ACS Demographics by Year
for year in years: # Each year from 2010-2019
  new_df = pd.read_csv(f'/content/drive/MyDrive/GentrificationProj/MedianValues/Median_{year}.csv', delimiter=',', header=1) # read in the CSV file
  new_df.dataframeName = f"Median House Price {year}"  # Name of the dataframe
  cols = range(3, new_df.shape[1], 2)                       # Indecies of margin of error columns
  new_df.drop(new_df.columns[cols],axis=1,inplace=True)     # Remove all margin of error columns
  new_df.rename(columns=median_clean_col,inplace=True)            # Clean the column names so they are more readable
  new_df.dropna(inplace=True)                               # Remove NA Elements
  new_df = new_df[~new_df["County"].str.contains('Puerto Rico')] # Remove Puerto Rico, only want to look at mainland US
  median_house[year] = new_df                                        # Assign to dictionary

# Create and clean education data

In [185]:
education_clean_col = {'id': 'ID',
                       'Geographic Area Name': 'County',
                       'NAME': 'County',
                       'Estimate!!Total': 'Total',
                       'Total!!Estimate!!Population 18 to 24 years': 'Population 18 to 24 years',
                       'Estimate!!Total!!Population 18 to 24 years': 'Population 18 to 24 years',
                       'Estimate!!Total!!AGE BY EDUCATIONAL ATTAINMENT!!Population 18 to 24 years': 'Population 18 to 24 years',
                       'Total!!Estimate!!Population 18 to 24 years!!Less than high school graduate': 'Less than high school graduate (Ages 18-24)',
                       'Total!!Estimate!!Population 18 to 24 years!!High school graduate (includes equivalency)': 'High school graduate (includes equivalency) (Ages 18-24)',
                       'Total!!Estimate!!Population 18 to 24 years!!Some college or associate\'s degree': 'Some college or associate\'s degree (Ages 18-24)',
                       'Total!!Estimate!!Population 18 to 24 years!!Bachelor\'s degree or higher': 'Bachelor\'s degree or higher (Ages 18-24)',
                       'Total!!Estimate!!Bachelor\'s degree or higher': 'Bachelor\'s degree or higher (Ages 18-24)',
                       'Estimate!!Total!!Population 18 to 24 years!!Bachelor\'s degree or higher': 'Bachelor\'s degree or higher (Ages 18-24)',
                       'Estimate!!Total!!AGE BY EDUCATIONAL ATTAINMENT!!Population 18 to 24 years!!Bachelor\'s degree or higher': 'Bachelor\'s degree or higher (Ages 18-24)',
                       'Total!!Estimate!!Population 25 years and over': 'Population 25 years and over',
                       'Estimate!!Total!!Population 25 years and over': 'Population 25 years and over',
                       'Estimate!!Total!!AGE BY EDUCATIONAL ATTAINMENT!!Population 25 years and over': 'Population 25 years and over',
                       'Total!!Estimate!!Population 25 years and over!!Less than 9th grade': 'Less than 9th grade (Ages 25 and over)',
                       'Total!!Estimate!!Population 25 years and over!!9th to 12th grade, no diploma': '9th to 12th grade(Ages 25 and over)',
                       'Total!!Estimate!!Population 25 years and over!!High school graduate (includes equivalency)': 'High school graduate (includes equivalency) (Ages 25 and over)',
                       'Total!!Estimate!!Population 25 years and over!!Some college, no degree': 'Some college, no degree (Ages 25 and over)',
                       'Total!!Estimate!!Population 25 years and over!!Associate\'s degree': 'Associate\'s degree (Ages 25 and over)',
                       'Total!!Estimate!!Population 25 years and over!!Bachelor\'s degree': 'Bachelor\'s degree (Ages 25 and over)',
                       'Total!!Estimate!!Bachelor\'s degree': 'Bachelor\'s degree (Ages 25 and over)',
                       'Estimate!!Total!!Population 25 years and over!!Bachelor\'s degree': 'Bachelor\'s degree (Ages 25 and over)',
                       'Estimate!!Total!!AGE BY EDUCATIONAL ATTAINMENT!!Population 25 years and over!!Bachelor\'s degree': 'Bachelor\'s degree (Ages 25 and over)',
                       'Total!!Estimate!!Graduate or professional degree': 'Graduate or professional degree (Ages 25 and over)',
                       'Estimate!!Total!!Population 25 years and over!!Graduate or professional degree': 'Graduate or professional degree (Ages 25 and over)',
                       'Estimate!!Total!!AGE BY EDUCATIONAL ATTAINMENT!!Population 25 years and over!!Graduate or professional degree': 'Graduate or professional degree (Ages 25 and over)',
                       'Total!!Estimate!!Population 25 years and over!!Graduate or professional degree': 'Graduate or professional degree (Ages 25 and over)',
                      }

In [186]:
# Read in ACS Demographics by Year
for year in years: # Each year from 2010-2019
  new_df = pd.read_csv(f'/content/drive/MyDrive/GentrificationProj/Education/Education_{year}.csv', delimiter=',', header=1) # read in the CSV file
  new_df.dataframeName = f"Education of {year}"  # Name of the dataframe
  cols = range(3, new_df.shape[1], 2)                       # Indecies of margin of error columns
  new_df.drop(new_df.columns[cols],axis=1,inplace=True)     # Remove all margin of error columns
  new_df.rename(columns=education_clean_col,inplace=True)   # Clean the column names so they are more readable
  new_df.dropna(inplace=True)                               # Remove NA Elements
  new_df['Bachelor\'s degree or higher (Ages 25 and over)'] = new_df['Bachelor\'s degree (Ages 25 and over)'] + new_df['Graduate or professional degree (Ages 25 and over)']
  if(year >= 2015):
      drop_cols = range(15,386,1)
      percent18 = new_df['Bachelor\'s degree or higher (Ages 18-24)'] / new_df['Population 18 to 24 years'] * 100
      percent25 = new_df['Bachelor\'s degree or higher (Ages 25 and over)'] / new_df['Population 25 years and over']* 100
      new_df['Bachelor\'s degree or higher (%)'] =  percent18 + percent25
      new_df.drop(new_df.columns[drop_cols],axis=1,inplace=True) # Remove all unnecessary columns
      drop_cols2 = range(3,6,1)
      drop_cols3 = range(5,10,1)
      new_df.drop(new_df.columns[drop_cols2],axis=1,inplace=True) # Remove all unnecessary columns
      new_df.drop(new_df.columns[drop_cols3],axis=1,inplace=True) # Remove all unnecessary columns
      drop_cols4 = range(2,8,1)
      new_df.drop(new_df.columns[drop_cols4],axis=1,inplace=True) # Remove all unnecessary columns
  else:
      new_df['Bachelor\'s degree or higher (%)'] = new_df['Bachelor\'s degree or higher (Ages 18-24)'] + new_df['Bachelor\'s degree or higher (Ages 25 and over)']
      drop_cols = range(2,117,1)
      new_df.drop(new_df.columns[drop_cols],axis=1,inplace=True) # Remove all unnecessary columns
  new_df = new_df[~new_df["County"].str.contains('Puerto Rico')] # Remove Puerto Rico, only want to look at mainland US
  education[year] = new_df                                        # Assign to dictionary

# Create and clean income data

In [187]:
income_clean_col = {'id': 'ID',
                    'Geographic Area Name': 'County',
                    'Households!!Estimate!!Total': 'Total households',
                    'Estimate!!Households!!Total': 'Total households',
                    'Households!!Estimate!!Median income (dollars)': 'Median Income (Dollars)',
                    'Estimate!!Households!!Median income (dollars)': 'Median Income (Dollars)'
                    }

In [188]:
# Read in ACS Demographics by Year
for year in years: # Each year from 2010-2019
  new_df = pd.read_csv(f'/content/drive/MyDrive/GentrificationProj/Income/Income_{year}.csv', delimiter=',', header=1) # read in the CSV file
  new_df.dataframeName = f"Income for {year}"  # Name of the dataframe
  cols = range(3, new_df.shape[1], 2)                       # Indecies of margin of error columns
  new_df.drop(new_df.columns[cols],axis=1,inplace=True)     # Remove all margin of error columns
  drop_cols1 = range(3,13,1)
  new_df.drop(new_df.columns[drop_cols1],axis=1,inplace=True) # Remove all unnecessary columns
  drop_cols2 = range(4,56,1)
  new_df.drop(new_df.columns[drop_cols2],axis=1,inplace=True) # Remove all unnecessary columns
  new_df.rename(columns=income_clean_col,inplace=True)      # Clean the column names so they are more readable
  new_df.dropna(inplace=True)                               # Remove NA Elements
  new_df = new_df[~new_df["County"].str.contains('Puerto Rico')] # Remove Puerto Rico, only want to look at mainland US
  print(new_df.columns[3])
  income[year] = new_df                                          # Assign to dictionary

Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)
Median Income (Dollars)


# Checking data

In [189]:
race[2015].head()

Unnamed: 0,ID,County,Total Population,White alone,Black or African American alone,American Indian and Alaska Native along,Asain alone,Native Hawaiian and Other Pacific Islander alone,Some other race alone,Two or more races,Two races including some other race,"Two races excluding Some other race, and three or more races"
0,0500000US01003,"Baldwin County, Alabama",203709.0,174923.0,23049.0,1917.0,303.0,0.0,1383.0,2134.0,812.0,1322.0
1,0500000US01015,"Calhoun County, Alabama",115620.0,86053.0,22709.0,176.0,1229.0,0.0,1787.0,3666.0,84.0,3582.0
2,0500000US01043,"Cullman County, Alabama",82005.0,80309.0,475.0,99.0,218.0,0.0,598.0,306.0,0.0,306.0
3,0500000US01049,"DeKalb County, Alabama",71130.0,60748.0,1274.0,541.0,146.0,2319.0,4605.0,1497.0,103.0,1394.0
4,0500000US01051,"Elmore County, Alabama",81468.0,61760.0,17971.0,7.0,207.0,0.0,166.0,1357.0,0.0,1357.0


In [190]:
median_house[2016].head()

Unnamed: 0,ID,County,Median House Value (Dollars)
0,0500000US01003,"Baldwin County, Alabama",189100
1,0500000US01015,"Calhoun County, Alabama",115600
2,0500000US01043,"Cullman County, Alabama",133100
3,0500000US01049,"DeKalb County, Alabama",101600
4,0500000US01051,"Elmore County, Alabama",171000


In [191]:
education[2010].head()

Unnamed: 0,ID,County,Bachelor's degree or higher (%)
0,0500000US01003,"Baldwin County, Alabama",37.9
1,0500000US01015,"Calhoun County, Alabama",18.5
2,0500000US01043,"Cullman County, Alabama",15.2
3,0500000US01049,"DeKalb County, Alabama",14.5
4,0500000US01051,"Elmore County, Alabama",25.4


In [192]:
income[2016].head()

Unnamed: 0,ID,County,Total households,Median Income (Dollars)
0,0500000US01003,"Baldwin County, Alabama",76779,56732
1,0500000US01015,"Calhoun County, Alabama",43972,41687
2,0500000US01043,"Cullman County, Alabama",30299,39411
3,0500000US01049,"DeKalb County, Alabama",25383,35963
4,0500000US01051,"Elmore County, Alabama",29350,52579


# Merge datasets

In [193]:
merged_df = {}

for year in years:
  df = pd.concat([median_house[year].set_index('ID'),
                  income[year].set_index('ID'),
                  education[year].set_index('ID'),
                  race[year].set_index('ID'),
                  ],
                               
                              axis=1,
                              join='inner')
  
  df = df.loc[:,~df.columns.duplicated()] # Remove duplicate columns
  merged_df[year] = df

In [194]:
merged_df[2015].head()

Unnamed: 0_level_0,County,Median House Value (Dollars),Total households,Median Income (Dollars),Bachelor's degree or higher (%),Total Population,White alone,Black or African American alone,American Indian and Alaska Native along,Asain alone,Native Hawaiian and Other Pacific Islander alone,Some other race alone,Two or more races,Two races including some other race,"Two races excluding Some other race, and three or more races"
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0500000US01003,"Baldwin County, Alabama",177800,72269,52003,38.754795,203709.0,174923.0,23049.0,1917.0,303.0,0.0,1383.0,2134.0,812.0,1322.0
0500000US01015,"Calhoun County, Alabama",111400,44323,42346,23.29171,115620.0,86053.0,22709.0,176.0,1229.0,0.0,1787.0,3666.0,84.0,3582.0
0500000US01043,"Cullman County, Alabama",128600,30798,37862,21.185035,82005.0,80309.0,475.0,99.0,218.0,0.0,598.0,306.0,0.0,306.0
0500000US01049,"DeKalb County, Alabama",101000,26247,36559,12.853441,71130.0,60748.0,1274.0,541.0,146.0,2319.0,4605.0,1497.0,103.0,1394.0
0500000US01051,"Elmore County, Alabama",159800,29615,52502,25.562724,81468.0,61760.0,17971.0,7.0,207.0,0.0,166.0,1357.0,0.0,1357.0


# Find Eligible Counties

## Eligablity
## Methods
Based on [Gentrification and
Disinvestment 2020](https://s3.us-west-2.amazonaws.com/secure.notion-static.com/ac5291fd-dcc5-4927-a73a-e2ce3858efa1/Gentrification-and-Opportunity-Zones-2020-v9.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20211205%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20211205T201043Z&X-Amz-Expires=86400&X-Amz-Signature=6b110a3bd2669cc2ec39b8835a4a1376f7d3b6adb33d40db0d3c97ef6abce4bd&X-Amz-SignedHeaders=host&response-content-disposition=filename%20%3D%22Gentrification-and-Opportunity-Zones-2020-v9.pdf%22&x-id=GetObject)

### Criteria of Gentrification
Eligible
* **Population** > 500
* **Median Home Value** < 40th percentile
* **Median Household Income** < 40th Percentile

Gentrified
* **Increase in Median Home Value** > 60th Percentile
* **Increase in College Educated** > 60th percentile
* **Increase in median household income**

In [195]:
# Eligble County
for year in years:
  df = merged_df[year]
  home_val_40perc = df['Median House Value (Dollars)'].quantile(0.4) # 40th percentile
  med_income_40perc = df['Median Income (Dollars)'].quantile(0.4) # 40th percentile

  ls = []
  for (i, row) in df.iterrows():
    population = row['Total Population']
    med_home_val = row['Median House Value (Dollars)'] 
    med_income = row['Median Income (Dollars)']
    
    is_pop_met = population > 500
    is_home_met = med_home_val < home_val_40perc
    is_income_met = med_income < med_income_40perc
    
    if(is_pop_met and is_home_met and is_income_met):
      ls.append(1)
    else:
      ls.append(0)
      
  df['Eligible'] = ls
  merged_df[year] = df

In [199]:
# Check eligible column
merged_df[2014].head()

Unnamed: 0_level_0,County,Median House Value (Dollars),Total households,Median Income (Dollars),Bachelor's degree or higher (%),Total Population,White alone,Black or African American alone,American Indian and Alaska Native along,Asain alone,Native Hawaiian and Other Pacific Islander alone,Some other race alone,Two or more races,Two races including some other race,"Two races excluding Some other race, and three or more races",Eligible
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0500000US01003,"Baldwin County, Alabama",180100,71307,48461,39.1,200111.0,173137.0,19588.0,1621.0,639.0,0.0,1305.0,3821.0,1918.0,1903.0,0
0500000US01049,"DeKalb County, Alabama",90500,24720,35023,14.1,71065.0,59354.0,896.0,1126.0,416.0,0.0,7892.0,1381.0,63.0,1318.0,1
0500000US01051,"Elmore County, Alabama",150300,28352,55530,29.1,80977.0,60220.0,18242.0,219.0,64.0,0.0,1122.0,1110.0,166.0,944.0,0
0500000US01055,"Etowah County, Alabama",99700,39714,40529,23.1,103531.0,83905.0,16122.0,338.0,686.0,55.0,757.0,1668.0,0.0,1668.0,1
0500000US01069,"Houston County, Alabama",117100,38825,39543,24.6,104193.0,72551.0,28680.0,221.0,573.0,0.0,598.0,1570.0,125.0,1445.0,1


In [200]:
# Observe how many counties are eligible
# 0 = Not eligible
# 1 = Eligible
merged_df[2015].Eligible.value_counts()

0    552
1    229
Name: Eligible, dtype: int64

# Create eligible dataset

In [213]:
# Create dataset with only eligible counties
eligible_df = {}
for year in years:
  df = merged_df[year]
  df = df[df.Eligible == 1]
  eligible_df[year] = df

In [214]:
# Check eligible dataset
eligible_df[2014].head()

Unnamed: 0_level_0,County,Median House Value (Dollars),Total households,Median Income (Dollars),Bachelor's degree or higher (%),Total Population,White alone,Black or African American alone,American Indian and Alaska Native along,Asain alone,Native Hawaiian and Other Pacific Islander alone,Some other race alone,Two or more races,Two races including some other race,"Two races excluding Some other race, and three or more races",Eligible
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0500000US01049,"DeKalb County, Alabama",90500,24720,35023,14.1,71065.0,59354.0,896.0,1126.0,416.0,0.0,7892.0,1381.0,63.0,1318.0,1
0500000US01055,"Etowah County, Alabama",99700,39714,40529,23.1,103531.0,83905.0,16122.0,338.0,686.0,55.0,757.0,1668.0,0.0,1668.0,1
0500000US01069,"Houston County, Alabama",117100,38825,39543,24.6,104193.0,72551.0,28680.0,221.0,573.0,0.0,598.0,1570.0,125.0,1445.0,1
0500000US01073,"Jefferson County, Alabama",142500,261980,44646,38.6,660793.0,346240.0,277340.0,2147.0,10715.0,284.0,14420.0,9647.0,542.0,9105.0,1
0500000US01077,"Lauderdale County, Alabama",114000,38093,40309,27.9,93096.0,80893.0,8534.0,188.0,635.0,0.0,36.0,2810.0,137.0,2673.0,1


# Visualizing Data

In [227]:
for year in years:
  abq = eligible_df.iloc[[0], :]
  print(abq)
#plt.plot(Year, income[2016].iloc)
#plt.title('Median Income Value Vs Year')
#plt.xlabel('Year')
#plt.ylabel('Median Income Value')
#plt.show()

AttributeError: ignored