# Socioeconomic scores : Area deprivation index (ADI)

Area deprivation index (ADI): Is a composite index that ranks neighborhoods by socioeconomic disadvantage in a specific region of interest (state or national level). It was developed by the University of Wisconsin-Madison.

    - The composite index is calculated with different inputs such as income, education, employment, and housing quality.
    - It is originally aggregated at the Census block group level (neighborhood unit), but has a mapping file for zip codes.
    - At the national level, the ADI scores are expressed as percentiles (1-100).
    - At the state level, the ADI scores are expressed as deciles (1-10), where 1 represents the least disadvantaged neighborhoods, while 10 are the most disadvantaged neighborhood. 
 
 
You can find more information about the ADI here: https://www.neighborhoodatlas.medicine.wisc.edu/

### 1. Loading ADI dataset

We downloaded previously the ADI dataset for Wisconsin state only from https://www.neighborhoodatlas.medicine.wisc.edu/ and exported to a csv file. Each column represents:
- ZIP_4: The 9 digit zip code ID 
- FIPS: The block group Census ID
- GISJOIN: Key linkage field to the block group shapefile served by NHGIS
- ADI_NATRANK: National percentile of block group ADI score
- ADI_STATERNK: State-specific decile of block group ADI score

In [None]:
import pandas as pd

SOCIOECONOMIC_FILE = './socioeconomic_scores_zipcode.csv'

df_socioec_scores = pd.read_csv(SOCIOECONOMIC_FILE)
df_socioec_scores.head(5)

### 2. Cleaning the dataset

We have some interesting challenges with this dataset and we'd only need 2 columns:
    - ZIP_4: We would need to transform to 5 digit zip code only to align it with UWWisconsin dataset.
    - ADI_STATERANK: It has more than 10 values, such as GQ,PH, GQ-PH. We'd need to clean it. More here: https://www.neighborhoodatlas.medicine.wisc.edu/

An initial solution to clean this dataset: (up to discussion)
    - Group the dataset by 5-digit zip code and assign the average rank within each group.

In [None]:
def clean_adi(df_original):
    
    # Let's create the cleaned dataset
    df_clean = df_original.copy()
    
    # Extract only 5 digits from zipcode
    df_clean['ZIP'] = df_clean['ZIP_4'].astype(str).str[:5]
    
    # Remove zipcodes that don't have a valid decile
    df_clean = df_clean.loc[(df_clean.ADI_STATERANK!='GQ') &
                                         (df_clean.ADI_STATERANK!='PH') &
                                          (df_clean.ADI_STATERANK!='GQ-PH') &
                                        (~df_clean.ADI_STATERANK.isna())
                                         ]
    # Transform state rank to integer
    df_clean['ADI_STATERANK'] = df_clean['ADI_STATERANK'].astype(int)
    
    # IMPORTANT: Assign each zipcode the average rank from their block neighborhoods 
    df_clean = df_clean[['ZIP','ADI_STATERANK']].groupby(['ZIP']).\
                                                                agg({'ADI_STATERANK':'mean'}).reset_index()
    
    
    return df_clean[['ZIP','ADI_STATERANK']]

In [None]:
df_clean = clean_adi(df_socioec_scores)
df_clean.head(5)

### 3. Creating a Choropleth to show ADI by zip codes

We will use the Folium library to plot the zipcodes and their corresponding ADI value. For this task we will need a ZIPCODE GeoJSON file for Wisconsin.


In [None]:
# Install folium library
!pip install folium

In [None]:
import folium
import pandas as pd
import json
import requests

# GeoJSON file definition
wisconsin_geojson = "https://raw.githubusercontent.com/OpenDataDE/State-zip-code-GeoJSON/master/wi_wisconsin_zip_codes_geo.min.json"


# Creating the map centered at Wisconsin state
m = folium.Map(location=[44.808444, -89.673194], 
               tiles="cartodbpositron", 
               zoom_start=6.8)

# Creating the Choropleth
m.choropleth(geo_data=json.loads(requests.get(wisconsin_geojson).text),
             data=df_clean,
             columns=['ZIP', 'ADI_STATERANK'],
             key_on='feature.properties.ZCTA5CE10', 
             fill_color='YlOrRd', fill_opacity=1, line_opacity=0.2,
             legend_name='Area Deprivation Index : 1(Least disadvantaged)-10(Most disadvantaged))')

m


In [None]:
m.save(outfile = './choropleth_wisconsin.html' )

### 4. Final thoughts of this dataset

Finally, let's reflect into the next questions before using this dataset for our Datadive event!
    - Is the zipcode the best neighborhood unit to match information from UWWisconsin calls?
    - Can we find another way to aggregate the scores (currently: average)?
    - Is this information enough to draw conclusions for our analysis? Do we need an additional socioeconomic dataset?


## Service Sites per Zip Code ##

In [None]:
sites = pd.read_csv('uwwi_dataset_sites.csv')
sites.head()

In [None]:
sites_zip = sites[['Agency_Id', 'SiteAddressus_SiteAddressus_zip']]

In [None]:
sites_zip.isna().sum()

## 4.7% of sites list no zip code. Removing these sites for now. ##

In [None]:
sites_zip = sites_zip.dropna()

In [None]:
sites_zip.columns = ['Agency_Id', 'Zip']
sites_zip.head()

In [None]:
%%capture --no-display
sites_zip['Zip-5'] = sites_zip['Zip'].astype(str).str[:5]

In [None]:
sites_zip 

In [None]:
site_count = sites_zip[['Agency_Id', 'Zip-5']].groupby('Zip-5').agg('count').reset_index()
site_count

In [None]:
site_count.columns = ['Zip','Number of Sites']

In [None]:
site_count[site_count['Number of Sites'] >300]

In [None]:
site_count.describe()

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.hist(site_count['Number of Sites'])
plt.show()

Zipcode quartiles:
- First quartile have 1 site
- Second quartile have 1-3 sites
- Third quartile have 3-12 sites
- Fourth quartile have 12-347 sites

Creating Site Density Categories 1-4 using fairly arbitrary bins:

In [None]:
site_count['Category'] = pd.cut(site_count['Number of Sites'], bins=[0, 50, 100, 200, 350], labels = [1, 2, 3, 4])

In [None]:
site_count['Category'].value_counts()

In [None]:
# Creating the map centered at Wisconsin state
m = folium.Map(location=[44.808444, -89.673194], 
               tiles="cartodbpositron", 
               zoom_start=6.8)

# Creating the Choropleth
folium.Choropleth(geo_data=json.loads(requests.get(wisconsin_geojson).text),
             data=site_count,
             columns=['Zip', 'Number of Sites'],
             key_on='feature.properties.ZCTA5CE10', 
             fill_color='YlOrRd', fill_opacity=1, line_opacity=0.2, nan_fill_color="White",
             legend_name='Number of Sites').add_to(m)

m

In [None]:
# Creating the map centered at Wisconsin state
cat_map = folium.Map(location=[44.808444, -89.673194], 
               tiles="cartodbpositron", 
               zoom_start=6.8)

# Creating the Choropleth
folium.Choropleth(geo_data=json.loads(requests.get(wisconsin_geojson).text),
             data=site_count,
             columns=['Zip', 'Category'],
             key_on='feature.properties.ZCTA5CE10', 
             fill_color='YlOrRd', fill_opacity=1, line_opacity=0.2, nan_fill_color="White",
             legend_name='Site Density Category: 1(Low)-4(High)').add_to(cat_map)

cat_map

In [None]:
m.save(outfile = './sites_per_zip.html' )

In [None]:
site_count.columns=['ZIP', 'Number of Sites', 'Site Density Category']
site_count

In [None]:
site_count.info()

In [None]:
zip_pop = pd.read_csv('./population_zip.csv')

In [None]:
zip_pop.head()

In [None]:
zip_pop['ZIP'] = zip_pop['ZIP'].astype(str)

In [None]:
site_pop = site_count.merge(zip_pop, on='ZIP')

In [None]:
site_pop.info()

In [None]:
site_pop['Sites Per Capita'] = (site_pop['Number of Sites'] / site_pop['Population'])
site_pop.head()

In [None]:
site_pop.describe()

In [None]:
site_pop.to_csv('./site_pop.csv', index=False)

In [None]:
site_pop.info()

In [None]:
import numpy as np

site_pop['Log Sites Per Capita'] = np.log10(site_pop['Sites Per Capita'])
site_pop.describe()

In [None]:
# Creating the map centered at Wisconsin state
percapita_map = folium.Map(location=[44.808444, -89.673194], 
               tiles="cartodbpositron", 
               zoom_start=6.8)

# Creating the Choropleth
folium.Choropleth(geo_data=json.loads(requests.get(wisconsin_geojson).text),
             data=site_pop,
             columns=['ZIP', 'Log Sites Per Capita'],
             key_on='feature.properties.ZCTA5CE10', 
             fill_color='YlOrRd', fill_opacity=1, line_opacity=0.2, nan_fill_color="White",
             legend_name='Sites Per Capita (log scale)').add_to(percapita_map)

percapita_map

In [None]:
percapita_map.save(outfile = './per_capita_map.html' )

## Identifying cold zones ##

In [None]:
sites_with_adi = site_count.merge(df_clean, on='ZIP', how='inner')
sites_with_adi

Creating a map and list of "cold" neighborhoods with low number of services and high Area Deprivation Index:

In [None]:
cold_zips = sites_with_adi[(sites_with_adi['Site Density Category'] == 1) & (sites_with_adi['ADI_STATERANK'] >= 8)]

In [None]:
cold_zips

Map of Cold Zones

In [None]:
# Creating the map centered at Wisconsin state
cold_map = folium.Map(location=[44.808444, -89.673194], 
               tiles="cartodbpositron", 
               zoom_start=6.8)

# Creating the Choropleth
folium.Choropleth(geo_data=json.loads(requests.get(wisconsin_geojson).text),
             data=cold_zips,
             columns=['ZIP', 'ADI_STATERANK'],
             key_on='feature.properties.ZCTA5CE10', 
             fill_color='OrRd', fill_opacity=1, line_opacity=0.2, nan_fill_color="White",
             legend_name='Area Deprivation Index : 1(Least disadvantaged)-10(Most disadvantaged)').add_to(cold_map)

cold_map

Merging with city list to get a list of "cold" cities:

In [None]:
sites_cities = sites[['SiteAddressus_SiteAddressus_zip', 'SiteAddressus_SiteAddressus_city']]

In [None]:
sites_cities.columns = ['ZIP', 'City']
sites_cities = sites_cities.drop_duplicates(subset='ZIP')
sites_cities['City'] = sites_cities['City'].str.title()
sites_cities

In [None]:
cold_cities = cold_zips.merge(sites_cities, on='ZIP')
cold_cities

In [None]:
cold_cities['City'].unique()

In [None]:
cold_map.save(outfile = './cold_zones.html' )

In [None]:
cold_cities = cold_cities.drop(columns = ['Site Density Category'])
cold_cities

In [None]:
cold_cities.to_csv('./cold_cities.csv', index=False)