# Exploring Lexington's Redlined Neighborhoods: Wealth

This notebook covers the steps taken to collect and pre-process data for redlining-wealth-health-lexington github repo.

Required environment dependencies (actually loaded into environment):  
**jupyter notebook** conda install -c conda-forge jupyter  
**numpy** conda install -c conda-forge numpy  
**geopandas** conda install --channel conda-forge geopandas  
**pandas** (version 0.24 or later) conda install -c conda-forge pandas  
**shapely** (interface to GEOS) conda install -c conda-forge shapely  
**fiona** (interface to GDAL) conda install -c conda-forge fiona  
**matplotlib** (>=2.2.0) conda install -c conda-forge matplotlib  
**pyproj** (interface to PROJ; version 2.2.0 or later) conda install -c conda-forge pyproj

Further, optional dependencies (actually loaded into environment:  
**rtree** (optional; spatial index to improve performance and required for overlay operations; interface to libspatialindex) conda install -c conda-forge rtree  
**psycopg2**(optional; for PostGIS connection) conda install -c conda-forge psycopg2  
**GeoAlchemy2** (optional; for writing to PostGIS)conda install -c conda-forge geoalchemy2  
**geopy** (optional; for geocoding) conda install -c conda-forge geopy  

For plotting, these additional packages may be used:  
**mapclassify**(>=2.2.0)  
  

## Intro to US Census bureau data 
[Video](https://www.youtube.com/watch?v=1LZPYS0cR68) detailing how to access data for specific census tracts.  
Can find US Census data [here](https://mtgis-portal.geo.census.gov/arcgis/apps/MapSeries/index.html?appid=2566121a73de463995ed2b2fd7ff6eb7) on the 2020 Census Bureau Demographic Data Map Viewer OR [here](data.census.gov)

Data within the US Census bureau is organized in a hierarchy of:
Nation-> State -> County -> Tract -> Block Group -> Block
Where a Nation is composed of my states which are composed of many counties etc..

Will need to use these sites to determine the exact census tracts for data interested in. Possible this will be how to perform table joins. 

data.census.gov  
From here click advanced search and select what filters to work through.  
Geography example:  
Click advanced search-geography filter-select the hierarchy level wish to pull-state-county-within (tract)  
To only pull data from 2020 census click on year filter and select year of interest  
To only pull data from specific census click on surveys filter and select survey of interest  
Click search (data tables resulting from these filters will populate)  
Can then either download the table in csv or look in map format  

For example I chose  
advanced search-geography-block-kentucky-fayette-all blocks within fayette county  

Promising survey:  
**DP4 Profile of Selected Housing Characteristics: 2000** Decennial Survey  
**DP04 Selected Housing Characteristics (2019: ACS American Community Survey 5 year estimates data profiles**   
Housing units, Structure type, year built, number of rooms, year householder moved into unit (establish long term owners), vehicles, house heating, specified owner occupied units value..., gross rent, 

**DP02 Selected Social Characteristics in the United States (2019: ACS 5-Year Estimates Data Profiles**  
Educational attainment, Disability status, residence length, with a computer, with broadband internet


## Intro to Lexington's Data Portal
**GOOD DATA FILE for neighborhoods but no common ground to merge.**
Census 2020 by Race for 18 and Over by Preceinct in geoJSON format accessed 1/27/2022 from [here](https://data-lfucg.hub.arcgis.com/datasets/a1e93b1a225a4bb79baa00190e2b212c_0/explore?location=38.029745%2C-84.464568%2C13.69)

This is an alternative option to finding census tracts as they should be included in this geojson from 2020 census.
This data set containes the names of the neighborhoods.
This data set also uses a unique change in identification. In 2020 there was an option to identify as more than one race (multiracial). Will need to consider this when exploring the race make up of the precincts. Can use the dominate identifiers or be more specific i.e. white vs black or african american vs *race 1 and race 2*

Would like to know what CODE stands for, example CODE C129 and NAME IDLE HOUR. is C129 part of a precinct number? 

[**GOOD DATA FILE for Census Tracts 2010**](https://data-lfucg.hub.arcgis.com/datasets/6cb9e8be350d45d19f1e517d2e9c4e4b_0/explore?location=38.041424%2C-84.451398%2C12.05) Good example of tract number identification from 2010 census tracts.
It would appear **NAME10** on Census Tract 2010 corresponds to **BASENAME** for Census Tract in US Census Bureau 2019 census. Caveat in using this data is it's dated. Should find more up to date data file.

[**GOOD DATA FILE for Census Tracts & Race**](https://data-lfucg.hub.arcgis.com/datasets/8c1d1363e6ce4822b818a51469f4f502_0/explore)
Census 2020 Precinct_p1_race

[**GOOD DATA FILE for Landuse (Vacancies)**](https://data-lfucg.hub.arcgis.com/datasets/8113df2403c048a0837a0bec68906f3a_0/explore?filters=eyJMVTIwMDUiOlsiU0YiLCJNRiIsIkQiLCJDT00iLCJWQUMiXX0%3D&location=38.031300%2C-84.466517%2C13.19)
Only issue is that the landuse may be from 2005.

[**Good DATA FILE for Parks**](https://data-lfucg.hub.arcgis.com/datasets/764044274e974dbba9069d9dab7dcb34_0/explore?location=38.037063%2C-84.482212%2C13.06)
Park type, acreage. Need to check and see if this is more up to date than the previous park project completed in lex_redlined_and_parks personal repo

[**Property values Option B**](https://qpublic.schneidercorp.com/Application.aspx?AppID=1019&LayerID=21445&PageTypeID=4&PageID=9143&Q=1076636217&KeyValue=40100350)
allowed 100 free look ups first month
each additional month 50 free look ups.
could limit the look ups to one holc a and one holc d? 

[**Occupancy Status by Precinct Census 2020**](https://data-lfucg.hub.arcgis.com/datasets/4ce0a61b95764366a1971f5960a10690_0/explore?location=38.028065%2C-84.471929%2C13.69)
Occupied vs vacant reported properties

[**Greenway**](https://data-lfucg.hub.arcgis.com/datasets/79605c14aebf4bbb98c6df79c6e539ec_0/explore?location=38.022719%2C-84.458785%2C16.91)
acreage of greenspace 

[**Railway**](https://data-lfucg.hub.arcgis.com/datasets/7671e2d17391430ebca3e88e547a0cb0_0/explore?location=38.056548%2C-84.478677%2C13.08)
line work for railways in lex

Impervious Area 2007 (private upload can download as shapefile)
Tree Canopy (private upload can download as shapefile)

Other data sets of interest for another day:
Various school zones (districts?)
Library
Bicycle Network

### Data to find & starting points
A fair amount of data for Lexington can be found in their archub data portal
https://data-lfucg.hub.arcgis.com/ now forever going to be termed LEX

- timeline (this will be more about journal articles or papers/maps referencing the development of Lexington through time)  
- Race/Segregation (Lexington data hub)  
- Median Household income  
- % of population in poverty
- Mortgage approval rates through time (HMDA home mortgage disclosure act data, by census tract number, race of applicant, not sure if approvals/denials within large data set too large to open in excel)
- Percent homeownership today (US census bureau American Community Survey) https://www.census.gov/programs-surveys/acs/data.html
- Property values today (county clerks office?) DP04 census bureau has owner reported value of house!
- Quality of property 
    - NLCD Tree Land Cover data set? or tree coverage from lex
    - impervious surface coverage from LEX
    - distance to park (ref previous project)
    - vacant lots nearby

### Inspiration for maps
- [ramp styling 2020 census by race](https://mtgis-portal.geo.census.gov/arcgis/apps/MapSeries/index.html?appid=2566121a73de463995ed2b2fd7ff6eb7)
- [overalp of holc designations & secondary ramp map](https://mtgis-portal.geo.census.gov/arcgis/apps/MapSeries/index.html?appid=2566121a73de463995ed2b2fd7ff6eb7)


### End of Day Thoughts:
*Determine what years data is available for interest, then select correct census tract that best corresponds to these  
*Open the home mortgage disclosure act to explore what attributes are within.   
*Download other data sets of interest and determine what attributes are within. (use jupyter packages-will need to download packages into the envrionment first)  
*Determine how many parcels are within the tracts and contact fayette county clerks office  
*Overlay census tracts with redlining geoJSON to determine what tracts to pull (quick viz from past QGIS lex_parks project)  
*Find HOLC descriptions for individual grade blocks example D3 or D4 and the exact quote text they use to describe that block
*With every data download from Lexingtons Data portal need to pull the terms of use.  
*EMAIL LOUISVILLE REP about how they found some of their data?  
*Would be interesting to look at impervious surfaces vs tree canopy as a way to analyse simple pleasure of green. look for the demarcation of affluent areas by satellite? or aerial imagery (inspiration Bunge 2011 from map719) green grass on a colored aerial photograph almost perfectly maps the territory of the affluent-their campuses, expressways, downtown business center, parks, and homes. The affluent are surrounded by private parks.   

In [25]:
#exploring Lexingtons 2020 Census data by Race
%matplotlib inline

import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from shapely.geometry import mapping
import shapely.speedups # can help speed up processes
import matplotlib.pyplot as plt
import timeit

# change default figsize
plt.rcParams['figure.figsize'] = (15, 12)

Import data locally and examine amount of time to do so.

In [26]:
%%time
lex_redlined = gpd.read_file('../project-files/KYLexington1936.geojson')

CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)

In [27]:
%%time
lex_race_18up = gpd.read_file('../project-files/Census_2020_Race_18_&_Over_By_Precinct.geojson')

#even if wanted to remove pyproj it's a dependency of one of the required packages.
#error thrown seems to be related to having a mix up of databases being called. 
#documentation describes needing to find and reassign database, but unclear how/where to do so. 

CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)