# Mattox Capstone Project Outline
#### An Examination of the Impacts of Sex Offender Residence Restrictions in St. Louis, MO

### 1. Initial Data Ingest, Carpentry, and Database Loading
Pull in all the datasets we will need, transform the data into a format that will facilitate our later analysis, and then store the results in a database for easy access later.

[Public School Shapefile](Public_Schools.ipynb)  
[Private School Shapefile](Private_Schools.ipynb)  
[Childcare Facility Spreadsheet](Childcare_Facilities.ipynb)  
[Sex Offender Registry Spreadsheet](MSOR.ipynb)  
\- [Fixing Entries That Failed Geocoding](Failed_Geocoding_MSOR.ipynb)  
[Parcel Value](Parcel_Value.ipynb)

(no longer necessary) [Zoning Shapefile + Zoning Codes](Zoning.ipynb) 

### 2. Combine Data Layers
Load our processed geodata from our PostGIS database, then work to get new data layers that paint a picture of the impacts we're working to assess.

[Geodata Fusion](Geodata%20Fusion.ipynb)  

### 3. Conduct Analysis on Processed Data
Look into the results of combination and geodata fusion work to quantify the impacts of sex offender residence restrictions.

[Analysis - Residential Area](Analysis%20-%20Residential%20Area.ipynb)  
[Analysis - Sex Offender Potential Noncompliance](Analysis%20-%20SO%20Potential%20Noncompliance.ipynb)

### 4. View Results
[Map of St. Louis Sex Offender Locations and Status](stl_so.html)

---

## To Do

**Start working with the data**  
✓ Reduce public school data to STL only  
✓ Reduce private school data to STL only  
✓ Initial carpentry for childcare facilities  
---- Reduce to STL only  
---- Convert address info to geocode-compatible format  
---- Geocode addresses to get lat/lon  
☐ Examine childcare facility entries that failed geocoding  
---- Clean up text addresses for better compatability with geocoder    
---? Would it be better to manually fix these ~two dozen items?  
✓ Initial carpentry for sex offender registry  
---- Reduce to STL only   
---- Convert address info to geocode-compatible format  
---- Geocode addresses to get lat/lon  
☐ Remove sex offender resgistry entries that occur more than once (one person, multiple crimes)  
---- Columns to consider: name, address, city, date of birth  
✓ Examine sex offender registry entries that failed geocoding  
---- Clean up text addresses for better compatability with geocoder   
---- Run in two passes:  
---- 1. Remove floor, apartment, etc., then run through geocoder  
---- 2. Remove AVE, RD, DR, etc. along with cardinal directions, then run through geocoder again  
✓ Merge zoning shapefile data with codes  
---- Classify as residential/non-residential  
---- Maintain original (translated) categorization

**Plot points on interactive map (folium)**  
✓ Plot public schools    
✓ Plot private schools  
✓ Plot childcare facilities  
✓ Plot sex offender locations

**Expand point data to reflect restricted area (circular buffer)**  
✓ Public schools - CRQ conversion/understanding  
✓ Public schools - expand to circle  
✓ Private schools - CRQ conversion/understanding  
✓ Private schools - expand to circle  
✓ Childcare facilities - CRQ conversion/understanding  
✓ Childcare facilities - expand to circle  

**Back up project to GitHub**  
✓ Set up a GitHub account  
✓ Figure out how to connect my Mizzou work to my GitHub repo  
✓ Commit/push my code & data to GitHub

**Load geodata into postgres/GIS database**  
✓ DB test completed in standalone notebook (access, load, retrieve)  
✓ Public school geodata loaded  
✓ Private school geodata loaded  
✓ Childcare facility geodata loaded  
☐ Append once-failed, now-successful childcare facilities into DB  
✓ Sex offender geodata loaded  
✓ Append once-failed, now-successful sex offender registry entries    
---- Pass \#1  
---- Pass \#2  
✓ Zoning data loaded  
✓ Parcel data loaded  

**Fuse geodata**  
✓ Fuse all restricted together to create a simplified view  
✓ Load fused data into PostGIS  
✓ Subtract restricted areas from residential  
✓ Load end results into PostGIS   
✓ Find sex offenders residing in restricted area (potentially noncompliant) vs. outside restricted area (compliant)   
✓ Create mask of St. Louis City i.e. polygon of MO & IL with STL removed  
---- Save as shapefile

**Cost data**  
✓ Find good source of price (rent and/or real estate purchase) data  
✓ Combine value assesment data with parcel geometry  
✓ Merge with zoning info  
✓ Downselect to residential data only   
---- Use residential zoning polygons as bounding boxes  
✓ Investigate 0 value RESIDENTIAL parcels  
✓ Use parcel data exclusively (no more zoning data)?   
---- NumResBldgs as basis  
------ Need to compare to zoning  
✓ Load parcel data into PostGIS  
✓ Subtract restricted areas from residential parcels  
☐ Separate out apartments vs. single-person homes  
---- Set value threshold, compare before/after  
---- Also check zoning codes  
☐ Map property value to some annual cost of living
---- What is residual penalty?  
------ E.g. SOs pay XX% more property tax, etc.    

**Analyze results of geo work**  
✓ Quantify total residential area    
✓ Quantify total restricted area   
✓ Quantify residental ZONE area available to sex offenders (total res - total restricted)  
✓ Quantify residental PARCEL area available to sex offenders (total res - total restricted)  
✓ Quantify potentially non-compliant sex offenders (within restricted area; filter/group by offense/level)  
✓ Assess value statistics of entire dataset (all residential) vs. non-restricted residential  
---- Mean & median value, etc.  

**Retool MSOR data source and pipeline**  
✓ Switch to `msor_offense.xlsx` as source  
✓ Load in new source  
✓ Filter down to STL  
✓ Generate `randomid`  
✓ Remove entries with no address (`address` == None)  
✓ Run through geocoder  
✓ Output ALL results to CSV (so I don't have to geocode again)  
✓ Filter down to successful geocodes  
✓ Store success in DB table  
✓ Store failures in a different DB table  
✓ Run failures through [Failed_Geocoding_MSOR.ipynb](Failed_Geocoding_MSOR.ipynb)

**Pinnacle output: Folium map**  
✓ St. Louis City mask  
✓ Restricted zones  
✓ Compliant sex offenders w/ lablels (crime, tier, maybe ID)  
✓ Potentially noncompliant sex offenders w/ labels (crime, tier, maybe ID)  
✓ Custom icons  
✓ Layer/group toggles  
✓ Popup size (max_width) and multi-line display  
✖ Icon sizes?  
---- May not be possible as it's based on the native JPG/PNG size for the icon  
✓ Output to HTML  
✓ Apply custom page title for standalone viewing  
✓ Dig deeper into potentially non-compliant sex offenders  
---- Spatial (map) layer of SO tier (w/ compliant vs noncompliant)  
✓ Provide more options for exploring restricted areas  
---- Spatial (map) layer of restricted area by source type (public school, private school, childcare facility)  
------ Maybe plot the school (point) locations with some kind of school-esque icon  
☐ Add year of offense for space time plot?  
---- Does the SO registry have `offense_year`?  
------ This info exists, but in `msor_offense.xlsx` rather than `msor.xlsx`. Would need to change to this other input and re-run geocoding (no primary/foreign key to tie to).  
---- Show time slider that lays on SOs added each year  
------ Probably doable with `TimestampedGeoJson`  
------ Need to switch to `msor_offense.xlsx` first  
------ https://towardsdatascience.com/visualizing-nyc-bike-data-on-interactive-and-animated-maps-with-folium-plugins-c2d7645cd19b  
☐ Spatial clusters for SO locations?  
---- Look at associated demographics E.g. low income areas, near train tracks or other undesirable features  
✓ Find easy hosting option that makes this available to audience  
---- Copy to `~/public_html` (use terminal)  
---- From directory with HTML file: `cp stl_so.html ~/public_html`  

**Pinnacle output: Presentation (PPTX/slideshow)**  
-- Turn key notebooks into charts  
---- 30 min presentation  
✓ Intro to self (education, jobs)  
☐ Overall question  
✓ Summary of the data (sources, formats, quality)    
☐ Challenges & how overcame  
☐ Visualization & statistics (can break out to HTML page)  
☐ Outcomes, how it answers the question  
☐ Next steps (where to go from here, what could the next person do)  
☐ Animate where appropriate  

**Opportunities for additional cleanup and bonus work**  
✓ Create project outline notebook    
✓ Learn how to identify idle database connections  
✓ Learn how to KILL idle database connections  
✓ Update area subtraction to exclude any parcel polygons that overlap any part of a restricted zone   
---- https://gis.stackexchange.com/questions/418283/find-and-remove-overlapping-polygons-with-geopandas  
✓ Explore and test `to_postgis(if_exists='append')`  
✓ How to move files with git + server combo?  
☐ Clean up Test_Folium_Mask notebook and update links to it    
☐ Database improvement: one table for all flat/dissolved geometry  
---- Need to add columns for name, description  
✓ Mask everything outside STL City on folium plots  
---- Grey fill on map of MO + IL, subtract STL City from this  
☐ Work up a small polygon AOI for closer inspection and streamline a process to trim data down to this area  
☐ Examine parcels that are in residential zones but do not have any residential buildings  
---- Find the difference/overlay outer between parcels and residential zoning  
---- Codes, types, building examples  

---

## Database Table Summary Info

PostGIS database `cappsds_psmd39` on `pgsql.dsa.lan` contains the following tables:  

`country_borders`  
&emsp; _default inclusion_     
 `gadm_admin_borders`  
 &emsp; _default inclusion_  
 `geonames_feature`  
 &emsp; _default inclusion_  
 `msorfailedgeocoding`  
 &emsp; Items from the Missouri Sex Offender Registry that failed geocoding.  
 `msorfailedgeocodingv2`  
 &emsp; Items from the Missouri Sex Offender Registry that failed geocoding even after the initial clean-up. This is a subset of `msorfailedgeocoding`.   
 `spatial_ref_sys`  
 &emsp; _default inclusion_   
 `stlchildcare`  
 &emsp; Childcare facilities in St. Louis with restricted buffer circles (1,000ft. radius) applied   
 `stlnonrestrictedresparcels`   
 &emsp; Polygons of residential parcels and remaining area in St. Louis that sex offenders are able to reside in.   
 &emsp; That is, `stlresparcels` - `stlrestrictedflat`.  
 `stlnonrestrictedreszones`   
 &emsp; (Multi)polygons of zone area in St. Louis that sex offenders are able to reside in.   
 &emsp; That is, (residential component of `stlzoning`) - (`stlrestrictedflat`).  
 `stlpubschools`   
 &emsp; Public schools in St. Louis with restricted buffer circles (1,000ft. radius) applied   
 `stlpvtschools`  
 &emsp; Private schools in St. Louis with restricted buffer circles (1,000ft. radius) applied   
 `stlresparcels`  
 &emsp; Polygons of residential parcels (i.e. individual lots of property) in St. Louis. Also includes identifiers (handle, parcel ID, GIS parcel number) and multiple value columns (assessed total, billing total, land appraisal). Based on in the inclusion of at least one residential building within the parcel.  
 `stlrestrictedflat`  
 &emsp; Multipolygon of combined restricted areas.   
 &emsp; That is, `stlchildcare` + `stlpubschools` + `stlpvtschools`, which was then dissolved.  
 `stlsexoffenders`  
 &emsp; Point (lat/lon) locations of registered sex offenders in St. Louis.  
 `stlzoning`  
 &emsp; (Multi)polygons of zones (e.g. residential, commercial, unrestricted) in St. Louis