# GeoSpatial Public Policy Analysis - PyCon 2020
#### A repository containing the materials for the tutorial Geospatial Public Policy Analysis with GeoPandas for PyCon 2020. To be delivered virtually.

<details>
    <summary><strong>Goal</strong></summary>
    The goal of this notebook is to process NCES IPEDS tabular data into an analytical file (subset) that will be used in another notebook to visualize education deserts. 
    <ul>
        <li> Measurable goals for this notebook:</li>
        <li> 1. Identify <i>variables of interest</i> in dataset <strong>processed_data.csv</strong></li>
        <li> 2: Create a <i>working dataset</i> from dataset containing those variables of interest.</li>
    </ul>
</details>

<details>
    <summary><strong>Context</strong></summary>
    We've downloaded raw data from NCES IPEDS <a href="https://nces.ed.gov/ipeds/use-the-data/"><strong>https://nces.ed.gov/ipeds/use-the-data/</strong></a>. It has been pre-processed slightly for the purposes of this workshop. 
    <li>This notebook will create an analytical file to be used in the 01_Data_Visualization notebook.</li>
    <li>The pre-processed masterfile contains coordinates of higher education institutions as well as other institutional characteritics.</li>
    <li>The pre-processed mssterfile also contains graduation rate, demographic, and enrollment statistics.</li>
</details>

In [2]:
import pandas as pd
from pathlib import Path
from tools import tree
from datetime import datetime as dt
today = dt.today().strftime("%d-%b-%y")

today

'14-Apr-20'

In [3]:
RAW_DATA = Path("../data/raw/")
INTERIM_DATA = Path("../data/interim/")
PROCESSED_DATA = Path("../data/processed/")
FINAL_DATA = Path("../data/final/")
EXTERNAL_DATA = Path("../data/external/")

In [5]:
tree(EXTERNAL_DATA)

+ ../data/external
    + 4-8-2020---748.zip
    + processed
            + acs5_2018_race_counties-checkpoint.csv
        + acs5_2018_race_counties.csv
    + raw
        + acs5_2018_race_counties.csv


In [6]:
data = pd.read_csv(EXTERNAL_DATA / 'processed' / 'acs5_2018_race_counties.csv')
data.head().T

Unnamed: 0,0,1,2,3,4
geoid,1001,1003,1005,1007,1009
name,"Autauga County, Alabama","Baldwin County, Alabama","Barbour County, Alabama","Bibb County, Alabama","Blount County, Alabama"
universe,55200,208107,25782,22527,57645
universe_annotation,,,,,
universe_moe,-5.55556e+08,-5.55556e+08,-5.55556e+08,-5.55556e+08,-5.55556e+08
universe_moe_annotation,*****,*****,*****,*****,*****
white_alone,41412,172768,11898,16801,50232
white_alone_annotation,,,,,
white_alone_moe,59,227,22,22,229
white_alone_moe_annotation,,,,,


In [11]:
voi = [col for col in data.columns if "moe" not in col and "annotation" not in col]
voi

['geoid',
 'name',
 'universe',
 'white_alone',
 'black_alone',
 'american_indian_and_alaska_native',
 'asian_alone',
 'native_hawaiian_and_pacific_islander',
 'other_alone',
 'two_or_more_races',
 'latino_alone',
 'state',
 'county',
 'asians_all',
 'other_all']

In [13]:
working_data = data[voi].copy()

In [14]:
working_data.head().T

Unnamed: 0,0,1,2,3,4
geoid,1001,1003,1005,1007,1009
name,"Autauga County, Alabama","Baldwin County, Alabama","Barbour County, Alabama","Bibb County, Alabama","Blount County, Alabama"
universe,55200,208107,25782,22527,57645
white_alone,41412,172768,11898,16801,50232
black_alone,10475,19529,12199,4974,820
american_indian_and_alaska_native,159,1398,63,8,124
asian_alone,568,1668,85,37,198
native_hawaiian_and_pacific_islander,5,9,1,0,18
other_alone,41,410,86,0,174
two_or_more_races,1012,2972,344,160,818


In [15]:
working_data.to_csv(PROCESSED_DATA / 'counties.csv', encoding = 'utf-8', index = False)