### Quality of Life Explorer 

##### Import Libraries

- To run this script we need the libraries below
- Pandas : For creating DataFrames
- Numpy:  For computations 
- OS : For modifying working directories 
- censusdata: This is a library for accessing US Census Bureau, we use some of its function to preview variables *
- Census: Main library for accessing the US Census Bureau API *
- pd.set_option allows us to view our entire output instead of it being truncated in jupyter notebook



 * You need to install these packages before you can use. Census = pip install census | censusdata = pip install CensusData

In [1]:
import pandas as pd
import numpy as np
import os
import censusdata as cs
from census import Census
pd.set_option('display.max_rows',None, 'display.max_columns',None,'display.max_colwidth', None)

##### Set Directory 

- Jupyter notebooks have default directories 
- Use os.chdir to set your preferred directory

In [2]:
# path = "C:\\Users\\padu\\Desktop\\UrbanInstitute\\CensusData"
# os.chdir(path)

In [3]:
pwd

'C:\\Users\\padu\\Downloads'

##### Pass your API Key

- Go to this [website](https://api.census.gov/data/key_signup.html 'sign up for your personal API key') and sign up your personal API key 

- Once you get your API key, call the Census function and pass your API key. Eg. Census("API Key") 

In [4]:
c = Census("e0577a26a616f4dda60446eae987e3b6d0d944a3")

#### Preview ACS variable 
- Use the censustable function from censusdata library to preview the census variables
- To preview specify the dataset type eg. ACS year 5 estimates (acs5), then the year of interest(2019), then the table number('B21001')
- Use the printtable function to format the output nicely 

#### Downloading ACS data 
- We use the census library to download ACS variables for the census API. 
- We are able to access the census API because we passed our unique API key in the step above
- To access the data with census library, we have to specify the dataset of interest(acs5) and call the 'get' function
- The get function needs few key parameters including : NAME, ACS variable ID, geography of interest, and year of interest

    - Name: The name parameter reports the census block name
    
    - ACS Variable: The variable has to be specific, Estimates has E at the end, Margion of Error has M 
    
    - Geography: The geography uses a python dictionary.i.e key value pairs. we are interested in all blockgroup in Mecklenburg county. To get those we have to specify the geography using (for:name of geography and asteriks since we want all blockgroups), then for the county we have to specify the State ID (NC ID is 37) and the County ID (Meck County ID is 119), then the year (2019)
    
- The output after running the get function is a dictionary, so we wrap it in pandas DataFrame for further analysis

In [5]:
Years = [2013,2014,2015,2016,2017,2018,2019,2020]
# Years = [2020]

for year in Years:
    print(year)
    

    Data = pd.DataFrame(c.acs5.get(('NAME',
                                    ######################## MEDIAN AGE ##################################
                                    'B01002_001E','B01002_001M',
                                    ################## HISPANIC OR LATINO ORIGIN BY RACE ################
                                    'B03002_006E','B03002_006M','B03002_004E','B03002_004M','B03002_003E',
                                    'B03002_003M','B03002_012E','B03002_012M','B03002_001E','B03002_001M',
                                    ##################EDUCATIONAL ATTAINMENT ##########################
                                    'B15003_023E','B15003_023M','B15003_024E','B15003_024M','B15003_025E',
                                    'B15003_025M','B15003_022E','B15003_022M','B15003_025E','B15003_025M',
                                    'B15003_001E','B15003_001M',
                                    ####################MEDIAN HOUSEHOLD INCOME ##########################
                                    'B19013_001E','B19013_001M',
                                    ############################GROSS RENT ###############################
                                    'B25063_001E','B25063_001M',
                                    ############################TENURE####################################
                                   'B25003_002E','B25003_002M','B25003_001E','B25003_001M'
                                    ########################## POVERTY####################################
                                   # 'B17020_001E', 'B17020_001M','B17020_002E','B17020_002M'
                                   ),{'for':'block group:*',
                                       'in':'state:37 county:119'}, year =year))
    
    Data.rename(columns={
                     ###################################### MEDIAN AGE ################################################,
                    'B01002_001E': 'Median_AgeE','B01002_001M':'Median_AgeM',
                     ############################## HISPANIC OR LATINO ORIGIN BY RACE ###############################,
                     'B03002_006E':'AsianAloneE','B03002_006M':'AsianAloneM','B03002_004E':'BlackAloneE',
                     'B03002_004M':'BlackAloneM','B03002_003E':'WhiteAloneE','B03002_003M':'WhiteAloneM',
                     'B03002_012E':'HispanicAloneE', 'B03002_012M':'HispanicAloneM','B03002_001E':'TotalRaceE',
                     'B03002_001M':'TotalRaceM',
                     ##################################EDUCATIONAL ATTAINMENT ############################,
                     'B15003_022E':'BachelorsDegreeE','B15003_022M':'BachelorsDegreeM','B15003_023E':'MastersDegreeE',
                     'B15003_023M':'MastersDegreeM','B15003_024E':'ProfessionalSchoolDegreeE',
                     'B15003_024M':'ProfessionalSchoolDegreeM','B15003_025E':'DoctorateDegreeE',
                     'B15003_025M':'DoctorateDegreeM','B15003_001E':'TotalEducationAttainmentE',
                     'B15003_001M':'TotalEducationAttainmentM',
                     ########################################MEDIAN HOUSEHOLDINCOME###################################,
                     'B19013_001E':'MedianHouseholdIncomeE','B19013_001M':'MedianHouseholdIncomeM',
                     ########################GROSS RENT ####################################,
                     'B25063_001E':'TotalGrossRentE','B25063_001M':'TotalGrossRentM',
                     ###################################Owneroccupied####################################
                     'B25003_002E':'OwnerOccupiedE', 'B25003_002M':'OwnerOccupiedM',
                     'B25003_001E':'TotalOwnerOccupiedE', 'B25003_001M':'TotalOwnerOccupiedM'
                    ################################### POVERTY ###############################
#                      'B17020_001E': 'TotalBelowPovertyE','B17020_002E': 'BelowPovertyE',
#                      'B17020_001M': 'TotalBelowPovertyM','B17020_002M': 'BelowPovertyM'
            
                    
                    },
           inplace = True)
    
    col = []

    for row in Data.columns:
        if row not in ('NAME','GEO_ID','state','county','tract','block group'):
            col.append(row)

        
#     for row in col:
#         Data[row]= Data[row].apply(np.int64)
    
    Data.replace(-666666666.0,np.nan,inplace= True)
    Data.replace(-222222222.0,np.nan,inplace= True)
    Data.replace(-333333333.0,np.nan,inplace= True)
    
    Data['GEOID'] = (Data['state'] + Data['county'] + Data['tract'] + Data['block group']).astype('int64')
    
    NPA = pd.read_csv('https://raw.githubusercontent.com/MLProject20/Data/main/NPA_Census_Crosswalk.csv')

#     NPA = pd.read_csv('https://raw.githubusercontent.com/MLProject20/Data/main/NPA_Census_Crosswalk_2020.csv')
    
    NPAData = pd.merge(NPA,Data, how = "left", left_on = ['GEOID2'], right_on= ['GEOID'])
    
    Data = NPAData.groupby('NPA').sum()
    
    Data['PercentAsian'] = (Data['AsianAloneE']/Data['TotalRaceE'])
    Data['PercentBlack'] = (Data['BlackAloneE']/Data['TotalRaceE'])
    Data['PercentWhite'] = (Data['WhiteAloneE']/Data['TotalRaceE'])
    Data['PercentHispanic'] = (Data['HispanicAloneE']/Data['TotalRaceE'])
    
    BachelorsE = ['BachelorsDegreeE','MastersDegreeE','ProfessionalSchoolDegreeE','DoctorateDegreeE']


    Data['AdultsWithAtLeastBachelorsA'] = Data[BachelorsE].sum(1)
    
    Data['PercentAdultsWithAtLeastBachelors'] = (Data['AdultsWithAtLeastBachelorsA']/Data['TotalEducationAttainmentE'])
    Data['PercentOwnerOccupied'] = Data['OwnerOccupiedE']/Data['TotalOwnerOccupiedE']
    
    for row in Data: 
        if row.startswith('Per'):
            Data[str(row)+str(year)] = Data[str(row)]*100
            
    Percentages = ['PercentAsian'+str(year), 'PercentBlack'+str(year),'PercentWhite'+str(year),
                   'PercentHispanic'+str(year), 'PercentAdultsWithAtLeastBachelors'+str(year),
             'PercentOwnerOccupied'+str(year)]
            
    Data[Percentages].to_csv('QualityOfLifeExplorer'+ str(year)+'.csv')

    # Data

2013
2014
2015
2016
2017
2018
2019
2020


#### Veterans