# Part 1 - Introduction

### Definitions of 'Urban' and 'Rural'

>The Census Bureau’s urban areas represent densely developed territory, and encompass residential, commercial, and other non-residential urban land uses. The Census Bureau delineates urban areas after each decennial census by applying specified criteria to decennial census and other data.

According to the US Census Bureau, the [2010 Census](https://www.census.gov/geo/reference/ua/urban-rural-2010.html) quantified urban areas as encompassing a minimum of 2,500 people, including at least 1,500 in the surrounding area. The US Census defines two types of urban areas, categorized by population: **Urbanized Areas** (UAs) with 50,000 or greater, and **Urban Clusters** (UCs) with 2,500 up to 50,000. After 2010, [around 500 UAs and 3,000 UCs](https://www.census.gov/geo/reference/ua/uafacts.html) were defined in the US. Around 80 percent of the US population was living within urban areas at this time.  

>'Rural' encompasses all population, housing, and territory not included within an urban area.  

Interestingly, "rural" lacks its own discrete definition. The Census Bureau defines "rural" in terms of the "urban" - as being excluded from the urban area.

Another way of measuring an area's degree of 'urban' is through [**Metropolitan** and **Micropolitan** areas](https://www.census.gov/library/publications/2012/dec/c2010sr-01.html). These are better suited for many inquiries as opposed to overly large states or too many small counties. These alternate areas are "socially and economically integrated groupings of one or more counties" and "provide appropriately detailed geographic analysis as well as good mapping units for a national overview."

### Other Definitions

In England and Wales, the units are called **'Output Areas'**. Here, 'urban' does not have one single definition due to varying criteria. The Office for National Statistics determines 'urban' on the basis of having [greater than 10,000 people](https://www.ons.gov.uk/methodology/geography/geographicalproducts/ruralurbanclassifications/2011ruralurbanclassification). Around 80% of England's population lived in urban areas in 2011. 
Similar to the US, 'rural' is also classified as simply *not being urban* in the UK. There are more categories to describe a rural area (town and fringe, village, hamlet and isolated dwellings), as well as each type being ["sparse" or "less sparse"](https://webarchive.nationalarchives.gov.uk/20160107121407/http://www.ons.gov.uk/ons/guide-method/geography/products/area-classifications/rural-urban-definition-and-la/rural-urban-definition--england-and-wales-/index.html).

A different theoretical definition of urban areas includes not only a high population count, but that population *density* being higher as well. This characteristic is accompanied by the connotation of a large built environment, including skyscrapers, mass transportation, large commercial land use, and bustling city streets. Additionally, just outside of this "urban core" described above, is the theoretical understanding that the greater urbanized area is surrounded by areas of varying degrees of high population, mixed land uses (including some residential), and satellite cities, all of which are connected to the core.  

### Data Source

The data used for this lab is provided by the State of Washington Office of Financial Management's [**Small Area Estimates Program** (SAEP)](https://www.ofm.wa.gov/washington-data-research/population-demographics/population-estimates/small-area-estimates-program).

>The SAEP estimates include both intercensal and postcensal estimates. The data for years 2000-2009 are considered intercensal estimates whereas the data for years 2011-present are considered postcensal estimates. The two estimate series differ in their development and revision cycles but both series are based on 2010 census blocks.  

Three variables are measured during the estimation process: group quarter populations, housing units, and household populations. This demographic data is kept in its un-rounded form as a reminder that the figures are estimates. However, because of these approximations, the data is neither a precise statistic nor a definitive count. This process is repeated yearly, and subsequent data builds off of that of the previous year.  

Potential issues with this dataset come from inherent error in estimation, specifically in regards to caclucations and original data collection.

### Findings

Our definition of urban is greater than 1,000 people per square mile. As a result of our analysis, we calculated that as of 2018 population estimations, 73.06% of the population in Washington is urban and 2.28% of the land area in Washington is urbanized. Additionally, the majority of block groups experienced no change (4716 blocks), only 65 block groups 'urbanized' between 2008 and 2018, while 2 groups de-urbanized in the same time period. 


# Part 2 – Basic Processing with Python


In [38]:
# Imports
import geopandas as gpd
import pandas as pd
import dbfread as dbf

# Reading in shapefile data
saep_bg10 = gpd.read_file("LAB3-Emmi/saep_bg10/saep_bg10.shp")
saep_bg10.head()

# Reading in DBF data and converting into a Panda's dataframe
WashingtonFIPS = dbf.DBF('LAB3-Emmi/WashingtonFIPS.dbf')
fips_df = pd.DataFrame(iter(WashingtonFIPS))
fips_df.head()

# Creating shapefiles and gathering population sums for each county

# Creating empty dataframe to store population sums 
counties = pd.DataFrame(columns=['county_name', 'county_population'])

# Iterate through the fips_df to get individual county name and number
for index, row in fips_df.iterrows():
    fips = row.FIPSCounty
    name = row.CountyName
    
    # Using the fips from fips_df
    # Select using loc to select the rows of saep_bg10 where the 
    # COUNTYFP10 == fips of current county
    # i.e. selects the rows from saep_bg10 for each county
    polygons = saep_bg10.loc[saep_bg10['COUNTYFP10'] == fips]
    
    # Adding the county name and sum of the populations for 2017 from 
    # the polygons made above
    counties.loc[index] = [name, sum(polygons['POP2017'])]
    
    # Creates the filename and adds the polygons for each county to the 
    # shapefile for that county
    filename = "counties/" + name + ".shp"
#     polygons.to_file(filename)
    
# Sorting the counties by the population
pop_list = counties.sort_values(by=['county_population'], ascending=False)
pop_list.head(10)

Unnamed: 0,county_name,county_population
16,King,2153700.0
26,Pierce,859400.0
30,Snohomish,789400.0
31,Spokane,499800.0
5,Clark,471000.0
33,Thurston,276900.0
17,Kitsap,264300.0
38,Yakima,253000.0
36,Whatcom,216300.0
2,Benton,193500.0


Above is a ranked list, in descending order, of the ten largest total populations in 2017 of counties in Washington, according to this dataset.

# Part 3 – Urban vs. Rural

In [39]:
# Setup
import geopandas as gpd
import pandas as pd
import dbfread as dbf

# Reading the shapefile as a GeoDataFrame
wa_blocks = gpd.read_file("saep_bg10/saep_bg10.shp")

# Step 1: Determining urban/rural classification based on population density

# Creating new columns with dummy values
wa_blocks['density'] = None
wa_blocks['urb_or_rur'] = None

for index, row in wa_blocks.iterrows():
    if row['ALANDMI'] == 0: # Some values for the land areas are 0??
        wa_blocks.loc[index, 'density'] = 0 # Avoids dividing by zero problem
    else:
        wa_blocks.loc[index, 'density'] = (row['POP2018'] / row['ALANDMI']) # calculate the density field
            
for index, row in wa_blocks.iterrows():
    if row['density'] >= 1000: # Population density is greater than or equal to 1000 people per square mile
        wa_blocks.loc[index, 'urb_or_rur'] = 'Urban'
    else: # Density is fewer than 1000 per square mile
        wa_blocks.loc[index, 'urb_or_rur'] = 'Rural'
        
# Delete this later!! - just for checking that the calculated columns look okay
# .loc does position-based selection
# first part is the row, second part is the column(s)

#print wa_blocks.loc[10:19,['POP2018', 'ALANDMI', 'density', 'urb_or_rur']]

# Steps 2 and 3: Calculating percentage of the pop/area is urbanized

# New variables for the urban stuff
totalUrbPop = 0
totalUrbArea = 0

# Variables for total figures
totalPop = wa_blocks.POP2018.sum()
totalAr = wa_blocks.ALANDMI.sum()

for index, row in wa_blocks.iterrows():
    if row['urb_or_rur'] == 'Urban':
        totalUrbPop = totalUrbPop + wa_blocks.loc[index, 'POP2018']
        totalUrbArea = totalUrbArea + wa_blocks.loc[index, 'ALANDMI']

print ("Estimation for the percentage of the poulation in Washington that is urbanized is " + str(round(totalUrbPop / totalPop * 100, 2)) + "% of people.")
print ("Estimation for the percentage of urbanized land area in Washington is " + str(round(totalUrbArea / totalAr * 100, 2)) + "%.")

Estimation for the percentage of the poulation in Washington that is urbanized is 73.06% of people.
Estimation for the percentage of urbanized land area in Washington is 2.28%.


In [40]:
# Delete later - just for looking at calculatd columns
#wa_blocks.groupby('urb_or_rur').sum().POP2018
#wa_blocks.loc[wa_blocks['urb_or_rur'] == 'Rural', 'POP2018'].sum()
#print(wa_blocks.loc[[1250], ['POP2018', 'ALANDMI', 'density', 'urb_or_rur']])

# Step 4 setup - looking at previous decade
# Need to calculate 2008 density and categories first

# Creating new columns with dummy values
wa_blocks['2008density'] = None
wa_blocks['2008urb_or_rur'] = None
wa_blocks['block_change'] = None

for index, row in wa_blocks.iterrows():
    if row['ALANDMI'] == 0: # Land area field is 0
        wa_blocks.loc[index, '2008density'] = 0 # Avoids dividing by zero problem
        wa_blocks.loc[index, '2008urb_or_rur'] = 'Rural' # Automatically classified as rural
    else:
        wa_blocks.loc[index, '2008density'] = (row['POP2008'] / row['ALANDMI']) # Calculates density from population and land area
        if wa_blocks.loc[index, '2008density'] >= 1000: # Density greater than 1000
            wa_blocks.loc[index, '2008urb_or_rur'] = 'Urban'
        else: # Density less than 1000 per square mile
            wa_blocks.loc[index, '2008urb_or_rur'] = 'Rural'
            
# Step 4 getting the categorical change

for index, row in wa_blocks.iterrows():
    c2018 = row['urb_or_rur']
    c2008 = row['2008urb_or_rur']
    
    if c2008 == c2018: # Same category for both years
        wa_blocks.loc[index, 'block_change'] = 'No Change in Category'
    elif c2008 == 'Urban' and c2018 == 'Rural': # Changed from Urban to Rural
        wa_blocks.loc[index, 'block_change'] = 'De-Urbanized'
    elif c2008 == 'Rural' and c2018 == 'Urban': # Changed from Rural to Urban
        wa_blocks.loc[index, 'block_change'] = 'Urbanized'
        
# Step 5

# Variables for totals
urb_grps = 0
deurb_grps = 0
nochange = 0

for index, row in wa_blocks.iterrows():
    if row['block_change'] == 'De-Urbanized':
        deurb_grps = deurb_grps + 1
    elif row['block_change'] == 'Urbanized':
        urb_grps = urb_grps + 1
    else: # no change
        nochange = nochange + 1
        
print (str(urb_grps) + " block groups 'urbanized' between 2008 and 2018, while " + str(deurb_grps) + " groups de-urbanized in the same time period. \n" + str(nochange) + " block groups had no change in category.")

65 block groups 'urbanized' between 2008 and 2018, while 2 groups de-urbanized in the same time period. 
4716 block groups had no change in category.


In [41]:
# Delete later - shows the different categories and their respective counts

wa_blocks['block_change'].value_counts()

No Change in Category    4716
Urbanized                  65
De-Urbanized                2
Name: block_change, dtype: int64

## Visualizing Urbanization In Washington State

In [42]:
#Imported maps see "lab3_working_p3" under Dani's branch for the code for the following 4 static maps

Here the values in the legend are people per square mile.

![alt text](2018_popdensity.png "Title")
![alt text](2008_popdensity.png "Title")

Here the values in the legend are 1 is reprentative of Urban blocks and 0 is representative of non-urban blocks.

![alt text](2018_urbanstatus.png "Title")
![alt text](2008_urbanstatus.png "Title")

In [51]:
# folium interactive map
import folium

# Changed from nominal to numerical values from step 4 

for index, row in wa_blocks.iterrows():
    c2018 = row['urb_or_rur']
    c2008 = row['2008urb_or_rur']
    
    if c2008 == c2018: # Same category for both years
        wa_blocks.loc[index, 'block_change_1'] = 0
    elif c2008 == 'Urban' and c2018 == 'Rural': # Changed from Urban to Rural
        wa_blocks.loc[index, 'block_change_1'] = -1
    elif c2008 == 'Rural' and c2018 == 'Urban': # Changed from Rural to Urban
        wa_blocks.loc[index, 'block_change_1'] = 1

geo_data = wa_blocks[['geometry', 'GEOID10']]
geo_data.head()

m = folium.Map([47, -120], zoom_start=6, tiles='cartodbpositron')

# m.choropleth(
#     geo_data=test,
#     fill_color='YlGn',
#     data=wa_blocks,
#     key_on='feature.properties.GEOID10'
# )
# folium.GeoJson(wa_blocks).add_to(m)


m.choropleth(
    geo_data=geo_data,
    data=wa_blocks,
    columns=['GEOID10', 'block_change_1'],
    key_on='feature.properties.GEOID10',
    fill_color='RdBu',
    line_weight=0.0
)


In [None]:
m.save(outfile='DecadeofChange.html')

[This](DecadeofChange.html) Map displays a decade of change in urbanization in Washington state. The data used for this map is comparing urban blocks in 2008 and urban blocks in 2018. Light blue (0) is no change from 2008-2018, red (-1) is de-urbanization from 2008-2018, and dark blue (1) is non-urban to urban from 2008-2018.