<a id="section3"></a>
## 1.3 Mapping the ACS Data

In order to map the ACS data it needs to be geospatial data. Since the data are aggregated to census tracts, we will join the ACS data with the census tract geographic data for our county.

### About Census Geographic Data:

There are two main types of census geographic data products: 
- TIGER/Line Files 
  - contain detailed geometry, big files
  - not pretty for mapping
  - good for spatial analysis
  - have a `tl` (as in `T`IGER/`L`ine) in the filename when downloaded from Census web or FTP site.
    - e.g., tl_2018_06_tract.zip
    
  
- [Cartographic Boundary files](https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file.html): 
  - smaller file sizes, 
  - made specifically for mapping,
  - have a `cb` in the file name when downloaded from Census web or FTP site
      - e.g., cb_2018_06_tract_500k.zip
  - have a mapping resolution at the end of the file name, 
    - eg `_500k` files look best around 1:500K map scale
  
### Several ways to obtain Census Geographic data

1. Fetch via API - although not all years may be available.
2. Download from Census website or FTP site
3. Download from another website like [NHGIS.org](https://nhgis.org)

### What files should you download?

Census tract geographic data files are updated frequently to improve the quality of the spatial data, but the most significant updates happen for all census geographies just before the decennial census.

When mapping or spatially analyzing ACS 5 year data, download the geographic files with the same year as the end date as the ACS5 year data you are analyzing.

For example, we can use the following URLs to download 2013 and 2018 census tracts for California.

- Cartographic Boundary File for CA Census Tracts, 2013: [https://www2.census.gov/geo/tiger/GENZ2013/cb_2013_06_tract_500k.zip](https://www2.census.gov/geo/tiger/GENZ2013/cb_2013_06_tract_500k.zip)
  - suitable for mapping ACS 5 year 2009 - 2013 data: 


- Cartographic Boundary File for CA Census Tracts, 2018: [https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_06_tract_500k.zip](https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_06_tract_500k.zip)
  - suitable for mapping ACS 5 year 2014 - 2018 data: 


#### ESRI Shapefiles

These census tract files are made available in the [ESRI Shapefile](https://en.wikipedia.org/wiki/Shapefile) format, along with other formats.

An ESRI Shapefile is actually a collection of 3 to 9+ files that together are called a shapefile. Although this is a old file format with numerous limitations, it remains the most commonly used file format for vector spatial data. 


### Census tract data

We are ready to read in the census tract data for CA using the Geopandas `read_file` function.

- Specifially, we will read in the `2018 cartographic boundary files` for CA census tracts. 

In [None]:
# Import CA census tracts data
tracts_gdf = gpd.read_file("zip://../notebook_data/census/Tracts/cb_2018_06_tract_500k.zip")

And take a look...

In [None]:
tracts_gdf.head(2)

### The GeoPandas GeoDataFrame

A [GeoPandas GeoDataFrame](https://geopandas.org/data_structures.html#geodataframe), or `gdf` for short, is just like a pandas dataframe (`df`) but with an extra geometry column and methods & attributes that work on that column. I repeat because it's important:

> `A GeoPandas GeoDataFrame is a pandas DataFrame with a geometry column and methods & attributes that work on that column.`

> This means all the methods and attributes of a pandas DataFrame also work on a Geopandas GeoDataFrame!!


How cool is that to see the geometry! Desktop GIS software like `QGIS` and `ArcGIS` hide the geometry from the user. Not so with GeoPandas. 

### Geopandas Geometries
There are main types of geometries that can be associated with your geodataframe: points, lines and polygons:

<img src ="https://datacarpentry.org/organization-geospatial/fig/dc-spatial-vector/pnt_line_poly.png" width="450"></img>

In the geodataframe these geometries are encoded in a format known as [Well-Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry). For example:

> - POINT (30 10)
> - LINESTRING (30 10, 10 30, 40 40)
> - POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
>
> *where coordinates are separated by a space and coordinate pairs by a comma*

Your geodataframe may also include the variants **multipoints, multilines and multipolgyons** if the row-level feature of interest is comprised of multiple parts. For example, a geodataframe of states, where one row represents one state, would have POLYGON geometry for Utah but MULTIPOLYGON for Rhode Island, which includes many small islands.

> It's ok to mix and match geometries of the same family, e.g., POLYGON and MULTIPOLYGON, in the same geodatafame.

You can check the types of geometries in a geodataframe or a subset of the geodataframe by combining the `type` and `unique` methods.


In [None]:
tracts_gdf['geometry'].type.unique()

### Plotting a Geodataframe
Let's now go ahead and use the GeoPandas gdf `plot` method to map all of our tracts.

In [None]:
# Plot the gdf
tracts_gdf.plot()

> ### Wow! How cool is that?

### Select Census Tracts for Alameda County

We want to subset the tracts to get the data for Alameda county. In order to do this, let's first check what variables we have and what the data looks like.

In [None]:
tracts_gdf.head(3)

In [None]:
tracts_gdf.columns

Here's what each variable means:
- `STATEFP`: State FIPS code 
- `COUNTYFP`: County FIPS code
- `TRACTCE`: Census tract code
- `AFFGEOID`: Summary level code + geovariant code + '00US' + GEOID
- `GEOID`:  Census tract identifier; a concatenation of Current state FIPS code, county FIPS code, and census tract code
- `NAME`:  Census tract name
- `LSAD`:  Legal/statistical description with the census tract name
- `ALAND`: Area that is land, in square meters
- `AWATER`:  Area that is water, in square meters
- `geometry`: Geometry of tract

Let's take a closer look at the county identifiers.

In [None]:
# Are the county codes
tracts_gdf['COUNTYFP'].unique()

Since the county code for Alameda County is `001`, let's subset our data using that knowledge so we can focus on our area of interest.

In [None]:
tracts_gdf_ac = tracts_gdf[tracts_gdf['COUNTYFP']=='001']
tracts_gdf_ac.plot()
plt.show()

Nice! Looks like we have what we were looking for.

*FYI*: You can also make dynamic plots of one or more county without saving to a new gdf.

In [None]:
# Dynamic plot of the census tracts for the 10 County Bay Area
# Alameda, Contra Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Santa Cruz, Solano, Sonoma
tracts_gdf[tracts_gdf['COUNTYFP'].isin(['001','013','041','055','075','081', '085','087','095','097'])].plot()

<img src ="https://i.ytimg.com/vi/C9J1p6kO9VA/maxresdefault.jpg" height="200" width="800">


#### Exercise

Now do this for the SF tracts data:
1. Subset to SF county, assign to `tracts_gdf_sf`
2. Plot the tracts 
3. <img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="20" align=left >  Answer this question: What's weird about our plot?

In [None]:
# Your code here

*Click here for solution*

<!--- 
    # SOLUTION
    # 1. Subset to SF county, assign to `tracts_gdf_sf'
    tracts_gdf_sf = tracts_gdf[tracts_gdf['COUNTYFP']=='075']
    # 2. Plot
    tracts_gdf_sf.plot()
    plt.show()

    # 3. Answer this question: What's weird about our plot?
--->

<img src ="https://s.hdnux.com/photos/61/50/04/13009196/3/920x920.jpg" height="400" width="400">

Our SF tract map seems off because it includes the [Farallon Islands](https://en.wikipedia.org/wiki/Farallon_Islands). These are not inhabitated (so population=0)!

In [None]:
# 1. Subset to SF county, assign to `tracts_gdf_sf'
tracts_gdf_sf = tracts_gdf[tracts_gdf['COUNTYFP']=='075']

# 2. Plot
tracts_gdf_sf.plot()


Take a look at the gdf with `head` to see if we have a column to use to filter out the Farrallon Islands.

In [None]:
tracts_gdf_sf.head(2)

<a id="section4"></a>
## 1.4 Spatial Subsetting

We could filter the Farallon Islands out if we knew their census tract geographic identifier, or `GEOID`.

Geopandas offers another way. We can use the values in the `geometry` column to `spatially subset` our data.

One way to do this with the geodataframe [cx](https://geopandas.org/indexing.html) method which spatially selects rows whose geometry intersects a specified bounding box.

In [None]:
# Uncomment to view help docs
#tracts_gdf_sf.cx?

For the `cx` method we need to specify the bounding coordinates as follows:
<pre>
tracts_gdf_sf.cx[xmin:xmax, ymin:ymax]
</pre>
We can define a bounding box around the city of San Francisco to select only those census tracts.
- You can find the coordinates for this bounding box by making a quick plot of the gdf.

In [None]:
tracts_gdf_sf.plot()

The coordinate bounds of the data are shown on the map X and Y axes.
- The ymin (south) and ymax (north) coordinates look good, as does the xmax (east) coordinate. 

- The xmin (west) coordinate needs to be adjusted. 

You can try a few values before you spatially subset the data.

In [None]:
tracts_gdf_sf.cx[-122.45:-122.35, 37.65:37.85].plot()

That's not great. But what does it tell you about how `cx` works?

Try this..

In [None]:
tracts_gdf_sf.cx[-122.8:-122.35, 37.65:37.85].plot()

That looks good. When you are ready to subset, you can overwrite the input dataset.
- If you make a mistake, that's ok. Just rerun the previous code to get the SF census tract data.

When you are ready to save the clip...

In [None]:
tracts_gdf_sf= tracts_gdf_sf.cx[-122.8:-122.35, 37.65:37.85].copy().reset_index(drop=True)

In [None]:
# Take a look
tracts_gdf_sf.plot()
plt.show()

Once we combine our tract data with the ACS data we can subset the data based on population greater than zero. 

But, with just the census tract columns, what could we use to subset the data to remove those tracts?

<a id="section4"></a>
## 1.4 Spatial Subsetting

We could filter the Farallon Islands out if we knew their census tract geographic identifier, or `GEOID`.

Geopandas offers another way. We can use the values in the `geometry` column to `spatially subset` our data.

One way to do this with the geodataframe [cx](https://geopandas.org/indexing.html) method which spatially selects rows whose geometry intersects a specified bounding box.

In [None]:
# Uncomment to view help docs
#tracts_gdf_sf.cx?

For the `cx` method we need to specify the bounding coordinates as follows:
<pre>
tracts_gdf_sf.cx[xmin:xmax, ymin:ymax]
</pre>
We can define a bounding box around the city of San Francisco to select only those census tracts.
- You can find the coordinates for this bounding box by making a quick plot of the gdf.

In [None]:
tracts_gdf_sf.plot()

The coordinate bounds of the data are shown on the map X and Y axes.
- The ymin (south) and ymax (north) coordinates look good, as does the xmax (east) coordinate. 

- The xmin (west) coordinate needs to be adjusted. 

You can try a few values before you spatially subset the data.

In [None]:
tracts_gdf_sf.cx[-122.45:-122.35, 37.65:37.85].plot()

That's not great. But what does it tell you about how `cx` works?

Try this..

In [None]:
tracts_gdf_sf.cx[-122.8:-122.35, 37.65:37.85].plot()

That looks good. When you are ready to subset, you can overwrite the input dataset.
- If you make a mistake, that's ok. Just rerun the previous code to get the SF census tract data.

When you are ready to save the clip...

In [None]:
tracts_gdf_sf= tracts_gdf_sf.cx[-122.8:-122.35, 37.65:37.85].copy().reset_index(drop=True)

In [None]:
# Take a look
tracts_gdf_sf.plot()
plt.show()