# 1. Introduction to GeoPandas

Welcome back! We're diving into using a popular Python package, `GeoPandas`, so we can start looking at our data spatially! In this notebook we'll be covering the following topics:

- [1.1 Introduction](#section1)
    - GeoPandas and Geospatial Data in Python
- [1.2 Data Preparation](#section2)
    - Reading in and writing out data as csv
    - Preprocessing the ACS 5 year data
- [1.3 Mapping Census Tracts](#section3)
    - Learning about census tracts data
    - Reading in shapefiles with Geopandas
    - Exploring a GeoDataFrame
    - Mapping geospatial data stored in a GeoDataFrame
- [1.4 Spatial Subsetting](#section4)
    - Subsetting by bounding box coordinates
- [1.5 Attribute Joins](#section5)
    - Joining a pandas DataFrame to a GeoPandas GeoDataFrame
- [1.6 Data Driven Mapping](#section6)
    - Types of thematic mapping
    - Reading in and writing out spatial data in different file formats (e.g., shapefile, csv, geojson)
- [1.7 Coordinate Reference Systems (CRS)](#section7)
    - Handling CRS in GeoPandas (i.e., getting, setting, transforming)
    - Spatial measurement calculations (area, length)
- [1.8 Recap](#section8)
- [1.9 Homework](#section9)
- [1.10 References](#section10)
    



**INSTRUCTOR NOTES**:
- Datasets used:
    - "../notebook_data/census/ACS5yr/census_variables_CA.csv"
    - "../notebook_data/census/Tracts/cb_2018_06_tract_500k.zip"


- Expected time to complete:
    - Lecture + Questions: 1.5 hours
    - Homework: 40 minutes
    
---


<a id="section1"></a>
## 1.1 Introduction

The goal of this notebook is to give you a **tip of the iceberg introduction** to working with geospatial data in Python using the **GeoPandas** package.  

> #### Assumptions
> This lesson assumes you have basic working knowledge of Python and of geospatial data. If you need a geospatial refresher, we refer you to these freely available online resources:
> - The Open Textbook Library: [Essentials of Geographic Information Systems by Jonathan E. Campbell and Michael Shin](http://open.umn.edu/opentextbooks/BookDetail.aspx?bookId=67)
> - The Open Textbook Library: [Nature of Geographic Information Systems by David DiBiase](http://open.umn.edu/opentextbooks/BookDetail.aspx?bookId=428) from Esri.
> - Online Gitbook: [Intro to GIS and Spatial Analysis by Manuel Gimond](https://mgimond.github.io/Spatial/index.html)


#### Terminology

Just so we are on the same page..

- `Geographic data` is data about locations on or near the surface of the Earth.

- `Geospatial data`  is geographic data that can be explictly located on the surface of the Earth because it contains coordinates like latitude and longitude.

- `Spatial data` is a more generic term that includes geospatial data as well as other kinds of spatial data.
 
 
### GeoPandas and related Geospatial Packages

[GeoPandas](http://geopandas.org/) is a relatively new package that makes it easier to work with geospatial data in Python. In the last few years it has grown more powerful and stable. This really is great because previously it was quite complex to work with geospatial data in Python. GeoPandas is now the go to package for working with `vector` geospatial data in Python. 

> **Pro-tip**: If you work with `raster` data you will want to checkout the [rasterio](https://rasterio.readthedocs.io/en/latest/) package. We will not cover raster data in this tutorial.

### GeoPandas = pandas + geo
GeoPandas gives you access to all of the functionality of [pandas](https://pandas.pydata.org/), which is the primary data analysis tool for working with tabular data in Python. GeoPandas extends pandas with attributes and methods for working with geospatial data.


### Import Libraries

Let's start by importing the libraries that we will use.

In [None]:
import pandas as pd
import geopandas as gpd

import matplotlib # base python plotting library
import matplotlib.pyplot as plt # submodule of matplotlib

# To display plots, maps, charts etc in the notebook
%matplotlib inline  


<a id="section2"></a>
## 1.2 Data preparation

In this lesson we will use ACS and census tract data to demonstrate how to work GeoPandas. Data for Alameda County as our primary example.


<img src ="https://upload.wikimedia.org/wikipedia/commons/thumb/9/95/CampanileMtTamalpiasSunset-original.jpg/1280px-CampanileMtTamalpiasSunset-original.jpg" height="100" width="400"> 
        

As you are probably aware, Berkeley (and of course the University) are located in Alameda County. 

### American Community Survey 5 Year Data (or ACS5)

To get started, let's read the ACS 5 year data for California tracts into a `dataframe` using the  `pandas read_csv` method. 

As we read in the ACS data we will tell pandas to make sure that the data in the column `FIPS_11_digit` is read in as a string to preserve leading zeros in the census tract identifiers.

In [None]:
# Read in the ACS5 data for CA into a pandas DataFrame.
# Note: We force the FIPS_11_digit to be read in as a string to preserve any leading zeroes.
acs5data_df = pd.read_csv("../notebook_data/census/ACS5yr/census_variables_CA.csv", dtype={'FIPS_11_digit':str})

Pandas provides a number of methods to view information about a dataframe.

The pandas dataframe attribute `shape` tells us the number of rows and columns in the dataframe.

In [None]:
# Take a look at the shape of the dataframe
acs5data_df.shape

Each row in our dataframe is an observation. For the ACS5 data each observation is about a census tract.

Each column in our dataframe is a variable for that observation.

Let's use `head` to take a look at the first 5 rows in the dataframe.

In [None]:
# Take a look at the data
acs5data_df.head()

A `...` in the middle of the top row indicates that there are two many columns to display.

The pandas dataframe `columns` attribute returns a list of the column names.

In [None]:
acs5data_df.columns

We can see more information about the variables included in our ACS5 year data using the `info` method. This method tells us at a glance what variables (or columns) are included in the data, the data type of each variable, and which variables have values for all rows.

In [None]:
acs5data_df.info()

### Brief review of the ACS data

These variables were combined from different ACS 5 year tables. We have information for the following:

- `c_race` - Total population
- `c_white` - Total white non-Latinx
- `c_black` - Total black and African American non-Latinx
- `c_asian` - Total Asian non-Latinx
- `c_latinx` - Total Latinx
- `state_fips` - State level FIPS code
- `county_fips` - County level FIPS code
- `tract_fips` - Tracts level FIPS code
- `med_rent` - Median rent
- `med_hhinc` - Median household income
- `c_tenants` - Total tenants
- `c_owners` - Total owners
- `c_renters` - Total renters
- `c_movers` - Total number of people who moved
- `c_stay` - Total number of people who stayed
- `c_movelocal` - Number of people who moved locally
- `c_movecounty` - Number of people who moved counties
- `c_movestate` - Number of people who moved states
- `c_moveabroad` - Number of people who moved abroad
- `c_commute` - Total number of commuters
- `c_car` - Number of commuters who use a car
- `c_carpool` - Number of commuters who carpool
- `c_transit` - Number of commuters who use public transit
- `c_bike` - Number of commuters who bike
- `c_walk` - Number of commuters who bike
- `year` - ACS data year
- `FIPS_11_digit` - 11-digit FIPS code

The ACS variables that start with `c_` are counts, those that start with `med_` are medians.  Variables that end in `_moe` denote margin of error. There are also a number of derived variables that start with `p_`. These are proportions calcuated from the counts divided by the table denominator (the total count for whom that variable was assessed).

We're going to drop all of our `moe` columns by identifying all of those that end with `_moe`. We can do that in two steps, first by using `filter` to identify columns that contain the string `_moe`.

In [None]:
moe_cols = acs5data_df.filter(like='_moe',axis=1).columns
moe_cols

Note how we set the filter `like=` to a value that matches the pattern of the names of the columns we want to drop. You need to make sure you get all but only the columns that you want to drop.

<div style="display:inline-block;vertical-align:top;">
    <img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 
</div>  
<div style="display:inline-block;">

#### Question
</div>

What do you think happens if you match `_mo` instead of `_moe` in the filter?

Now that we've got our list of moe columns, we can use `.drop()` to remove them from the dataframe. 

In [None]:
# Drop MOE columns
acs5data_df.drop(moe_cols, axis=1, inplace=True)

Check that you no longer have the moe columns in the dataframe.

In [None]:
acs5data_df.columns

### Select data for our county and year of interest

Our ACS5 data contains observations for all CA counties and two ACS 5 year periods.

The counties are identified by a unique Census FIPS code. 
- You can see the list of all CA Counties and their FIPS codes [here](https://en.wikipedia.org/wiki/List_of_counties_in_California).

Let's use the `.unique` to check the unique set of county FIPS codes included in our dataframe.

In [None]:
acs5data_df['county_fips'].unique()  #what counties are in our dataframe

Now use `.unique` to see what years are included.

In [None]:
acs5data_df['year'].unique()

We are interested in Alameda County, which has the FIPS code `001`.  Moreover, we are only interested in the 2018 ACS 5 year data.  Let's filter the data to keep only the rows that match these two conditions.


In [None]:
acs5data_df_ac = acs5data_df[(acs5data_df['year']==2018) & (acs5data_df['county_fips']==1)]

<div style="display:inline-block;vertical-align:top;">
    <img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 
</div>  
<div style="display:inline-block;">

#### Question
</div>

Why do we filter on `county_fips==1` instead of `county_fips==001` or `county_fips=='001'`?

In [None]:
# Write your thoughts here

Now, check the contents of our dataframe again.

In [None]:
# now what is the shape of the data when filtered for Alameda County?
print(acs5data_df_ac.shape)

In [None]:
# Take a look at the first 5 rows
acs5data_df_ac.head()

>**Pro-tip:** Checking your row and column counts and values often with `.shape` and values with `.head` help to make sure that these values are consistent with your understanding of the data.

### Saving our output

It's a good idea to save your data if you have done any major processing on it. Let's save our Alameda County sub-setted ACS5 data to a CSV file.

In [None]:
# Save processed data to a csv file - give it a name that is meaningful
acs5data_df_ac.to_csv('../outdata/acs5data_2018_AC.csv')

Confirm that the file was saved with a [shell command](https://jakevdp.github.io/PythonDataScienceHandbook/01.05-ipython-and-shell-commands.html#Shell-Commands-in-IPython).  Shell commands are prefaced by a `!` and allow you to access the file system and run commands like you would from a terminal window. (This may differ if you are on a windows computer)

In [None]:
!ls ../outdata

#### Exercise

Now do this for the SF ACS data:
1. Find the FIPS code for [SF county](https://en.wikipedia.org/wiki/List_of_counties_in_California)
2. Subset the ACS data to keep only rows for SF county in 2018 and assign to `acs5data_df_sf`
3. Save out ACS data as `acs5data_2018_SF.csv`




In [None]:
# Your code here


*Click here for solution*

<!--- 
    # SOLUTION
    # 1 & 2 Subset ACS data for SF
    acs5data_df_sf = acs5data_df[(acs5data_df['county_fips']==75) & (acs5data_df.year==2018)]

    # SOLUTION
    acs5data_df_sf.head()

    # SOLUTION
    # 3. Save out ACS data as 'acs5data_2018_SF.csv'
    acs5data_df_sf.to_csv('../outdata/acs5data_2018_SF.csv')
--->

<a id="section3"></a>
## 1.3 Mapping the ACS Data

In order to map the ACS data it needs to be geospatial data. Since the data are aggregated to census tracts, we will join the ACS data with the census tract geographic data for our county.

### About Census Geographic Data:

There are two main types of census geographic data products: 
- TIGER/Line Files 
  - contain detailed geometry, big files
  - not pretty for mapping
  - good for spatial analysis
  - have a `tl` (as in `T`IGER/`L`ine) in the filename when downloaded from Census web or FTP site.
    - e.g., tl_2018_06_tract.zip
    
  
- [Cartographic Boundary files](https://www.census.gov/programs-surveys/geography/technical-documentation/naming-convention/cartographic-boundary-file.html): 
  - smaller file sizes, 
  - made specifically for mapping,
  - have a `cb` in the file name when downloaded from Census web or FTP site
      - e.g., cb_2018_06_tract_500k.zip
  - have a mapping resolution at the end of the file name, 
    - eg `_500k` files look best around 1:500K map scale
  
### Several ways to obtain Census Geographic data

1. Fetch via API - although not all years may be available.
2. Download from Census website or FTP site
3. Download from another website like [NHGIS.org](https://nhgis.org)

### What files should you download?

Census tract geographic data files are updated frequently to improve the quality of the spatial data, but the most significant updates happen for all census geographies just before the decennial census.

When mapping or spatially analyzing ACS 5 year data, download the geographic files with the same year as the end date as the ACS5 year data you are analyzing.

For example, we can use the following URLs to download 2013 and 2018 census tracts for California.

- Cartographic Boundary File for CA Census Tracts, 2013: [https://www2.census.gov/geo/tiger/GENZ2013/cb_2013_06_tract_500k.zip](https://www2.census.gov/geo/tiger/GENZ2013/cb_2013_06_tract_500k.zip)
  - suitable for mapping ACS 5 year 2009 - 2013 data: 


- Cartographic Boundary File for CA Census Tracts, 2018: [https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_06_tract_500k.zip](https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_06_tract_500k.zip)
  - suitable for mapping ACS 5 year 2014 - 2018 data: 


#### ESRI Shapefiles

These census tract files are made available in the [ESRI Shapefile](https://en.wikipedia.org/wiki/Shapefile) format, along with other formats.

An ESRI Shapefile is actually a collection of 3 to 9+ files that together are called a shapefile. Although this is a old file format with numerous limitations, it remains the most commonly used file format for vector spatial data. 


### Census tract data

We are ready to read in the census tract data for CA using the Geopandas `read_file` function.

- Specifially, we will read in the `2018 cartographic boundary files` for CA census tracts. 

In [None]:
# Import CA census tracts data
tracts_gdf = gpd.read_file("zip://../notebook_data/census/Tracts/cb_2018_06_tract_500k.zip")

And take a look...

In [None]:
tracts_gdf.head(2)

### The GeoPandas GeoDataFrame

A [GeoPandas GeoDataFrame](https://geopandas.org/data_structures.html#geodataframe), or `gdf` for short, is just like a pandas dataframe (`df`) but with an extra geometry column and methods & attributes that work on that column. I repeat because it's important:

> `A GeoPandas GeoDataFrame is a pandas DataFrame with a geometry column and methods & attributes that work on that column.`

> This means all the methods and attributes of a pandas DataFrame also work on a Geopandas GeoDataFrame!!


How cool is that to see the geometry! Desktop GIS software like `QGIS` and `ArcGIS` hide the geometry from the user. Not so with GeoPandas. 

### Geopandas Geometries
There are main types of geometries that can be associated with your geodataframe: points, lines and polygons:

<img src ="https://datacarpentry.org/organization-geospatial/fig/dc-spatial-vector/pnt_line_poly.png" width="450"></img>

In the geodataframe these geometries are encoded in a format known as [Well-Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry). For example:

> - POINT (30 10)
> - LINESTRING (30 10, 10 30, 40 40)
> - POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
>
> *where coordinates are separated by a space and coordinate pairs by a comma*

Your geodataframe may also include the variants **multipoints, multilines and multipolgyons** if the row-level feature of interest is comprised of multiple parts. For example, a geodataframe of states, where one row represents one state, would have POLYGON geometry for Utah but MULTIPOLYGON for Rhode Island, which includes many small islands.

> It's ok to mix and match geometries of the same family, e.g., POLYGON and MULTIPOLYGON, in the same geodatafame.

You can check the types of geometries in a geodataframe or a subset of the geodataframe by combining the `type` and `unique` methods.


In [None]:
tracts_gdf['geometry'].type.unique()

### Plotting a Geodataframe
Let's now go ahead and use the GeoPandas gdf `plot` method to map all of our tracts.

In [None]:
# Plot the gdf
tracts_gdf.plot()

> ### Wow! How cool is that?

### Select Census Tracts for Alameda County

We want to subset the tracts to get the data for Alameda county. In order to do this, let's first check what variables we have and what the data looks like.

In [None]:
tracts_gdf.head(3)

In [None]:
tracts_gdf.columns

Here's what each variable means:
- `STATEFP`: State FIPS code 
- `COUNTYFP`: County FIPS code
- `TRACTCE`: Census tract code
- `AFFGEOID`: Summary level code + geovariant code + '00US' + GEOID
- `GEOID`:  Census tract identifier; a concatenation of Current state FIPS code, county FIPS code, and census tract code
- `NAME`:  Census tract name
- `LSAD`:  Legal/statistical description with the census tract name
- `ALAND`: Area that is land, in square meters
- `AWATER`:  Area that is water, in square meters
- `geometry`: Geometry of tract

Let's take a closer look at the county identifiers.

In [None]:
# Are the county codes
tracts_gdf['COUNTYFP'].unique()

Since the county code for Alameda County is `001`, let's subset our data using that knowledge so we can focus on our area of interest.

In [None]:
tracts_gdf_ac = tracts_gdf[tracts_gdf['COUNTYFP']=='001']
tracts_gdf_ac.plot()
plt.show()

Nice! Looks like we have what we were looking for.

*FYI*: You can also make dynamic plots of one or more county without saving to a new gdf.

In [None]:
# Dynamic plot of the census tracts for the 10 County Bay Area
# Alameda, Contra Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Santa Cruz, Solano, Sonoma
tracts_gdf[tracts_gdf['COUNTYFP'].isin(['001','013','041','055','075','081', '085','087','095','097'])].plot()

<img src ="https://i.ytimg.com/vi/C9J1p6kO9VA/maxresdefault.jpg" height="200" width="800">


#### Exercise

Now do this for the SF tracts data:
1. Subset to SF county, assign to `tracts_gdf_sf`
2. Plot the tracts 
3. <img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="20" align=left >  Answer this question: What's weird about our plot?

In [None]:
# Your code here

*Click here for solution*

<!--- 
    # SOLUTION
    # 1. Subset to SF county, assign to `tracts_gdf_sf'
    tracts_gdf_sf = tracts_gdf[tracts_gdf['COUNTYFP']=='075']
    # 2. Plot
    tracts_gdf_sf.plot()
    plt.show()

    # 3. Answer this question: What's weird about our plot?
--->

<img src ="https://s.hdnux.com/photos/61/50/04/13009196/3/920x920.jpg" height="400" width="400">

Our SF tract map seems off because it includes the [Farallon Islands](https://en.wikipedia.org/wiki/Farallon_Islands). These are not inhabitated (so population=0)!

In [None]:
# 1. Subset to SF county, assign to `tracts_gdf_sf'
tracts_gdf_sf = tracts_gdf[tracts_gdf['COUNTYFP']=='075']

# 2. Plot
tracts_gdf_sf.plot()


Take a look at the gdf with `head` to see if we have a column to use to filter out the Farrallon Islands.

In [None]:
tracts_gdf_sf.head(2)

Once we combine our tract data with the ACS data we can subset the data based on population greater than zero. 

But, with just the census tract columns, what could we use to subset the data to remove those tracts?

<a id="section4"></a>
## 1.4 Spatial Subsetting

We could filter the Farallon Islands out if we knew their census tract geographic identifier, or `GEOID`.

Geopandas offers another way. We can use the values in the `geometry` column to `spatially subset` our data.

One way to do this with the geodataframe [cx](https://geopandas.org/indexing.html) method which spatially selects rows whose geometry intersects a specified bounding box.

In [None]:
# Uncomment to view help docs
#tracts_gdf_sf.cx?

For the `cx` method we need to specify the bounding coordinates as follows:
<pre>
tracts_gdf_sf.cx[xmin:xmax, ymin:ymax]
</pre>
We can define a bounding box around the city of San Francisco to select only those census tracts.
- You can find the coordinates for this bounding box by making a quick plot of the gdf.

In [None]:
tracts_gdf_sf.plot()

The coordinate bounds of the data are shown on the map X and Y axes.
- The ymin (south) and ymax (north) coordinates look good, as does the xmax (east) coordinate. 

- The xmin (west) coordinate needs to be adjusted. 

You can try a few values before you spatially subset the data.

In [None]:
tracts_gdf_sf.cx[-122.45:-122.35, 37.65:37.85].plot()

That's not great. But what does it tell you about how `cx` works?

Try this..

In [None]:
tracts_gdf_sf.cx[-122.8:-122.35, 37.65:37.85].plot()

That looks good. When you are ready to subset, you can overwrite the input dataset.
- If you make a mistake, that's ok. Just rerun the previous code to get the SF census tract data.

When you are ready to save the clip...

In [None]:
tracts_gdf_sf= tracts_gdf_sf.cx[-122.8:-122.35, 37.65:37.85].copy().reset_index(drop=True)

In [None]:
# Take a look
tracts_gdf_sf.plot()
plt.show()

Beautiful! Now our SF county tract and ACS data are ready too.

<a id="section5"></a>
## 1.5 Attribute Joins  between Geodataframes and  Dataframes

We just mapped the census tracts. But what makes a map powerful is when you map the data associated with the locations.

In order to map the ACS data we need to associate it with the tracts. We have polygon data in the `tracts_gdf_ac` geodataframe but no attributes of interest.

In a separate file we have our ACS 5-year data for 2018 `census_variables_CA.csv` that we just imported and read in as a `pandas` dataframe. We're now going to join the columns from that data to the `tracts_gdf_ac` with a common key. This process is called an `attribute join`, which we covered in an earlier notebook.

We're going to be conducting an inner join here -- think about why we do one type of join over another. You can read more about merging in `geopandas` [here](http://geopandas.org/mergingdata.html#attribute-joins).

<img src="https://shanelynnwebsite-mid9n9g1q9y8tt.netdna-ssl.com/wp-content/uploads/2017/03/join-types-merge-names.jpg">



Let's talk about the data and the different join operations. What kind of join do we want to do?

In [None]:
# write any notes here

Let's take another look at the two data objects that we have -- do we see any columns that we can join on between the two?

In [None]:
# ACS 5 year data
acs5data_df.columns

Since its hard to see all of our variables and know what types they are, let's use the `info` method instead.

In [None]:
acs5data_df.info()

Okay, awesome! Now let's go ahead and check our our tracts data.

In [None]:
# Tracts data
tracts_gdf_ac.head(2)

So it seems like `GEOID` in our tracts data and `FIPS_11_digit` are going to be the keys in our join. 

<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="20" align=left >  Let's check those variables-- do you see any differences?

In [None]:
tracts_gdf_ac['GEOID'].head()

In [None]:
acs5data_df['FIPS_11_digit'].head()

A `join` requires data to be of the same type and same values. Are we good to go?

In [None]:
# Write your thoughts here

Use the `geopandas` `merge` command to join the two dataframes by matching the values in the `GEOID` and `FIPS_11_digit` columns. Then take a look at the output since it should contain our ACS data for Alameda County.

In [None]:
# Uncomment to view documentation 
#acs5data_df_ac.merge?

Let's do a `left` join to keep all of the census tracts in Alameda County and only the ACS data for those tracts.

In [None]:
# Left join keeps all tracts and the acs data for those tracts
tracts_acs_gdf_ac = tracts_gdf_ac.merge(acs5data_df_ac, left_on='GEOID',right_on="FIPS_11_digit", how='left')
tracts_acs_gdf_ac.head(2)

Let's see all the variables we have in our dataset now.

In [None]:
list(tracts_acs_gdf_ac.columns)

How many rows and columns should we have? Think about this before you run the next lines of code.

In [None]:
print("Rows and columns in the Alameda County Census tract gdf:", tracts_gdf_ac.shape)
print("Rows and columns in the Alameda County Census tract gdf joined to the ACS data:", tracts_acs_gdf_ac.shape)

<div style="display:inline-block;vertical-align:top;">
    <img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="30" align=left > 
</div>  
<div style="display:inline-block;">

#### Question
</div>

1. What would happen if we did a inner join instead of a left join? A right join? 
2. What is data type of output of the merge?

In [None]:
# Put your thoughts here

In [None]:
# Check the data type of the join output
type(tracts_acs_gdf_ac)

### Join Order Matters!

Above, we lefted joined the ACS5 dataframe to the tracts geodataframe. The ouput was a geodataframe of all census tracts and the ACS data for those tracts.

We can do do a similar operation by joining the tracts geodataframe to the ACS dataframe.  However, if we change the order of inputs we get a different type of output!

Let's check that out

In [None]:
tracts_acs_df_ac = acs5data_df_ac.merge(tracts_gdf_ac, right_on='GEOID', left_on="FIPS_11_digit", how='right')

In [None]:
type(tracts_acs_df_ac)

In [None]:
print(tracts_acs_gdf_ac.shape)
print(tracts_acs_df_ac.shape)

In [None]:
tracts_acs_df_ac.columns

The number of rows and columns in the output is the same for both joins but the output type is different - even though the pandas dataframe contains a geometry column.

So be careful when joining Geopandas geodataframes and Pandas dataframes. Always check your outputs to make sure they are what you expect.

<a id="section6"></a>
## 1.6 Data Driven Mapping

Data driven mapping refers to the process of using data values to determine the symbology of mapped features. Color, shape, and size and the three most common symbology types used in data driven mapping. 

Data driven maps are often refered to as `thematic maps`.

### Types of Thematic Maps

There are two primary types of maps used to convey data values:

- `Choropleth maps`: set the color of areas (polygons) by data value
- `Point symbol maps`: set the color or size of points by data value

We will discuss both of these types of maps in more detail in the next lesson. But let's take a quick look at choropleth maps. 

### Choropleth Maps

Choropleth maps are the most common type of thematic map.

Let's take a look at how we can use a geodataframe to make a choropleth map.

First a basic map of a geodataframe using the `plot` method, which we did above...

In [None]:
tracts_acs_gdf_ac.plot()

Now, let's create a choropleth map by setting the color of the census tracts based on the values in the population (c_race) column.

In [None]:
tracts_acs_gdf_ac.plot(column='c_race')

That's really the heart of it. To set the color of the features based on the values in a column, set the `column` argument to the column name in the gdf.
> **Pro-tips:** 
- If you want to get rid of the matplotlib text output, add `plt.show()` or a semi-colon after the plot method.
- You can quickly right-click on the plot and save to a file or open in a new browser window.

In [None]:
tracts_acs_gdf_ac.plot(column='c_race')
plt.show()

Let's make this map a bit more informative now-- start by adding a legend.

In [None]:
tracts_acs_gdf_ac.plot(column='c_race', 
                    legend=True)
plt.show()

Aesthetically, we could put the color bar on the bottom. Let's do that and make this more informative by adding a label to our color bar.

In [None]:
# add a legend but put it on the bottom
tracts_acs_gdf_ac.plot(column='c_race', 
                    legend=True,
                    legend_kwds={'label': "Population by County",
                                 'orientation': "horizontal"}
                    )
plt.show()

Now let's make this chart bigger so we can see our tracts more clearly.

You can use [matplotlib](https://matplotlib.org) commands directly to customize our maps.
- matplotlib is the primary python plotting library

In [None]:
## Change the size by adding in some more matplotlib commands
fig, ax = plt.subplots(figsize = (10,10)) 
tracts_acs_gdf_ac.plot(column='c_race', 
                    legend=True,
                    legend_kwds={'label': "Population by County",
                                 'orientation': "horizontal"},
                    ax=ax)
plt.show()

### About Choropleth maps

There are several types of quantitative data variables that can be used to create a choropleth map. Let's consider these in terms of our ACS data.

- `Counts`: display the count of observations aggregated by a feature, for example, the population within a census tract.

- `Density`: express the count within a feature by the of area of the feature, for example, population per square mile within a census tract, 

- `Proportions / Percentages`: compare the value of a part to the whole. For example, the proportion of the tract population that is white compared to the total tract population.

- `Rates/ratios`: compare the relationship of one observation to another. For example the homeowner to renter ratio would be calculated as the number of homeowners (c_owners/ c_renters).


The goal of a choropleth map is to use color to visualize the spatial distribution of a quantitative variable.

- Brighter or richer colors are typically used to signify higher values.

A big problem with choropleth maps is that our eyes are drawn to the color of larger areas, even even if the value being mapped is more significant in one or more smaller areas.

This problem is exacerbated when the variable being mapped is a `count` rather than a standardized variable like density or percent. Large areas often have higher counts than smaller areas but not necessary higher densities, percents, or rates.

For this reason it is considered best practice to create choropleth maps of standardized variables and not raw counts!

### Mapping Population density

With that said, we're now going to create density variables for population per square kilometer (km^2) and square mile (mi^2) and create choropleth maps of these. We can use our total population (`c_race`) and land area (`ALAND`) columns. 

> `Area` is present in all census geographic data 
- in the [ALAND](https://www.census.gov/quickfacts/fact/note/US/LND110210) column as the land area per sq meter.
- and in the `AWATER` column as water area per sq meter

In [None]:
# Create population density variable
# Land area measurements are originally recorded as whole square meters 
# To convert square meters to square kilometers, divide by 1,000,000; 
# To convert square meters to square miles, divide by 2,589,988.
SQMETER_PER_SQKM = 1000000
SQMETER_PER_SQMILE = 2589988

tracts_acs_gdf_ac['pop_dens_km2'] = tracts_acs_gdf_ac['c_race']/ (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQKM)
tracts_acs_gdf_ac['pop_dens_mi2'] = tracts_acs_gdf_ac['c_race']/ (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQMILE)

We can check our geodataframe to make sure our new variables have been incorporated.

In [None]:
tracts_acs_gdf_ac.head(3)

#### Always check your calculations!
You can compare the land area of [Alameda County](https://en.wikipedia.org/wiki/Alameda_County,_California) to that listed in Wikipedia to check your math (739 sq mi / 1,910 km2).

In [None]:
print("Land area of Alameda county in square km:", (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQKM).sum().round())
print("Land area of Alameda county in square miles:", (tracts_acs_gdf_ac
                                                       ['ALAND']/SQMETER_PER_SQMILE).sum().round())

Now let's plot population density per sq kilometer ('pop_dens_km2').

- Consider how it differs from the map of population count that we made above.

In [None]:
# Plot population density - km^2
fig, ax = plt.subplots(figsize = (10,10)) 
tracts_acs_gdf_ac.plot(column='pop_dens_km2', legend=True,
                    legend_kwds={'label': "Population per Sq KM",
                                 'orientation': "horizontal"},
                    ax=ax)
plt.show()

#### Exercise 

Now you try it! Map population density per sq miles.

In [None]:
# Plot population density - miles^2

Our population maps look dark blue for the most part. What does that mean? Write what you think below

In [None]:
# Put your thoughts here 

When color bunching occurs it's best to see what the distribution of your data is like. In fact it is always a good idea to explore your data values as you prepare your maps.

#### Exercise 
Plot a histogram of your `pop_dens_km2` below and consider how the distribution of values impacts the colors in the choropleth map.

In [None]:
# histogram of pop_dens_km2

*Click here for answers*

<!--- 
# # SOLUTION
# # histogram of pop_dens_km2
# tracts_acs_gdf_ac['pop_dens_km2'].hist()
--->

#### Looking Ahead

In the next lesson we'll take a deeper dive into mapping and learn about `classification schemes` and `color palettes` so we can avoid color bunching.

### Saving a geodataframe to a file

Let's not forget to save out our Alameda County geodataframe `tracts_acs_gdf_ac`. By saving it we will not need to repeat the processing steps and attribute join we did above.

We can save to a shapefile.

In [None]:
tracts_acs_gdf_ac.to_file("../outdata/tracts_acs_ac.shp")

One of the problems of saving to a shapefile is that our column names get truncated to 10 characters (a shapefile limitation.) 

Instead of renaming all columns with obscure names that are less than 10 characters, we can save our geodatafraem to a spatial data file format that does not have this limation - [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON) or [GPKG](https://en.wikipedia.org/wiki/GeoPackage) (geopackage) file.
- These formats have the added benefit of outputting only one file in contrast tothe multi-file shapefile format.

In [None]:
tracts_acs_gdf_ac.to_file("../outdata/tracts_acs_gdf_ac.json", driver="GeoJSON")

In [None]:
tracts_acs_gdf_ac.to_file("../outdata/tracts_acs_gdf_ac.gpkg", driver="GPKG")

We can also save out our data as a csv, dropping the geometry column.

In [None]:
tracts_acs_gdf_ac.drop('geometry',axis=1).to_csv("../outdata/tracts_acs_gdf_ac.csv") 

We can also save just the tract data we subsetted earlier into its own shapefile

In [None]:
tracts_gdf_ac.to_file("../outdata/tracts_ac.shp")

#### Exercise
Go ahead and save your SF county tracts geodataframe (`tracts_gdf_sf`) as a shapefile, GeoJSON, and csv file.

In [None]:
# Your code here

*Click here for answers*

<!--- 
    # SOLUTION
    # shapefile
    tracts_gdf_sf.to_file("../outdata/tracts_sf.shp")

    # SOLUTION
    # GeoJSON
    tracts_acs_gdf_sf.to_file("../outdata/tracts_acs_gdf_sf.json", driver="GeoJSON")

    # SOLUTION
    # csv
    tracts_acs_gdf_sf.drop('geometry',axis=1).to_csv("../outdata/tracts_acs_gdf_sf.csv") 
--->

<a id="section7"></a>
## 1.7 Coordinate Reference Systems (CRS) and Map Projections

Before moving onto our next lesson, let's about how talk coordinate reference systems (CRS) and Map Projections are handled by GeoPandas.

In fact, we have gotten pretty far without talking about these!

<img src="http://www.pngall.com/wp-content/uploads/2016/03/Light-Bulb-Free-PNG-Image.png" width="20" align=left >  Do you have experience with Coordinate Reference Systems?

As a refresher, a CRS describes how the coordinates in a geospatial dataset relate to locations on the surface of the earth. 

A `geographic CRS` consists of: 
- a 3D model of the shape of the earth (a `datum`), approximated as a sphere or spheroid (aka ellipsoid)
- the `units` of the coordinate system (e.g, decimal degrees, meters, feet) and 
- the `origin `(0,0 location), specified as the `equator` and the `prime meridian`

A `projected CRS` consists of
- a geographic CRS
- a **map projection** and related parameters used to transform the geographic coordinates to `2D` space.
  - a map projection is a mathematical model used to transform coordinate data

### A Geographic vs Projected CRS
<img src ="https://www.e-education.psu.edu/natureofgeoinfo/sites/www.e-education.psu.edu.natureofgeoinfo/files/image/projection.gif" height="100" width="500">

### There are many, many CRSs

Theoretically the number of CRSs is unlimited!

Why? Primariy, because there are many different definitions of the shape of the earth. Our understanding of its shape and our ability to measure it has changed greatly over time.

### Why are CRSs Important?

- You need to know the data about your data (or `metadata`) to use it appropriately.


- All projected CRSs introduce distortion in shape, area, and/or distance. So understanding what CRS best maintains the characteristics you need for your area of interest and your analysis is important.


- Some analysis methods expect geospatial data to be in a projected CRS
  - For example, `geopandas` expects a geodataframe to be in a projected CRS for area or distance based analyses.


- Some Python libraries, but not all, implement dynamic reprojection from the input CRS to the required CRS and assume a specific CRS (WGS84) when a CRS is not explicitly defined.


- Most Python spatial libraries, including Geopandas, require geospatial data to be in the same CRS if they are being analysed together.

### What you need to know when working with CRSs

- What CRSs used in your study area and their main characteristics
- How to identify, or `get`, the CRS of a geodataframe
- How to `set` the CRS of geodataframe (i.e. define the projection)
- Hot to `transform` the CRS of a geodataframe (i.e. reproject the data)

### Codes for CRSs commonly used with CA data

CRSs are typically referenced by an [EPSG code](http://wiki.gis.com/wiki/index.php/European_Petroleum_Survey_Group).  

It's important to know the commonly used CRSs and their EPSG codes for your geographic area of interest.  

For example, below is a list of commonly used CRSs for California geospatial data along with their EPSG codes.

##### Geographic CRSs
-`4326: WGS84` (units decimal degrees) - the most commonly used geographic CRS

-`4269: NAD83` (units decimal degrees) - the geographic CRS customized to best fit the USA. This is used by all Census geographic data.

>  `NAD83 (epsg:4269)` are approximately the same as `WGS84(epsg:4326)` although locations can differ by up to 1 meter in the continental USA and elsewhere up to 3m. That is not a big issue with census tract data as these data are only accurate within +/-7meters.
##### Projected CRSs

-`5070: CONUS NAD83` (units meters) projected CRS for mapping the entire contiguous USA (CONUS)

-`3857: Web Mercator` (units meters) conformal (shape preserving) CRS used as the default in web mapping

-`3310: CA Albers Equal Area, NAD83` (units meters)  projected CRS for CA statewide mapping and spatial analysis

-`26910: UTM Zone 10N, NAD83` (units meters) projected CRS for northern CA mapping & analysis

-`26911: UTM Zone 11N, NAD83` (units meters) projected CRS for Southern CA mapping & analysis

-`102641 to 102646: CA State Plane zones 1-6, NAD83` (units feet) projected CRS used for local analysis.

You can find the full CRS details on the website https://www.spatialreference.org

### Getting the CRS of a gdf

GeoPandas GeoDataFrames have a `crs` attribute that returns the CRS of the data.

In [None]:
# Check the CRS of our gdf
tracts_acs_gdf_ac.crs

The above CRS definition specifies 
- the name of the CRS (`NAD83`), 
- the axis units (`latitude` and `longitude`)
- the shape (`datum`),
- and the origin (`Prime Meridian`, and the equator)
- and the area for which it is best suited (`North America`)

> Notes:
>    - `geocentric` latitude and longitude assume a spherical (round) model of the shape of the earth
>    - `geodetic` latitude and longitude assume a spheriodal (ellipsoidal) model, which is closer to the true shape.
>    - `geodesy` is the study of the shape of the earth.

Note that the ouput looks very different if you print it.

In [None]:
print(tracts_acs_gdf_ac.crs)

Printing the crs is useful because it outputs the code you should use if you want to `set` the CRS.


### Setting the CRS

You can set the CRS of a gdf with the `crs` method.  You would set the CRS if is not defined or if you think it is incorrectly defined.

> In desktop GIS terminology setting the CRS is called `defining the projection`

As an example, let's set the CRS of our data to `None`

In [None]:
# first set the CRS to None
tracts_acs_gdf_ac.crs = None

In [None]:
# Check it again
tracts_acs_gdf_ac.crs

...hummm...

If a variable has a null value (None) then displaying it without printing it won't display anything!

In [None]:
# Check it again
print(tracts_acs_gdf_ac.crs)

In [None]:
# Set it to 4326
tracts_acs_gdf_ac.crs = "epsg:4326"

In [None]:
# Show it
tracts_acs_gdf_ac.crs

Opps, that was wrong, the CRS is `4269`

In [None]:
# Set it to 4269
tracts_acs_gdf_ac.crs = "epsg:4326"

> #### Important note
> - You can `set` the CRS to anything you like - that doesn't make it correct!
> - Setting the CRS does not change the coordinate data. It just tells the software how to interpret it.

### Transforming or Reprojecting the CRS
You can transform the CRS of a geodataframe with the `to_crs` method.


> In desktop GIS terminology transforming the CRS is called `projecting the data`

When you do this you want to save the output to a new geodataframe.

In [None]:
tracts_acs_ac_utm10 = tracts_acs_gdf_ac.to_crs('epsg:26910')

Now take a look at the CRS.

In [None]:
tracts_acs_ac_utm10.crs

You can see the result immediately by plotting the data.

- What two key differences do you see?

In [None]:
# plot geographic gdf
tracts_acs_gdf_ac.plot();

# plot utm gdf
tracts_acs_ac_utm10.plot();

#### Exercise

In the code cell below:
1. transform the CRS of the `tracts_acs_gdf_ac` geodataframe to the `CA Albers Equal Area` CRS and save it to a new geodataframe
2. display the CRS defintion of the output geodataframe
3. plot the data to see if how the shape and range of coordinate values differ from those for the tracts_acs_gdf_ac and tracts_acs_gdf_ac_3310 geodataframes.





In [None]:
# Your code here

*Double-click here to view the solution*

<!--
tracts_acs_gdf_ac_3310 = tracts_acs_gdf_ac.to_crs('epsg:3310')
tracts_acs_gdf_ac_3310.crs
tracts_acs_gdf_ac_3310.plot()
-->

### Geopandas for Spatial Measurement Calculations

To see the immediate usefulness of this transformation from a geographic to a projected CRS, let's consider our calculation of population density above.

That calculation was based on the ALAND column, or land area in sq meters, that is included in the census tract data.

- What if the data did not contain that column?

If your geodataframe is in a projected CRS that is appropriate for area or distance calculations you can calculate these values for each feature using the `area` or `length` attributes. 

For geodatraframes with polygon geometry,
- `geodataframe_name.area` will return the area of each row's geometry

For geodatraframes with line or polygon geometry,
- `geodataframe_name.length` will return the length (or perimeter) of each row's geometry


The output units will be the units of the CRS.

In [None]:
tracts_acs_ac_utm10.area # returns the area of each feature

In [None]:
tracts_acs_ac_utm10.length # returns the perimeter of each feature in meters

We can also get the total area or length.

In [None]:
tracts_acs_ac_utm10.area.sum()

So if we want to calculate the area of Alameda County, we could do so as follows.

- *Below we use the constants we defined earlier.*

In [None]:
tracts_acs_ac_utm10.area.sum()  / SQMETER_PER_SQKM

How do this value compare to we get above using the column `ALAND`?

In [None]:
tracts_acs_ac_utm10.ALAND.sum() / SQMETER_PER_SQKM

### Getting Help with CRSs and Map Projections

See the [GeoPandas](https://geopandas.org/projections.html) website for more info on managing projections of geodataframes.

As you work with geospatial data in GeoPandas or in any software you will want to transform you data to the CRS that is most appropriate for you work.  

Most spatial analysis operations will assume a projected CRS. For example, you would not want to compute area using a geographic CRS.

For more introductory materials on CRSs and Map Projections see the references listed at the end of ths notebook.


<a id="section8"></a>
## 1.8 Recap
This lesson provided a broad overview to using [GeoPandas](http://geopandas.org/) to work with geospatial data in Python. 

Below is a quick recap of the GeoPandas capabilities and geospatial concepts we covered:

- Reading and writing spatial data to/from Geopandas (gpd) GeoDataFrames (gdf), with a focus on ESRI Shapefiles and geojson files.
	- `gpd.read_file()`
    - `gdf.to_file()`
- Plotting a geodataframe 
	- `gdf.GeoDataFrame.plot()`
- Spatially subsetting a geodataframe
	- `gdf.cx()`
- Using attribute joins to merge Geopandas GeoDataFrames with pandas DataFrames (df)
	- `gdf.merge(df)`
- Choropleth mapping 
	- `.plot(column='<column_name>')`
- Adding columns to a GeoDataFrame to transform counts to densities
	- `tracts_acs_gdf_ac['pop_dens_km2'] = tracts_acs_gdf_ac['c_race']/ (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQKM)`

- Getting, setting (defining), and transforming (projecting) a CRS using `EPSG` codes
	- `.crs`
	- `.to_crs()`
- Spatial measurements: accessing the spatial attributed of GeoDataFrame geometries
	- `.area` 
	- `.length`

<a id="section9"></a>
## 1.9 Homework

#### Exercise 1
1. Compare the values in the `GEOID` column of the tracts gdf and `FIPS_11_digit` in the ACS dataframe.
2. Join the two datasets and name the output geodataframe `tracts_acs_gdf_sf`
3. Check your output data - type, columns, shape, data values, etc. 

In [None]:
# Your code here

*Click here for answers*

<!--- 
    # SOLUTION
    # 1.a - look at census tract identifiers in the tract data
    tracts_gdf_sf['GEOID']

    # SOLUTION
    # 1.b - look at census tract identifiers in the ACS data
    acs5data_df_sf['FIPS_11_digit'].head()

    # SOLUTION
    # 2. Join the two datasets and name the output tracts_acs_gdf_sf
    tracts_acs_gdf_sf = tracts_gdf_sf.merge(acs5data_df_sf, left_on='GEOID',right_on="FIPS_11_digit", how='inner')
    tracts_acs_gdf_sf.head(2)

    # SOLUTION
    # 3. Check your output data
    print(tracts_gdf_sf.shape)
    print(tracts_acs_gdf_sf.shape)
--->

#### Exercise 2

Plot population density for SF county. Here are the steps you'll need to take:
1. Create a population density per km2 variable and add it to the data frame
2. Repeat but for population density per mile2
3. Create choropleth maps for both variables

In [None]:
# Your code here

*Click here for answers*

<!--- 
    # SOLUTION
    # 1. Create a population density per km2 variable and add it to the data frame
    tracts_acs_gdf_sf['pop_dens_km2'] = tracts_acs_gdf_sf['c_race']/ (tracts_acs_gdf_sf['ALAND']/SQMETER_PER_SQKM)

    # SOLUTION
    # 2. Repeat but for population density per mile2
    tracts_acs_gdf_sf['pop_dens_mi2'] = tracts_acs_gdf_sf['c_race']/ (tracts_acs_gdf_sf['ALAND']/SQMETER_PER_SQMILE)

    # SOLUTION
    # 3. Plot population density - km^2
    fig, ax = plt.subplots(figsize = (10,10)) 
    tracts_acs_gdf_sf.plot(column='pop_dens_km2', legend=True,
                        legend_kwds={'label': "Population per Sq KM",
                                     'orientation': "horizontal"},
                        ax=ax)
    plt.show()
--->

#### Exercise 3

Do you remember how to read in data from a file to a geodataframe? Test that below by completing the code.

In [None]:
# read in Alameda county Geojson file to a geodataframe
ac_tracts_from_geojson = ...

# Uncomment line below and plot
#ac_tracts_from_geojson.plot(column='pop_dens_mi2')

In [None]:
# read in Alameda county Geojson file to a geodataframe
ac_tracts_from_gpkg = ...
# Uncomment line below and plot
#ac_tracts_from_gpkg.plot(column='pop_dens_mi2')

*Click here for answers*

<!--- 
    # SOLUTION
    # read in Alameda county Geojson file to a geodataframe
    ac_tracts_from_geojson = gpd.read_file("../outdata/tracts_acs_gdf_ac.json")
    ac_tracts_from_geojson.plot(column='pop_dens_mi2')

    # SOLUTION
    # read in Alameda county Geojson file to a geodataframe
    ac_tracts_from_gpkg = gpd.read_file("../outdata/tracts_acs_gdf_ac.json", driver="GeoJSON")
    ac_tracts_from_gpkg.plot(column='pop_dens_mi2')
--->

#### Exercise 4
1. Check the CRS of the geodataframe `tracts_acs_gdf_sf` 
2. Transform the CRS of `tracts_acs_gdf_sf` to UTM Zone 10N, NAD83 and call it `tracts_acs_sf_utm10`
3. Display and compare your two CRS definitions.
4. Use plot to make a map of the data in both CRSs
3. Calculate the area of SF using the `.area` geodataframe attribute and the `ALAND` column

In [None]:
# Your code here

*Click here for answers*

<!--- 
# 1. Check the CRS 
tracts_acs_gdf_sf.crs
# 2. transform the crs of your SF tracts ACS data data 
tracts_acs_sf_utm10 = tracts_acs_gdf_sf.to_crs('epsg:26910')
# 3. Display the CRS definitions
tracts_acs_gdf_sf.crs
tracts_acs_sf_utm10.crs

# 3. Plot and compare your two CRSs

# plot geographic gdf
tracts_acs_gdf_sf.plot();
# plot utm gdf
tracts_acs_sf_utm10.plot();

# 4. Calculate the area of SF using the 2 above methods
tracts_acs_sf_utm10.area.sum()  / SQMETER_PER_SQKM
tracts_acs_sf_utm10.ALAND.sum()/SQMETER_PER_SQKM
--->

<a id="section10"></a>
## References

- [Kaggle Learn: Geospatial Analysis in Python](https://www.kaggle.com/learn/geospatial-analysis), an online interactive tutorial

- [Campbell & Shin, Geographic Information System Basics, v1.0](https://2012books.lardbucket.org/books/geographic-information-system-basics/index.html)

- [Intro to Python GIS: Map Projections and Coordinate Reference Systems](https://automating-gis-processes.github.io/CSC/notebooks/L2/projections.html)

- [ESRI 
Coordinate systems, map projections, and geographic (datum) transformations](http://resources.esri.com/help/9.3/arcgisengine/dotnet/89b720a5-7339-44b0-8b58-0f5bf2843393.htm)

#### Installing GeoPandas on Your Computer

To install GeoPandas on your own computer, see the instructions in this file [s0_0_Geopandas_Installation.md](https://github.com/dataforhousing/curriculum_dev/blob/master/code/s0_0_Geopandas_Installation.md) or on the [GeoPandas.org](https://geopandas.org/install.html) website.

The geospatial functionality of GeoPandas is provided by several lower level spatial data packages that are included in GeoPandas and which you may have used previously. These include:
- [shapely](https://pypi.python.org/pypi/Shapely) - for geometry processing
- [fiona](https://pypi.python.org/pypi/Fiona) - for spatial data file IO
- [GDAL/Ogr](https://gdal.org) - for spatial data file IO
- [pyproj](https://github.com/jswhit/pyproj) - for map projections and coordinate systems
- [PROJ.4](https://proj.org) - for map projections and coordinate systems
- [geopy](https://geopy.readthedocs.io/en/stable/) for geocoding and for geodesic distance calculations,
- [pysal](https://pysal.org/) for spatial analysis functions such as data classification methods and spatial autocorrelation,
- [descartes](https://bitbucket.org/sgillies/descartes/src/default/) for plotting Shapely geometric objects with Matplotlib

These packages may be installed as dependencies when you install Geopandas or you may need to install these directly.  We list the packages above for reference only in case you have questions about what is being installed on your system or need help getting Geopandas to run.


## Congrats you're done with GeoPandas part 1!
</br>


---
<div style="display:inline-block;vertical-align:middle;">
<a href="https://dataforhousing.org/" target="_blank"><img src ="https://media-exp1.licdn.com/dms/image/C560BAQELkt35AxeIeA/company-logo_200_200/0?e=1597881600&v=beta&t=irZ1tYCA9A2biVzCguvCXzsfzanSYDFuF22IUFNY5Sg" width="75" align="left">
</a>
</div>

<div style="display:inline-block;vertical-align:middle;">
    <div style="font-size:larger">&nbsp;Data Science for Housing Workshop, University of California Berkeley</div>
    <div>&nbsp;Tim Thomas, Patty Frontiera, Emmanuel Lopez, Ethan Ebinger, Hikari Murayama, Karen Chapple, Claudia von Vacano<div>
    <div>&copy; UC Regents, 2019-2020</div>
</div>