<a id="section6"></a>
## 1.6 Data Driven Mapping

Data driven mapping refers to the process of using data values to determine the symbology of mapped features. Color, shape, and size and the three most common symbology types used in data driven mapping. 

Data driven maps are often refered to as `thematic maps`.

### Types of Thematic Maps

There are two primary types of maps used to convey data values:

- `Choropleth maps`: set the color of areas (polygons) by data value
- `Point symbol maps`: set the color or size of points by data value

We will discuss both of these types of maps in more detail in the next lesson. But let's take a quick look at choropleth maps. 

### Choropleth Maps

Choropleth maps are the most common type of thematic map.

Let's take a look at how we can use a geodataframe to make a choropleth map.

First a basic map of a geodataframe using the `plot` method, which we did above...

In [None]:
tracts_acs_gdf_ac.plot()

Now, let's create a choropleth map by setting the color of the census tracts based on the values in the population (c_race) column.

In [None]:
tracts_acs_gdf_ac.plot(column='c_race')

That's really the heart of it. To set the color of the features based on the values in a column, set the `column` argument to the column name in the gdf.
> **Pro-tips:** 
- If you want to get rid of the matplotlib text output, add `plt.show()` or a semi-colon after the plot method.
- You can quickly right-click on the plot and save to a file or open in a new browser window.

In [None]:
tracts_acs_gdf_ac.plot(column='c_race')
plt.show()

Let's make this map a bit more informative now-- start by adding a legend.

In [None]:
tracts_acs_gdf_ac.plot(column='c_race', 
                    legend=True)
plt.show()

Aesthetically, we could put the color bar on the bottom. Let's do that and make this more informative by adding a label to our color bar.

In [None]:
# add a legend but put it on the bottom
tracts_acs_gdf_ac.plot(column='c_race', 
                    legend=True,
                    legend_kwds={'label': "Population by County",
                                 'orientation': "horizontal"}
                    )
plt.show()

Now let's make this chart bigger so we can see our tracts more clearly.

You can use [matplotlib](https://matplotlib.org) commands directly to customize our maps.
- matplotlib is the primary python plotting library

In [None]:
## Change the size by adding in some more matplotlib commands
fig, ax = plt.subplots(figsize = (10,10)) 
tracts_acs_gdf_ac.plot(column='c_race', 
                    legend=True,
                    legend_kwds={'label': "Population by County",
                                 'orientation': "horizontal"},
                    ax=ax)
plt.show()

### About Choropleth maps

There are several types of quantitative data variables that can be used to create a choropleth map. Let's consider these in terms of our ACS data.

- `Counts`: display the count of observations aggregated by a feature, for example, the population within a census tract.

- `Density`: express the count within a feature by the of area of the feature, for example, population per square mile within a census tract, 

- `Proportions / Percentages`: compare the value of a part to the whole. For example, the proportion of the tract population that is white compared to the total tract population.

- `Rates/ratios`: compare the relationship of one observation to another. For example the homeowner to renter ratio would be calculated as the number of homeowners (c_owners/ c_renters).


The goal of a choropleth map is to use color to visualize the spatial distribution of a quantitative variable.

- Brighter or richer colors are typically used to signify higher values.

A big problem with choropleth maps is that our eyes are drawn to the color of larger areas, even even if the value being mapped is more significant in one or more smaller areas.

This problem is exacerbated when the variable being mapped is a `count` rather than a standardized variable like density or percent. Large areas often have higher counts than smaller areas but not necessary higher densities, percents, or rates.

For this reason it is considered best practice to create choropleth maps of standardized variables and not raw counts!

### Mapping Population density

With that said, we're now going to create density variables for population per square kilometer (km^2) and square mile (mi^2) and create choropleth maps of these. We can use our total population (`c_race`) and land area (`ALAND`) columns. 

> `Area` is present in all census geographic data 
- in the [ALAND](https://www.census.gov/quickfacts/fact/note/US/LND110210) column as the land area per sq meter.
- and in the `AWATER` column as water area per sq meter

In [None]:
# Create population density variable
# Land area measurements are originally recorded as whole square meters 
# To convert square meters to square kilometers, divide by 1,000,000; 
# To convert square meters to square miles, divide by 2,589,988.
SQMETER_PER_SQKM = 1000000
SQMETER_PER_SQMILE = 2589988

tracts_acs_gdf_ac['pop_dens_km2'] = tracts_acs_gdf_ac['c_race']/ (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQKM)
tracts_acs_gdf_ac['pop_dens_mi2'] = tracts_acs_gdf_ac['c_race']/ (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQMILE)

We can check our geodataframe to make sure our new variables have been incorporated.

In [None]:
tracts_acs_gdf_ac.head(3)

#### Always check your calculations!
You can compare the land area of [Alameda County](https://en.wikipedia.org/wiki/Alameda_County,_California) to that listed in Wikipedia to check your math (739 sq mi / 1,910 km2).

In [None]:
print("Land area of Alameda county in square km:", (tracts_acs_gdf_ac['ALAND']/SQMETER_PER_SQKM).sum().round())
print("Land area of Alameda county in square miles:", (tracts_acs_gdf_ac
                                                       ['ALAND']/SQMETER_PER_SQMILE).sum().round())

Now let's plot population density per sq kilometer ('pop_dens_km2').

- Consider how it differs from the map of population count that we made above.

In [None]:
# Plot population density - km^2
fig, ax = plt.subplots(figsize = (10,10)) 
tracts_acs_gdf_ac.plot(column='pop_dens_km2', legend=True,
                    legend_kwds={'label': "Population per Sq KM",
                                 'orientation': "horizontal"},
                    ax=ax)
plt.show()

#### Exercise 

Now you try it! Map population density per sq miles.

In [None]:
# Plot population density - miles^2

Our population maps look dark blue for the most part. What does that mean? Write what you think below

In [None]:
# Put your thoughts here 

When color bunching occurs it's best to see what the distribution of your data is like. In fact it is always a good idea to explore your data values as you prepare your maps.

#### Exercise 
Plot a histogram of your `pop_dens_km2` below and consider how the distribution of values impacts the colors in the choropleth map.

In [None]:
# histogram of pop_dens_km2

*Click here for answers*

<!--- 
# # SOLUTION
# # histogram of pop_dens_km2
# tracts_acs_gdf_ac['pop_dens_km2'].hist()
--->

#### Looking Ahead

In the next lesson we'll take a deeper dive into mapping and learn about `classification schemes` and `color palettes` so we can avoid color bunching.

### Saving a geodataframe to a file

Let's not forget to save out our Alameda County geodataframe `tracts_acs_gdf_ac`. By saving it we will not need to repeat the processing steps and attribute join we did above.

We can save to a shapefile.

In [None]:
tracts_acs_gdf_ac.to_file("../outdata/tracts_acs_ac.shp")

One of the problems of saving to a shapefile is that our column names get truncated to 10 characters (a shapefile limitation.) 

Instead of renaming all columns with obscure names that are less than 10 characters, we can save our geodatafraem to a spatial data file format that does not have this limation - [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON) or [GPKG](https://en.wikipedia.org/wiki/GeoPackage) (geopackage) file.
- These formats have the added benefit of outputting only one file in contrast tothe multi-file shapefile format.

In [None]:
tracts_acs_gdf_ac.to_file("../outdata/tracts_acs_gdf_ac.json", driver="GeoJSON")

In [None]:
tracts_acs_gdf_ac.to_file("../outdata/tracts_acs_gdf_ac.gpkg", driver="GPKG")

We can also save out our data as a csv, dropping the geometry column.

In [None]:
tracts_acs_gdf_ac.drop('geometry',axis=1).to_csv("../outdata/tracts_acs_gdf_ac.csv") 

We can also save just the tract data we subsetted earlier into its own shapefile

In [None]:
tracts_gdf_ac.to_file("../outdata/tracts_ac.shp")

#### Exercise
Go ahead and save your SF county tracts geodataframe (`tracts_gdf_sf`) as a shapefile, GeoJSON, and csv file.

In [None]:
# Your code here

*Click here for answers*

<!--- 
    # SOLUTION
    # shapefile
    tracts_gdf_sf.to_file("../outdata/tracts_sf.shp")

    # SOLUTION
    # GeoJSON
    tracts_acs_gdf_sf.to_file("../outdata/tracts_acs_gdf_sf.json", driver="GeoJSON")

    # SOLUTION
    # csv
    tracts_acs_gdf_sf.drop('geometry',axis=1).to_csv("../outdata/tracts_acs_gdf_sf.csv") 
--->