# Data-driven Mapping

In [None]:
import pandas as pd
import geopandas as gpd

import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline  

In [None]:
counties = gpd.read_file('../data/california_counties/CaliforniaCounties.shp')

---

### Challenge 1: Plotting Different Columns

Why are we plotting `POP12_SQMI` instead of `POP2012`? What do each of the two columns tell us?

Try plotting `POP2012`, instead. What does this look like?

---

Plotting population per square meter conveys density. When we plot total population, Los Angeles and the surrounding areas tend to dominate the plot.

In [None]:
counties.plot(column='POP2012',
              figsize=(10, 10),
              legend=True,
              legend_kwds={'label': "Population",
                           'orientation': "horizontal"})

### Classification Schemes and GeoDataFrames

Classification schemes can be implemented using the geodataframe `plot` method by setting a value for the **scheme** argument. This requires the [pysal](https://pysal.org/) and [mapclassify](https://pysal.org/mapclassify) libraries to be installed in your Python environment. 

Here is a list of the `classification schemes` names that we will use:

- `equalinterval`
- `quantiles`
- `fisherjenks`
- `naturalbreaks`
- `userdefined`.

For more information about these classification schemes, see the [pysal mapclassifiers web page](https://pysal.org/mapclassify/api.html) or check out the help docs.

Let's redo the last map using the `quantile` classification scheme.

In [None]:
# Plot population density
fig, ax = plt.subplots(figsize=(10, 5)) 
counties.plot(column='POP12_SQMI', 
              scheme="quantiles",
              legend=True,
              ax=ax)
ax.set_title("Population Density per Square Mile")

What is different about the code? About the output map?

---

### Challenge 2: Reflecting on Chloropleth Maps

1. What new options and operations have we added to our code?
2. Based on our code, what title would you give this plot to describe what it displays?
3. How many bins do we specify in the `legend_labels_list` object, and how many bins are in the map legend? Why?

---

1. We added an edge color as well as a line width specification.
2. "Californians Identifying as Hispanic Largely Live in Southern California"
3. We specified 5 bins, but only 4 are shown in the legend. This is because there are no counties that are 80% - 100% Hispanic.

---

### Challenge 3: Data-Driven Mapping

Point and polygons are not the only geometry-types that we can use in data-driven mapping!

Run the next cell to load a dataset containing Berkeley's bicycle boulevards (which we'll be using more in the following notebook).

Then, in the following cell, write your own code to:

1. Plot the bike boulevards;
2. Color them by status (find the correct column in the head of the dataframe, displayed below);
3. Color them using a fitting, good-looking qualitative colormap that you choose from [The Matplotlib Colormap Reference](https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html);
4. Set the line width to 5 (check the plot method's documentation to find the right argument for this!);
4. Add the argument `figsize=(15, 15)`, to make your map nice and big and visible!
    
Using this map, answer the following questions:

1. What does that map indicate about the status of the Berkeley bike boulevards?
2. What does that map indicate about the status of your Berkeley bike-boulevard *dataset*?

---

In [None]:
bike_blvds = gpd.read_file('../data/transportation/BerkeleyBikeBlvds.geojson')
bike_blvds.head()

In [None]:
bike_blvds.plot(column='Status',
                cmap='Dark2',
                linewidth=5,
                legend=True,
                figsize=(15, 15))

All bike boulevards exist, but it seems like there are some typos in our DataFrame!