# More Data, More Maps!

Now that we know how to pull in data, check and transform Coordinate Reference Systems (CRS), and plot GeoDataFrames together, let's practice doing the same thing with other geometry types. In this notebook we'll be bringing in maps of bike boulevards and schools, which will prime us to think about spatial relationship questions.


<!---
- Expected time to complete
    - Lecture + Questions: 30 minutes
    - Exercises: 20 minutes
-->

In [None]:
import pandas as pd
import geopandas as gpd

import matplotlib
import matplotlib.pyplot as plt

%matplotlib inline  

## Berkeley Bike Boulevards

We're going to bring in data bike boulevards in Berkeley. Note two things that are different from our previous data:

- We're importing a [GeoJSON](https://en.wikipedia.org/wiki/GeoJSON) this time, and not a shapefile.
- We have a **line** geometry GeoDataFrame, while our county and states data had **polygon** geometries.

In [None]:
bike_blvds = gpd.read_file('../data/transportation/BerkeleyBikeBlvds.geojson')
bike_blvds.plot()

As usual, we'll want to do our typical data exploration...

In [None]:
bike_blvds.head()

In [None]:
bike_blvds.shape

In [None]:
bike_blvds.columns

Our bike boulevard data includes the following information:

- `BB_STRNAM`: Bike boulevard street name
- `BB_STRID`: Bike boulevard street ID
- `BB_FRO`: Bike boulevard origin street
- `BB_TO`: Bike boulevard end street
- `BB_SECID`: Bike boulevard section id
- `DIR_`: Cardinal directions the bike boulevard runs
- `Status`: Status on whether the bike boulevard exists
- `ALT_bikeCA`: Unclear what this column indicates
- `Shape_len`: Length of the boulevard in meters 
- `len_km`: Length of the boulevard in kilometers
- `geometry`: Our standard geometry column for GeoDataFrames

Let's go ahead and check out the CRS that comes with the GeoDataFrame:

In [None]:
bike_blvds.crs

Let's tranform our CRS to NAD83 / UTM Zone 10N that we used in the last lesson.

In [None]:
bike_blvds_utm10 = bike_blvds.to_crs("epsg:26910")

In [None]:
bike_blvds_utm10.head()

In [None]:
bike_blvds_utm10.crs

---

### Challenge 1: 

The GeoDataFrame has 211 samples, indicating 211 lines. However, when we plot the GeoDataFrame, we only see 8 bike boulevards. How do we explain the discrepancy?

You may find it helpful to look closely at the GeoDataFrame, or even plot subsets of it. You could even plot the entire GeoDataFrame, and then plot a subset of it on top of the original plot, but with a different color.

---

In [None]:
# YOUR CODE HERE


## Alameda County Schools

Alright! Now that we have our bike boulevard data squared away, we're going to bring in our Alameda County school data.

In [None]:
schools_df = pd.read_csv('../data/alco_schools.csv')
schools_df.head()

In [None]:
schools_df.shape

Wait, does this look right? Always be sure to check the data that you import. Is it what you expect?

This is not a GeoDataFrame! A couple of clues to figure that out are..

1. We're pulling in a Comma Separated Value (CSV) file, which is not a geospatial data format.
2. There is no geometry column (although we do have latitude and longitude values)

Although our school data is not starting off as a GeoDataFrame, we actually have the tools and information to make it one. Using the `gpd.GeoDataFrame` constructor, we can transform our plain DataFrame into a GeoDataFrame (specifying the geometry information and then the CRS).

In [None]:
schools_gdf = gpd.GeoDataFrame(data=schools_df, 
                               geometry=gpd.points_from_xy(schools_df.X, schools_df.Y))

In [None]:
schools_gdf.head()

In [None]:
print(schools_gdf.crs)

In [None]:
# Assign a CRS
schools_gdf.crs = "epsg:4326"
schools_gdf.head()

You'll notice that the shape is the same from what we had as a dataframe, just with the added `geometry` column.

In [None]:
schools_gdf.shape

And with it being a GeoDataFrame, we can plot it as we did for our other data sets.
Notice that we have our first **point** geometry GeoDataFrame.

In [None]:
schools_gdf.plot()

We'll want to transform the CRS, so that we can later plot it with our bike boulevard data:

In [None]:
schools_gdf_utm10 = schools_gdf.to_crs("epsg:26910")
schools_gdf_utm10.plot()

In Lesson 2, we discussed that you can save out GeoDataFrames in multiple file formats. You could opt for a GeoJSON, a shapefile, etc. For point data sets, we have the additional option to save it out as a CSV since the geometry isn't complicated.

---

### Challenge 2: Even More Data!

Let's play around with another GeoDataFrame with point geometry.

In the code cell provided below, compose code to:

1. Read in the parcel points data (`../data/parcels/parcel_pts_rand30pct.geojson`).
2. Transform the CRS to EPSG:26910.
3. Plot and customize as desired!

---

In [None]:
# YOUR CODE HERE


## Map Overlays with Matplotlib

No matter what geometry type we have for our GeoDataFrame, we can create overlay plots.

Since we've already done the legwork of transforming our CRS, we can go ahead and plot them together.

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
bike_blvds_utm10.plot(ax=ax, color='red')
schools_gdf_utm10.plot(ax=ax)

If we want to answer questions like *"What schools are close to bike boulevards in Berkeley?"*, the above plot isn't super helpful, since the extent covers all of Alameda county.

Luckily, GeoDataFrames have an easy method to extract the minimium and maximum values for both x and y, so we can use that information to set the bounds for our plot.

In [None]:
x_min, y_min, x_max, y_max = bike_blvds.total_bounds
print(x_min, y_min, x_max, y_max)

Using `set_xlim()` and `set_ylim()`, we can zoom in to see if there are schools proximal to the bike boulevards.

In [None]:
# Create figure and axis objects
fig, ax = plt.subplots(figsize=(10, 10))
# Plot geometries
bike_blvds_utm10.plot(ax=ax, color='red')
schools_gdf_utm10.plot(ax=ax)
# Set bounds
ax.set_xlim(x_min, x_max)
ax.set_ylim(y_min, y_max)

## Overview

In this lesson, we learned a several new skills:
- We Transformed an a-spatial dataframe into a geospatial one.
    - `gpd.GeoDataFrame`
- We worked with point and line GeoDataFrames.
- We overlayed point and line GeoDataFrames.
- We limited the extent of a map.
    - `total_bounds`

---

### Challenge 3: Overlay Mapping

Let's take some time to practice reading in and reconciling new datasets, then mapping them together.

In the code cell provided below, write code to:

1. Import your Berkeley places shapefile (and don't forget to check/transform the crs!) (`../data/berkeley/BerkeleyCityLimits.shp`).
2. Overlay the parcel points on top of the bike boulevards.
3. Create the same plot but limit it to the extent of Berkeley city limits.

***BONUS***: Add the Berkeley outline to your last plot!

---

In [None]:
# YOUR CODE HERE


## A Teaser for Day 2...

You may be wondering if and how we could make our maps more interesting and informative than what we've seen so far.

To give you a tantalizing taste of Day 2, the answer is: Yes, we can! And here's how!

In [None]:
ax = schools_gdf_utm10.plot(column='Org',
                            cmap='winter', 
                            markersize=35,
                            edgecolor='black',
                            linewidth=0.5,
                            alpha=1,
                            figsize=(9, 9),
                            legend=True)
ax.set_title('Public and Private Schools, Alameda County')