# Lab: Visualizing Geospatial Data with Python and BigQuery
This lab will teach you how to create maps from geospatial data using the GeoJSON extension for JupyterLabs.

## Creating Maps from GeoJSON Data
[GeoJSON](https://en.wikipedia.org/wiki/GeoJSON) is an open data standard that can represent geospatial data.  It can be used to represent points, linestrings, and polygons.  Each geospatial piece of data is called a "Feature" and multiple features are called a "FeatureCollection."

If you are familiar with Python dictionary data types, which consist of key/value pairs, you can think of GeoJSON data as a nested dictionary.

Let's start off with a simple GeoJSON object.  We will map the location of Times Square in New York City.

The coordinates are:
 * Latitude: 40.7580
 * Longitude: -73.9855

In [None]:
times_square = {"type":"Feature", "geometry": {"type": "Point", "coordinates": [-73.9855, 40.7580]}}

Now we can display this point on a map using the GeoJSON extension.

In [None]:
from IPython.display import GeoJSON

GeoJSON(times_square)

We could add properties to our GeoJSON that give additional information.

In [None]:
#Add data under the new "properties" key
times_square["properties"] = {"Name": "Times Square", "Latitude": 40.7580, "Longitude": -73.9855}

times_square

Now that we have seen how to map a single point, we can map many points by creating a Feature Collection.

First, we can save a few more New York landmarks in the single point GeoJSON format.

In [None]:
esb = {"type":"Feature", "geometry": {"type": "Point", "coordinates": [-73.9857, 40.7484]}, "properties": {"Name": "Empire State Building", "Latitude": 40.7484, "Longitude": -73.9857}}
columbus_circle = {"type":"Feature", "geometry": {"type": "Point", "coordinates": [-73.9819, 40.7681]}, "properties": {"Name": "Columbus Circle", "Latitude": 40.7681, "Longitude": -73.9819}}
rockefeller_center = {"type":"Feature", "geometry": {"type": "Point", "coordinates": [-73.9787, 40.7587]}, "properties": {"Name": "Rockefeller Center", "Latitude": 40.7587, "Longitude": -73.9787}}

Finally, we can incorporate all these data points into a Feature Collection.

In [None]:
nyc_landmarks = {"type":"FeatureCollection", "features": [times_square, esb, columbus_circle, rockefeller_center]}

nyc_landmarks

Let's see our four points on a map.

In [None]:
GeoJSON(nyc_landmarks)

Let's add another landmark: the Delacorte Theater in Central Park (this is where Shakespeare in the Park is performed every summer).

In [None]:
delacorte = {"type":"Feature", "geometry": {"type": "Point", "coordinates": [-73.9688, 40.7804]}, "properties": {"Name": "Delacorte Theater", "Latitude": 40.7804, "Longitude": -73.96887}}
nyc_landmarks["features"].append(delacorte)

GeoJSON(nyc_landmarks)

**Try It:**  
Now it's your turn to add a new landmark to our map.  Can you add Grand Central Station?  
*Hint:* The latitude is 40.7529 and the longitude is -73.9773.

In [None]:
#Create single point GeoJSON for Grand Central Station
gct = {??}

#Append Grand Central to our existing NYC landmarks GeoJSON
nyc_landmarks["features"].append(??)

#Map it!
GeoJSON(??)

## Mapping `GEOGRAPHY` Data From BigQuery GIS
BigQuery is a serverless data warehouse solution on Google Cloud Platform (GCP) that allows users to interact with their data using standard ANSI SQL.

Using the BigQuery Python API, BigQuery databases can be directly accessed via Python and/or Jupyter notebooks.

BigQuery GIS allows point, line or polygon geospatial data to be stored as a special `GEOGRAPHY` data type within a BigQuery table.  This `GEOGRAPHY` datatype is stored as a [well-known text](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) object.  

In order to connect to BigQuery in Python, you need to install the BigQuery Python client library and the BigQuery Storage API.  You will also need to set up the notebook to use the "magic" command we will be using to access BigQuery throughout this lab.  

In addition, you will need to install the `pandas` package if you don't have it already installed.

Note: If you are running this notebook in a GCP environment, these packages should be pre-installed.

In [None]:
%pip install 'google-cloud-bigquery'
%pip install 'google-cloud-bigquery-storage'
%pip install 'pyarrow'
%pip install 'pandas'

%load_ext google.cloud.bigquery

For this section of the lab, we will leverage the [public datasets](https://cloud.google.com/public-datasets) available from Google within BigQuery.  

Let's start by looking at some US zip code data.  We can directly write ANSI SQL to query BigQuery tables by using the `%%bigquery` [magic command](https://googleapis.dev/python/bigquery/latest/magics.html).

In [None]:
%%bigquery
SELECT zip_code, city, county, state_code, zip_code_geom
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE county = 'Los Angeles County'
LIMIT 10;

The table above shows 10 zip codes in Los Angeles County in California.  The `zip_code_geom` field is a polygon data object represented as BigQuery GIS's `GEOGRAPHY` data type.

Let's pick one specific (rather famous) zip code in Los Angeles County and convert the `zip_code_geom` field into GeoJSON format using the [ST_ASGEOJSON](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_asgeojson) formatter function.

We can use a tag after the `%%bigquery` magic command ("la_zips" in this case), which will automatically store the output of the query into a `pandas` dataframe.

In [None]:
%%bigquery la_zips
SELECT zip_code, city, county, state_code, zip_code_geom, ST_ASGEOJSON(zip_code_geom) zip_GeoJSON
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE zip_code = '90210';

Let's look at our `pandas` dataframe.  We have a new field called "zip_GeoJSON" that was created from the `GEOGRAPHY` data.

In [None]:
la_zips.head()

This is just the "geometry" portion of the more complex GeoJSON data structure.  The GeoJSON extension will let you map this geometry, but it will give a warning.

So instead, let's further convert it to the format we saw for our NYC landmarks example above.  We will do this with the assistance of the [json](https://docs.python.org/3/library/json.html) package, which allows us to encode and decode JSON objects in Python.

In [None]:
import json

la_zip_polygon = {"type":"Feature", "geometry": json.loads(la_zips["zip_GeoJSON"][0])}

GeoJSON(la_zip_polygon)

Now that we have done this for one zip code, let's go ahead and make a Feature Collection of a group of 100 zip codes in Los Angeles County.

In [None]:
%%bigquery la_zips --use_bqstorage_api
SELECT zip_code, city, county, state_code, zip_code_geom, ST_ASGEOJSON(zip_code_geom) zip_GeoJSON
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE county = 'Los Angeles County'
ORDER BY zip_code
LIMIT 100;

Let's use a [list comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) to create our "features" array.  This is a "Pythonic" way to create a list by looping over every item in an array.

In [None]:
la_zip_polygons = [{"type":"Feature", "geometry": json.loads(geom)} for geom in la_zips["zip_GeoJSON"]]

We can now build our final feature collection of zip code polygons and map it.

In [None]:
la_feature_collection = {"type":"FeatureCollection", "features": la_zip_polygons}

GeoJSON(la_feature_collection)

**Try It:**  
Using the Google zip code public data, can you create a map of the zip codes in King County, Washington?

In [None]:
%%bigquery seattle_zips --use_bqstorage_api
#Query to get zip codes for Seattle area (King County, WA)
SELECT zip_code, city, county, state_code, zip_code_geom, ST_ASGEOJSON(??) zip_GeoJSON
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE county = 'King County' and state_code = 'WA'
ORDER BY zip_code;

In [None]:
#Create array of zip code features
#Note: You can use a loop here to append each polygon if you are not comfortable with list comprehensions
seattle_zip_polygons = [??]

#Create feature collection
seattle_feature_collection = {??}

#Map it!
GeoJSON(??)

## Explore Further
There are many other ways to represent and visualize geospatial data within Python.  Some other packages for you to check out if you are interested in learning more about map visualizations are:  
 * [geopandas](https://geopandas.org/): This package extends `pandas` dataframes to include a "geometry" object, very similar to the `GEOGRAPHY` data type seen in BigQuery GIS.
 * [Plotly](https://plotly.com/python/maps/): Popular visualization library that has extensions for mapping.
 * [Bokeh](https://docs.bokeh.org/en/latest/docs/user_guide/geo.html): Another popular visualization library that allows you to create interactive maps.