# Lab: Spatial Measurements and Spatial Transformations with BigQuery GIS and Python
This lab will teach you how to perform spatial measurements and spatial transformations in BigQuery GIS using Python and Jupyter notebooks.

BigQuery is a serverless data warehouse solution on Google Cloud Platform (GCP) that allows users to interact with their data using standard ANSI SQL.

Using the BigQuery Python API, BigQuery databases can be directly accessed via Python and/or Jupyter notebooks.

BigQuery GIS allows point, line or polygon geospatial data to be stored as a special `GEOGRAPHY` data type within a BigQuery table.  This `GEOGRAPHY` datatype is stored as a [well-known text](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) object. 

## Getting Started
In order to connect to BigQuery in Python, you need to install the BigQuery Python client library and the BigQuery Storage API.  You will also need to set up the notebook to use the "magic" command, which we will be using to access BigQuery throughout this lab.  

In addition, you will need to install the `pandas` package if you don't have it already installed.

Note: If you are running this notebook in a GCP environment, these packages should be pre-installed.

In [None]:
%pip install 'google-cloud-bigquery'
%pip install 'google-cloud-bigquery-storage'
%pip install 'pyarrow'
%pip install 'pandas'

%load_ext google.cloud.bigquery

## Calculating Distance Between Two Geospatial Points
Spatial measurement functions allow you to compute things such as distance and area using BigQuery's special `GEOGRAPHY` data type as input.

One of the functions in this type is [ST_DISTANCE](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_distance), which allows you to calculate the distance between two geospatial points.

For this example, we will calculate the distance between two New York City landmarks: The Empire State Building and the Statue of Liberty.

First, let's use the [ST_GEOGPOINT](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_geogpoint) spatial constructor function to create `GEOGRAPHY` data types for each location from their latitude/longitude coordinates.

We can directly write ANSI SQL to query BigQuery tables by using the `%%bigquery` [magic command](https://googleapis.dev/python/bigquery/latest/magics.html).

In [1]:
%%bigquery
SELECT 'Empire State Building' name,ST_GEOGPOINT(-73.9857,40.7484) location_geom;

Unnamed: 0,name,location_geom
0,Empire State Building,POINT(-73.9857 40.7484)


In [2]:
%%bigquery
SELECT 'Statue of Liberty' name,ST_GEOGPOINT(-74.0445,40.6892) location_geom;

Unnamed: 0,name,location_geom
0,Statue of Liberty,POINT(-74.0445 40.6892)


Next, we can calculating the distance using the `ST_DISTANCE` distance function.

The values of all distance functions are returned in meters (or square meters for the area function).

In [3]:
%%bigquery
SELECT ST_DISTANCE(
  ST_GEOGPOINT(-73.9857,40.7484), #Empire State Building GEOGRAPHY
  ST_GEOGPOINT(-74.0445,40.6892) #Statue of Liberty GEOGRAPHY
) distance_value;

Unnamed: 0,distance_value
0,8239.506227


The distance between these two landmarks is 8,240 meters or 8.24 kilometers.

**Try It:**  
Can you calculate the distance between the Empire State Building and Rockefeller Center?  
*Hint:* The coordinates for Rockefeller Center are: latitude = 40.7587, longitude = -73.9787.

In [None]:
%%bigquery
#Calculate the distance between the Empire State Building and Rockefeller Center
SELECT ST_DISTANCE(
  ??, #Empire State Building GEOGRAPHY
  ?? #Rockefeller Center GEOGRAPHY
) distance_value;

## Calculating Area of a Geospatial Polygon
For this next example, we will leverage the [public datasets](https://cloud.google.com/public-datasets) available from Google within BigQuery.

We can use the "places_georgia" table from the "geo_us_census_places" dataset to calculate the area of Atlanta, Georgia using the [ST_AREA](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_area) spatial measurement function.

In [4]:
%%bigquery
SELECT place_id, place_name, place_geom, ST_AREA(place_geom)/1000 area_value
FROM `bigquery-public-data.geo_us_census_places.places_georgia`
WHERE place_name = 'Atlanta';

Unnamed: 0,place_id,place_name,place_geom,area_value
0,1304000,Atlanta,"MULTIPOLYGON(((-84.428549 33.657486, -84.42854...",353968.153318


Atlanta is 354 million square kilometers in area!

Note that I divided the value by 1,000 in the SQL query, which converted the area value to square kilometers (instead of square meters).

**Try It:**  
Can you calculate the area of Plymouth, Massachusetts?  
*Hint:* Use the "places_massachusetts" table.

In [None]:
%%bigquery
SELECT place_id, place_name, place_geom, ST_AREA(??)/1000 area_value
FROM ??
WHERE place_name = 'Plymouth';

## Calculating the Centroid of a Geospatial Polygon
Spatial transformations generate new `GEOGRAPHY` data based on one or more `GEOGRAPHY` inputs.

One useful spatial transformation function is [ST_CENTROID](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_centroid), which calculates the geospatial point that is the geographic centroid of a polygon `GEOGRAPHY` field.

For this example, we will again use a datset from the Google public data: the US zip code table.

The `zip_code_geom` field is a `GEOGRAPHY` polygon for each zip code in the United States.

We will generate the centroid points for all zip codes in Manatee County, Florida.

In [5]:
%%bigquery
SELECT zip_code, city, county, state_code, ST_CENTROID(zip_code_geom) zip_code_centroid
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE state_code = 'FL' and county = 'Manatee County';

Unnamed: 0,zip_code,city,county,state_code,zip_code_centroid
0,34217,"Holmes Beach city, Bradenton Beach city",Manatee County,FL,POINT(-82.706100539919 27.493641341139)
1,34207,"South Bradenton CDP, Bayshore Gardens CDP",Manatee County,FL,POINT(-82.5801888849448 27.4375556191155)
2,34205,"South Bradenton CDP, Bradenton city, West Brad...",Manatee County,FL,POINT(-82.5833772249596 27.484120535388)
3,34203,"Bayshore Gardens CDP, West Samoset CDP",Manatee County,FL,POINT(-82.5093368898136 27.4435725319891)
4,34221,"Palmetto city, Ellenton CDP, Memphis CDP",Manatee County,FL,POINT(-82.5535753726386 27.5811887778284)
5,34215,"Cortez CDP, Bradenton Beach city",Manatee County,FL,POINT(-82.6849967120187 27.4717673438458)
6,34209,"Bradenton city, West Bradenton CDP",Manatee County,FL,POINT(-82.6483297532489 27.4983598416737)
7,34222,"Palmetto city, Ellenton CDP",Manatee County,FL,POINT(-82.5050428454843 27.5362333385473)
8,34216,Anna Maria city,Manatee County,FL,POINT(-82.7370504186488 27.5376107443427)
9,34208,"South Bradenton CDP, Bradenton city, West Samo...",Manatee County,FL,POINT(-82.5131228435458 27.4871432554532)


**Try It:**  
Can you calculate the zip code centroids for Galveston County, Texas?

In [None]:
%%bigquery
SELECT ??
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE ??;

## Creating Combined `GEOGRAPHY` Data
The next example uses the [ST_UNION_AGG](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_union_agg) function to create a multipoint `GEOGRAPHY` from an array of single geospatial points.

For this example, we will use the 2015 New York City tree census data from Google's public datasets to group living trees by neighborhoods and postal code.

In [6]:
%%bigquery
SELECT zipcode, nta_name neighborhood, boroname borough, ST_UNION_AGG(ST_GEOGPOINT(longitude,latitude)) tree_points
FROM `bigquery-public-data.new_york_trees.tree_census_2015`
WHERE status = 'Alive' and boroname = 'Manhattan'
GROUP BY zipcode, nta_name, boroname;

Unnamed: 0,zipcode,neighborhood,borough,tree_points
0,10034,Marble Hill-Inwood,Manhattan,"MULTIPOINT(-73.91238006 40.87039025, -73.91211..."
1,10040,Marble Hill-Inwood,Manhattan,"MULTIPOINT(-73.92426232 40.86062745, -73.92431..."
2,10039,Central Harlem North-Polo Grounds,Manhattan,"MULTIPOINT(-73.93553748 40.83492454, -73.93554..."
3,10037,Central Harlem North-Polo Grounds,Manhattan,"MULTIPOINT(-73.9357097 40.81856501, -73.935421..."
4,10030,Central Harlem North-Polo Grounds,Manhattan,"MULTIPOINT(-73.94806563 40.81549696, -73.94790..."
...,...,...,...,...
114,10065,Upper East Side-Carnegie Hill,Manhattan,"MULTIPOINT(-73.96608259 40.76243573, -73.96522..."
115,10075,Upper East Side-Carnegie Hill,Manhattan,"MULTIPOINT(-73.96434922 40.77599814, -73.96428..."
116,10022,Upper East Side-Carnegie Hill,Manhattan,"MULTIPOINT(-73.97028783 40.7631652, -73.970218..."
117,10010,Stuyvesant Town-Cooper Village,Manhattan,"MULTIPOINT(-73.97865039 40.7365079, -73.978637..."


**Try It:**  
Can you create multipoint `GEOGRAPHY` values for each neighborhood/zip code in the Bronx?  
*Hint:* The `boroname` field should be set equal to "Bronx"

In [None]:
%%bigquery
SELECT ??
FROM `bigquery-public-data.new_york_trees.tree_census_2015`
WHERE ??
GROUP BY zipcode, nta_name, boroname;

## Explore Further
There are other spatial measurement and spatial transformation functions available within BigQuery GIS, as well as other types of spatial functions.  To get more information on spatial functions, you can view the full documentation [here](https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions).