# Geospatial Analysis

Practice some basic exploration of geospatial data.  Look for `<YOUR_TURN>` markers for edit the Python code.

In this practice, we will utilize the taxi-cab ride data.
Specifically, the two tables:
  * taxi zones
  * trips

```SQL
nyc-taxi-data=# \d taxi_zones
                                      Table "public.taxi_zones"
   Column   |            Type             |                        Modifiers                         
------------+-----------------------------+----------------------------------------------------------
 gid        | integer                     | not null default nextval('taxi_zones_gid_seq'::regclass)
 objectid   | integer                     | 
 shape_leng | numeric                     | 
 shape_area | numeric                     | 
 zone       | character varying(254)      | 
 locationid | smallint                    | 
 borough    | character varying(254)      | 
 geom       | geometry(MultiPolygon,4326) | 
Indexes:
    "taxi_zones_pkey" PRIMARY KEY, btree (gid)
    "index_taxi_zones_on_geom" gist (geom)
    "index_taxi_zones_on_locationid" btree (locationid)

nyc-taxi-data=# \d trips
                                           Table "public.trips"
        Column         |            Type             |                     Modifiers                      
-----------------------+-----------------------------+----------------------------------------------------
 id                    | integer                     | not null default nextval('trips_id_seq'::regclass)
 cab_type_id           | integer                     | 
 vendor_id             | character varying           | 
 pickup_datetime       | timestamp without time zone | 
 dropoff_datetime      | timestamp without time zone | 
 store_and_fwd_flag    | character(1)                | 
 rate_code_id          | integer                     | 
 pickup_longitude      | numeric                     | 
 pickup_latitude       | numeric                     | 
 dropoff_longitude     | numeric                     | 
 dropoff_latitude      | numeric                     | 
 passenger_count       | integer                     | 
 trip_distance         | numeric                     | 
 fare_amount           | numeric                     | 
 extra                 | numeric                     | 
 mta_tax               | numeric                     | 
 tip_amount            | numeric                     | 
 tolls_amount          | numeric                     | 
 ehail_fee             | numeric                     | 
 improvement_surcharge | numeric                     | 
 total_amount          | numeric                     | 
 payment_type          | character varying           | 
 trip_type             | integer                     | 
 pickup_nyct2010_gid   | integer                     | 
 dropoff_nyct2010_gid  | integer                     | 
 pickup                | geometry(Point,4326)        | 
 dropoff               | geometry(Point,4326)        | 
Indexes:
    "trips_pkey" PRIMARY KEY, btree (id)
```


In [None]:
import matplotlib.pyplot as plt
import geopandas as gpd
import psycopg2

con = psycopg2.connect(database="nyc-taxi-data", user="dsa_ro_user",password="readonly",host="dbase")

# Pull the zone, borough and polygon of the taxi zone 
# from the taxi zones table.
sql= "SELECT zone, borough, geom FROM taxi_zones"

zones=gpd.GeoDataFrame.from_postgis(sql,con,geom_col='geom' )

# Edit this Line to pull 1000 taxi trips from the database
# ensure the 
sql= "SELECT * FROM trips WHERE pickup IS NOT NULL LIMIT 1000"

                                             # choosing pickup as the spatial column.
pickups=gpd.GeoDataFrame.from_postgis(sql,con,geom_col="pickup" )


In [None]:
zones.describe()

In [None]:
pickups.describe()

## Plotting 

Produce a geospatial (map) plot of the data as described for each.

Additionally, we can overlay additional inforamtion.

Instead of using scatter plot, use geopandas to overlay the taxi stops.

Read more about mapping data with GeoPandas here: http://geopandas.org/mapping.html

In [None]:
%matplotlib inline
# Build a base map
base = zones.plot(figsize=(15,15),color="white")

# Your Code below this comment
# -----------------------------

pickups.plot(ax=base, marker='o', color='red', markersize=5)




## Plotting with Aggregation

In the next practice, pull the taxi zones from the database.
Show all the taxi zones in the first cell, then show the merged map

In [None]:
sql = "select borough, geom from taxi_zones"


# Your Code below this comment
# -----------------------------

zones.plot(figsize=(15,15))




In [None]:
# Your Code below this comment
# -----------------------------

sql2 = "select borough, ST_Union(geom) AS geom1 from taxi_zones GROUP BY borough"

zones2 = gpd.GeoDataFrame.from_postgis(sql2,con,geom_col='geom1' )

zones2.plot(figsize=(15,15))

## BONUS/Challenge Problem for students that have had DB/SQL and Python.

Collect the borough and **number of taxi rides that originate within that burrow**.

Plot a choropleth that use the number of taxi rides originating in that bourough for the statistic.


In [None]:
# Your Code below this comment
# -----------------------------


%matplotlib inline

sql2 =  "WITH boroughs AS (SELECT borough, ST_Union(geom)  AS geom1 " + \
            "FROM taxi_zones AS zone GROUP BY borough) " + \
        "SELECT geom1, count(trips.id) " + \
        "FROM boroughs, (SELECT * from trips limit 10000) AS trips " + \
        "WHERE ST_Intersects(boroughs.geom1, trips.pickup) " + \
         "GROUP BY boroughs.geom1"



print("got it")
# sql2 = "SELECT borough, ST_Union(zone.geom)  AS geom1, count(trips.id) AS count " + \
#         "FROM taxi_zones AS zone, trips   " + \
#         "WHERE ST_Intersects(geom1, trips.pickup) GROUP BY taxi_zones.borough"

zones3 = gpd.GeoDataFrame.from_postgis(sql2,con,geom_col='geom1' )

zones3.plot(column="count",
            k=6,
            figsize=(15,15))





# SAVE YOUR NOTEBOOK, the File > "Close and Halt"