# Long Term Analyses with the Data Science Package

One of the capabilities of the Data Science Package is to be able to do long term analyses with a few queries. This may allow you to easily determine the effect of a policy or enviornmental change. In this example, we will see whether stop times at Pearson Airport are shorter because of a new policy around unloading. This query will also show you how to use BigQuery's GIS functions to do spatial analyses. 

The easiest way to separate out stops in areas of interest from other areas would be to use shapefiles, which can serve the same purpose as making zones in MyGeotab. BigQuery has functions that can do spaitial analysis, and we can do that here. For simplicity's sake, let us compare stoptimes from one month of trips data to another. We are going to use the function ST_Contains to accurately capture the stops that happened within this region.

In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import IFrame
import folium
import json
import csv



In [0]:
projectname='YOUR PROJECT NAME HERE'

In [0]:
queryGIS="""
Select Day, Month, AvgStopDuration, No_Records
  From
  (Select Extract(Date from StopTime) As Day, Extract(Month from StopTime) AS Month, Avg(StopDuration) as AvgStopDuration, Count(Distinct(SerialNo)) As No_Records
    From(
        Select SerialNo, TripId, Starttime, StopTime, StopLatitude, StopLongitude, St_GeogPoint(stopLongitude, Stoplatitude) as G, DrivingDuration, StopDuration 
          From `geotab-dsp-$name.Interpolated.Trips` 
         Where _PartitionTime between Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 1 Month))
                and Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 1 Day))
                and StartTime between Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 1 Month))
                and Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 1 Day))
        )
   Where ST_Contains(ST_GEOGFROMTEXT('Polygon(( -79.63160991668701 43.696176153756106, -79.63703870773315 43.69242160434331, -79.61158990859985 43.67539906674, -79.59914445877075 43.68387214592166, -79.63160991668701 43.696176153756106 ))'), G)
   Group by Extract(Date From StopTime), Extract(Month from StopTime))
 Where No_Records>20
  """

In [0]:
df_stop=pd.read_gbq(queryGIS.replace('$name', projectname), project_id='geotab-dsp-'+projectname, dialect='standard')

In [0]:
df_stop

You can then use this data to do analysis on the stop duration, and if the stoptime as differs from one month to the next. 

GIS functions also help you easily create informational maps such as chloropleth maps. In order to understand stop times in different areas of the airport, for example, you can create a table in your project that has GeoJSON files that refer to areas of interest. You can then combine the table in bigquery in order to join your geographical and statistical information. For example, if you wanted to understand the average Stop Time of your vehicles at different areas of Pearson Airport. The query below takes all of your trips that stop at Pearson Airport, and then you can combine them with the three areas of interest at the airport. 

In [0]:
queryHM= """
With Trips As (
  Select *
    From(
      Select TripId, StopLatitude as Lat, StopLongitude as Long, St_GeogPoint(StopLongitude, Stoplatitude) as G, StopDuration 
            From `geotab-dsp-$name.Interpolated.Trips` 
            Where _PartitionTime between Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 2 Month))
                  and Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 1 Month))
                  and StartTime between Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 2 Month))
                  and Timestamp(Date_Sub(Date_Trunc(Current_Date(), Month), Interval 1 Month))
        )
    Where ST_Contains(ST_GEOGFROMTEXT('Polygon(( -79.63160991668701 43.696176153756106, -79.63703870773315 43.69242160434331, -79.61158990859985 43.67539906674, -79.59914445877075 43.68387214592166, -79.63160991668701 43.696176153756106 ))'), G)
  ) 
, Areas As (
  Select *, ST_GEOGFROMTEXT(GEOM) As DC_Shape 
   From `geotab-dsp-$OurTables.OurGeoData.Pearson_Areas`
  )
 
 
Select AVg(StopDuration) AS Avg_Stop, Name, Geom 
From  (Select T.*, A.Name, A.Geom From Trips T Inner Join Areas A ON ST_Contains(DC_Shape,G))
Group by Name, Geom

"""

In [0]:
avg_duration =pd.read_gbq(queryHM.replace('$name',projectname), project_id='geotab-dsp-'+projectname  dialect='standard')

In [0]:
from google.colab import files
uploaded = files.upload()

In [0]:
from google.colab import files
uploaded = files.upload()

Saving Pearson.geojson to Pearson.geojson


In [0]:
pearson_areas=('Pearson.geojson')

In [0]:
avg_duration

Unnamed: 0,Name,AvgStopDuration,Geom
0,Loading Area 2,213.666667,POLYGON ((-79.61894989013672 43.69158378024691...
1,Loading Area 3,219.588235,POLYGON ((-79.62839126586914 43.69673466168258...
2,Loading Area 1,206.729167,POLYGON ((-79.59835052490234 43.68183933676057...


In [0]:
m = folium.Map(location=[43.680, -79.622], zoom_start=13)

In [0]:
m.choropleth(
  geo_data=pearson_areas,
  name='choropleth',
  data=avg_duration,
  columns=['Name','AvgStopDuration'],
  key_on='properties.name',
  fill_color='YlOrRd',
  fill_opacity=0.7,
  line_opacity=0.2,
  legend_name='Avg. Stop Duration (mins)'
  
)
folium.LayerControl().add_to(m)
m.save('test.html')

In [0]:
IFrame('test.html',500,500)