<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Austin Bike Share</b>
</header>

<p style = 'font-size:16px;font-family:Arial'>Bike shares are becoming a popular alternative means of transportation. Suppose you had a transportation business servicing the public with various stations where they could access your transportation services.  You need to make sure you have equipment at the stations when the public needs them. You also know that the demand for your transportation services are dramatically impacted by the weather.  This demonstration shows how to integrate the historical trip information with weather information, leveraging Vantage Geospatial and timeseries capabilities to improve your service and grow your business.
<br>
The City of Austin makes data available on >649k bike trips over 2013-2017.</p>
<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Contents:</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Initiate a connection to Vantage</li>
    <li>Explore the data </li>
    <li>Create and Explore Temporal, Time index and Geospatial data </li>
    <li>Clean up </li>
<hr>

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'> Accessing the Data
<p style = 'font-size:16px;font-family:Arial'>These demos will work either with foreign tables accessed from Cloud Storage via NOS or you may import the tables to your machine. If you import data for multiple demos, you may need to use the Data Dictionary "Manage Your Space" routine to cleanup tables you no longer need. 
    
<p style = 'font-size:16px;font-family:Arial'>Use the link below to access the 2 options for using data from the data dictionary notebook:

[Click Here to get data for this notebook](../Data_Dictionary/Data_Dictionary.ipynb#TRNG_AustinBikeShare)

[Click Here to Manage Your Space](../Data_Dictionary/Data_Dictionary.ipynb#Manage_Your_Space)

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>1. Initiate a connection to Vantage</b>
<p style = 'font-size:16px;font-family:Arial'>You might be prompted to enter the password.</p>

In [None]:
%connect local

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>2. Explore the data</b>
<p style = 'font-size:16px;font-family:Arial'>As a warm up, let us look at the tables we have in our database TRNG_AustinBike</p>       

In [None]:
SELECT 
    DatabaseName,
    TableName
FROM
    DBC.Tables
WHERE
    DatabaseName = 'TRNG_AustinBikeShare'

<p style = 'font-size:16px;font-family:Arial'>We can see that we have 3 tables in our database. The Trips table contains data of the trips taken using the bikes, stations table has locations of the bike stations and weather table has details about the weather.
    <br>
The query below shows the number of rows in each of the tables in the database.</p>

In [None]:
SELECT
(
    SELECT COUNT(*)
    FROM TRNG_AustinBikeShare.trips
) AS trips,
(
    SELECT COUNT(*)
    FROM TRNG_AustinBikeShare.stations
) AS stations,
(
    SELECT COUNT(*)
    FROM TRNG_AustinBikeShare.weather
) AS weather;

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.1 Examine the trips table</b></p>    
<p style = 'font-size:16px;font-family:Arial'>Let's look at the sample data in the trips table</p>

In [None]:
SELECT
    *
FROM
    TRNG_AustinBikeShare.trips
SAMPLE 10;

<p style = 'font-size:16px;font-family:Arial'>Which type of subscribers take most of the rides?</p> 

In [None]:
select count(trip_id) as ride_count, subscriber_type from TRNG_AustinBikeShare.trips group by subscriber_type order by 1 desc;

<p style = 'font-size:16px;font-family:Arial'>From the above result we can say that <b>Walk Up</b> rides are <b>250%</b> more than second most popular subscription type.
    <br><br>
    From which station do highest number of trips start?</p>  

In [None]:
SELECT
    TOP 20
    start_station_name,
    COUNT(1) AS trips
FROM
    TRNG_AustinBikeShare.trips
GROUP BY 1
ORDER BY 2 DESC

In [None]:
%chart start_station_name, trips, title=Trips by station, height=200, width=700

<p style = 'font-size:16px;font-family:Arial'>We clearly see that <b>Riverside @ S. Lamar</b> has highest number of trips originating from here. </p>
<p style = 'font-size:16px;font-family:Arial'>Let's see average number of trips originating per from a station.</p>

In [None]:
select avg(trips) from (
    SELECT
    start_station_name,
    COUNT(1) AS trips
    FROM
        TRNG_AustinBikeShare.trips
    GROUP BY 1
) as t;

<p style = 'font-size:16px;font-family:Arial'>We see that the top station <b>Riverside @ S. Lamar</b> has <b>4 times more trips</b> than the average.</p>
<p style = 'font-size:16px;font-family:Arial'>Now let's look at the pattern of bike usage over time. </p>    

In [None]:
SELECT
    trunc(start_time, 'Month') AS start_Month,
    COUNT(1) AS trips
FROM
    TRNG_AustinBikeShare.trips
GROUP BY 1
ORDER BY 1

In [None]:
%chart start_Month, trips, title=Trips by day, typex=t, width=700

<p style = 'font-size:16px;font-family:Arial'>In the above chart we observe few things:</p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>There are two months where the data is nearly missing</li>
    <li>The peak usage month is as much as 30k trips in a month</li>
    <li>March and October are first and second busiest months across the data of 4 years.</li>
</ol>

<p style = 'font-size:16px;font-family:Arial'>Can this be related to weather? Is the weather in March and October favourable for biking? Let's see in the next section.</p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.2 Examine the weather table</b></p>    

In [None]:
SELECT TOP 10 * FROM TRNG_AustinBikeShare.weather

<p style = 'font-size:16px;font-family:Arial'>The temperature data appears to be reported hourly (the minutes and seconds are always zero). The temperature columns are in Kelvin which few people use to decide if it is good bicycle weather, so we will create a view over the weather table to convert the temperature to Fahrenheit. We will also average the temperature for the day.</p>

In [None]:
replace view austin_weather as
select trunc(dt, 'Month') dt, 
round( avg( (temp - 273.15) * 9/5 + 32 ) ,0) AveTemp,
sum(case when weather_main in ('Rain', 'Mist') then 1 else 0 end) Precip_hours
from TRNG_AustinBikeShare.weather group by 1;

In [None]:
Select  * from austin_weather order by 1;

<p style = 'font-size:16px;font-family:Arial'>If we plot the data, we find we are missing some data, but we get an idea of the typical temperature ranges.  If we look at the hours each month where precipitation is occuring, we see some patterns there that could also be impacting the number of trips.  </p>

In [None]:
%chart dt, avetemp, width=800, title=Average Temperature by Month

<p style = 'font-size:16px;font-family:Arial'>Here we can observe that for almost all March and October months, the temperature is around 70 degree Fahrenheit. This seems to be a favourble temperature for biking as it is not too cold nor too hot.</p>

In [None]:
%chart dt, Precip_hours, width=800, title=Average Precip Hours by Month

<p style = 'font-size:16px;font-family:Arial'>From the above two charts, we can say that March and October have favourable conditions for biking which reflects in the increased bike rides.</p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.3 Geospatial data</b></p>    

<p style = 'font-size:16px;font-family:Arial'>The Geospatial columns have a type and one or more pairs of Latitude and Longitude. The Latitude and Longitude columns were included in the table so you could see how a simple geospatial feature (a POINT) is represented.
    <br>
For more geospatial datatypes supported by Teradata, please click <a href = 'https://docs.teradata.com/r/Teradata-VantageTM-Geospatial-Data-Types-17.20/Geospatial-Data/Geometry-Types'>here</a>.</p>

In [None]:
SELECT
    TOP 10 *
FROM
    trng_austinbikeshare.stations

<p style = 'font-size:16px;font-family:Arial'>There are numerous geospatial functions, but we can demonstrate the basics by finding the distance from the main office (station_id = 1001) to other stations.</p>
<p style = 'font-size:16px;font-family:Arial'>
For more geospatial functions supported by Teradata, please click <a href = 'https://docs.teradata.com/r/Teradata-VantageTM-Geospatial-Data-Types-17.20'>here</a>.</p>

In [None]:
select top 10 station.station_id, station.name, 
round(office.location.ST_SphericalDistance(station.location), 0) Distance_Meters
from trng_austinbikeshare.stations station, trng_austinbikeshare.stations office where office.station_id = 1001 order by 1;

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>3. Create and Explore Temporal, Time index and Geospatial data</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.1 Create a temporal table with weather data</b></p>    
<p style = 'font-size:16px;font-family:Arial'>Temporal tables store and maintain information with respect to time. Using temporal tables, Vantage can process statements and queries that include time-based reasoning. Temporal tables include one or two special columns, which store time information:
<ul style = 'font-size:16px;font-family:Arial'>
    <li>A transaction-time column records and maintains the time period for which Vantage was aware of the information in the row. Vantage automatically enters and maintains the transaction-time column data, and consequently automatically tracks the history of such information.</li>
    <li>A valid-time column models the real world, and stores information such as the time an insurance policy or product warranty is valid, the length of employment of an employee, or other information that is important to track and manipulate in a time-aware fashion. When you add a new row to this type of table, you use the valid-time column to specify the time period for which the row information is valid. This is the period of validity (PV) of the information in the row.</li>
</ul>
</p>

In [None]:
CREATE TABLE weather_temporal (
    begin_dt      TIMESTAMP(6) NOT NULL,
    end_dt        TIMESTAMP(6) NOT NULL,
    temp          FLOAT,
    temp_min      FLOAT,
    temp_max      FLOAT,
    pressure      INTEGER,
    humidity      INTEGER,
    wind_speed    INTEGER,
    wind_deg      INTEGER,
    rain_1h       FLOAT,
    rain_3h       FLOAT,
    clouds        INTEGER,
    weather_id    INTEGER,
    weather_main  VARCHAR(50),
    weather_desc  VARCHAR(50),
    weather_icon  VARCHAR(50),
    PERIOD FOR Weather_Duration(begin_dt,end_dt) AS VALIDTIME
)
PRIMARY INDEX (weather_id);

In [None]:
INSERT INTO weather_temporal
SELECT
    dt,
    dt + INTERVAL '59' MINUTE + INTERVAL '59' SECOND,
    round( ((temp - 273.15) * 9/5 + 32 ) ,0),
    round( ((temp_min - 273.15) * 9/5 + 32 ) ,0),
    round( ((temp_max - 273.15) * 9/5 + 32 ) ,0),
    pressure,
    humidity,
    wind_speed,
    wind_deg,
    rain_1h,
    rain_3h,
    clouds,
    weather_id,
    weather_main,
    weather_desc,
    weather_icon
FROM 
    TRNG_AustinBikeShare.weather;

In [None]:
SELECT TOP 10 * FROM weather_temporal;

<p style = 'font-size:16px;font-family:Arial'>Now we can efficiently answer time-based reasoning queries faster and efficiently with Temporal tables. For example, was the weather favourable to bike in the month of March and October 2016?</p>

In [None]:
select count(weather_main) as weather_hours, weather_main from (validtime period '(2016-03-01, 2016-03-31)'
select * from weather_temporal) as dt group by weather_main;

In [None]:
%chart weather_main, weather_hours, width = 500, title = 'Duration(in hours) of weather by weather type(for March 2016)'

In [None]:
select count(weather_main) as weather_hours, weather_main from (validtime period '(2016-10-01, 2016-10-30)'
select * from weather_temporal) as dt group by weather_main;

In [None]:
%chart weather_main, weather_hours, width = 500, title = 'Duration(in hours) of weather by weather type(for October 2016)'

<p style = 'font-size:16px;font-family:Arial'>The above graphs suggest that March and October in the year 2016 had more days favourable for biking(clear, cloudy, mist) and hence explains the increased number bike rides.</p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.2 Create a view for all trips with start/end stations data and a GEOSEQUENCE with start/end lat/long/time</b></p>
<p style = 'font-size:16px;font-family:Arial'>The code below merely defined a view which enhances the trip data with a Geosequence field containing both the end points of the trip and the time at each point of the trip. </p>

In [None]:
REPLACE VIEW trips_geo AS
SELECT
    t.bikeid,
    t.trip_ID,
    t.subscriber_type,
    t.start_station_id,
    COALESCE(t.start_station_name, st.NAME) AS start_station_name,
    t.start_time,
    st.status starting_station_status,
    t.end_station_id,
    COALESCE(t.end_station_name, ed.NAME) AS end_station_name,
    t.start_time 
        + CAST(t.duration_minutes/60 AS INTERVAL HOUR(4)) 
        + CAST(t.duration_minutes MOD 60 AS INTERVAL MINUTE(4)) AS end_time,
    ed.status AS End_station_status,
    t.duration_minutes,
    CAST('GEOSEQUENCE( ('
        || COALESCE(st.Longitude,-98.272797)
        || ' '
        || COALESCE(st.Latitude,30.578245)
        || ','
        || COALESCE(ed.longitude,-98.272797)
        || ' '
        || COALESCE(ed.latitude,30.578245)
        || '), ('
        || CAST(CAST(t.start_time AS FORMAT 'yyyy-mm-ddbhh:mi:ss') AS VARCHAR(50))
        || ','
        || CAST(CAST(end_time AS FORMAT 'yyyy-mm-ddbhh:mi:ss') AS VARCHAR(50))
        || '), ('
        || '1,2), (0) )' AS ST_GEOMETRY) AS GEOM
FROM
    TRNG_AustinBikeShare.trips AS t
    LEFT JOIN TRNG_AustinBikeShare.stations AS st ON t.start_station_id = st.station_id
    LEFT JOIN TRNG_AustinBikeShare.stations AS ed ON t.end_station_id = ed.station_id;

In [None]:
SELECT TOP 10 * FROM trips_geo;

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.3 Create a Time Index table of the trips to accelerate time related analysis</b></p>
<p style = 'font-size:16px;font-family:Arial'>Vantage supports tables with a Primary Time Index (PTI) which is used to store and quickly lookup data that arrives based on time. This time-aware index both distributes data across the units of parallelism but allows the optimizer to build plans which go directly to the unit of parallelism where the data is stored based on the time constraint.<br><br>
In this case, we will declare the index to have hourly granularity with a baseline time earlier that any date of data we have. The database automatically creates the first column with the name TD_TIMECODE based on the primary index declaration. When we insert data, we will use the start_time column as that value </p>

In [None]:
CREATE TABLE trips_geo_pti (
    bikeid                    INTEGER,
    trip_id                   BIGINT,
    subscriber_type           VARCHAR(50),
    start_station_id          INTEGER,
    start_station_name        VARCHAR(100),
    starting_station_status   VARCHAR(50),
    end_station_id            INTEGER,
    end_station_name          VARCHAR(100),
    end_time                  TIMESTAMP(6),
    end_station_status        VARCHAR(50),
    duration_minutes          INTEGER,
    geom                      SYSUDTLIB.ST_GEOMETRY(16776192) INLINE LENGTH 9920
)
PRIMARY TIME INDEX (TIMESTAMP(6), DATE '2013-12-20', MINUTES(60));

<p style = 'font-size:16px;font-family:Arial'>We now populate the local table.  If you are getting data from the cloud storage, this could take a minute.   </p>

In [None]:
INSERT INTO trips_geo_pti
SELECT
    start_time,
    bikeid,
    trip_id,
    subscriber_type,
    start_station_id,
    start_station_name,
    starting_station_status,
    end_station_id,
    end_station_name,
    end_time,
    End_station_status,
    duration_minutes,
    geom
FROM
    trips_geo;

In [None]:
SELECT TOP 10 * FROM trips_geo_pti

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.4 Augment trips data with weather data and extract geospatial information</b></p> 
<p style = 'font-size:16px;font-family:Arial'>Finally, we bring the data together with the geosequenced trip information with the available weather data where the period of the weather report contains the start time (TD_TIMECODE) of the trip.</p>
<p style = 'font-size:16px;font-family:Arial'>
For more geospatial functions supported by Teradata, please click <a href = 'https://docs.teradata.com/r/Teradata-VantageTM-Geospatial-Data-Types-17.20'>here</a>.</p>

In [None]:
CREATE TABLE trips_and_weather AS (
    SELECT 
        t.start_station_name,
        t.end_station_name,
        t.bikeid,
        t.trip_id,
        t.geom.GetInitT() AS pickup_time,
        t.geom.GetFinalT() AS dropoff_time,
        t.geom.ST_POINTN(1).ST_SPHEROIDALDISTANCE(geom.ST_POINTN(2))/1000 AS total_distance,
        t.geom.ST_POINTN(1).ST_X() AS pickup_location_lon,
        t.geom.ST_POINTN(1).ST_Y() AS pickup_location_lat,
        t.geom.ST_POINTN(2).ST_X() AS dropoff_location_lon,
        t.geom.ST_POINTN(2).ST_Y() AS dropoff_location_lat,        
        t.duration_minutes,
        t.TD_TIMECODE as Trip_TIMECODE,
        wt.*
    FROM 
        trips_geo_pti AS t
        INNER JOIN Weather_temporal AS wt ON wt.weather_duration contains t.TD_TIMECODE
        and pickup_time >= '2017-07-01 00:00:00'
)
WITH DATA primary index(trip_id);

In [None]:
SELECT TOP 10 * FROM trips_and_weather where cast(pickup_time as date) between '2017-07-01' and '2017-07-31'

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>5. Clean up</b>

<p style = 'font-size:16px;font-family:Arial'>Drop the objects we created in our user database</p>

In [None]:
DROP TABLE weather_temporal;

In [None]:
DROP TABLE trips_geo_pti;

In [None]:
DROP TABLE trips_and_weather;

In [None]:
DROP VIEW trips_geo;

<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Information about Geospatial datatype can be found at: <a href = 'https://docs.teradata.com/r/Teradata-VantageTM-Geospatial-Data-Types-17.20'>https://docs.teradata.com/r/Teradata-VantageTM-Geospatial-Data-Types-17.20</a></li>
    <li>Information about Temporal datatype can be found at: <a href = 'https://docs.teradata.com/r/Teradata-VantageTM-Temporal-Table-Support-17.20'>https://docs.teradata.com/r/Teradata-VantageTM-Temporal-Table-Support-17.20</a></li>
</ul>

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2022 Teradata. All Rights Reserved</footer>