<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>4D Analytics using the New York City Taxi dataset --Timeseries Analytics</b>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Introduction</b></p>


<p style = 'font-size:16px;font-family:Arial'>
This is a demonstration of Vantage capabilities for
    <li style = 'font-size:16px;font-family:Arial'>
Timeseries and Primary Time Index
    </li>
    <p>
        

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b> Accessing the Data </b> </p>
<p style = 'font-size:16px;font-family:Arial'>These demos will work either with foreign tables accessed from Cloud Storage via NOS or you may import the tables to your machine. If you import data for multiple demos, you may need to use the Data Dictionary "Manage Your Space" routine to cleanup tables you no longer need.     
    
<p style = 'font-size:16px;font-family:Arial'>Use the link below to access the 2 options for using data from the data dictionary notebook:

[Click Here to get data for this notebook](../Data_Dictionary/Data_Dictionary.ipynb#TRNG_NYCTaxi)

[Click Here to Manage Your Space](../Data_Dictionary/Data_Dictionary.ipynb#Manage_Your_Space)

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Connect to Vantage and explore the dataset</b></p>
Below command will connect to the Vantage environment.

In [None]:
%connect local

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b> Access data in Vantage  </b> </p>
<p style = 'font-size:16px;font-family:Arial'>For this demo, data is already resident in Object Storage which we are accessing via ReadNOS.  Create a reference to the table, and sample the contents.  Data could just as easily reside in permanent tables, another RDBMS, or another Vantage system.<br>
This demonstration will use two tables: the taxi trip details and the fares for each trip. The queries below will sample each table and then show the range of the time period covered by the data. <br>
<i>*You can skip to Timeseries Analysis if you have seen the trip & fare tables in other notebook in the series</p>

In [None]:
SELECT top 10 * from TRNG_NYCTaxi.trip;

In [None]:
SELECT top 10 * from TRNG_NYCTaxi.trip_fare;

In [None]:
sel min(pickup_datetime), max(dropoff_datetime) from TRNG_NYCTaxi.trip;

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Timeseries Analysis </b></p>
<p style = 'font-size:16px;font-family:Arial'> Now we have seen the trip and fare details, Let's do some analysis based on time.
The query below issues a GROUP BY clause referencing a unit of time and includes a USING TIMECODE clause to specify a column that the unit grouping is to be applied to. <br><br>
<p style = 'font-size:16px;font-family:Arial'>    
How many passengers are being picked up by hour in november?</p>

In [None]:
select extract(day from pickup_datetime), extract(hour from pickup_datetime), sum(passenger_count) 
from TRNG_NYCTaxi.trip 
where extract(month from pickup_datetime)=11
group by 1,2 order by 1,2


In [None]:
sel top 72
$TD_TIMECODE_RANGE 
,begin($TD_TIMECODE_RANGE) time_bucket_start --(timestamp, format 'YYYY-MM-DDBHH:MI:SS') time_bucket_start
,sum(passenger_count) passenger_count
from TRNG_NYCTaxi.trip 
where extract(month from pickup_datetime)=11
group by time(hours(1))
USING TIMECODE(pickup_datetime)
order by 1;

<p style = 'font-size:16px;font-family:Arial'>It's about time to add some visual element...

In [None]:
%chart x=time_bucket_start, y=passenger_count, title=Passenger pickup by hour

<p style = 'font-size:16px;font-family:Arial'>How many passengers are being picked up and what is the average trip duration by vendor every 30min in November?

In [None]:
sel top 96
$TD_TIMECODE_RANGE 
,vendor_id
,sum(passenger_count)
,avg(trip_time_in_secs)
from TRNG_NYCTaxi.trip
group by time(minutes(15) and vendor_id)
USING TIMECODE(pickup_datetime)
where extract(month from pickup_datetime)=11
order by 1,2;

<p style = 'font-size:16px;font-family:Arial'> Let's call this a view

In [None]:
replace view NYC_taxi_trip_ts as
sel
$TD_TIMECODE_RANGE time_bucket_per
,vendor_id
,sum(passenger_count) passenger_cnt
,sum(trip_time_in_secs) avg_trip_time_in_secs
from TRNG_NYCTaxi.trip 
group by time(minutes(15) and vendor_id)
USING TIMECODE(pickup_datetime)
where extract(month from pickup_datetime)=11;

<p style = 'font-size:16px;font-family:Arial'><b>Moving Averages</b>
    <br>
 Let's calculate a 2 hours minutes moving average on our 15-minutes time series. 2 hour is 8 * 15 minutes periods.   

In [None]:
SELECT * FROM MovingAverage (
  ON NYC_taxi_trip_ts PARTITION BY vendor_id ORDER BY time_bucket_per
  USING
  MAvgType ('S')
  WindowSize (8)
  TargetColumns ('passenger_cnt')
) AS dt 
where begin(time_bucket_per)(date) = '2013-11-10'
ORDER BY vendor_id, time_bucket_per;

<p style = 'font-size:16px;font-family:Arial'> We can use this to identify demand trends for each time period. A basic way to do this would be to check if the current pickup count is above (rising) or below (reducing) our moving average.

In [None]:
SELECT time_bucket_per, vendor_id, passenger_cnt, trend, case when passenger_cnt - passenger_cnt_smavg > 0 then '+' else '-' end trend
FROM MovingAverage (
  ON NYC_taxi_trip_ts PARTITION BY vendor_id ORDER BY time_bucket_per
  USING
  MAvgType ('S')
  WindowSize (8)
  TargetColumns ('passenger_cnt')
) AS dt 
where begin(time_bucket_per)(date) = '2013-11-10'
ORDER BY vendor_id, time_bucket_per;

<p style = 'font-size:16px;font-family:Arial'> We can use this to compare how vendors are doing.

In [None]:
sel dt.*
from 
(
	SELECT time_bucket_per, vendor_id, passenger_cnt, case when passenger_cnt - passenger_cnt_smavg > 0 then '+' else '-' end trend
	FROM MovingAverage (
	  ON NYC_taxi_trip_ts PARTITION BY vendor_id ORDER BY time_bucket_per
	  USING
	  MAvgType ('S')
	  WindowSize (8)
	  TargetColumns ('passenger_cnt')
	) AS dt 
	where begin(time_bucket_per)(date) = '2013-11-10'
) dt
PIVOT(
    MAX(passenger_cnt) as passenger_cnt, MAX(trend) as trend FOR vendor_id IN ('CMT', 'VTS')
) dt
order by 1;


<ul style = 'font-size:16px;font-family:Arial'> 
       
  <li>Introduction to Teradata Time Series Tables and Operations: <a href = 'https://docs.teradata.com/r/Teradata-VantageTM-Time-Series-Tables-and-Operations-17.20'>https://docs.teradata.com/r/Teradata-VantageTM-Time-Series-Tables-and-Operations-17.20</a></li>
  </ul>

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2023 Teradata. All Rights Reserved</footer>