<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>4D Analytics using the New York City Taxi dataset --Temporal</b>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial'>
This is a demonstration of Vantage capabilities for
    <li style = 'font-size:16px;font-family:Arial'>
Temporal using the PERIOD data type, CONTAINS, OVERLAPS and EXPAND
    </li>
    </p>
    

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b> Accessing the Data </b> </p>
<p style = 'font-size:16px;font-family:Arial'>These demos will work either with foreign tables accessed from Cloud Storage via NOS or you may import the tables to your machine. If you import data for multiple demos, you may need to use the Data Dictionary "Manage Your Space" routine to cleanup tables you no longer need.     
    
<p style = 'font-size:16px;font-family:Arial'>Use the link below to access the 2 options for using data from the data dictionary notebook:

[Click Here to get data for this notebook](../Data_Dictionary/Data_Dictionary.ipynb#TRNG_NYCTaxi)

[Click Here to Manage Your Space](../Data_Dictionary/Data_Dictionary.ipynb#Manage_Your_Space)

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Connect to Vantage and explore the dataset</b></p>
Below command will connect to the Vantage environment.


In [None]:
%connect local

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b> Access data in Vantage  </b> </p>
<p style = 'font-size:16px;font-family:Arial'>For this demo, data is already resident in Object Storage which we are accessing via ReadNOS.  Create a reference to the table, and sample the contents.  Data could just as easily reside in permanent tables, another RDBMS, or another Vantage system.<br>
This demonstration will use two tables: the taxi trip details and the fares for each trip. The queries below will sample each table and then show the range of the time period covered by the data. </p>

In [None]:
SELECT top 10 * from TRNG_NYCTaxi.trip;

In [None]:
SELECT top 10 * from TRNG_NYCTaxi.trip_fare;

In [None]:
sel min(pickup_datetime), max(dropoff_datetime) from TRNG_NYCTaxi.trip;

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Warmup: temporal algebra </b></p>
<img  src="contains.png" alt="Contains Example" width="600" align="right" />
<p style = 'font-size:16px;font-family:Arial'> Let's warmup with a bit of temporal algebra and get familiar with period types and operators. 
<br><br>
For a taxi service, the number of dispatchers needed is impacted by the number of pickups. While some pickups are based on a taxi stand or someone waving their hand along the side of the street, the driver calls in the trip. 
<br><br>The period data type may be based on dates or timestamps and has a beginning bound and extends up to, but does not include the ending bound value. The ending bound may be expressed as "UNTIL_CHANGED" or "UNTIL CLOSED" which effectively means "forever". Most commonly, the period data type would appear in a table but in the example below, we will use a literal to count the number of taxi pickups between 10:30 and 10:45 on Nov 10th.</p>


In [None]:
sel count(1)
from TRNG_NYCTaxi.trip  
--where pickup_datetime between '2013-11-10 10:30:00' and '2013-11-10 10:45:00'
--Here is an alternative to the above using period data type a
where period
	(
		'2013-11-10 10:30:00' (timestamp), 
		'2013-11-10 10:45:00' (timestamp)
	) 
	CONTAINS pickup_datetime 

<img  src="contains-period.png" alt="Contains Period Example" width="600" align="right" />
<p style = 'font-size:16px;font-family:Arial'> 
The above example used CONTAINS for a simple comparison to a date which is easily replaced by BETWEEN.  A more interesting case is when two periods are in a constraint which would be more complicated logic.  The following query analyzes how many rides started and ended between 10:30 and 10:45 on Nov 10th.

In [None]:
SELECT count(1)
FROM TRNG_NYCTaxi.trip  
WHERE  pickup_datetime < dropoff_datetime
and period
	(
		'2013-11-10 10:30:00' (timestamp), 
		'2013-11-10 10:45:00' (timestamp)
	) 
	contains period(pickup_datetime, dropoff_datetime);

<img  src="overlap.png" alt="Overlap Example" width="600" align="right" />
<p style = 'font-size:16px;font-family:Arial'> 
The taxi cabs now contain video screens that include advertising content. To analyze the number of people that may be viewing the video to justify compensation for advertising, we need to know how many people are in the cabs in a given period. 
<br><br>
Where the CONTAINS operator above says that the timestamp or period must be within the bounds of the period, the OVERLAPS operator selects any rows where the row's period begins, ends, or orverlaps the constraint period.  In the following example, we count how many passengers were in a taxi between 10:30 and 10:45 on Nov 10th.

In [None]:
SELECT sum(passenger_count)
FROM TRNG_NYCTaxi.trip  
WHERE pickup_datetime < dropoff_datetime
and period
	(
		'2013-11-10 10:30:00' (timestamp), 
		'2013-11-10 10:45:00' (timestamp)
	) 
	overlaps period(pickup_datetime, dropoff_datetime)
;

<img  src="normalize.png" alt="Overlap Example" width="600" align="right" />
<p style = 'font-size:16px;font-family:Arial'> 
As input to the hours of operations, management would like to know how many hours in the day there is at least 1 taxi operating.
<br><br>
Let's assume we run the fleet of the 3 taxis identifiable by medallion starting with ‘007’. The fleet is considered active if at least 1 taxi is driving. For how long has the fleet been “active” on November 10th?
<br><br>
We will need to consolidate (or normalize) the periods where taxis are active and then find the duration of the resulting periods of activity. The final query will have nested derived tables, so we will start by showing a sample of the innermost query which will create a set of period on the date for the group of medallions:

In [None]:
SELECT top 10
period(pickup_datetime, dropoff_datetime) as norm_per
from TRNG_NYCTaxi.trip  
where (pickup_datetime (date))='2013-11-10'
and medallion like '007%'
order by 1;

<p style = 'font-size:16px;font-family:Arial'> We will then use the NORMALIZE modifier to consolidate the periods. With NORMALIZE, the result of the select is normalized on the first period column in the select list. Period values that meet or overlap are coalesced, that is, combined to form a period that
encompasses the individual period values.  For comparison, we will follow with a query that doesn't normmalize which is the total hours of active service of all of the taxis. 

In [None]:
SELECT sum(interval(normalized_periods) hour(2) to minute) normalized_activity
from
(
	SELECT normalize
	period(pickup_datetime, dropoff_datetime) as normalized_periods
	from TRNG_NYCTaxi.trip  
	where (pickup_datetime (date))='2013-11-10'
	and medallion like '007%'
) d1
order by 1;

In [None]:
select sum( interval( period(pickup_datetime, dropoff_datetime) ) hour to minute ) total_activity
    from TRNG_NYCTaxi.trip  
	where (pickup_datetime (date))='2013-11-10'
	and medallion like '007%'

<p style = 'font-size:16px;font-family:Arial'> The normalized activity consolidating all of the periods before sumarizing is less than the total activity. 
<br><br>
There are situations where you need to see the state of the business at fixed intervals. For example: total inventory at the end of each week.  The EXPAND ON operator will create those routine periods for reporting. To demonstrate EXPAND ON, we will get a count of active cabs at the top of each hour.

In [None]:
select begin_hour, count(*)
from
(
	select begin(hour_check) begin_hour
	from TRNG_NYCTaxi.trip  
	where (pickup_datetime (date))='2013-11-10'
	and medallion like '007%'
    EXPAND ON period(pickup_datetime, dropoff_datetime) AS hour_check BY ANCHOR ANCHOR_HOUR
) a
group by 1
order by 1;

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Conclusion</b></p>
<p style = 'font-size:16px;font-family:Arial'>
In this demonstration we have seen that Temporal and period datatype is time aware and the queries using the same are fairly simple and easy to use 

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">©2023 Teradata. All Rights Reserved</footer>