# Analyzing Taxi Time Series Data with Time Series Chains

This example utilizes the main takeways from the research papers: [Matrix Profile VII](https://www.cs.ucr.edu/~eamonn/chains_ICDM.pdf).

We will be looking at data from taxi passengers in NYC and will be seeing if we can find any chains within the time series to find trends.

## Getting Started

Let's import the packages that we'll need to load, analyze, and plot the data.

In [1]:
import pandas as pd
import stumpy
import numpy as np

## Loading Some Data

First, we'll download historical data that represents the half-hourly average of the number of NYC taxi passengers over 75 days in the Fall of 2014.


We extract that data and insert it into a pandas dataframe, making sure the timestamps are stored as *datetime* objects and the values are of type *float64*.

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/stanford-futuredata/ASAP/master/Taxi.csv", sep=',')
df['value'] = df['value'].astype(np.float64)
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.head()

Unnamed: 0,timestamp,value
0,2014-10-01 00:00:00,12751.0
1,2014-10-01 00:30:00,8767.0
2,2014-10-01 01:00:00,7005.0
3,2014-10-01 01:30:00,5257.0
4,2014-10-01 02:00:00,4189.0


In [3]:
m = 48

In [4]:
stump_results = stumpy.stump(df['value'].values, m=m)
out_df = pd.DataFrame(stump_results, columns=['mp', 'inx', 'left', 'right'])
out_df.head()

Unnamed: 0,mp,inx,left,right
0,0.462536,1680,-1,1680
1,0.465413,1681,-1,1681
2,0.467104,1682,-1,1682
3,0.473099,1683,-1,1683
4,0.478382,1684,-1,1684


In [5]:
S, C = stumpy.allc(out_df['left'].values, out_df['right'].values)

In [6]:
C

array([ 185,  521,  857, 1193, 1865, 2201, 2537])

# Resources

[Matrix Profile VII](https://www.cs.ucr.edu/~eamonn/chains_ICDM.pdf)