# 7. Dynamic Time Warping on NYSE trades

[Dynamic Time Warping (DTW)](https://docs.kx.com/public-preview/kdb-x/Reference/dtw-index-parameters.htm) is an algorithm used to measure the similarity between two temporal sequences that may vary in speed or timing. Unlike simple distance metrics, DTW aligns sequences by stretching or compressing segments so that similar shapes or patterns can be matched even if they occur at different rates or are out of phase. 

This technique is widely used in fields such as speech recognition, finance, and bioinformatics, where comparing time series data with varying lengths or temporal distortions is essential. DTW provides a flexible way to identify patterns and trends across datasets that are not perfectly synchronized.

This tutorial walks through the process of creating a database filled with a day's worth of NYSE data, and then applying Dynamic Time Warping search to find similar patterns across the price column.

### 1. Prerequisites

This notebook is compatible with Linux, Mac, and Windows via WSL.

This notebook is designed to be run as a Python notebook, but the same funtionality can be applied directly in a CLI with KDB-X. If running q from the terminal: Just copy the q code starting in section 3, and paste into your terminal.

View the [KDB-X Docs](https://docs.kx.com/public-preview/kdb-x/home.htm) for full details on KDB-X and KDB-X Python.


1. Requires KDB-X to be installed, you can sign up [here](https://developer.kx.com/products/kdb-x/install). For full install instructions see: [KDB-X Install](https://docs.kx.com/public-preview/kdb-x/Get_Started/kdb-x-install.htm).
2. Ensure you have the necessary dataset:
   1. Download the NYSE sample dataset from NYSE: [US Trades](https://ftp.nyse.com/Historical%20Data%20Samples/TAQ%20NYSE%20TRADES/EQY_US_TAQ_NYSE_TRADES_20231002.gz) (Open link in a new tab to start the download)
   2. Copy the `.gz` file into the same directory where you are running this 
   
3. Install KDB-X Python: (For full install instructions see: [KDB-X Python Install](https://docs.kx.com/public-preview/kdb-x/Get_Started/kdb-x-python-install.htm))

In [1]:
!pip install -qq --upgrade --pre pykx

### 2. Initialize q First Mode

To easily interact with KDB-X and q from a Python notebook, it is best to use [q first mode](https://code.kx.com/pykx/3.1/examples/jupyter-integration.html#7-q-first-mode).

In [2]:
import pykx as kx
kx.util.jupyter_qfirst_enable()


Welcome to KDB-X Community Edition!
For Community support, please visit https://kx.com/slack
Tutorials can be found at https://github.com/KxSystems/tutorials
Ready to go beyond the Community Edition? Email preview@kx.com

PyKX now running in 'jupyter_qfirst' mode. All cells by default will be run as q code.
Include '%%py' at the beginning of each cell to run as python code. 


### 3. Load the AI Libs KDB-X Module & Prepare the Environment
This loads several AI Libs modules, including the DTW functionality.

In [None]:
\l ai-libs/init.q

Set a random seed for reproducable results:

In [4]:
\S 42

### 4. Loading and Preparing the Data

Load the dataset (`EQY_US_TAQ_NYSE_TRADES_20231002.gz`) into a table using [named pipe fifo](https://code.kx.com/q/kb/named-pipes/), and [.Q.fps](https://code.kx.com/q/ref/dotq/#fps-pipe-streaming):

In [5]:
file:"EQY_US_TAQ_NYSE_TRADES_20231002.gz";
system"rm -f fifo && mkfifo fifo";
trade:([] msgType:();sequenceNo:();time:();sym:();tradeId:();price:();volume:());
system"gunzip -cf ",file," > fifo &";
.Q.fps[{`trade upsert flip ("JJNS JFJ    ";"," )0:x}]`:fifo;
hdel `:fifo;
trade:update `g#sym from select from trade where msgType=220;


In the above:
- Linux fifo is used to efficiently parse the gzipped data
- The function `.Q.fps` is used to read files too large to fit into memory, it loops over a file in conveniently-sized chunks of complete records, and applies a function to each chunk
- The schema `"JJNS JFJ    "` specifies the datatypes for each column
- The `","` ensures the CSV is parsed using commas as delimiters.

Now the dataset has been loaded, we can inspect the table using `first` to see what the first row of the table looks like:

In [6]:
first trade

msgType   | 220
sequenceNo| 58765
time      | 0D07:00:00.105862814
sym       | `TMF
tradeId   | 24476
price     | 4.74
volume    | 4000


This shows a single trade message for the symbol TMF executed at a price of $4.74 for 4,000 shares at 7:00:00 on October 2, 2023.

Now that we understand the structure of the data, we can move on to efficiently querying using TSS and DTW.


### 5. Performing DTW Searches

Now, we create a [random walk](https://en.wikipedia.org/wiki/Random_walk) float vector which we will use as our pattern we wish to find matches of. We choose a pattern length of 25 arbitrarily here, feel free to experiment with your own query vectors.

In [7]:
vector:10*abs sums neg[0.5]+25?1f

In the above:
- `25?1f` generates 25 random float numbers between 0 and 1 
- `neg[0.5]` returns -0.5
- `sums` computes the cumulative sum, `abs` tables the absolute value of these and finally we multiply by `10*`

The result is a random pattern that fluctuates around zero, mimicking price fluctuations that we may want to detect in the time-series data.

In [8]:
vector

2.742128 4.791852 5.003978 10.00096 5.120625 8.321094 13.1784 12.8079 16.3266..


#### Example Query: Simple TSS Full Query on Price

We can run a simple TSS search across our price column using the [`.ai.tss.tss`](https://docs.kx.com/public-preview/kdb-x/Reference/tss-index-parameters.htm#aitsstss) function:

In [9]:
`distances`indexes!.ai.tss.tss[;vector;5;::] trade`price

distances| 1.993204 2.127701 2.139295 2.142165 2.193366
indexes  | 3285704  3186212  2417271  2047819  2314123 


The result is a 2 element tuple of distances and ids:
- The first list shows the Euclidean (L2) distance between each query `vector` and the matched pattern.
- The second list shows the indexes in the time series where the matched pattern begins.

For more information on TSS, see our [TSS Tutorial](https://github.com/KxSystems/tutorials/blob/main/KDB-X/Modules/ai-libs/1.TSS_Search_by_NYSETrades.md).

#### DTW Search Query

Here we run our first DTW query using the [`.ai.dtw.search`](https://docs.kx.com/public-preview/kdb-x/Reference/dtw-index-parameters.htm) function.

The code ``` `distances`indexes!.ai.dtw.search[;vector;5;0.0;::] trade`price` ``` performs a Dynamic Time Warping (DTW) search on the `price` column of the `trade` table using the query vector `vector`. 

The `.ai.dtw.search` function compares the query pattern against the time series data, identifying the top 5 most similar subsequences based on DTW distance. The fourth parameter, `0.0`, is the window. It is the ratio of the query size allowed in warping. The larger the window, the more flexible, and time-consuming the search is.

In this case, since the window is `0.0`, it is identical to running a basic TSS search.

In [10]:
`distances`indexes!.ai.dtw.search[;vector;5;0.0;::] trade`price

distances| 1.993204 2.127701 2.139295 2.142165 2.195907
indexes  | 3285704  3186212  2417271  2047819  2314123 


Let's try it again, but allow for some DTW window flexiblity during the search by setting the window to `0.1`:

In [11]:
`distances`indexes!.ai.dtw.search[;vector;5;0.1;::] trade`price

distances| 1.519624 1.766677 1.773423 1.78443 1.794284
indexes  | 3802228  24797    2314123  2746995 4067262 


Finally, we set the window to a higher value of `0.8`, to get even more time warping into our search. You might be able to tell that this search takes longer than the others.

In [12]:
`distances`indexes!.ai.dtw.search[;vector;5;0.8;::] trade`price

distances| 1.265552 1.354349 1.446201 1.468039 1.515346
indexes  | 2303698  632384   632383   2303699  3388567 


#### DTW searchRange Query

The DTW searchRange query gives us some additional flexiblity in setting a maximum distance. Here we set that to 1.6, meaning it will only return matches within a distance of 1.6.

In [17]:
`distances`indexes!.ai.dtw.searchRange[;vector;0.2;1.6;::] trade`price

distances| 1.265552 1.468039 1.507729 1.519624 1.520065 1.592965
indexes  | 2303698  2303699  632384   3802228  3388567  2620562 


#### DTW filteredSearch Query

The DTW filteredSearch query allows us to set a maximum results cap. Here we limit the results to 3, and ensure returned results are <1.6.

In [18]:
`distances`indexes!.ai.dtw.filterSearch[;vector;3;0.2;1.6;::] trade`price

distances| 1.265552 1.468039 1.507729
indexes  | 2303698  2303699  632384  


### Conclusion

In this tutorial, we explored how Dynamic Time Warping (DTW) can be applied to financial time series data using KDB-X. We learned how to initialize q First Mode in the notebook, load and prepare NYSE trade data, and use DTW-based search functions to identify similar patterns in price movements—even when those patterns occur at different speeds or times. 