# 7. Dynamic Time Warping on NYSE trades

[Dynamic Time Warping (DTW)](https://docs.kx.com/public-preview/kdb-x/Reference/dtw-index-parameters.htm) is an algorithm used to measure the similarity between two temporal sequences that may vary in speed or timing. Unlike simple distance metrics, DTW aligns sequences by stretching or compressing segments so that similar shapes or patterns can be matched even if they occur at different rates or are out of phase. 

This technique is widely used in fields such as speech recognition, finance, and bioinformatics, where comparing time series data with varying lengths or temporal distortions is essential. DTW provides a flexible way to identify patterns and trends across datasets that are not perfectly synchronized.

This tutorial walks through the processes of creating a database filled with a day's worth of NYSE data. It then demonstrates how to apply Dynamic Time Warping search to find similar patterns across the price column.

### 1. Prerequisites

This notebook is designed to be run as a Python notebook, but the same funtionality can be applied directly in a CLI with KDB-X. If running q from the terminal: Just copy the q code from under the %%q commands in this notebook and paste into your terminal.


1. Requires KDB-X to be installed, you can sign up at https://kdb-x.kx.com/sign-in.
2. Ensure you have the necessary dataset:
   1. Download the NYSE sample dataset from NYSE: [US Trades](https://ftp.nyse.com/Historical%20Data%20Samples/TAQ%20NYSE%20TRADES/EQY_US_TAQ_NYSE_TRADES_20231002.gz)
   2. Copy into the same directory where you are running this 
   
3. Install PyKX:

In [1]:
!pip install -qq --upgrade --pre pykx

### 2. Start a Local q Process

To interact with KDB-X from Python, we first need to start a local `q` process. This step uses the `PyKX` library to launch `q` on port 5000, allowing subsequent cells to communicate with the database. The code uses Python's `subprocess` module to start the process.

In [2]:
import subprocess
import time
import pykx as kx

try:
    with kx.PyKXReimport():
        proc = subprocess.Popen(
            ('q', '-p', '5000')
        )
    time.sleep(2)
except:
    raise kx.QError('Unable to create q process on port 5000')


Welcome to KDB-X Community Edition!
For Community support, please visit https://kx.com/slack
Tutorials can be found at https://github.com/KxSystems/tutorials
Ready to go beyond the Community Edition? Email preview@kx.com



### 3. Load the AI Libs KDB-X Module & Prepare the Environment
This loads several AI Libs modules, including the DTW functionality.

In [3]:
%%q
system"l ",getenv[`HOME],"/.kx/ai-libs/init.q"

Set a random seed for reproducable results:

In [4]:
%%q
\S 42

Test out our connection to the q process:

In [5]:
%%q
til 10

0 1 2 3 4 5 6 7 8 9


### 4. Loading and Preparing the Data

Load the dataset (`EQY_US_TAQ_NYSE_TRADES_20231002.gz`) into a table:

In [None]:
%%q 

file:"EQY_US_TAQ_NYSE_TRADES_20231002.gz";
system"rm -f fifo && mkfifo fifo";
trade:([] msgType:();sequenceNo:();time:();sym:();tradeId:();price:();volume:());
system"gunzip -cf ",file," > fifo &";
.Q.fps[{`trade upsert flip ("JJNS JFJ    ";"," )0:x}]`:fifo;
hdel `:fifo;
trade:update `g#sym from select from trade where msgType=220;


In the above:
- Linux fifo is used to efficiently parse the gzipped data
- The function `.Q.fps` is used to read files too large to fit into memory, it loops over a file in conveniently-sized chunks of complete records, and applies a function to each chunk
- The schema `"JJNS JFJ    "` specifies the datatypes for each column
- The `","` ensures the CSV is parsed using commas as delimiters.

Now the dataset has been loaded, we can inspect the table using `first` to see what the first row of the table looks like:

In [32]:
%%q
first trade

msgType   | 220
sequenceNo| 58765
time      | 0D07:00:00.105862814
sym       | `TMF
tradeId   | 24476
price     | 4.74
volume    | 4000


This shows a single trade message for the symbol TMF executed at a price of $4.74 for 4,000 shares at 7:00:00 on October 2, 2023.

Now that we understand the structure of the data, we can move on to efficiently querying using TSS and DTW.


### 5. Performing DTW Searches

Now, we create a [random walk](#https://en.wikipedia.org/wiki/Random_walk) float vector which we will use as our pattern we wish to find matches of. We choose a pattern length of 64 arbitrarily here, feel free to experiment with your own query vectors.

In [33]:
%%q

q:10*abs sums neg[0.5]+64?1f

In the above:
- `64?1f` generates 64 random float numbers between 0 and 1 
- `neg[0.5]` returns -0.5
- `sums` computes the cumulative sum, `abs` tables the absolute value of these and finally we multiply by `10*`

The result is a random pattern that fluctuates around zero, mimicking price fluctuations that we may want to detect in the time-series data.

In [34]:
%%q
q

2.742128 4.791852 5.003978 10.00096 5.120625 8.321094 13.1784 12.8079 16.3266..


#### Example Query: Simple TSS Full Query on Price

We can run a simple tss search across our price column using the `.ai.tss.tss` function:

In [35]:
%%q

select .ai.tss.tss[price;q;5;::] from trade

x                                          
-------------------------------------------
3.569454 3.633002 3.639012 3.7102  3.741339
2477787  2477786  2477785  2477788 3518322 


The result is a 2 element tuple of distances and ids:
- The first value is the Euclidean (L2) distance between the query `q` and the matched pattern.
- The second value is the position in the time series where the matched pattern begins.

For more information on TSS, see our [TSS Tutorial](https://github.com/RyanSieglerKX/tutorials/blob/main/KDB-X/Modules/ai-libs/1.TSS_Search_by_NYSETrades.md).

#### DTW Search Query

Here we run our first DTW query using the `.ai.dtw.search` function.

The code `select .ai.dtw.search[price;q;5;0.0;::] from trade` performs a Dynamic Time Warping (DTW) search on the `price` column of the `trade` table using the query vector `q`. 

The `.ai.dtw.search` function compares the query pattern against the time series data, identifying the top 5 most similar subsequences based on DTW distance. The fourth parameter, `0.0`, is the window. It is the ratio of the query size allowed in warping. The larger the window, the, the more flexible, and time-consuming the search is.

In this case, since the window is `0.0`, it is identical to running a basic TSS search.

In [36]:
%%q
select .ai.dtw.search[price;q;5;0.0;::] from trade

x                                          
-------------------------------------------
3.569454 3.633002 3.639012 3.7102  3.741339
2477787  2477786  2477785  2477788 3518322 


Let's try it again, but allow for some DTW window flexiblity during the search by setting the window to `0.1`:

In [40]:
%%q
select .ai.dtw.search[price;q;5;0.1;::] from trade

x                                          
-------------------------------------------
2.629015 2.6606  2.814019 3.057903 3.123576
2314100  2314101 2314102  2314103  2477787 


Finally, we set the window to a higher value of `0.8`, to get even more time warping into our search. You might be able to tell that this search takes longer than the others.

In [39]:
%%q

select .ai.dtw.search[price;q;5;0.8;::] from trade

x                                           
--------------------------------------------
2.629015 2.660589 2.779003 2.930249 3.101474
2314100  2314101  2314102  2314103  2477787 


#### DTW searchRange Query

The DTW searchRange query gives us some additional flexiblity in setting a maximum distance. Here we set that to 3.0, meaning it will only return matches within a distance of 3.0.

In [52]:
%%q

select .ai.dtw.searchRange[price;q;0.2;3.0;::] from trade

x                                  
-----------------------------------
2.629015 2.660589 2.779003 2.930249
2314100  2314101  2314102  2314103 


#### DTE filteredSearch Query

The DTW filteredSearch query allows us to set a maximum results cap. Here we limit the results to 3, and ensure returned results are <3.0.

In [57]:
%%q

select .ai.dtw.filterSearch[price;q;3;0.2;3.0;::] from trade

x                         
--------------------------
2.629015 2.660589 2.779003
2314100  2314101  2314102 


### Conclusion

In this tutorial, we explored how Dynamic Time Warping (DTW) can be applied to financial time series data using KDB-X. We learned how to set up a local q process, load and prepare NYSE trade data, and use DTW-based search functions to identify similar patterns in price movements—even when those patterns occur at different speeds or times. 