# Import Essential Libraries

In [None]:
# python visualization libraries
import matplotlib.pyplot as plt
import matplotlib as mpl
# from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
import plotly.express as px
%matplotlib inline

import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as subplots

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# Dynamic Time Warping
1. Data and some figures in this notebook are based on https://pages.databricks.com/rs/094-YMS-629/images/dynamic-time-warping-background.html
2. https://en.wikipedia.org/wiki/Dynamic_time_warping


#Dynamic Time Warping Background


<table width=100% border="0">
  <tr>
    <td width=50% valign="top">
      <span style="font-family:helvetica,arial;line-height: 1.6">
      The objective of time series comparison methods is to produces a <b>distance</b> metric between two input time series.  
      Dynamic time warping is a seminal time series comparison technique that has been used for speech recognition and word recognition since the 1970s with sound waves as the source; an often cited paper is <a href="https://ieeexplore.ieee.org/document/1171695" target="_blank">Dynamic time warping for isolated word recognition based on ordered graph searching techniques</a>. 
      <br/><br/>
      This technique can not only be used for pattern matching but also anomaly detection (e.g. overlap time series between two disjoint time periods to understand if the shape has changed significantly, examine outliers).  For example, when looking at the red and blue lines in the graph to the right, note the traditional time series matching (i.e. Euclidean Matching) is extremely restrictive.  On the otherhand, dynamic time warping allows the two curves to match up even though the x-axis (i.e. time) are not necessarily in sync. Another way is to think of this as a robust dissimilarlity score where <b>lower number</b> means the series are more similar.
      <br/><br/>
      Two time series (the base time series and new time series) are considered <b>similar</b> when it is possible to map with function <em>f(x)</em> according to the following rules so as to match the magnitudes using an optimal (warping) path. 
      <br/><br/>
      <center>
      <img src="https://pages.databricks.com/rs/094-YMS-629/images/dtw-rules-formula.png" width="400"/>
      </center>
      </span>
    </td>
    <td width=50% align="center">
        <img src="https://upload.wikimedia.org/wikipedia/commons/6/69/Euclidean_vs_DTW.jpg" width=500 />
        <br/>
        <center>
          <span style="font-family:helvetica,arial">Source: Wiki Commons: <a href="https://commons.wikimedia.org/wiki/File:Euclidean_vs_DTW.jpg">File:Euclidean_vs_DTW.jpg</a></span>
        </center>
    </td>
  </tr>
</table>






# Pre-requisites


## Please install the [`fastdtw`](https://pypi.org/project/fastdtw/) PyPi library;

In [None]:
!pip install fastdtw



In [None]:
# test if import works
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean

## Test fastdtw library

In [None]:
df_test = pd.DataFrame({
    'x': [1,2,3,4,5], 
    'y': [0,2,3,4,0], 
    'time': [1,2,3,4,5]
  })
display(df_test)

# using plotly's go.Figure
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = df_test['time'], y = df_test['x'] )
)

fig.add_trace(
    go.Scatter(x = df_test['time'], y = df_test['y'] )
)
fig.show()

# test dtw
distance, path = fastdtw(df_test['x'], df_test['y'], dist=euclidean)
print('eucledean distance: ', distance)

Unnamed: 0,x,y,time
0,1,0,1
1,2,2,2
2,3,3,3
3,4,4,4
4,5,0,5


eucledean distance:  6.0


## Mount Google Drive

In [18]:
#@title
import os

# mount drive
from google.colab import drive
drive.mount('/content/drive')


# edit this path if needed
my_path = '/content/drive/My Drive/Colab Notebooks/'

# change to this path
os.chdir(my_path)

# verify present working directory. It should be identical to 'my_path'
!pwd

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/My Drive/Colab Notebooks


## Copy audio clips from GitHub
* Create a folder named "audio" in your Google Colab folder
* Copy audio clips from [here](https://github.com/smbillah/ist526/tree/main/audio) to your audio folder

# Understanding Audio Files

In [19]:
# import audio-related libraries
from scipy.io import wavfile

# more about .wav file
fs, data = wavfile.read("audio/doors-and-corners-kid_thats-where-they-get-you.wav")

## here:
# fs --> frequency sample (Hz)
# data --> num_channels (e.g., left, right, for stereo sound)

display(data.shape)
display(data)

fig = px.line(y = data[:,0]) # log-scale is commonly used for audio, i.e., log_y = True
fig.show()

fig = px.line(y = data[:,1])
fig.show()

Output hidden; open in https://colab.research.google.com to view.

# Speech Matching Scenario
One of the most common use cases for dynamic time warping is the speech matching scenario.  In our use case, we will be matching various audio recordings (in WAV files) based on quotes from [The Expanse](https://www.amazon.com/The-Expanse-Season-1/dp/B018BZ3SCM).  There are four audio clips (you can listen to them below but this is not necessary) where three of them (clips 1, 2, and 4) are based on the quote <br/><br/> 

<blockquote>Doors and corners, kid. That's where they get you</blockquote>

and one clip (clip 3) is the quote <br/><br/>

<blockquote>You walk into a room too fast, the room eats you</blockquote>

## Loading/Playing Audio Clips

In [20]:
# Read stored audio files for comparison

# frequency_samplerate, data (= num_channels, e.g., left, right, for stereo sound) = wavfile.read("audio/doors-and-corners-kid.wav")
fs1, data1 = wavfile.read("audio/doors-and-corners-kid_thats-where-they-get-you.wav")
fs2, data2 = wavfile.read("audio/doors-and-corners-kid_thats-where-they-get-you-2.wav")
fs3, data3 = wavfile.read("audio/you-walk-into-a-room-too-fast_the-room-eats-you.wav")
fs4, data4 = wavfile.read("audio/doors-and-corners-kid.wav")

# Take the max values along axis
data1 = np.amax(data1, axis=1)
data2 = np.amax(data2, axis=1)
data3 = np.amax(data3, axis=1)
data4 = np.amax(data4, axis=1)

# debug
display(data1.shape, fs1)


(331712,)

48000

In [21]:
# import audio-play library
import IPython
from IPython.display import Audio
# https://www.dev2qa.com/how-to-display-rich-output-media-audio-video-image-etc-in-ipython-jupyter-notebook/

display("Doors and Corners, Kid.  That's where they get you. [v1]", Audio("audio/doors-and-corners-kid_thats-where-they-get-you.wav"))
display("Doors and Corners, Kid. That's where they get you. [v2]", Audio("audio/doors-and-corners-kid_thats-where-they-get-you-2.wav"))
display("Doors and Corners, Kid. That's where they get you [v3]", Audio("audio/doors-and-corners-kid.wav"))
display("You walk into a room too fast, the room eats you. [new]", Audio("audio/you-walk-into-a-room-too-fast_the-room-eats-you.wav"))


# import subplot library
from plotly.subplots import make_subplots
#https://plotly.com/python/subplots/#setting-subplots-on-a-figure-directly

fig = make_subplots(
    rows=2, 
    cols=2, 
    subplot_titles = (
        "Doors and Corners, Kid. That's where they get you.", 
        "Doors and Corners, Kid. That's where they get you.(v2)", 
        "You walk into the room too fast, the room eats you.", 
        "Doors and Corners, Kid. That's where they get you.(v3)"
    )
)

# show each audio gram

fig.add_trace(
    go.Scatter(y = data1),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(y = data2),
    row=1, col=2,    
)


fig.add_trace(
    go.Scatter(y = data3),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(y = data4),
    row=2, col=2
)

# fig = px.line(y = data1)
# fig.show()

# fig = px.line(y = data2)
# fig.show()

# fig = px.line(y = data3)
# fig.show()

# fig = px.line(y = data4)
fig.show()

Output hidden; open in https://colab.research.google.com to view.


# Visually compare

<table border=0 cellpadding=20 cellspacing=10>
<tr>
  <td colspan=2>
    <center>
    <span style="font-family:helvetica,arial;line-height: 1.6">
      When comparing two audio clips (in this example, clips 1 and 4), <br/>
      notice how even though both clips have the same intonation and words, the times are not in sync.
      <br/><br/>
      <img src="https://pages.databricks.com/rs/094-YMS-629/images/dtw-animated.gif" width="600" align="center"/>
      <br/><br/>
      If we folllow Euclidian Matching approach the magnitude of the base and new time series are not in sync. <br/>  
      But by using dynamic time warping, the time series (x-axis) is shifted to sync the two clips.
    </span>  
    </center>
  </td>
<tr>
  <td><img src="https://pages.databricks.com/rs/094-YMS-629/images/euclidean-matching.png" width="500"/>&nbsp;</td>
  <td><img src="https://pages.databricks.com/rs/094-YMS-629/images/dynamic-time-warping.png" width="500"/></td>
</tr>
</table>

### Comparing Clip 1 and Clip 2
Comparing the base time series (trace 0) with a new time series [v2] (trace 1) of the same quote where the intonation and speed differnces are exagerated.

In [22]:
from fastdtw import fastdtw

# Distance between clip 1 and clip 2
dist = fastdtw(data1, data2)

# cumulative distance
display('Distance between clip 1 and clip 2: ', dist[0])

# all distances
display('Distance between clip 1 and clip 2: ', dist)


'Distance between clip 1 and clip 2: '

480148446.0

'Distance between clip 1 and clip 2: '

(480148446.0,
 [(0, 0),
  (1, 1),
  (2, 1),
  (3, 1),
  (4, 1),
  (5, 1),
  (6, 1),
  (7, 1),
  (8, 1),
  (9, 1),
  (10, 1),
  (11, 1),
  (12, 1),
  (13, 1),
  (14, 2),
  (15, 2),
  (16, 2),
  (17, 2),
  (18, 2),
  (19, 2),
  (20, 2),
  (21, 2),
  (22, 2),
  (23, 2),
  (24, 2),
  (25, 2),
  (26, 2),
  (27, 2),
  (28, 2),
  (29, 2),
  (30, 3),
  (31, 4),
  (32, 4),
  (33, 4),
  (34, 4),
  (35, 4),
  (36, 4),
  (37, 4),
  (38, 4),
  (39, 4),
  (40, 5),
  (41, 6),
  (42, 7),
  (43, 8),
  (44, 9),
  (44, 10),
  (44, 11),
  (44, 12),
  (45, 13),
  (45, 14),
  (45, 15),
  (45, 16),
  (46, 17),
  (47, 17),
  (48, 17),
  (49, 17),
  (50, 17),
  (51, 17),
  (52, 18),
  (53, 19),
  (54, 20),
  (54, 21),
  (54, 22),
  (54, 23),
  (54, 24),
  (55, 25),
  (55, 26),
  (55, 27),
  (55, 28),
  (55, 29),
  (56, 30),
  (57, 31),
  (58, 32),
  (58, 33),
  (58, 34),
  (58, 35),
  (58, 36),
  (58, 37),
  (58, 38),
  (59, 39),
  (59, 40),
  (59, 41),
  (59, 42),
  (60, 43),
  (60, 44),
  (60, 45),
  (60, 46

In [23]:
## visualize

fig = go.Figure()

fig.add_trace(go.Scatter(y = data1 , opacity=0.5))
fig.add_trace(go.Scatter(y = data2, opacity=0.5))    

fig.show()

Output hidden; open in https://colab.research.google.com to view.

### Comparing Clip 1 and Clip 3
Comparing the base time series (trace 0) with a new time series (trace 1) with *different quotes* but where the intonation and speed differnces are the same.

In [24]:
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean

fig = go.Figure()

fig.add_trace(go.Scatter(y = data1 , opacity=0.5))    
fig.add_trace(go.Scatter(y = data3, opacity=0.5))    

fig.show()

# Distance between clip 1 and clip 3
dist = fastdtw(data1, data3)
display('Distance between clip 1 and clip 3: ', dist[0])

Output hidden; open in https://colab.research.google.com to view.

### Comparing Clip 1 and Clip 4
Comparing the base time series (trace 0) with a new time series [v3] (trace 1) with the same quote and where the intonation and speed differences are the same.




In [25]:
fig = go.Figure()

fig.add_trace(go.Scatter(y = data1 , opacity=0.5))    
fig.add_trace(go.Scatter(y = data4, opacity=0.5))    

fig.show()

# Distance between clip 1 and clip 4
dist = fastdtw(data1, data4)
display('Distance between clip 1 and clip 4: ', dist[0])

Output hidden; open in https://colab.research.google.com to view.

## Discussion

By using `fastdtw`, we can quickly calculate the *distance* between two different time series - in this case audio patterns.  

| Base | Query | Distance |
| ---- | ----- | -------- |
| Clip 1 | Clip 2 | 480148446.0 |
|        | Clip 3 | 310038909.0 |
|        | Clip 4 | 293547478.0 |

Some quick observations:
* Clips 1 and 4 have the shortest distance as the audio clips have the same words and intonations
* The distance between Clips 1 and 3 is also quite short (though longer than when compared to Clip 4) even though they have different words, they are using the same intonation and speed.
* Clips 1 and 2 have the longest distance due to the extremely exagerated intonation and speed even though they are using the same quote.

As you can see, one can use **dynamic time warping** to ascertain the similarity of two different time series.

## Question A:

There are four heartbeat sounds in the data/audio folder (heartbeat1, heartbeat2, heartbeat3, and heartbeat4). Two sounds are recorded from the same person at the same condition, which you need to find out using time-series data visualization and a dynamic time warping algorithm.

(i) Visualize each sound clip and use visual perception to determine which two clips can be similar (write your thoughts and observation to receive full credit).

(ii) Use fastdtw library (implements DTW algorithm) and calculate the pairwise distances between two clips. In total, there are 6 pairs. Then, based on the distance scores, predict which two clips are similar.