# Introduction 

The aim of the project is to preliminary test a clustering procedure on multivariate road traffic time series in order to separate different paths between the days of the week. To create the multivariate time series two fundamental variables are considered the flow $q(t,x)$ and the density $\rho(t,x)$ . The clustering technique used is k-means with soft-dynamic time warping to compare the series. To see if the clustering technique is able to generalize well traffic dynamics, it is also applied to unseen data (the clustering is train on a specific segment of a road and then tested on a different segment with similar boundary level of flow, density and speed). In order to decide the most suitable number of cluster the silhouette coefficients is computed and a similarity measure between the cluster based on soft-DTW is calculated.

In section 2 I present the data used, the detectors considered S60 and S1816, the pre-process strategy implemented to treat rough data and normalize them. In section 3 I explain the soft-DTW similarity measure between time series, the k-means algorithm and the silhouette coefficients used to evaluate the technique. Then in section 4 the result are showed: Both detectors are used for training the centroids and as test set. Finally in section 5 conclusion and outlook are reported.       

# Traffic Data 

To test the procedure I used data from Minnesota Department of Transportation.The roads considered are I-35W and I-94 the segment of the roads considered are showed in Figure 1. 



\begin{figure}
    \centering
    \includegraphics{maps.png}
    \caption{Satellite image of I-35 and I-94}
\end{figure}
\emph{Figure 1. Satellite image of I-35 and I-94}




The two detectors considered are S60 on I-35W north direction and S1816 on I-94 west direction. Despite the detectors are in different road they share the same number of lanes 5 and they have the similar boundary level in term of flow, density and speed in the 2013 year. In addition both segment do not present ramps 500 meters before and after the detectors. The flow density and speed measurements are downloaded using http://data.dot.state.mn.us/datatools/ for both detectors all lanes are aggregated together and the 30-minute averages from 01/01/2013 00:00 to 31/12/2013 23:30 of the dimensions is computed.The procedure is tested also with 6-minute averages see Appendix.

S60 detector boundary levels 30-minute averages: 

- $\rho(t,x): MIN=0.37\ \ veh/km \ \ \ \ MAX=91.84\ \ veh/km$

- $q(t,x): MIN=74 \ \ veh/h \ \ \ \ MAX=6074\ \ veh/h$

- $v(t,x): MIN=7.02 \ \ km/h \ \ \ \ MAX=98.65\ \ km/h$

S1816 detector boundary levels 30-minute averages: 

- $\rho(t,x): MIN=0.7216\ \ veh/km \ \ \ \ MAX=90.5739\ \ veh/km$

- $q(t,x): MIN= 112\ \ veh/h \ \ \ \ MAX=7266\ \ veh/h$

- $v(t,x): MIN= 8.509\ \ km/h \ \ \ \ MAX=86.79\ \ km/h$




# Pre-process

Due to the long period of time considered, from 01/01/2013 to 31/12/2013 for both detectors are present missing values. To reduce their impact on the normalization procedure and on the clustering algorithm  missing values are replaced with substituted values. Generally missing values are created when detectors are out of order, no measure of flow and density are registered for a certain period. To overcome that an imputation process is performed. If for example data on 14/03/2013 at 9:30 are missing the imputation is done by taking the median of the data on 01/03, 07/03, 21/03 and 28/03 at 9:30. By considering the median of the four observation closest to the missing one with the respect to the temporal alignment the possible impact of the public holidays in the imputation procedure is reduced. 

Time series, flow and density, are preprocessed using normalization over all period (17520 observations for 30-minute averages and 87600 observations for 6-minute averages). This scaler is such that each output time series is in the range [0,1]  allowing to have identical scales for time series with originally different scales (veh/h and veh/km):
$\rho_{std}(t,x)=\frac{\rho(t,x)-MIN}{MAX-MIN}$ and $q_{std}(t,x)=\frac{q(t,x)-MIN}{MAX-MIN}$. 

The MinMax Scaler trasform each values of the time series proportionally within the range [0,1] preserving the shape. Density and flow scaled series are not amplitude invariant, they do not have the same standard deviation (reached instead by using standardization). 
By looking forward to the clustering procedure density and flow scaled series do not have the same importance in explaining the variance within the cluster, a single modality could be responsible for a large part of the variance inside a specific cluster. 






# Methodology 

After having treated time series and pre-processed them in order to have a multivariate time series for every day of the year, I have to define a strategy in order to compare different series both for assign each series in a cluster and update the centroids in the k-means algorithms. Dynamic Time Warping is a technique to measure similarity between two temporal sequences considering not only the temporal alignment but every binary alignment of the two series. For example a similar traffic condition based on flow and density could be recognized in different hours of the day in two different series.

Given two multivariate series that corresponds to two different days:

$x \in R^{2 x n}$ and $y \in R^{2 x n}$ valued in$ R^2$ (flow and density).

Consider a function in order to compare different point of the two series $(x_i \in R^{2}$ and $y_j \in R^{2}) \ \ d \ \ : R^2 × R^2 \Rightarrow R$, such as the euclidean distance $d(x_i, y_j)= \sqrt{(x_i - y_j)^2}$ . 

A matrix of similarity is computed: 

$\Delta(x,y) := [d(x_i,y_j)]_{i,j} \in R^{2 x n x n}$

Finally for each element $x_i $ in series $x $ DTW selects the nearest point in $ y $ for similarity calculation: 

By considering $A_{n,n} \subset \{0, 1\}^{n,n}$ all binary alignment matrices the DTW similarity measure reads as follow: 

$DTW(x,y)= \min_{A \in A_{n,n} } \langle\,A,\Delta(x,y) \rangle$

This creates a warped “path” between $x$ and $y$  that aligns each point in $x$ to the nearest point in $y$. 

However I can not define dynamic time warping as a distance because does not satisfy the triangular inequality, moreover it is not differentiable everywhere due to the  $\min$ operator.  

Soft-Dynamic Time Warping is a variant of DTW that is differentiable. It use the log-sum-exp formulation \footnote{ Cuturi,Blondel Soft-DTW: a Differentiable Loss Function for Time-Series}:

$DTW^{\gamma}(x,y)= - \gamma \log \sum_{A \in A_{n,n}} exp(- \frac{\langle\,A,\Delta(x,y) \rangle}{\gamma})$
  where $\gamma  \geq 0$ 
  








The "path" created between  $x$ and $y$ is smoother than the one created with DTW. Soft-DTW depends on a hyper-parameter $\gamma$ that controls the smoothing. As showed in Figure 2. DTW corresponds to the limit case when $\gamma$ =0.





In [None]:
\begin{figure}
    \centering
    \includegraphics{softdtw.png}
    \caption{}
\end{figure}
\emph{Figure 2. Soft-DTW hyperparameter behaviour }



The soft-DTW is used in k-means algorithms to assign the series to the clusters and to upload the centroids of the cluster (centroids in a cluster corresponds to the time multivariate series that minimizes the sum of the similarity measures between that time series and all time series inside the cluster). Given the 365 multivariate time series each of them composed by daily observations (48 with 30-minute averages, 240 with 6-minute averages) for both flow and density dimensions: 


$$
\begin{cases}
\textbf{Algorithm} \ \ k-meansclustering \ \ (T,k) \\
\ \ \ \ Input: T = (t_1, t_2, ..., t_{365})  \\
 \ \ \ \ Input: k \ \ the \ \ number \ \ of \ \ clusters \\
 \ \ \ \ Output: {c_1, c_2, ..., c_k} \ \ (set\ \ of\ \ cluster\ \  bi-dimensional\ \ centroids ) \\
\ \ \ \ p=0 \\
 \ \ \ \ Randomly \ \ choose \ \ k \ \ objects \ \ and \ \ make \ \ them \ \ as \  initial \ \ centroids \ \ ({c_1^{(0)}, c_2^{(0)}, ..., c_k^{(0)}}) \\ 
 \ \ \ \  \textbf{repeat} \\
  \ \ \ \ \ \ \ \  Assign \ \ each \ \ data \ \ point \ \ to \ \ the \ \ cluster \ \ with \ \ the \ \ nearest \ \ centroid \ \ using \ \ soft-DTW \\
  \ \ \ \ \ \ \ \ p=p+1 \\
   \ \ \ \ \ \ \ \ // \ \  \textbf{Centroid update} \\
   \ \ \ \ \ \ \ \ \textbf{for} \ \ j=1 \ \ to \ \ k \ \ \textbf{do} \\
    \ \ \ \ \ \ \ \ \ \ Update \ \ the \ \ centroid \ \ c_j^{(p)} \ \ of \ \ each \ \ cluster \ \ using \ \ soft-DTW \\
    \ \ \ \ \ \ \ \ \textbf{end for} \\ 
     \ \ \ \  \textbf{until} \\ \ \  c_j^{(p)} \approx  c_j^{(p-1)} \ \  \ \ j = 1, 2, ..., k \\
     \ \ \ \  Return \ \ c_1 ,c_2 ,... ,c_k \\
\end{cases}
$$


Since Soft-DTW is differentiable it could be also used as a function to evaluate the cohesion inside each cluster and the separation with respect to the nearest clusters. The silhouette coefficient is a measure of how similar a time series  is to its own cluster (cohesion) compared to other clusters (separation). The silhouette can be computed with the soft-DTW metric, it takes values in the range [-1, 1]: 

Assume that the time series have been clustered via k-means.
For time series $t_i  \in C_k$ (time series $i$ in the cluster $C_k$ )
$a(t_i)$ is the mean distance between time series $t_i$ and all other time series in the same cluster:

$a(t_i)= \frac{1}{| C_k | -1} \sum_{j \in C_k} DTW^{\gamma}(t_i,t_j)$. 

$b(t_i)$ is defined as  the mean dissimilarity of the time series $t_1  \in C_k$ to  the nearest cluster $C_z$ ( where $C_k \neq C_z$) as the mean of the distance from the time series $t_i$ to all time series $\in C_z$:

$b(t_i) = \min_{k \neq z} \frac{1}{|C_z|}\sum_{j \in C_z} DTW^{\gamma}(t_i,t_j)$. 

Finally the silhouette coefficient for a time series is computed as follow: 

$s(t_i) = \frac{b(t_i)-a(t_i)}{max\{a(t_i),b(t_i)\}} if \ \ |C_k| >1$

The coefficients for every time series is averaged to have a global measure.


