# Introduction 

The aim of the project is to preliminary test a clustering procedure on multivariate road traffic time series in order to separate different paths between the days of the week and between different months. To create the multivariate time series two fundamental variables are considered the flow $q(t,x)$ and the density $\rho(t,x)$ . The clustering technique used is k-means with soft-dynamic time warping to compare the series. To see if the clustering technique is able to generalize well traffic dynamics, it is also applied to unseen data (the clustering is train on a specific segment of a road and then tested on a different segment with similar boundary level of flow, density and speed). In order to decide the most suitable number of cluster the silhouette coefficients is computed  as well as a similarity measure between nearest clusters  based on soft-DTW.

In section 2 I present the data used, the detectors considered S60 and S1816, the pre-process strategy implemented to treat rough data and normalize them. In section 3 I explain the soft-DTW similarity measure between time series, the k-means algorithm and a set of metrics to evaluate the technique. Then in section 4 the result are showed: Both detectors are used for training the centroids and as test set, the procedure is repeated for different aggregation time and only significant results are showed. Finally in section 5 conclusion and outlook are reported.       

# Data and Pre-process  

To test the procedure I used data from Minnesota Department of Transportation.The roads considered are I-35W and I-94 the segment of the roads considered are showed in Figure 1. 



\begin{figure}
    \centering
    \includegraphics{maps.png}
    \caption{Satellite image of I-35 and I-94}
\end{figure}
\emph{Figure 1. Satellite image of I-35 and I-94}




The two detectors considered are S60 on I-35W north direction and S1816 on I-94 east direction. Despite the detectors are in different road they share the same number of lanes 5 and they have the similar boundary level in term of flow, density and speed in the 2013 year. In addition both segment do not present ramps 500 meters before and after the detectors. The flow density and speed measurements are downloaded using http://data.dot.state.mn.us/datatools/ for both detectors all lanes are aggregated together and the 30-minute averages from 01/01/2013 00:00 to 31/12/2013 23:30 of the dimensions is computed (17650 observations). The procedure is then repeated also with 6-minute averages (87600 observations).

S60 detector boundary levels 30-minute averages: 

- $\rho(t,x): MIN=0.37\ \ veh/km \ \ \ \ MAX=91.84\ \ veh/km$

- $q(t,x): MIN=74 \ \ veh/h \ \ \ \ MAX=6074\ \ veh/h$

- $v(t,x): MIN=7.02 \ \ km/h \ \ \ \ MAX=98.65\ \ km/h$

S1816 detector boundary levels 30-minute averages: 

- $\rho(t,x): MIN=0.7216\ \ veh/km \ \ \ \ MAX=90.5739\ \ veh/km$

- $q(t,x): MIN= 112\ \ veh/h \ \ \ \ MAX=7266\ \ veh/h$

- $v(t,x): MIN= 8.509\ \ km/h \ \ \ \ MAX=86.79\ \ km/h$




Due to the long period of time considered, from 01/01/2013 to 31/12/2013 for both detectors are present missing values. To reduce their impact on the normalization procedure and on the clustering algorithm  missing values are replaced with substituted values. Generally missing values are created when detectors are out of order, no measure of flow and density are registered for a certain period. To overcome that an imputation process is performed. If for example data on 14/03/2013 at 9:30 are missing the imputation is done by taking the median of the data on 01/03, 07/03, 21/03 and 28/03 at 9:30. By considering the median of the four observation closest to the missing one with the respect to the temporal shift the possible impact of the public holidays in the imputation procedure is reduced. 

Time series, flow and density, are preprocessed using normalization over all period (17520 observations for 30-minute averages and 87600 observations for 6-minute averages). This scaler is such that each output time series is in the range [0,1]  allowing to have identical scales for time series with originally different scales (veh/h and veh/km):
$$\rho_{norm}(t,x)=\frac{\rho(t,x)-MIN}{MAX-MIN}$$.  
$$q_{norm}(t,x)=\frac{q(t,x)-MIN}{MAX-MIN}$$. 


The MinMaxScaler function  from the Python machine learning library  $\textbf{Scikit-learn}$ \footnote{Pedregosa et.Al Scikit-learn: Machine Learning in Python}   trasform each values of the time series proportionally within the range [0,1] preserving the shape. Density and flow scaled series are not amplitude invariant, they do not have the same standard deviation (reached instead by using standardization). 
By looking forward to the clustering procedure density and flow scaled series do not have the same importance in explaining the variance within the cluster, a single modality could be responsible for a large part of the variance inside a specific cluster.
The inverse_transform method of MinMaxScaler, that Undo the scaling of a data point  according to feature_range $$\rho(t,x)=\rho(t,x)_{norm}*(MAX-MIN)+ MIN$$  $$q(t,x)=q(t,x)_{norm}*(MAX-MIN)+ MIN$$, is used in the centroids representation to have a better comprehension of paths, no more in the [0,1] range.

Despite the $MIN \ \ MAX$ for $\rho(t,x)$ and $q(t,x)$ are defined over the 2013 year, the clustering procedure is applied to the 365 daily multivariate time series with 48 daily observations for 30-minute averages and 240 observations 6-minute averages. The choice of scaling the two varaibles with respect to the entire year time and not with respect to the single 365 daily time series is done to preserve variability with respect to different days of the week. For example I simple assume that the $MAX$ density reached on Sunday or Saturday is really low compared to $MAX$ density reached on working days. 

To create the 365 daily normalized multivariate time series a simple $\textbf{reshape}$ procedure is applied: 

$[ts*n,2] \implies [ts,n,d]$ where $ts=365$ is the number of time series, $n=$ is the number of daily observations (48 or 240 depending on minute averages) and $d=2$ is the dimensionality of each time series (flow and density).  	







# Methodology 

## Soft-Dynamic Time Warping 

After having treated time series and pre-processed them in order to have a multivariate time series for every day of the year, I have to define a strategy in order to compare different series both for assign each series in a cluster and update the centroids in the k-means algorithms. Dynamic Time Warping is a technique to measure similarity between two temporal sequences considering not only the temporal alignment but every binary alignment of the two series. For example a similar traffic condition based on flow and density could be recognized in different hours of the day in two different series. The calculation of the DTW distance involves a dynamic programming algorithm that tries to find the optimum warping path between two series under certain constraints.

Given two multivariate series that corresponds to two different days:

$x \in R^{2 x n}$ and $y \in R^{2 x n}$ valued in $R^2$ (flow and density). Since in this  case $x$ and $y$ have the same lenght DTW can be computed in $O(n^2)$ time. 

Consider a function in order to compare different point of the two series $(x_i \in R^{2}$ and $y_j \in R^{2}) \ \ d \ \ : R^2 × R^2 \Rightarrow R$, such as $d(x_i, y_j)= (\sum_{i,j=1}^n{(|x_i - y_j|^p})$, where usually $p=2$ and $d(x_i, y_j)$ is the quadratic Euclidean distance between two vectors. 

A matrix of similarity is computed: 

$\Delta(x,y) := [d(x_i,y_j)]_{i,j} \in R^{2 x n x n}$

$\Delta(x,y)$ can also be defined as local cost matrix, such a matrix must be created for every pair of series compared.  

The DTW algorithm finds the path that minimizes the alignment between $x$ and $y$ by iteratively stepping through  $\Delta(x,y)$, starting at $[d(x_i,y_j)]_{1,1}$ and finishing at $[d(x_i,y_j)]_{n,n}$, and aggregating the cost \footnote{Sardà-Espinosa:"Comparing Time-Series Clustering Algorithms in R Using the dtwclust Package"}. At each step, the algorithm finds the direction in which the cost increases the least under the chosen constraints. These constraints typically consists in forcing paths to lie close to the diagonal. 

By considering $A_{n,n} \subset \{0, 1\}^{n,n}$ all binary alignment matrices the DTW similarity measure reads as follow: 


\begin{equation}
DTW(x,y)= \min_{A \in A_{n,n} } \langle\,A,\Delta(x,y) \rangle
\end{equation}

This creates a warped “path” between $x$ and $y$  that aligns each point in $x$ to the nearest point in $y$. 

However I can not define dynamic time warping as a distance because does not satisfy the triangular inequality, moreover it is not differentiable everywhere due to the  $\min$ operator.  

Soft-Dynamic Time Warping is a variant of DTW that is differentiable. It use the log-sum-exp formulation \footnote{ Cuturi,Blondel:"Soft-DTW: a Differentiable Loss Function for Time-Series"}:

\begin{equation}
DTW^{\gamma}(x,y)= - \gamma \log \sum_{A \in A_{n,n}} exp(- \frac{\langle\,A,\Delta(x,y) \rangle}{\gamma})
  \ \ where \ \ \gamma  \geq 0
\end{equation}

  

Despite considering all alignments and not just the optimal one, soft-DTW can be computed in quadratic time $O(n^2) $ as DTW, however as DTW soft-DTW does not satisfy the triangular inequality. Soft-DTW is a symmetric similarity measure, it supports multivariate series as DTW, and it can provide differently smoothed results by means of a user-defined parameter $\gamma$. 






The "path" created between  $x$ and $y$ is smoother than the one created with DTW. Soft-DTW depends on a hyper-parameter $\gamma$ that controls the smoothing. As showed in Figure 2 and in equation  DTW corresponds to the limit case when $\gamma$ =0. 


\begin{equation}
DTW^{\gamma}(x,y)=\begin{cases}
                     \min_{A \in A_{n,n} } \langle\,A,\Delta(x,y) \rangle ,\ \  \gamma=0 \\
                     - \gamma \log \sum_{A \in A_{n,n}} exp(- \frac{\langle\,A,\Delta(x,y) \rangle}{\gamma}), \ \ \gamma \geq 0 \\
                     \end{cases}
\end{equation}



By default the $\gamma$ hyperparameter is set to 1. 


\begin{figure}
    \centering
    \includegraphics{softdtw.png}
    \caption{}
\end{figure}
\emph{Figure 2.1 Soft-DTW hyperparameter behaviour }



## k-means algorithm

Partitioning methods, such as k-means and k-medoids use an iterative way to create the clusters by moving data points from one cluster to another, based on a distance measure, starting from an initial partitioning. \footnote{k-medoids clustering is very similar to the k-means clustering algorithm. The major difference between them is that while a cluster is represented with its center in the k-means algorithm, it is represented with the most centrally located data point in a cluster in the k-medoids clustering}.

Stepping into time series clustering, since the procedure has been applied to 30-minute averages and 6-minute averages, even if the number of data points is substantially large (365 multivariate time series with 240 daily observations for 6-minute averages) k-means reamins computationally attractive. The complexity of each iteration of the k-means algorithm performed on 365 time series is $O(k × 365)$. This linear complexity is one of the reasons for the popularity of the k-means clustering algorithms \footnote {Soheily-Khah:" Generalized k-means based clustering for temporal data under time warp" Chapter 3}.

The soft-DTW is used in k-means algorithm to assign the series to the clusters and to upload the centroids of the cluster (centroids in a cluster corresponds to the time multivariate series that minimizes the sum of the similarity measures between that time series and all time series inside the cluster). Given the 365 multivariate time series each of them composed by daily observations (48 with 30-minute averages, 240 with 6-minute averages) for both flow and density dimensions the algorithm work as follow: 


$$
\begin{cases}
\textbf{Algorithm} \ \ k-meansclustering \ \ (T,k) \\
\ \ \ \ Input: T = (t_1, t_2, ..., t_{365})  \\
 \ \ \ \ Input: k \ \ the \ \ number \ \ of \ \ clusters \\
 \ \ \ \ Output: {c_1, c_2, ..., c_k} \ \ (set\ \ of\ \ cluster\ \  bi-dimensional\ \ centroids ) \\
\ \ \ \ p=0 \\
 \ \ \ \ Randomly \ \ choose \ \ k \ \ objects \ \ and \ \ make \ \ them \ \ as \  initial \ \ centroids \ \ ({c_1^{(0)}, c_2^{(0)}, ..., c_k^{(0)}}) \\ 
 \ \ \ \  \textbf{repeat} \\
  \ \ \ \ \ \ \ \  Assign \ \ each \ \ data \ \ point \ \ to \ \ the \ \ cluster \ \ with \ \ the \ \ nearest \ \ centroid \ \ using \ \ soft-DTW \\
  \ \ \ \ \ \ \ \ p=p+1 \\
   \ \ \ \ \ \ \ \ // \ \  \textbf{Centroid update} \\
   \ \ \ \ \ \ \ \ \textbf{for} \ \ j=1 \ \ to \ \ k \ \ \textbf{do} \\
    \ \ \ \ \ \ \ \ \ \ Update \ \ the \ \ centroid \ \ c_j^{(p)} \ \ of \ \ each \ \ cluster \ \ using \ \ soft-DTW \\
    \ \ \ \ \ \ \ \ \textbf{end for} \\ 
     \ \ \ \  \textbf{until} \\ \ \  c_j^{(p)} \approx  c_j^{(p-1)} \ \  \ \ j = 1, 2, ..., k \\
     \ \ \ \  Return \ \ c_1 ,c_2 ,... ,c_k \\
\end{cases}
$$


Sometimes different initializations of the centroids lead to very different final clustering results. To overcome this problem 5 times the k-means algorithm is run with different centroids randomly placed at different initial positions. The final results will be the best output of the 5 times consecutive runs in terms of inertia.

## Number of Optimal Clusters, k

K-means, require number of clusters, k, as clustering parameter. Getting the optimal number of clusters is very significant in the analysis. If,for example, k is too high, each time series starts representing a own cluster.

There is no a unique approach for finding the righ number of cluster, I take in account a clustering quality measure, the soft-DTW similarity measure between nearest centroids and an empirical method to fix a minimum number of time series in each cluster.  

Since Soft-DTW is differentiable it could be also used as a function to evaluate the cohesion inside each cluster and the separation with respect to the nearest clusters. The silhouette coefficient is a measure of how similar a time series  is to its own cluster (cohesion) compared to other clusters (separation). The silhouette can be computed with the soft-DTW metric, it takes values in the range [-1, 1] \footnote{Rousseeuw: "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis" pag 53-65}. 

Assume that the time series have been clustered via k-means.
For time series $t_i  \in C_k$ (time series $i$ in the cluster $C_k$ )
$a(t_i)$ is the mean distance between time series $t_i$ and all other time series in the same cluster:

$a(t_i)= \frac{1}{| C_k | -1} \sum_{j \in C_k} DTW^{\gamma}(t_i,t_j)$. 

$b(t_i)$ is defined as  the mean dissimilarity of the time series $t_1  \in C_k$ to  the nearest cluster $C_z$ ( where $C_k \neq C_z$) as the mean of the distance from the time series $t_i$ to all time series $\in C_z$:

$b(t_i) = \min_{k \neq z} \frac{1}{|C_z|}\sum_{j \in C_z} DTW^{\gamma}(t_i,t_j)$. 

Finally the silhouette coefficient for a time series is computed as follow: 

$s(t_i) = \frac{b(t_i)-a(t_i)}{max\{a(t_i),b(t_i)\}} if \ \ |C_k| >1$

The coefficients for every time series is averaged to have a global measure. The Mean Silhouette Coefficient for all time series is computed for the different number of clusters to see how this number afflicts the analysis. 

I consider also the soft-DTW similarity measure between the nearest clusters, by taking in account their centroids and applying equation :

- When $k=2$ becomes $DTW^{\gamma}(C_1,C_2)$ where $C_1$ and $C_2$ are centroids of the clusters (multivariate time series (flow and density).

- When $k=d$ where $d\geq 2$, $DTW^{\gamma}(C_j,C_z)$ is applied to all possible binary combinations of centroids forming a symmetric matrix of dimension $(d x d)$ in which the minimum similarity between two centroids is selected.  

As the number of cluster $k$ increases the nearest clusters become close each other until an asymptotic value of soft-DTW is reached. In addition by increasing the number of clusters each of them should continue to be representative of a particular subset, without having too few time series inside them. Following a rule of thumb \footnote{Zhang et.Al: "An empirical study to determine the optimal k in Ek-NNclus method" chapter 1}. I fixed a lower bound for the number of observations in each cluster approximately at $\sqrt{2*N}$ where $N$ is the number of daily time series (365). If the number of time series within a cluster is lower than the fixed bound the cluster does not generalize well a particular traffic path, thus the number of cluster should be reduced.        


The code is implemented with the Python machine learning library for time series $\textbf{tslearn}$ \footnote{Tavernard et.Al: "Tslearn, A Machine Learning Toolkit for Time Series Data"}.  








# Result 

In this chapter graphical results are showed. To generalize the traffic dynamics within a cluster the centroids, for flow and density \footnote{ $q(x,t)$ in (veh/100)/h and $\rho(x,t)$ in (veh/Km) using inverse_transform method }, of the train set are plotted. To identified traffic patterns over the year a graphical tool is used: $\textbf{calplot}$ library \footnote{https://pypi.org/project/calplot/}. In addition to verify the path in different period of the year and to validate the representation of the centroids weekly series of flow and density are illustrated.

The centroids trained on a particular detector are both used on train data (flow and density of that detector) and on test data (flow and density of the other detector). As the number of cluster increase the generalization on the test data becomes more difficult thus I only report the grapichal results on test set for $k=2$ and $k=3$. 

The procedure is repeated both for 30-minute averages and 6-minute avareges, a general tendency is found for both detectors: Until a certain number of clusters (3 for S60 and 4 for S1816) 30-minute averages and 6-minute averages have the same results (same days in same cluster), after these numbers of clusters (3 for S60 and 4 for S1816) the two aggregation times give different results. For 6-minute averages only these lasts results are presented.   

## S60 detector 30-minute averages 

For the I-35W's detector the number of cluster showed is from 2 to 5. Then the plot of the soft-DTW similarity measure between nearest clusters (nearest centroids) in relation with the number of cluster is reported to confirm that after 5 clusters some of them may be irrelevant for the analysis.    

### Two clusters

For two clusters on S60 detector the algorithm separate well the working days and no working days including weekends and public holidays . From the centroids in Figure 3.1 I can recognize free flow in the first cluster $k=0$ and peaks in the morning (7-9) and in the afternoon (16:30-18:30) in the density centroid of the second cluter $k=1$. 

\begin{figure}
    \centering
    \includegraphics{S60paths of the centroids k=2.png}
    \caption{}
\end{figure}
\emph{Figure 4.1 Centroids of S60 detector two clusters}


As showed in Figure 4.2 I clearly recognize this path over the year, despite some working days classified as no working days and Saturdays as working days (red circle). To check this days a section is created in Appendix{}. In addition Public holidays are marked (green circle) \footnote{ 01-01-2013. 21-01-2013 Martin Luther King's Day. 27-05-2013 Memorial Day. 04-07-2013 Indipendence Day. 02-09-2013 Labor Day. 28-11-2013 Thanksgiving Day. 25-12-2013.}  


\begin{figure}
    \centering
    \includegraphics{S60train k=2.png}
    \caption{}
\end{figure}
\emph{Figure 4.2 Clustering on S60 detector two clusters}


The S60 centroids applied to S1816 data shows similar behaviour in Figure 4.3, in general working days and no-working days are distinguished not only on train data but also on unseen data (test set). Some working days are classified as no working days due to a different dynamics of the traffic during working days on I-35W's segment road and I-94's segment road as showed later. 
\begin{figure}
    \centering
    \includegraphics{S60test k=2.png}
    \caption{}
\end{figure}
\emph{Figure 4.3 Clustering on S1816 detector, with respect to S60 centroids two clusters}


 





### Three clusters

In Figure 4.4, For three clusters on S60 detector the first cluster $k=0$ represents again a free flow situation in no-working days (weekends and Public holidays).The second cluster $k=1$ represents a low level of traffic  during working days, while the third cluster $k=2$ represents a greater level of traffic  during working days with respect to  the second cluster. Despite the second $k=1$ and third $k=2$ clusters have similar behaviour in term of flow centroids, they differ mostly for the traffic congestion of the afternoon (16:30-18:30). The peak in the density centroid in cluster $k=2$ is greater than 50 $veh/km$ while the peak in the density centroid in cluster $k=1$ is around 25 $veh/km$. The two cluster $k=1$ and $k=2$ have similar behaviour in the morning congestion (7-9) where the density centroids reach the same level (35 $veh/km$).       
\begin{figure}
    \centering
    \includegraphics{S60paths of the centroids k=3.png}
    \caption{}
\end{figure}
\emph{Figure 4.4 Centroids of S60 detector three clusters}


In Figure 4.5 I can see the seasonal behaviour of the traffic on S60 detector. According to the clustering algorithm the most trafficated months are June, July, August and December (do not consider last 10 days of Christmas holidays) where the majority of the days are assigned to cluster $k=2$. In addition the majority of Mondays are assigned to cluster$k=1$. To further validate the seasonal trends In Figure 4.6 and 4.7 the weekly time series (marked in green) of different months are plotted together.    
\begin{figure}
    \centering
    \includegraphics{S60train k=3.png}
    \caption{}
\end{figure}
\emph{Figure 4.5 Clustering on S60 detector three clusters}

In Figure 4.6 and 4.7  the density series from Monday to Friday are compared. The March's series (from 11/03 to 15/03), cluster $k=1$, presents an afternoon peaks lower than the one presents in August's series (from 26/08 to 30/08) and in December's series (from 09/12 to 13/12), both cluster $k=2$. This behaviour is captured by the algorithm as the main difference between $k=1$ and $k=2$ is due to the traffic congestion in the afternoon (16:30-18:30) 
 
\begin{figure}
    \centering
    \includegraphics{S60 march vs august.png}
    \caption{}
\end{figure}
\emph{Figure 4.6 Density of S60 detector in a week of March and in a week of August}


\begin{figure}
    \centering
    \includegraphics{S60 march vs december.png}
    \caption{}
\end{figure}
\emph{Figure 4.7 Density of S60 detector in a week of March and in a week of December}


The S60 centroids are also applied to S1816 data. Working days traffic on S1816 is more represented by $k=2$ cluster of S60 detector, only 30 days are classified inside $k=1$ cluster. No-working days (weekends and Public holidays) continue to be correctly classified. Again to test the difference between clusters $k=1$ and $k=2$ of S60 detector on unseen data in Figure 4.8 and 4.9 the weekly time series of S1816 detector (marked in green) of different months are plotted together.       
\begin{figure}
    \centering
    \includegraphics{S60test k=3.png}
    \caption{}
\end{figure}
\emph{Figure 4.8 Clustering on S1816 detector, with respect to S60 centroids three clusters}

In Figure 4.9 and 4.10  the density series from Monday to Friday of S1816 are compared. The March's series (from 11/03 to 15/03) presents an afternoon peaks lower than the one presents in August's series (from 26/08 to 30/08) and in December's series (from 09/12 to 13/12), both cluster $k=2$, except for Wednesday 13/03 classified in cluster $k=2$. The different behaviour in term of density levels during the traffic congestion in the afternoon (16:30-18:30) between $k=1$ and $k=2$ is captured also on the test set.   

\begin{figure}
    \centering
    \includegraphics{S1816 march vs august.png}
    \caption{}
\end{figure}
\emph{Figure 4.9 Density of S1816 detector in a week of March and in a week of August}


\begin{figure}
    \centering
    \includegraphics{S1816 march vs december.png}
    \caption{}
\end{figure}
\emph{Figure 4.10 Density of S1816 detector in a week of March and in a week of December}








### Four clusters 

In Figure 4.11, For four clusters on S60 detector the first cluster $k=0$ represents again a free flow situation in no-working days (weekends and Public holidays).The fourth cluster $k=3$ represents a low level of traffic  during working days, while the second cluster $k=1$ and the third one $k=2$ (similar each other) represents a greater level of traffic during working days with respect to the third cluster. The second $k=1$, the third $k=2$ and fourth $k=3$ clusters have similar behaviour in term of flow centroids, in addition the three clusters have similar behaviour in the morning congestion (7-9) where the density centroids reach the same level (35 $veh/km$). The fourth cluster $k=3$ differ mostly from the second $k=1$ and third $k=2$ ones for the traffic congestion of the afternoon (16:30-18:30). The peak in the density centroid in the fourth cluster $k=3$ is  around 25 $veh/km$ more flatted than density peaks reached in the afternoon by the second $k=1$ and third $k=2$ clusters. Despite the second $k=1$ and third $k=2$ clusters are the most similar ones they are distinguished by the level of traffic congestion in the afternoon (16:30-18:30) where the density centroid of the second cluster $k=1$ reaches 40 $veh/km$ while the density centroid of the third cluster $k=2$ exceeds 50 $veh/km$. 



\begin{figure}
    \centering
    \includegraphics{S60paths of the centroids k=4.png}
    \caption{}
\end{figure}
\emph{Figure 4.11 Centroids of S60 detector four clusters}

In Figure 4.12 the first cluster $k=0$  I can see the seasonal behaviour of the traffic on S60 detector. June, July August and December (do not consider last 10 days of Christmas holidays) are mostly marked by the second $k=1$ and the third $k=2$ clusters which represent a significant level of traffic. On the opposite side February, March, April and October are mostly signed with the fourth cluster $k=3$ that represents a lower level of traffic with respect to second $k=1$ and third $k=2$ clusters. In addition the majority of Mondays are assigned to the fourth cluster $k=3$, while Fridays are assigned mostly to the second $k=1$ and third $k=2$ clusters. In order to understand better how the algorithms separate the second $k=1$ and third $k=2$ clusters (the most similar ones), in Figure 4.13 the four days time series (marked in green) of different months are plotted together.    
\begin{figure}
    \centering
    \includegraphics{S60train k=4.png}
    \caption{}
\end{figure}
\emph{Figure 4.12 Clustering on S60 detector }


In Figure 4.13 the density series from Tuesday to Friday of different months are compared. The August's time series (from 27/08 to 30/08) ,third clusters $k=2$, presents visible higher peaks in the afternoon in relationship with the December's time series (from 17/12 to 20/12), second clusters $k=1$. This behaviour is captured by the algorithm as the main difference between $k=2$ and $k=1$ in which I can see different levels of traffic congestion in the afternoon (16:30-18:30), where the peaks of the third cluster $k=2$ (magenta) are greater than the one in the second cluster $k=1$ (blue). 

\begin{figure}
    \centering
    \includegraphics{S60 august december .png}
    \caption{}
\end{figure}
\emph{Figure 4.13 Density of S60 detector in a week of August and in a week of December}
 
The S60 centroids  applied to S1816 data presents the fourth cluster wih only 5 days. Following the lower bound rule fixed in (3.3) the four cluster trained on S60 can not be used to generalize the traffic dynamics on S1816 detector (test set), despite the algorithm continue to separate well working days and no workin days.   
 

### Five clusters 

In Figure 4.14, For five clusters on S60 detector the third cluster $k=2$ contains only 8 days, thus does not capture a particular path of the traffic over the year. 
\begin{figure}
    \centering
    \includegraphics{S60train k=5.png}
    \caption{}
\end{figure}
\emph{Figure 4.14 Clustering on S60 detector five clusters}

In Figure 4.15 the line of the soft-DTW similarity measure between nearest clusters in relation with the number of cluster considered  flattens markedly after 5 clusters showing that after this number of cluster the algorithm do not generalize well the traffic dynamic over the period considered.  

\begin{figure}
    \centering
    \includegraphics{S60elbow.png}
    \caption{}
\end{figure}
\emph{Figure 4.15 Soft-DTW similarity measure between closest cluster in relationship with the number of cluster}


## S1816 detector 30-minute averages

### two clusters  

\begin{figure}
    \centering
    \includegraphics{S1816paths of the centroids k=2.png}
    \caption{}
\end{figure}
\emph{Figure 4.16 Centroids of S1816 detector two clusters}

\begin{figure}
    \centering
    \includegraphics{S1816train k=2.png}
    \caption{}
\end{figure}
\emph{Figure 4.17 Clustering on S1816 two clusters}


\begin{figure}
    \centering
    \includegraphics{S1816test k=2.png}
    \caption{}
\end{figure}
\emph{Figure 4.18 Clustering on S60, with S1816 centroids two clusters} 

### three clusters 

\begin{figure}
    \centering
    \includegraphics{S1816paths of the centroids k=3.png}
    \caption{}
\end{figure}
\emph{Figure 4.19 Centroids of S1816 detector three clusters}

\begin{figure}
    \centering
    \includegraphics{S1816train k=3.png}
    \caption{}
\end{figure}
\emph{Figure 4.20 Clustering on S1816 detector three clusters}

\begin{figure}
    \centering
    \includegraphics{S1816train k=3.png}
    \caption{}
\end{figure}
\emph{Figure 4.21 Clustering on S60 detector, with S1816 centroids three clusters}
S1816test K=3.png

### four clusters

\begin{figure}
    \centering
    \includegraphics{S1816paths of the centroids k=4.png}
    \caption{}
\end{figure}
\emph{Figure 4.22 Centroids of S1816 detector four clusters}

\begin{figure}
    \centering
    \includegraphics{S1816train k=4.png}
    \caption{}
\end{figure}
\emph{Figure 4.23 Clustering on S1816 detector four clusters}


test on S60 only 10 days in a cluster. 

### five clusters 

\begin{figure}
    \centering
    \includegraphics{S1816paths of the centroids k=5.png}
    \caption{}
\end{figure}
\emph{Figure 4.24 Centroids of S1816 detector five clusters}


K=0 Sunday 
K=1 December 
K=2 Summer 
K=3 Saturday
K=4 Monday/ Rest of the months. 

\begin{figure}
    \centering
    \includegraphics{S1816train k=5.png}
    \caption{}
\end{figure}
\emph{Figure 4.25 Clustering on S1816 detector five clusters}




### number of clusters greater than 5

\begin{figure}
    \centering
    \includegraphics{S1816train k=6.png}
    \caption{}
\end{figure}
\emph{Figure 4.26 Clustering on S1816 detector six clusters}

\begin{figure}
    \centering
    \includegraphics{S1816train k=7.png}
    \caption{}
\end{figure}
\emph{Figure 4.27 Clustering on S1816 detector seven clusters}

\begin{figure}
    \centering
    \includegraphics{S1816elbow .png}
    \caption{}
\end{figure}
\emph{Figure 4.28 Soft-DTW similarity measure between closest cluster in relationship with the number of cluster}


## S60 detector 6-minute averages 

K=2 and K=3 same result of 30-minute averages

K=4 
\begin{figure}
    \centering
    \includegraphics{S60paths of the centroids k=4 6.png}
    \caption{}
\end{figure}
\emph{Figure 4.29 Centroids of S60 detector 6-minute averages, four clusters}

(k=1) and (k=2) really similar 
\begin{figure}
    \centering
    \includegraphics{S60train k=4 6.png}
    \caption{}
\end{figure}
\emph{Figure 4.30 Clustering on S60 detector 6-minute averages, four clusters}

cluster with just 16 days (k=3) 

## S1816 detector 6-minute averages 

K=5
\begin{figure}
    \centering
    \includegraphics{S1816paths of the centroids k=5 6.png}
    \caption{}
\end{figure}
\emph{Figure 4.31 Centroids of S1816 detector 6-minute averages, five clusters}


\begin{figure}
    \centering
    \includegraphics{S1816train k=5 6.png}
    \caption{}
\end{figure}
\emph{Figure 4.32 Clustering of S1816 detector 6-minute averages, five clusters}


cluster (k=3) just 11 days.

K=5 30-minute separate 


## Silhouette Coefficients

# Conclusion and Outlook

# Appendix 

## S60 detector misclassification check

05/03/2013 

\begin{figure}
    \centering
    \includegraphics{S60 march.png}
    \caption{}
\end{figure}
\emph{Figure 7.1 Weekly flow time series from 04/03 to 10/03}
 

18/03/2013 

\begin{figure}
    \centering
    \includegraphics{S60 march2.png}
    \caption{}
\end{figure}
\emph{Figure 7.2 Weekly flow time series from 18/03 to 24/03}
 

11/04/2013 

\begin{figure}
    \centering
    \includegraphics{S60 density April .png}
    \caption{}
\end{figure}
\emph{Figure 7.3 Weekly density time series from 08/03 to 14/03}


24/08/2013

\begin{figure}
    \centering
    \includegraphics{S60 august.png}
    \caption{}
\end{figure}
\emph{Figure 7.4 Weekly density time series from 19/08 to 25/08}

 
26/10/2013
\begin{figure}
    \centering
    \includegraphics{S60 october.png}
    \caption{}
\end{figure}
\emph{Figure 7.5 Weekly density time series from 21/10 to 27/10}






## S1816 detector misclassification check

11/02/2013 
\begin{figure}
    \centering
    \includegraphics{S1816 february2.png}
    \caption{}
\end{figure}
\emph{Figure 7.6 Weekly density time series from 11/02 to 17/02}
 


18/02/2013 and 22/02/2013 
\begin{figure}
    \centering
    \includegraphics{S1816 february.png}
    \caption{}
\end{figure}
\emph{Figure 7.7 Weekly density time series from 18/02 to 24/02}
 

04/03/2013 and 05/03/2013 
\begin{figure}
    \centering
    \includegraphics{S1816 march .png}
    \caption{}
\end{figure}
\emph{Figure 7.8 Weekly density time series from 04/03 to 10/03}
 

18/03/2013 

\begin{figure}
    \centering
    \includegraphics{S1816 march2.png}
    \caption{}
\end{figure}
\emph{Figure 7.9 Weekly density time series from 18/03 to 24/03}
 
04/12/2013

\begin{figure}
    \centering
    \includegraphics{S1816 december .png}
    \caption{}
\end{figure}
\emph{Figure 7.10 Weekly flow time series from 02/12 to 08/02}
 




# References 