This repo contain an implementation of siamese neural networks for clustering task. The training of the nueral network considering a tiplet loss and hard sample mining. TLP dataset handling is also implemented and some clustering metrics. It's possible to download the TLP dataset from here.
Siamese neural networks are a type of neural networks that were created for the processing of two or more inputs. One of the reasons for it is the diferentiation and learning of some instances of the dataset to distinguish their characteristics in a deep way. For that, the neural networks implemented map each instance to a common two-dimensional Euclidean space for each class in the dataset. A graphic way to visualize this type of network is shown below [1]:
although the neural network is the most important aspect of this project, it's not the only one, since the loss function plays a fundamental role in the differentiation of instances. That is why the tiplet loss function is implemented, which aims to distance each negative instance from it's anchor, which is conveniently chosen, and fetch positive instances that belong to the same class. Both the loss function equation as an illustrative image that show the learning process [2] are shown below:
In addition to the above, it's common to find that the training of this neural networks implements hard sampling mining which aim to get the most difficult examples to learning from.
For tasks based on clustering, it is convenient to implement metrics that allow evaluating how accurate the examples are grouping. For this, the Silhouette coefficient method is implemented:
In short, this method evaluates how accurate an instance is relative to others in the same cluster or instances of the same class in other clusters.
This project contemplate the use of the dataset TLP [3]. This dataset counts with 50 different scenes of videos. Totaling a recording time of 400 minutes and 676k frames.
Below are some results that were obtained for the clustering of 3 and 10 classes, considering an AlexNet Convolutional Neural Network as a images processor. First, it's shown the results of the network without training and after of that, the results of the clustering with 5 epochs of training:
Now, it's presented the results with 10 classes:
Also, below is shown the reults of Silhoutte coefficient for 3 and 10 classes:
[1] Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition.In: ICML Deep Learning Workshop, vol. 2 (2015)
[2] Schroff, F., Kalenichenko, D., & Philbin, J. (2015). FaceNet: A unified embedding for facerecognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). doi:10.1109/cvpr.2015.7298682
[3] Abhinav Moudgil and Vineet Gandhi. Long-Term Visual Object Tracking Benchmark. CoRR. abs/1712.01358. 2017. Available at: http://arxiv.org/abs/1712.01358