M Lebbah edited this page Mar 31, 2018 · 60 revisions

Welcome to the coliseum wiki!

Welcome to the Wiki webpage of Spartakus (Spark-clustering-notebook)! This Wiki webpage introduces somme clustering algorithms and describes its current implementation in the software using since 2012 Spark and Spark-notebook. This notebook has a dual purpose: teaching and research

The Wiki page is currently under initial construction, so come back soon. If you are interested in improving Spartakus (Spark-clustering-notebook) right now, contact us.

Getting started (https://github.com/Spark-clustering-notebook/coliseum)

Team

  • Mustapha LEBBAH. Resp. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13,

  • Hanane Azzag. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Tarn Duong. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Tugdual Sarazin. Lead Data Engineer

  • Mohammed Ghesmoune. Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Gael Beck. Phd student Computer Science Department (LIPN, CNRS(UMR 7030)) of the University of Paris 13

  • Doan Nhat Quang. ICT Lab, University of Science and Technology of Hanoi

  • Several students :

  • Quan Cao Anh (USTH of Hanoi, 2016), Omar Masmoudi (Tunis, 2015), Hugo Driviere (IUTV, 2016), Oscar ODIC (IUTV, 2013), Camille Gerin-Roze (2013), Victor Duvert (IUTV 2013), Aissa El Ouafi (IUTV 2013).

  • Amine Chaibi, Phd, Data scientist at Carrefour

    Thanks to Kensu (Andy Petrella and Xavier Tordoir) to help us to package the algorithms on spark-notebook

Publications

  • Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah: A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework. BigData 2017: 911-916
  • Tarn Duong, Gael Beck, Hanene Azzag, Mustapha Lebbah. Nearest neighbour estimators of density derivatives, with application to mean shift clustering. Pattern Recognition Letters (2016). http://dx.doi.org/10.1016/j.patrec.2016.06.021
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanane Azzag. state-of-the-art on clustering data stream (invited paper). Big Data Analytics journal, 2016
  • G. Beck, T. Duong, H. Azzag and M. Lebbah, "Distributed mean shift clustering with approximate nearest neighbours," 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, 2016, pp. 3110-3115. doi: 10.1109/IJCNN.2016.7727595. http://ieeexplore.ieee.org/abstract/document/7727595/
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanane Azzag. A new growing neural gas for clustering data streams. Neural Networks, Special Issue on Neural Network Learning in Big Data, 2016. http://dx.doi.org/10.1016/j.neunet.2016.02.003
  • Mohammed Ghesmoune, Mustapha Lebbah, Hanene Azzag. Micro-Batching Growing Neural Gas for Clustering Data Streams using Spark Streaming. Procedia Computer Science journal (2015) pp. 158-166. Doi 10.1016/j.procs.2015.07.290. Paper presented at INNS Conference on Big Data, 8-10 August 2015 – San Francisco, USA)
  • Mohammed Ghesmoune, Mustapha Lebbah, and Hanene Azzag. Clustering over data streams based on growing neural gas. In The Pacific-Asia Conference on Knowledge Discovery and Data Mining. PAKDD (2) 2015: 134-145.
  • Tugdual Sarazin, Mustapha Lebbah, and Hanane Azzag. Biclustering using spark- mapreduce. In 2014 IEEE International Conference on Big Data, Big Data 2014, Washington, DC, USA, October 27-30, 2014, pages 58–60, 2014.
  • Tugdual Sarazin, Hanane Azzag, and Mustapha Lebbah. 2014. SOM Clustering Using Spark-MapReduce. In Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW '14). IEEE Computer Society, Washington, DC, USA, 1727-1734. DOI=10.1109/IPDPSW.2014.192

French speaking conferences

  • Gaël Beck, Hanane Azzag, Mustapha Lebbah, Tarn Duong. Mean-shift : Clustering scalable et distribué, pp.415-425. EGC 2018
  • Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah. Nouveau Modèle de Sélection de Caractéristiques basé sur la Théorie des Ensembles Approximatifs pour les Données Massives, pp.377-378. EGC 2018 (Poster)
  • Mohammed Ghesmoune, Mustapha Lebbah and Hanane Azzag. G-Stream : une approche incrémentale pour le clustering de flux de données. In SFC 2015, 09-11 Septembre 2015, Nantes.
  • Mohammed Ghesmoune, Hanane Azzag and Mustapha Lebbah. Une nouvelle méthode topologique pour le clustering de flux de données. In COSI 2015, Coloque sur l’optimisation et les systèmes d’information, Oran, 01-03 Juin 2015.
  • Mohammed Ghesmoune, Mustapha Lebbah, Hanane Azzag. Clustering topologique pour le flux de données. In EGC 2015, vol. RNTI-E-28, pp.137-142.
  • Tugdual Sarazin, Hanane Azzag, Mustapha Lebbah. Modèle de Biclustering dans un paradigme "Mapreduce". In EGC 2015, vol. RNTI-E-28, pp.467-468
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.