Credits: 6 Lectures hours: 2h/week Projects hours: 2h/week Prof. Benjamin Quost Four projects (60%), one written exam (40%)
This course aims at presenting the modern techniques of large set of data analysis and at developing basic tools for data mining. The class aims at providing the students with the main theory under the hood of data mining and machine learning. The first part relates to exploratory data analysis, the approach where students analyze data sets using visual tools (plots, charts) and methods (Principal Component Analysis) to summarize their main characteristics and visualize relatedness and distance between populations . The second part concerns unsupervised and supervised learning, with pattern detection methods. Students will learn Bayesian theory, linear, quadratic regression and decisions trees with the implementation of the related loss functions and classifiers. Hence students will see different machine learning models and know how to choose the most robust or efficient model based on the data distribution and nature. In a nutshell, students will be able to describe the sense and information big volumes of data carry and justify the use of a particular method in real application.
All the projects are implemented with R.2017-2018
Descriptive statistics and Principal Component Analysis: Basic analysis of datasets with R, correlation determination and factors influence; manual application of the PCA, then use of R tools to apply it on different datasets.
Automatic Classification: Data visualization via AFTD (Analyse Factorielle d’un Tableau de Distances Factorial Analysis of a Distances Table), to show that this leads to the same results as a PCA; Hierarchical Classification; K-means implementation.
Discrimination, bayesian theory of decision: Implementation of Euclidian classifier, KNN algorithm with performance evaluation; Work on Bayes Rule with comparison between theoretical and practical results.
Discrimination: Implementation of Discriminant function analysis (linear, quadratic, and naive bayesian classifier), Logistic regression (linear and quadratic); Use of Decision Tree libraries and test on real data. 2017-2018