Data Mining (DM) Project of the DM course at Department of Computer Science of University of Pisa.
Carvana is a start-up business launched by a well-established American company. The goal is to change completely the way people buy, finance, and trade their used vehicles by replacing physical infrastructure with technology and top of the line scientific models. This project shows the analysis based on the dataset published on kaggle.com for the Data Mining 2019/2020 Project. The aim is to build a model to advise future customers whether a purchase could be a good or bad buy.
-
Data Understanding: Explore the dataset with the analytical tools studied and write a concise “data understanding” report describing data semantics, assessing data quality, the distribution of the variables and the pairwise correlations.
-
Clustering analysis: Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm and which are the advantages provided by the different approaches.
-
Classification: Explore the dataset using classification trees. Use them to predict the target variable.
-
Association Rules: Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable.
All the details can be found in the report at this link.
An additional task for the project: compare results of classification by decision tree with KNN, Naive Bayesian, analysing also the runtime at training and test phase.
All the details can be found in the report at this link.
-
Alessandro Cudazzo - @alessandrocuda - alessandro@cudazzo.com
-
Giulia Volpi - giuliavolpi25.93@gmail.com
-
Flavia Achena - flavia.achena@gmail.com
-
Aleksandra Maslennikova - msasha1996@gmail.com
Copyright 2019 © Alessandro Cudazzo - Giulia Volpi - Flavia Achena - Aleksandra Maslennikova