Vinho Verde Dataset Analysis

Project developed for the Mathematics for Machine Learning master course at Polythecnic of Turin.

The goal of the project was to study a dataset and apply some classification or regression algorithms. Detailed report.

Introduction

This is the analysis of Vinho Verde wine dataset that contains two different types of wine: red and white. The task is to build a model in order to predict wine quality given its psychochemical characteristics.

Data Exploration

Phase of the analysis is exploring data in order to obtain the highest possible number of information before applying models.

Input features: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, alcohol, sulphates.

Output variables: quality (score between 0 and 10), label (derived from the quality feature: low and high quality labels).

Other steps of the data exploration are: check for null values, duplicates values, quality distribution, univariate analysis, bivariate analysis.

Preprocessing

In the preprocessing part, data is processed in order to make it suitable for the different machine learning algorithms.

The steps are: standardization, dimensionality reduction (PCA) and split dataset.

Model Application

The problem is solved both as a regression and classification task so the machine learning algorithms used are suitable for both regression and classification. These algorithms are: SVM, KNN, Random Forest.

Extra

Since the classes were unbalanced, some balancing algorithms were used: SMOTE, ADASYN. These methods allowed to get similar accuracy score but most importantly the models were able to predict classes that previously they could not due to the very low number of elements for those classes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Images		Images
.gitignore		.gitignore
README.md		README.md
VinhoVerdeAnalysisReport.pdf		VinhoVerdeAnalysisReport.pdf
WineQuality.ipynb		WineQuality.ipynb
winequality-red.csv		winequality-red.csv
winequality-white.csv		winequality-white.csv
winequality.names		winequality.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

.gitignore

.gitignore

README.md

README.md

VinhoVerdeAnalysisReport.pdf

VinhoVerdeAnalysisReport.pdf

WineQuality.ipynb

WineQuality.ipynb

winequality-red.csv

winequality-red.csv

winequality-white.csv

winequality-white.csv

winequality.names

winequality.names

Repository files navigation

Vinho Verde Dataset Analysis

Introduction

Data Exploration

Preprocessing

Model Application

Extra

About

Releases

Packages

Languages

MauriVass/VinhoVerdeDatasetAnalysis

Folders and files

Latest commit

History

Repository files navigation

Vinho Verde Dataset Analysis

Introduction

Data Exploration

Preprocessing

Model Application

Extra

About

Resources

Stars

Watchers

Forks

Languages