DataMining1-Fundamentals

Project for the DataMining 1 exam. The dataset provided is a modified version (some values removed and made "missing values") of data present on https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset. The analysis therefore concerns IBM HR Analytics Employee Attrition & Performance.

Project Tasks

Guidelines for the task on Data Understanding

Data understanding (30 points)

Data semantics (3 points)
Distribution of the variables and statistics (7 points)
Assessing data quality (missing values, outliers) (7 points)
Variables transformations (6 points)
Pairwise correlations and eventual elimination of redundant variables (7 points)

Guidelines for the task on clustering

Clustering Analysis by K-means: (13 points)

Choice of attributes and distance function (1 points)
Identification of the best value of k (5 points)
Characterization of the obtained clusters by using both analysis of the k centroids and comparison of the distribution of variables within the clusters and that in the whole dataset (7 points)

Analysis by density-based clustering (9 points)

Choice of attributes and distance function (2 points)
Study of the clustering parameters (2 points)
Characterization and interpretation of the obtained clusters (5 points)

Analysis by hierarchical clustering (5 points)

Choice of attributes and distance function (2 points)
Show and discuss different dendograms using different algorithms (3 points)

Final evaluation of the best clustering approach and comparison of the clustering obtained (3 points)

Guidelines for the task on Association Rules Mining

Frequent patterns extraction with different values of support and different types (i.e. frequent, close, maximal), (6 points)
Discussion of the most interesting frequent patterns and analyze how changes the number of patterns w.r.t. the min_sup parameter (7 points)
Association rules extraction with different values of confidence (6 points)
Discussion of the most interesting rules and analyze how changes the number of rules w.r.t. the min_conf parameter, histogram of rules' confidence and lift (7 points)
Use the most meaningful rules to replace missing values and evaluate the accuracy (2 points)
Use the most meaningful rules to predict the target variable and evaluate the accuracy (2 points)

Guidelines for the task on Classification

Learning of different decision trees/classification algorithms with different parameters and gain formulas with the object of maximizing the performances (12 points)
Decision trees interpretation, validation with test and training set (6 points)
Training of different KNN classifiers with different parameters with the object of maximizing the performances (6 points)
Discussion of the best prediction model (6 points)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Association Rules		Association Rules
Classification		Classification
Clustering		Clustering
DataSet		DataSet
DataUnderstanding		DataUnderstanding
README.md		README.md
Relazione_Progetto_DataMining1.pdf		Relazione_Progetto_DataMining1.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataMining1-Fundamentals

Project Tasks

Guidelines for the task on Data Understanding

Data understanding (30 points)

Guidelines for the task on clustering

Clustering Analysis by K-means: (13 points)

Analysis by density-based clustering (9 points)

Analysis by hierarchical clustering (5 points)

Final evaluation of the best clustering approach and comparison of the clustering obtained (3 points)

Guidelines for the task on Association Rules Mining

Guidelines for the task on Classification

About

Releases

Packages

Languages

MatteoBiviano/DataMining1-Fundamentals

Folders and files

Latest commit

History

Repository files navigation

DataMining1-Fundamentals

Project Tasks

Guidelines for the task on Data Understanding

Data understanding (30 points)

Guidelines for the task on clustering

Clustering Analysis by K-means: (13 points)

Analysis by density-based clustering (9 points)

Analysis by hierarchical clustering (5 points)

Final evaluation of the best clustering approach and comparison of the clustering obtained (3 points)

Guidelines for the task on Association Rules Mining

Guidelines for the task on Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages