This exercise consists of two parts. The former part uses a Dermatology dataset from the UCI datasets which has symptoms as attributes and types of Erythemato-squamous diseases (ESDs) as labels. The purpose of this part is to compare multiple Classifiers (dummy, Gaussian Naive Bayes (GNB), K-Neirest Neighbors (KNN) and Logistic Regression (LR)) using metrics such as Accuracy and F1. To manually optimize the classifiers we preprocessed the data, experimented with various hyper-parameter values and added pipelines.
The latter part uses an Online Poker Games dataset from Kaggle which has details for a hand of poker from the point of view of a single player as attributes and the hand’s outcome as label. The purpose of this part is to optimize Multi-layer Perceptron (MLP) and Support vector machines (SVM) using the F1 metric. To optimize the classifiers we preprocessed the data and used Optuna, an automated hyper-parameter optimization tool