Skip to content

diegofiori/ML_Higgs

Repository files navigation

ML_Higgs

Machine Learning CS-433: Project 1

Group: Diego Fiori, Paolo Colusso, Valerio Volpe

Kaggle team name: LaVolpeilFioreEilColosso

The files are organised based on:
i) the process followed to implement the models,
ii) the Machine Learning algorithms being applied.

(1) Preprocessing

"load_data.py"

"preprocessing.py"

Contains the functions used to clean the data. Specifically:
-how to deal with missing values;
-creation of dummy variables;
-feature augmentation with interaction terms and polynomials;
-normalisation of data.

(2) Generic functions

"regression_tools.py"

Contains the generic functions used throughout the implementation of algorithms. Specifically:
-auxiliary functions for regression implementations:
-single steps of regression algorithms;
-extraction of a sample of the dataset;
-batch creation.

"AIC.py"

Implements a subset selection method based on AIC. The method is implemented both for ridge and for logistic regression and constructs a series of models of increasing number of variables, greedily adding a new variable at each step. In the end the best of these models is selected using AIC. Contains the functions: compare_aic_gradient_descent(y,tx,gamma,max_iter,threshold)
compare_aic_ridge(y,tx,lambda_)

(3) Implementations

"implementations.py"

Contains the implementations of the main machine learning algorithms we selected. The functions defined in this .py are:
-least_squares_GD
-least_squares_SGD
-least_squares
-ridge_regression
-ridge_regression_SGD
-lasso_regression_GD
-logistic_regression
-reg_logistic_regression
-logistic_regression_newton_method_demo

(4) Cross Validations

Cross validation is used to set the values of hyperparameters and polynomial degrees in different regression models. The files which implement cross-validation are:
-"cross_validaion_logistic.py"
-"cross_validation_lasso.py"
-"cross_validation_ridge.py"
-"cross_validation_ridge_super.py"

(5) Estimate models

The following files run algorithms implementing different machine learning algorithms from the data loading phase to the final csv creation.
-"test_lasso.py"
-"test_logistic_penalized.py"
-"test_logistic_penalized-cross.py"
-"test_logistic_newton.py"
-"test_logistic_gd.py"
-"test_AIC_logistic.py"

A few functions, such as batch_iter, were taken from the helpers of the lab session of the course.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages