Skip to content
Project 1 of the Machine Learning course at EPFL.
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
notebooks Test implementations and fix errors Oct 28, 2019
src
.gitignore
README.md
__init__.py
documentation_v1.8.pdf
implementations.py
notes.md Add report, fix README Oct 28, 2019
report.pdf
requirements.txt Add report, fix README Oct 28, 2019
run.py
tests.ipynb Test implementations and fix errors Oct 28, 2019

README.md

ML Project 1

General info

Repository containing the code for the Project 1 of the Machine Learning course at EPFL.

The team (SchroedingerCats) is composed by:

The project has been developed and tested with Python 3.6, and the packages used to get the project up and running are listed in requirements.txt, and can be installed with:

pip3 install --user --requirement requirements.txt

For visualization purposes in the feature selection and engineering phase, we also used matplotlib, seaborn, sklearn, and pandas, but they are not needed to run the models and the final training.

The training and the prediction on the provided test sets can be done running:

python3 run.py

Moreover, the data are supposed to be in the data folder (with respect to the run.py script), and are supposed to have the names train.csv and test.csv. It is possible to download the data we used from this page.

The output of the prediction can be found in the final-test.csv file, located in the same folder as run.py.

Project structure

The project is structured in the following way:

.
├── implementations.py: contains **all the implementations** required by the project
├── notes.md: general notes about the project development
├── README.md: this file :)
├── requirements.txt: contains the packages used to run the project
├── run.py: contains the **final code** to train the model
├── tests.ipynb: a notebook that contains the tests of the required implementations, that can be used as guide for usage
├── data: contains the datasets (.gitignore'd)
├── notebooks
│   ├── features_log.ipynb: contains our investigations about taking the logarithm of the features
│   ├── features_overview.ipynb: contains the exploratory data analysis phase
│   ├── logistic_regression.ipynb: contains out trials with logistic regression
│   └── ridge_regression.ipynb: contains our trials with ridge regression
└── src
    ├── helpers.py: some helper functions used by different modules
    ├── split.py: contains the function used to split the dataset into training and test sets
    ├── k_fold.py: contains the functions used for cross-validation
    ├── polynomials.py: contains the functions used to get the polynom
    ├── logistic: contains the functions used to train the logistic regression model
    │   ├── loss.py: contains the function to compute the loss
    │   ├── gradient.py: contains the function to compute the gradient
    │   ├── hessian.py: contains the function to compute the hessian
    │   ├── implementations.py: contains the **logistic regression** implementations required by the project
    │   └── sigmoid.py: contains the function to compute the sigmoid
    └── linear: contains the functions used to train the linear regression model
        ├── gradient.py: contains the function to compute the gradient
        ├── implementations.py: contains the **linear regression** implementations required by the project
        └── loss.py: contains the function to compute the loss function
You can’t perform that action at this time.