Skip to content

bernharl/FYS-STK4155-project1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FYS-STK4155 Project 1

This is our source code for Project 1 in the course FYS-STK4155 Applied Data Analysis and Machine Learning at the University of Oslo.

The project is based on various introductory regression methods and resampling. We will be using the following regression methods,

  1. Ordinary Least Squares
  2. Ridge
  3. Lasso

in combination with our own implementation of k-fold cross-validation in order to eventually model a two-dimensional polynomial fit to real terrain data downloaded from USGS EarthExplorer.

To run all test functions, generate data and plots used in the report, please run main_script.sh.

Source structure

  • src/main.py: Main script containing all classes used in this project.
  • src/test_main.py: Contains test functions for main.py. Use pytest to run tests.
  • src/beta_variance_ols_plot.py: Calculates the variance of the regression parameters for OLS, both for Franke and terrain data.Saves plots as .pdf in doc/figs/
  • src/bias_variance_error_Franke.py: Calculates EPE using k-fold cross validation for OLS, Ridge and LASSO on Franke data using different polynomial degrees and hyperparameters. Plots saved as .pdf to doc/figs/
  • src/bias_variance_error_terrain.py: Calculates EPE using k-fold cross validation for OLS, Ridge and LASSO on Terrain data using different polynomial degrees and hyperparameters. Plots saved as .pdf to doc/figs/
  • src/model_plots.py: Creates 3D plots of our bet OLS, Ridge and LASSO models for both datasets. Figures are saved as .pdf to doc/figs/
  • src/r2_scores.py: Calculates R2 scores of our best models for OLS, Ridge and LASSO models for both datasets. Results are printed in the terminal after running.
  • doc/report_1.tex: Main report of the project.
  • main_script.sh: Shell script that automatically runs all necessary python scripts and builds the TeX report using the newly generated figures.

Additional Figures mentioned in report

Unfortunately, Github does not support embedding graphics in pdf format, so we have to link to them instead. The reason we use .pdf is that we want to use vector graphics for figures.

Section 5.1

Section 5.2

About

Regression analysis and resampling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published