ML Power Predictor

Table of Contents

Power Predictor
Introduction
Usage
Installation
- Prerequisites
- Setup
Tests
License

Introduction

This is an machine learning program made for the subject TDT4173 Machine learning. The task was to find the how much solar power measured in Photovoltaic (PV) systems, which convert sunlight into electricity. This dataset provided data for evaluating solar production dayahead forecasting methods. The data provider is ANEO. With information about all weather features here. These Data was all collected in Trondheim. The data was collected from 2019-01-01 to 2023-07-31. The data was collected every 15 minutes. The data was collected from 3 different locations. These locations were not equal. The power output of location A was 6 times larger then B and C. Location A also had solar panals that were differently angled then B and C. Making it much more trick to make a model learn. There was also much noise in the data, with outages and times during the night with zero sunlight that there was reported solar production, and also times at day were there was no power production due to external factors.

Usage

After cloning the project, Look at the final submission folder to see the feature engineering and the model training. The final model is saved in the models folder. To use the model, run the following code in the root directory of the project.

The task was setup so we fight Machine learning algorithms that the professor Ruslan Khalitov have made.

NB: Press the image to see the video of Goslightning Talking to Students at start of semester.

Our journey

This has been a great task and we have learned a lot. We have learned how to use machine learning to solve a real world problem. We have tried so many things, worked so many late nights and had a lot of fun, and many frustrations. In the end we managed to beat all the bots, and we are very proud of our work. This giving us the best grade possible: A The two bots that where the hardest to beat was Ryleena and Shao-RyKhan. This seems a bit strange as Goslightning is the best bot in the entire tournament, but the reason for this was that most of the project we had tried to to predictions on Kaggle (where we were graded) with bugs in the way we got the test data. This made us think that the bots were better than they actually were. We worked so hard on data that was flawed. It is very impressive that we climbed so high with several flaws in our test data. After fixing that we defeated Ryleena.

We learned that simpler models better models, as we had models so complex that they required to be ran for more then 24 hours before completion. We also learned that the data is the most important part of the project. We spent so much time on feature engineering and data cleaning. We also learned that it is important to have a good workflow, and that it is important to have a good structure of the project. We learned that cloud computing is very powerful and quite easy to setup.

We have beaten the following bots:

The Gosborg 2049 VT was random guessing between 0 and max pv measurement.

The Kenshi VT was using Linear Regression, with no feature engineering or other preprocessing.

Quan Gos Chill was Average for each location at the specified hour.

Gospion was using Random Forest with minial feature engineering.

Frostling was using an AutoML solution using H2O, the VT had some feature engineering and random split.

Frostling used CatBoost with good feature engineering and good hyperparams.

La La Lizard was the avereage of two teaching assistans models

Keno used a single LightGBM with with change target and extensive hyperparameters search. It used one model for all 3 locations.

Shao TyKhan was made by using the best teaching assistants models, then averageing 10 different CatBoost models, having great hyper parameters and good feature engineering. But different to the other Virtual Teams was that it used one model for each location.

Goslightning was the best model that the professor made. This model had extended time to be finished. It used Geometric mean of 10 models from the best teaching assistanst, 1 model averaged from other teaching assistanst solutions, 2 LightGBM models with finetuning from the professor. This was the hardest bot in the compotition.

Installation

To install the Power Predictor, one needs to have all the prerequisites installed and set up, and follow the setup guild. The following sections will guide you through the process.

Prerequisites

Ensure Python 3.9 or newer is installed on your machine. Download Python
Jupyter Notebook

Clone the repository

git clone https://github.com/SverreNystad/power-predictor.git
cd power-predictor

Virtual Environment (Recommended)

🚀 A better way to set up repositories

A virtual environment in Python is a self-contained directory that contains a Python installation for a particular version of Python, plus a number of additional packages. Using a virtual environment for your project ensures that the project's dependencies are isolated from the system-wide Python and other Python projects. This is especially useful when working on multiple projects with differing dependencies, as it prevents potential conflicts between packages and allows for easy management of requirements.

To set up and use a virtual environment for Power Predictor: First, install the virtualenv package using pip. This tool helps create isolated Python environments.
```
pip install virtualenv
```
Create virtual environment Next, create a new virtual environment in the project directory. This environment is a directory containing a complete Python environment (interpreter and other necessary files).
```
python -m venv venv
```
Activate virtual environment To activate the environment, run the following command:
- For Windows
```
source ./venv/Scripts/activate
```
- For Linux / MacOS:
```
source venv/bin/activate
```

Install dependencies

With the virtual environment activated, install the project dependencies:

pip install -r requirements.txt

The requirements.txt file contains a list of packages necessary to run Power Predictor. Installing them in an activated virtual environment ensures they are available to the project without affecting other Python projects or system settings.

Tests

To run all the tests, run the following command in the root directory of the project:

pytest

License

Licensed under the MIT License. Because sharing is caring

Folder Structure

Data: All data used for the project.

data/raw: Original, immutable data dump.
data/processed: Cleaned and pre-processed data used for modeling.
data/interim: Intermediate data that has been transformed.

Results: Figures and solutions

results/figures: Generated analysis as HTML, PNG, PDF, LaTeX, etc.
results/output: Contains different solutions generated by models.

src: Source code for use in this project.

src/data: Scripts to download or generate data. From Data/raw or Data/processed to object that can be worked with.
src/features: Scripts to turn raw data into features for modeling.
src/models: Scripts to train models and then use trained models to make predictions.
src/visualization: Scripts to create exploratory and results oriented visualizations.

tests: Unit tests for the project source code.

final_submission: Contains the two attempts Short_notebook_1 and Short_notebook_2 that has our two allowed attempts at the private leaderboard.

Contributors

Three brave students that applied their knowledge of Machine Learning to beat the bots.

_{Gunnar Nystad}

_{Peter Skoland}

_{Sverre Nystad}

Thanks to

Ruslan Khalitov for the task and the bots. This task has been amazing and we have learned a lot.
Thanks to the group members for the great work and the good collaboration.
Thanks to our amazing Professor Zhirong Yang for great lectures.

Name		Name	Last commit message	Last commit date
Latest commit History 421 Commits
.github		.github
data		data
docs		docs
final_submission		final_submission
results		results
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Autogluon final submission.ipynb		Autogluon final submission.ipynb
LICENSE		LICENSE
README.md		README.md
Short notebook 2.ipynb		Short notebook 2.ipynb
Short_notebook_1.ipynb		Short_notebook_1.ipynb
auto_ml.ipynb		auto_ml.ipynb
autogluon.ipynb		autogluon.ipynb
catboost 3 modeller.ipynb		catboost 3 modeller.ipynb
catboost logcosh 3 modeller.ipynb		catboost logcosh 3 modeller.ipynb
catboost.ipynb		catboost.ipynb
delivever_to_kaggle.py		delivever_to_kaggle.py
exploratory_data_analysis.ipynb		exploratory_data_analysis.ipynb
exploratory_feature_engineering.ipynb		exploratory_feature_engineering.ipynb
exploratory_pca_cca.ipynb		exploratory_pca_cca.ipynb
gradient_boosting.ipynb		gradient_boosting.ipynb
gunnar_autogluon.ipynb		gunnar_autogluon.ipynb
gunnar_tremods.ipynb		gunnar_tremods.ipynb
lightgbm.ipynb		lightgbm.ipynb
linear_regression.ipynb		linear_regression.ipynb
long_notebook.ipynb		long_notebook.ipynb
lstm.ipynb		lstm.ipynb
missing_data.ipynb		missing_data.ipynb
one_model_per_location.ipynb		one_model_per_location.ipynb
optuna_catboost_1modell.ipynb		optuna_catboost_1modell.ipynb
prophet_model.ipynb		prophet_model.ipynb
random_forest.ipynb		random_forest.ipynb
requirements.txt		requirements.txt
sklearn stack.ipynb		sklearn stack.ipynb
stacking.ipynb		stacking.ipynb
weighted_averaging.ipynb		weighted_averaging.ipynb
xgboost.ipynb		xgboost.ipynb

License

SverreNystad/power-predictor

Folders and files

Latest commit

History

Repository files navigation

ML Power Predictor

Introduction

Usage

Our journey

Installation

Prerequisites

Clone the repository

Virtual Environment (Recommended)

Install dependencies

Tests

License

Folder Structure

Data: All data used for the project.

Results: Figures and solutions

src: Source code for use in this project.

tests: Unit tests for the project source code.

final_submission: Contains the two attempts Short_notebook_1 and Short_notebook_2 that has our two allowed attempts at the private leaderboard.

Contributors

Thanks to

About

Topics

Resources

License

Stars

Watchers

Forks

Languages