WindTurbineOutputPrediction

This repository contains the Python and R Jupyter notebooks I used to work on H2O's Open Tour NYC Hackathon on July 19 and 20, 2016, and afterwards. See blog post at http://lucdemortier.github.io/articles/17/WindPower for a description of the results.

1_data_preparation.ipynb: Reads hackathon input csv files (for training and testing), creates data frames, and pickles them for Python notebooks or feathers them for R notebooks.
2_exploratory_visuals.ipynb: Generates various plots to explore the data prior to modeling.
3_random_forest_regressor.ipynb: A random forest regression model which models all ten turbines as a single turbine with a "zone id" setting.
4_random_forest_regressor.ipynb: A random forest regression model which separately models each of the ten turbines, using wind velocity measurements from all zones.
5_xgboost_regressor.ipynb: An XGBoost regression model.
6_xgboost_classifier_plus_regressor.ipynb: A combination of an XGBoost classifier and regressor. The classifier predicts which turbine outputs are zero, the regressor predicts the values of the non-zero outputs.
7_gamlss_R.ipynb: A generalized linear model. This notebook runs an R kernel and uses the R package GAMLSS.
8_check_solution.ipynb: Uses csv files with predictions created by the other notebooks to compute the RMSE for the hackathon's public and private leaderboards.
summarynoprint.R and wp_withdata.R are routines from the GAMLSS package that I had to modify slightly for the R notebook.

Problem Statement

Given daily 24-hours-at-a-time wind forecasts, predict the nominal wind turbine output for 10 turbines. The provided data are the turbine number, timestamp of the forecast, and forecasted zonal and meridional wind vectors at 10 meters and 100 meters above ground. The wind data were taken in 2012 and 2013. The training data consist of the first 19 months, and the test set of the following five months (the last month only has ten records). The public leaderboard is based on the first two months of the test dataset (Aug-2013 and Sep-2013), while the rest of the test dataset is used for the private leaderboard.

Note:

The public-private split is based on time period.
The evaluation metric is Root Mean Squared Error (RMSE).

Data:

Variable	Definition
ID	Unique ID of observation
ZONEID	Zone (turbine) ID
TIMESTAMP	Date and time of observation
U10	Zonal wind velocity at 10 m above ground
V10	Meridional wind velocity at 10 m above ground
U100	Zonal wind velocity at 100 m above ground
V100	Meridional wind velocity at 100 m above ground
TARGETVAR	Output of wind turbine, as a fraction of maximum capacity

To learn more about the U and V wind velocity components, click here.

The full data set (including the target variable values for the test subset used for the public and private leaderboards) is available from Dr. Tao Hong's Energy Forecasting website, under "GEFCom2014".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

1_data_preparation.ipynb

1_data_preparation.ipynb

2_exploratory_visuals.ipynb

2_exploratory_visuals.ipynb

3_random_forest_regressor.ipynb

3_random_forest_regressor.ipynb

4_random_forest_regressor.ipynb

4_random_forest_regressor.ipynb

5_xgboost_regressor.ipynb

5_xgboost_regressor.ipynb

6_xgboost_classifier_plus_regressor.ipynb

6_xgboost_classifier_plus_regressor.ipynb

7_gamlss_R.ipynb

7_gamlss_R.ipynb

8_check_solution.ipynb

8_check_solution.ipynb

README.md

README.md

requirements.txt

requirements.txt

summarynoprint.R

summarynoprint.R

wp_withdata.R

wp_withdata.R

Repository files navigation

WindTurbineOutputPrediction

Contents

Problem Statement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
1_data_preparation.ipynb		1_data_preparation.ipynb
2_exploratory_visuals.ipynb		2_exploratory_visuals.ipynb
3_random_forest_regressor.ipynb		3_random_forest_regressor.ipynb
4_random_forest_regressor.ipynb		4_random_forest_regressor.ipynb
5_xgboost_regressor.ipynb		5_xgboost_regressor.ipynb
6_xgboost_classifier_plus_regressor.ipynb		6_xgboost_classifier_plus_regressor.ipynb
7_gamlss_R.ipynb		7_gamlss_R.ipynb
8_check_solution.ipynb		8_check_solution.ipynb
README.md		README.md
requirements.txt		requirements.txt
summarynoprint.R		summarynoprint.R
wp_withdata.R		wp_withdata.R

LucDemortier/WindTurbineOutputPrediction

Folders and files

Latest commit

History

Repository files navigation

WindTurbineOutputPrediction

Contents

Problem Statement

About

Resources

Stars

Watchers

Forks

Languages