Skip to content

dominodatalab/reference-project-wind-turbine

Repository files navigation

Wind Turbine Output Prediction using SCADA data

License

This template is licensed under Apache 2.0 and contains the following open source components:

Context

In this project we train a predictive model on Supervisory Control and Data Acquisition (SCADA) data captured from a physical wind turbine. SCADA systems are used for controlling, monitoring, and analyzing industrial devices and processes. The SCADA concept was developed to be a universal means of remote-access to a variety of local control modules, which could be from different manufacturers and allowing access through standard automation protocols.

Here we demonstrate how we can train a machine learning model using a freely available SCADA dataset, which comes from Kaggle

Dataset

The samples in this dataset are distributed as a .CSV file with the following attributes:

  • Date/Time --- timestamp of the observation (10 minutes intervals)
  • LV ActivePower (kW) --- The amount of power generated by the turbine at that timestamp (in kWh)
  • Wind Speed (m/s) --- The wind speed as measured at the hub height of the turbine
  • Theoretical_Power_Curve (KWh) --- The theoretical power values that the turbine generates with that wind speed as provided by the turbine manufacturer
  • Wind Direction (degrees) --- The wind direction at the hub height of the turbine (the turbine turns in this direction automaticaly)

Assets

This project contains the following assets

  • WindTurbineScada.ipynb --- a notebok demonstrating data ingestion, exploratory data analysis, model building and evaluation
  • train.py --- a model training script, which can be run as a Domino job to retrain the model (i.e. if new data is available)
  • score.py --- a scoring function, which can be deployed as a Domino Model API
  • model.bin --- a pickled version of a pre-trained ExtraTreesRegressor model
  • data/T1.csv --- the original dataset

Hardware Requirements

This project works with a standard small-sized hardware tier, such as the small-k8s tier on all Domino deployments.

Environment Requirements

This project can be run with a Domino Standard Compute Environment that has Python 3.9 or above.