Skip to content

End-to-end data pipeline tool to predict fuel prices using IESO data

Notifications You must be signed in to change notification settings

bhavyaverma1/IESO-Price-Forecasting

Repository files navigation

IESO Data Pipeline Project for Price Forecasting

This project creates an end-to-end data pipeline to fetch data from various reports, store it in a Google Cloud Platform (GCP) database, build a dashboard, and develop a machine learning model for price forecasting.

Data Extraction

We use Python, along with libraries such as pandas and BeautifulSoup, to scrape data from various report links. The scraped data is stored in dataframes and then loaded into Google Cloud Storage buckets. This data is then transferred to BigQuery tables for efficient processing. The data extraction process is automated with a Cronjob/Google Cloud Scheduler.

Forecasting Using Machine Learning

We build and run various machine learning models in GCP’s BigQuery to predict future fuel/energy prices. We tested LSTM univariate/multivariate, GRU for time series problems, and ANN Regressor, Random Forests regression for regression problems. The ANN regression model provided the best results for our use case.

Data Visualization Report

After modeling, we generate a data visualization report on Google Data Studio for further insights. The report includes a pie chart about the distribution of fuel generated by each fuel type, a stacked column chart about the distribution of fuel generated each month, and a time series visualization of fuel generation during each quarter of the year. lstm_ieso

Results and Insights

  • Mean Average Error (MAE): The ANN regression model achieved a MAE in the range of 7.51 - 12.
  • Look Back: The LSTM/GRU models used a look back of 3, meaning they trained on 3 hours of past data.
  • n_steps_in and n_steps_out: The LSTM/GRU models used n_steps_in of 3 and n_steps_out of 1, meaning they looked at 3 past hours and predicted 1 future hour.
  • nb_epochs: The LSTM/GRU models completed 10 passes of the entire training dataset.
  • Pie Chart: Nuclear fuel generated 60% of the fuel.
  • Stacked Column Chart: Fuel generation was highest in January and August.

Manual Configuration

Instructions for accessing and configuring Google BigQuery, Google Cloud Storage, Google Cloud Functions, and Google Cloud Scheduler are provided in the following sections.

Google Cloud Services Used

About

End-to-end data pipeline tool to predict fuel prices using IESO data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published