# Regression Project Student Solution

© Explore Data Science Academy

---

### Project Overview: Spain Electricity Shortfall Challenge

The government of Spain is considering an expansion of it's renewable energy resource infrastructure investments. As such, they require information on the trends and patterns of the countries renewable sources and fossil fuel energy generation. Your company has been awarded the contract to:

- 1. analyse the supplied data;
- 2. identify potential errors in the data and clean the existing data set;
- 3. determine if additional features can be added to enrich the data set;
- 4. build a model that is capable of forecasting the three hourly demand shortfalls;
- 5. evaluate the accuracy of the best machine learning model;
- 6. determine what features were most important in the model’s prediction decision, and
- 7. explain the inner working of the model to a non-technical audience.

Formally the problem statement was given to you, the senior data scientist, by your manager via email reads as follow:

> In this project you are tasked to model the shortfall between the energy generated by means of fossil fuels and various renewable sources - for the country of Spain. The daily shortfall, which will be referred to as the target variable, will be modelled as a function of various city-specific weather features such as `pressure`, `wind speed`, `humidity`, etc. As with all data science projects, the provided features are rarely adequate predictors of the target variable. As such, you are required to perform feature engineering to ensure that you will be able to accurately model Spain's three hourly shortfalls.
 
On top of this, she has provided you with a starter notebook containing vague explanations of what the main outcomes are. 

<a id="cont"></a>

## Table of Contents

<a href=#zero>I. Problem Statement</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Loading Data</a>

<a href=#three>3. Exploratory Data Analysis (EDA)</a>

<a href=#four>4. Data Engineering</a>

<a href=#five>5. Modeling</a>

<a href=#six>6. Model Performance</a>

<a href=#seven>7. Model Explanations</a>

 <a id="zero"></a>
## I. Problem Statement

To prevent the drastic effect of climate change and ensure sustainability of the global ecosystem, the world is gradually adopting the use of renewable energy. Asides ensuring a steady supply of electricity for a productive livelihood, renewable energy sources has also led to the emergence of new energy markets, enterprises, and job opportunities.

Renewable energy sources accounted for 43% of all electricity generated in Spain in the year 2020. As a result, the government of Spain is considering an expansion of its renewable energy resource infrastructure investments. To do so, they need information on the country's renewable resource and fossil fuel energy generating trends and patterns.

Our team of data scientists have been tasked with creating a model that would help predict the three-hourly load shortfall between the energy generated by means of fossil fuels and various renewable sources in Spain. This information will aid the government in determining how much infrastructure spending should be increased.

[Load Shortfall Image](https://dailytimes.com.pk/assets/uploads/2022/04/29/5ef6cf3f8fe3c.jpg)

![Loadshedding](https://github.com/JayHansea/TEAM-NM2/blob/65985167bb4b2ce180e3217d9b1b5356c9047a4d/Electricity%20Shortfall%20Image.jpg?raw=true)

[Image Source](https://dailytimes.com.pk/927865/pakistanis-suffer-worst-loadshedding-as-electricity-shortfall-reaches-9000mw/)



### II. OBJECTIVES

* Explore and visualize the dataset.
* Clean and engineer the dataset.
* Build several models that predicts the 3 hourly load shortfall.
* Assess the accuracy of the models.
* Choose the best model to make predictions.


### III. FEATURES DESCRIPTION
* **Time**: The date and time of the day when each feature value was recorded
* **Wind_speed**: This is a measure of the wind speed recorded in each city
* **Wind_deg**: This is a measure of the direction of the wind in each city
* **Pressure**: It is the atmospheric pressure measured in each city
* **Rain_1h/Rain_3h**: This is the amount of rain in each city as recorded in hourly or 3 hourly intervals
* **Snow**: The amount of snowfall in each city
* **Cloud_all**: This is a measure of the percentage of cloud coverage in each city


<a href=#cont>Back to Table of Contents</a>

 <a id="one"></a>
## 1. Importing Packages
<a href=#cont>Back to Table of Contents</a>



---
    
| ⚡ Description: Importing Packages ⚡ |
| :--------------------------- |
| First we import, and briefly describe the libraries that will be used throughout our analysis and modelling. |

---