# ***Renewable Energy Forecasting  🌱***

> This script is used to develop a predictive model for solar renewable energy production. The goal is to enhance forecasting accuracy and optimize energy resource management.

* ***Source: 🔗*** https://www.kaggle.com/datasets/henriupton/wind-solar-electricity-production

* ***Description:📝*** The dataset contains 59,806 records with key features like Date, Time, Source, and Production, capturing renewable energy production (solar and wind) in France.

* ***Purpose:*** This dataset enables the exploration and prediction of renewable energy output, focusing on production patterns across different energy sources.

## Import Necessary Libraries


In [33]:
# Standard Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
import plotly.graph_objs as go
import plotly.subplots as sp

# ***Data Preperation 🧹***

##  **Load and Prepare Data**



In [34]:
file_path = 'intermittent-renewables-production-france.csv'
data = pd.read_csv(file_path)

### Rename the <code>***Date***</code> column to <code>***DateTime***</code>


In [35]:
data.rename(columns={'Date': 'DateTime'}, inplace=True)

### Convert <code>***DateTime***</code> column to *datetime* type


In [36]:
data['DateTime'] = pd.to_datetime(data['DateTime'], errors='coerce')
print(data['DateTime'].dtype)

datetime64[ns]


### Drop rows with missing values in key columns <code>[***DateTime*** , ***Production***]</code>


In [37]:
data['Production'] = data['Production'].fillna(data['Production'].mean())

### Sort data and filter for solar <code>***Production***</code> from *2020* to *2022*
> We are excluding data from 2023 because it only extends until June. Using incomplete data for the year could disrupt the detection of seasonal patterns, as the full annual cycle is not represented.

In [38]:
data_sorted = data.sort_values('DateTime').reset_index(drop=True)

## Filter data for solar production
data_wind = data[data['Source'] == 'Wind'][['DateTime', 'Production']].copy()
data_solar = data[data['Source'] == 'Solar'][['DateTime', 'Production']].copy()

## Ensure 'DateTime' is set as index for resampling
data_wind.set_index('DateTime', inplace=True)
data_solar.set_index('DateTime', inplace=True)

## Filter data_solar to include only data from 2020 to 2022
data_wind_filtered = data_wind[(data_wind.index >= '2020-01-01') & (data_wind.index < '2023-01-01')]
data_solar_filtered = data_solar[(data_solar.index >= '2020-01-01') & (data_solar.index < '2023-01-01')]


In [39]:
# Apply a 7-day simple moving average (SMA) to smooth the time series for solar production
data_solar_filtered.loc[:, 'Production'] = data_solar_filtered['Production'].rolling(window=7).mean()

# Apply a 7-day simple moving average (SMA) to smooth the time series for wind production
data_wind_filtered.loc[:, 'Production'] = data_wind_filtered['Production'].rolling(window=7).mean();




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [40]:
## Resample to daily totals after filtering
wind_daily = data_wind_filtered['Production'].resample('D').sum()
solar_daily = data_solar_filtered['Production'].resample('D').sum()

### Convert the <code>***solar_daily***</code> Series to a DataFrame


In [41]:
wind_daily = wind_daily.to_frame(name='Wind_Production')
solar_daily = solar_daily.to_frame(name='Solar_Production')

In [42]:
wind_daily.head()

Unnamed: 0_level_0,Wind_Production
DateTime,Unnamed: 1_level_1
2020-01-01,73190.857143
2020-01-02,70373.714286
2020-01-03,120888.285714
2020-01-04,90338.0
2020-01-05,71005.0


In [43]:
solar_daily.head()

Unnamed: 0_level_0,Solar_Production
DateTime,Unnamed: 1_level_1
2020-01-01,9697.428571
2020-01-02,15809.714286
2020-01-03,11146.142857
2020-01-04,8268.0
2020-01-05,17050.428571


## **Stationary Checking**

### **ADF test is a statistical test used to check if a time series is stationary or not.**

* **ADF Statistic:** which shows the level of stationarity. Lower values often indicate a more stationary series.


* **p-value:** which is used to determine if the series is stationary.


> dictionary of **critical values** at different confidence levels (1%, 5%, and 10%). These values give us reference points to compare with the ADF Statistic.


### *Wind Data*

In [44]:
from statsmodels.tsa.stattools import adfuller

adf_test = adfuller(wind_daily['Wind_Production'])
adf_stat = adf_test[0]
p_value = adf_test[1]
critical_values = adf_test[4]

print(f'ADF Statistic: {adf_stat}')
print(f'p-value: {p_value}')
print('Critical Values:', critical_values)

print("--------------------------------------------------------------")
if p_value < 0.05:
    print("The time series is stationary (p < 0.05).")
else:
    print("The time series is non-stationary (p >= 0.05).")


ADF Statistic: -6.735996445820026
p-value: 3.2065288108494255e-09
Critical Values: {'1%': -3.4363635475753824, '5%': -2.864195245967465, '10%': -2.5681837404258903}
--------------------------------------------------------------
The time series is stationary (p < 0.05).


### *Solar Data*

In [45]:

adf_test = adfuller(solar_daily['Solar_Production'])
adf_stat = adf_test[0]
p_value = adf_test[1]
critical_values = adf_test[4]

print(f'ADF Statistic: {adf_stat}')
print(f'p-value: {p_value}')
print('Critical Values:', critical_values)

print("--------------------------------------------------------------")
if p_value < 0.05:
    print("The time series is stationary (p < 0.05).")
else:
    print("The time series is non-stationary (p >= 0.05).")


ADF Statistic: -1.9936399660139763
p-value: 0.28936869640811863
Critical Values: {'1%': -3.436402509014354, '5%': -2.8642124318084456, '10%': -2.568192893555997}
--------------------------------------------------------------
The time series is non-stationary (p >= 0.05).


In [46]:
# Apply differencing since the series is non-stationary
print("Applying first differencing to make the series stationary...")
solar_diff = solar_daily['Solar_Production'].diff().dropna()

# Re-run ADF Test on the differenced series
adf_test_diff = adfuller(solar_diff)
adf_stat_diff = adf_test_diff[0]
p_value_diff = adf_test_diff[1]
critical_values_diff = adf_test_diff[4]

# Print ADF test results for differenced data
print(f'\nADF Statistic (Differenced): {adf_stat_diff}')
print(f'p-value (Differenced): {p_value_diff}')
print('Critical Values (Differenced):', critical_values_diff)
print("--------------------------------------------------------------")

# Check if differenced series is now stationary
if p_value_diff < 0.05:
  print("The differenced time series is now stationary (p < 0.05).")
else:
    print("The differenced time series is still non-stationary (p >= 0.05).")

Applying first differencing to make the series stationary...

ADF Statistic (Differenced): -14.14907881148825
p-value (Differenced): 2.177914951767711e-26
Critical Values (Differenced): {'1%': -3.436402509014354, '5%': -2.8642124318084456, '10%': -2.568192893555997}
--------------------------------------------------------------
The differenced time series is now stationary (p < 0.05).


## **Seasonal Decomposition Analysis**


### *Wind Data*

In [47]:

result = seasonal_decompose(wind_daily['Wind_Production'], model='additive', period=365)

fig = sp.make_subplots(rows=2, cols=1, shared_xaxes=True,
                       subplot_titles=['Trend', 'Seasonal'])

fig.add_trace(go.Scatter(x=wind_daily.index, y=result.trend, mode='lines', name='Trend'), row=1, col=1)
fig.add_trace(go.Scatter(x=wind_daily.index, y=result.seasonal, mode='lines', name='Seasonal'), row=2, col=1)

fig.update_layout(template='plotly_dark', height=800, title='Seasonal Decomposition of Solar Production')
fig.show()

### *Solar Data*

In [48]:

result = seasonal_decompose(solar_daily['Solar_Production'], model='additive', period=365)

fig = sp.make_subplots(rows=4, cols=1, shared_xaxes=True,
                       subplot_titles=['Trend', 'Seasonal'])

fig.add_trace(go.Scatter(x=solar_daily.index, y=result.trend, mode='lines', name='Trend'), row=1, col=1)
fig.add_trace(go.Scatter(x=solar_daily.index, y=result.seasonal, mode='lines', name='Seasonal'), row=2, col=1)

fig.update_layout(template='plotly_dark', height=800, title='Seasonal Decomposition of Solar Production')
fig.show()
