## Wind Farm Power Prediction Notebook  - version 1.0

In this notebook students will learn of how to:

* Log in the [AVEVA Academic Hub](https://academic.osisoft.com)
* Browse [Hub datasets](https://academic.osisoft.com/datasets), specifically the Wind Farms dataset (real industrial data from an Australian operator) 
* Obtain two months of interpolated data for a cluster of 10 turbines  
* Plot a correlation matrix to identify the most relevant variables impacting power generation
* Clean up data in steps prior to modelisation
* Use the well-known scikit-learn ML library ([decision tree regression](https://scikit-learn.org/stable/modules/tree.html#regression)) to build a turbine model
* Call a public web API to get 5-day forecast weather data
* Apply the model against the forecast and get predicted power 


<img src="https://academichub.blob.core.windows.net/images/wind_farms_prediction_intro.png" alt="Power Prediction" width="500">

---

In [1]:
# Install the Academic Hub library and supporting modules if run outside of Binder
import os

if not os.environ.get("BINDER_LAUNCH_HOST"):
    !pip install ocs_academic_hub plotly==5.5.0 sklearn

Collecting ocs_academic_hub
  Using cached ocs_academic_hub-0.99.38-py3-none-any.whl (68 kB)
Collecting plotly==5.5.0
  Using cached plotly-5.5.0-py2.py3-none-any.whl (26.5 MB)
Collecting sklearn
  Downloading sklearn-0.0.tar.gz (1.1 kB)
Collecting backoff
  Downloading backoff-1.11.1-py2.py3-none-any.whl (13 kB)
Collecting ocs-sample-library-hub>=0.1.19
  Downloading ocs_sample_library_hub-0.1.20-py3-none-any.whl (41 kB)
[K     |████████████████████████████████| 41 kB 1.4 MB/s 
[?25hCollecting gql
  Downloading gql-2.0.0-py2.py3-none-any.whl (10 kB)
Collecting typeguard>=2.4.1
  Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Collecting ipywidgets
  Downloading ipywidgets-7.6.5-py2.py3-none-any.whl (121 kB)
[K     |████████████████████████████████| 121 kB 64.1 MB/s 
Collecting promise<3,>=2.3
  Downloading promise-2.3.tar.gz (19 kB)
Collecting graphql-core<3,>=2.3.2
  Downloading graphql_core-2.3.2-py2.py3-none-any.whl (252 kB)
[K     |████████████████████████████████| 252 k

### Import required modules and hub_login

In [2]:
import requests
import json
import pandas as pd
import numpy as np
import datetime
import pickle

import plotly.express as px
import plotly.graph_objects as go

from ocs_academic_hub.datahub import hub_login

### Login to Academic Hub by running the next cell

**Execute the cell below and follow the indicated steps to log in (an AVEVA banner would show up)** 

In [3]:
widget, hub = hub_login()
widget

<IPython.core.display.Javascript object>

VBox(children=(HTML(value='<p><img alt="AVEVA banner" src="https://academichub.blob.core.windows.net/images/av…

### Standard Hub Datasets

Note: Wind Farm dataset is not part of Hub standard datasets. For more info on those: https://academic.osisoft.com/datasets

In [5]:
hub.datasets()

['Brewery',
 'Campus_Energy',
 'Classroom_Data',
 'MIT',
 'Pilot_Plant',
 'USC_Well_Data',
 'Wind_Farms']

### Request dataset information for the lab

In [6]:
hub.refresh_datasets()
print("-- datasets info refreshed --")

-- datasets info refreshed --


### Check that WindFarm dataset is now available

In [7]:
hub.datasets()

['Brewery',
 'Campus_Energy',
 'Classroom_Data',
 'MIT',
 'Pilot_Plant',
 'USC_Well_Data',
 'Wind_Farms']

### Make it the current dataset

In [8]:
hub.set_dataset("Wind_Farms")
hub.current_dataset()

'Wind_Farms'

### OCS namespace where data lives

In [9]:
namespace_id = hub.namespace_of("Wind_Farms")
namespace_id

'academic_hub_01'

### List the assets in dataset

There are 10 wind turbines times 5 cluster (total of 50)

In [10]:
hub.assets()

Unnamed: 0,Asset_Id,Description
0,cluster1.turb1,Turbine
1,cluster1.turb10,Turbine
2,cluster1.turb2,Turbine
3,cluster1.turb3,Turbine
4,cluster1.turb4,Turbine
5,cluster1.turb5,Turbine
6,cluster1.turb6,Turbine
7,cluster1.turb7,Turbine
8,cluster1.turb8,Turbine
9,cluster1.turb9,Turbine


### Assets metadata

Store data about cluster no.4 into dataframe `df_meta` for map plot in next section

In [11]:
df_metadata = hub.all_assets_metadata()
df_meta = df_metadata[df_metadata.Asset_Id.apply(lambda s: s[:8] == "cluster4")]
df_meta

Unnamed: 0,Cluster,ID,Latitude,Longitude,Manufacturer,Model,Asset_Id
1,4,1,-33.294081,138.731074,,,cluster4.turb1
6,4,10,-33.302959,138.718843,,,cluster4.turb10
11,4,2,-33.297077,138.728607,,,cluster4.turb2
16,4,3,-33.296198,138.706698,,,cluster4.turb3
21,4,4,-33.295929,138.712857,,,cluster4.turb4
26,4,5,-33.296198,138.719079,,,cluster4.turb5
31,4,6,-33.298906,138.723414,,,cluster4.turb6
36,4,7,-33.30043,138.727212,,,cluster4.turb7
41,4,8,-33.301506,138.707042,,,cluster4.turb8
46,4,9,-33.300915,138.712792,,,cluster4.turb9


### Map of Wind Turbines using Plotly 

[Plotly](https://plotly.com/python/) is an easy-to-use open source graphing library

In [12]:
fig = px.scatter_mapbox(
    df_meta,
    lat="Latitude",
    lon="Longitude",
    text="Asset_Id",
    zoom=12.0,
    title="Locations of Cluster 4 wind turbines (green dots)",
)
fig.update_traces(marker=dict(size=12, color="green"))
fig.update_layout(mapbox_style="open-street-map")
fig.show()

<details>
    <summary><b>NOTE: the graph above doesn't show correctly on Github , click here to see a screenshot</b></summary>
<a><img alt="Map of cluster no.4" src="https://academichub.blob.core.windows.net/images/wind_farms_cluster4_map.png"></a>
</details>

### Get the list of all single-asset data views

In [13]:
hub.asset_dataviews()

['wind.farms_cluster1.turb1',
 'wind.farms_cluster1.turb10',
 'wind.farms_cluster1.turb2',
 'wind.farms_cluster1.turb3',
 'wind.farms_cluster1.turb4',
 'wind.farms_cluster1.turb5',
 'wind.farms_cluster1.turb6',
 'wind.farms_cluster1.turb7',
 'wind.farms_cluster1.turb8',
 'wind.farms_cluster1.turb9',
 'wind.farms_cluster2.turb1',
 'wind.farms_cluster2.turb10',
 'wind.farms_cluster2.turb2',
 'wind.farms_cluster2.turb3',
 'wind.farms_cluster2.turb4',
 'wind.farms_cluster2.turb5',
 'wind.farms_cluster2.turb6',
 'wind.farms_cluster2.turb7',
 'wind.farms_cluster2.turb8',
 'wind.farms_cluster2.turb9',
 'wind.farms_cluster3.turb1',
 'wind.farms_cluster3.turb10',
 'wind.farms_cluster3.turb2',
 'wind.farms_cluster3.turb3',
 'wind.farms_cluster3.turb4',
 'wind.farms_cluster3.turb5',
 'wind.farms_cluster3.turb6',
 'wind.farms_cluster3.turb7',
 'wind.farms_cluster3.turb8',
 'wind.farms_cluster3.turb9',
 'wind.farms_cluster4.turb1',
 'wind.farms_cluster4.turb10',
 'wind.farms_cluster4.turb2',
 'wind

### Get the list of all multiple-asset data views 

Those data views returns the data of multiple turbines, in the case below all the turbines that belongs to a given cluster. 

In [14]:
dataview_ids = hub.asset_dataviews("", multiple_asset=True)
dataview_cluster4 = dataview_ids[3]  # keep data view for cluster no. 4
print(f"cluster data views=\n  {dataview_ids}")

cluster data views=
  ['wind.farms_cluster1', 'wind.farms_cluster2', 'wind.farms_cluster3', 'wind.farms_cluster4', 'wind.farms_cluster5']


### Verify the structure of the data view

For wind turbine `cluster1.turb1`

In [15]:
hub.dataview_definition(namespace_id, "wind.farms_cluster1.turb1")

Unnamed: 0,Asset_Id,Column_Name,Stream_Type,Stream_UOM,OCS_Stream_Name
4,cluster1.turb1,Ambient Temperature,Float,°C,cluster1.turb1.temp_ambient
5,cluster1.turb1,Drivetrain Gearbox Temp IMSDE,Float,°C,cluster1.turb1.temp_drivetrain_gearbox_IMSDE
6,cluster1.turb1,Drivetrain Gearbox Temp IMSNDE,Float,°C,cluster1.turb1.temp_drivetrain_gearbox_IMSNDE
7,cluster1.turb1,Drivetrain Mainbearing Temp,Float,°C,cluster1.turb1.temp_drivetrain_mainbearing
9,cluster1.turb1,Drivetrain vibration,Float,m/s²,cluster1.turb1.vib_drive_train
8,cluster1.turb1,Nacelle Temp,Float,°C,cluster1.turb1.temp_nacelle
1,cluster1.turb1,Pitch Angle,Float,degrees,cluster1.turb1.pitch_angle
2,cluster1.turb1,Power To Grid,Float,kW,cluster1.turb1.power_to_grid
10,cluster1.turb1,Relative Wind Direction,Float,degrees,cluster1.turb1.wind_direction_relative
3,cluster1.turb1,Rotor Speed,Float,RPM,cluster1.turb1.rotor_rpm


### Request data view result

For 2 months starting on 2019-01-01, interpolated every hour. Method `dataview_interpolated_pd` takes care of gathering multiple pages of data and returning a single Pandas dataframe.  

In [16]:
start_date = "2019-01-01"
end_date = "2019-03-01"
interval = "01:00:00" # format is HH:MM:SS 

df = hub.dataview_interpolated_pd(
    namespace_id, dataview_cluster4, start_date, end_date, interval, count=1000
)
df

++++++++++++++
  ==> Finished 'dataview_interpolated_pd' in       78.6908 secs [ 180 rows/sec ]


Unnamed: 0,Timestamp,Asset_Id,Pitch Angle,Power To Grid,Rotor Speed,Ambient Temperature,Drivetrain Gearbox Temp IMSDE,Drivetrain Gearbox Temp IMSNDE,Drivetrain Mainbearing Temp,Nacelle Temp,Drivetrain vibration,Relative Wind Direction,Wind Speed,Yaw Angle,State
0,2019-01-01 00:00:00,cluster4.turb1,1.012412,185.175430,15.222478,27.000000,67.246573,62.000000,44.000000,36.000000,0.031246,-3.897128,4.575610,85.666200,OK
1,2019-01-01 01:00:00,cluster4.turb1,19.886360,-1.322839,1.323630,30.000000,60.000000,60.000000,43.307395,38.000000,0.017265,19.453730,2.093786,131.083868,OK
2,2019-01-01 02:00:00,cluster4.turb1,19.860564,-1.223571,2.421718,32.000000,57.114333,58.000000,43.000000,38.267785,0.005642,-39.464312,3.072401,134.471288,OK
3,2019-01-01 03:00:00,cluster4.turb1,19.778045,-5.238074,2.983831,32.655864,56.000000,57.000000,42.000000,38.493413,-0.026114,35.715920,3.678366,129.900000,OK
4,2019-01-01 04:00:00,cluster4.turb1,19.966558,-5.412133,2.461961,33.000000,55.000000,56.000000,42.000000,39.445350,-0.025467,4.536506,2.783734,203.775693,OK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14165,2019-02-28 20:00:00,cluster4.turb9,5.662858,2051.192673,15.752764,27.201029,68.000000,58.000000,44.000000,32.000000,-0.043622,8.435903,12.674007,9.535325,OK
14166,2019-02-28 21:00:00,cluster4.turb9,90.021289,-1.204553,0.074401,27.296805,61.000000,59.000000,43.000000,32.000000,-0.031854,-0.192271,9.751356,9.099093,TurbError
14167,2019-02-28 22:00:00,cluster4.turb9,1.131492,2108.136798,15.863010,28.000000,68.000000,59.000000,43.527698,33.000000,-0.027761,11.366595,12.178910,6.211855,OK
14168,2019-02-28 23:00:00,cluster4.turb9,0.027424,1344.904205,15.485261,29.979011,67.000000,57.447326,44.000000,33.650053,-0.051679,-11.040000,9.793801,6.634166,OK


In [17]:
# Structure of dataframe with df.info()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14170 entries, 0 to 14169
Data columns (total 15 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   Timestamp                       14170 non-null  datetime64[ns]
 1   Asset_Id                        14170 non-null  object        
 2   Pitch Angle                     14170 non-null  float64       
 3   Power To Grid                   14170 non-null  float64       
 4   Rotor Speed                     14170 non-null  float64       
 5   Ambient Temperature             14170 non-null  float64       
 6   Drivetrain Gearbox Temp IMSDE   14170 non-null  float64       
 7   Drivetrain Gearbox Temp IMSNDE  14170 non-null  float64       
 8   Drivetrain Mainbearing Temp     14170 non-null  float64       
 9   Nacelle Temp                    14170 non-null  float64       
 10  Drivetrain vibration            14170 non-null  float64       
 11  Re

**Turbine should be in a good state for data to be valid**

In [18]:
df = df[df["State"] == "OK"]
df

Unnamed: 0,Timestamp,Asset_Id,Pitch Angle,Power To Grid,Rotor Speed,Ambient Temperature,Drivetrain Gearbox Temp IMSDE,Drivetrain Gearbox Temp IMSNDE,Drivetrain Mainbearing Temp,Nacelle Temp,Drivetrain vibration,Relative Wind Direction,Wind Speed,Yaw Angle,State
0,2019-01-01 00:00:00,cluster4.turb1,1.012412,185.175430,15.222478,27.000000,67.246573,62.000000,44.000000,36.000000,0.031246,-3.897128,4.575610,85.666200,OK
1,2019-01-01 01:00:00,cluster4.turb1,19.886360,-1.322839,1.323630,30.000000,60.000000,60.000000,43.307395,38.000000,0.017265,19.453730,2.093786,131.083868,OK
2,2019-01-01 02:00:00,cluster4.turb1,19.860564,-1.223571,2.421718,32.000000,57.114333,58.000000,43.000000,38.267785,0.005642,-39.464312,3.072401,134.471288,OK
3,2019-01-01 03:00:00,cluster4.turb1,19.778045,-5.238074,2.983831,32.655864,56.000000,57.000000,42.000000,38.493413,-0.026114,35.715920,3.678366,129.900000,OK
4,2019-01-01 04:00:00,cluster4.turb1,19.966558,-5.412133,2.461961,33.000000,55.000000,56.000000,42.000000,39.445350,-0.025467,4.536506,2.783734,203.775693,OK
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14164,2019-02-28 19:00:00,cluster4.turb9,4.156665,2072.782247,15.728874,28.000000,68.000000,59.000000,44.000000,32.772618,-0.043262,-7.822293,12.153587,10.610644,OK
14165,2019-02-28 20:00:00,cluster4.turb9,5.662858,2051.192673,15.752764,27.201029,68.000000,58.000000,44.000000,32.000000,-0.043622,8.435903,12.674007,9.535325,OK
14167,2019-02-28 22:00:00,cluster4.turb9,1.131492,2108.136798,15.863010,28.000000,68.000000,59.000000,43.527698,33.000000,-0.027761,11.366595,12.178910,6.211855,OK
14168,2019-02-28 23:00:00,cluster4.turb9,0.027424,1344.904205,15.485261,29.979011,67.000000,57.447326,44.000000,33.650053,-0.051679,-11.040000,9.793801,6.634166,OK


In [19]:
# Compute the correlation between Power To Grid and the rest of the variables, then plot

px.imshow(df.corr(),  color_continuous_scale="RdYlGn", height=600)  

**According to the plot above, Wind Speed is the most correlated variable to Power To Grid** 

In [20]:
# Plotting Active Power versus Wind Speed

def plot_this(df, title="Power To Grid vs Wind Speed"):
    return px.scatter(
        df, x="Wind Speed", y="Power To Grid", 
        labels=dict(x="Wind Speed (m/s)", y="Power To Grid (kW)"), 
        title=title,
        hover_data=["Asset_Id", "Timestamp"],
        color="Asset_Id",
        range_x=[0,20],
        range_y=[-50, 2500]
    )
plot_this(df)

**The plot above shows suspicious data: there are 2 straight lines (light blue and light green) in the middle. It's possible to see them in isolation by double-clicking on "cluster4.turb5" (or "cluster4.turb7) in the legend above (screenshot below). This data is produced by the interpolation between the last time the turbine was active (before 2019-01-01) and the next time it sent data (after 2019-03-01). It is better to discard those 2 turbines before building the model since they do not correspond to a normal operating mode.**

<img src="https://academichub.blob.core.windows.net/images/wind_farms_cluster4_turb5_data.png" alt="Turbine 5 data" width="800">


In [21]:
# Filter out Turbine 5 and 7 (straight lines in previous plot) and rows with missing data (if any)
#
def plot_or_df(dfp): plot_this(dfp).show() if True else dfp

df_Filter = df[~((df.Asset_Id == "cluster4.turb5") | (df.Asset_Id == "cluster4.turb7"))].dropna()
plot_or_df(df_Filter)

In [22]:
# Filter out negative & excessive Active Power Values
filterNegativeActivePower = df_Filter["Power To Grid"] >= 0
df_Filter = df_Filter[filterNegativeActivePower]
plot_or_df(df_Filter)

In [23]:
# Remove the rows where we have a high wind speed and low active power in order to keep only the normal operating conditions
filterOutLowPowerHighWindSpeedData = ~(
    (df_Filter["Wind Speed"] > 10) & (df_Filter["Power To Grid"] < 600)
)
df_Filter = df_Filter[filterOutLowPowerHighWindSpeedData]
plot_or_df(df_Filter)

In [24]:
# Filter out high Wind Speeds (> 13 m/s) that do not change the Active Power results
filterOutHighWind = df_Filter["Wind Speed"] < 13
df_Filter = df_Filter[filterOutHighWind]

# Plotting Active Power versus Wind Speed - filtered data frame representing Normal Operating Conditions
#
plot_this(df_Filter, title="Power To Grid vs Wind Speed - filtered data, normal operating conditions")

In [25]:
# Prepare the training & testing/scoring data sets, and split them randomly
from sklearn.model_selection import train_test_split

# define the target variable to be predicted
y = df_Filter["Power To Grid"].values
# split the dataset randomly into test and train sets
X_train, X_test, y_train, y_test = train_test_split(
    df_Filter[["Ambient Temperature", "Wind Speed"]].values,
    y,
    test_size=0.25,
    random_state=42,
)
print("-- training and testing set prepared --")

-- training and testing set prepared --


In [26]:
# Use the Decision Tree Regression Machine Learning model from scikit-learn
from sklearn.tree import DecisionTreeRegressor

regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X_train, y_train)
regr_2.fit(X_train, y_train)

# Predict
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
print("-- regression and prediction completed --")

-- regression and prediction completed --


In [27]:
# Plot the results

fig = go.Figure(
    layout=dict(
        title="Decision Tree Regression", 
        xaxis=dict(title="Wind Speed (m/s)", range=[0,20]),
        yaxis=dict(title="Power to Grid (kW)")
    )
)
fig.add_trace(go.Scatter(
    x=X_train[:, 1], 
    y=y_train, 
    name='data',
    marker=dict(size=8, line=dict(width=2, color='DarkSlateGrey'), color='darkorange'),
    mode='markers'
))
fig.add_trace(go.Scatter(x=X_test[:, 1], y=y_1, name='max_depth=2', marker=dict(color="cornflowerblue")))
fig.add_trace(go.Scatter(x=X_test[:, 1], y=y_2, name='max_depth=5', marker=dict(color="yellowgreen")))
fig.show()

In [28]:
# save the machine learning model to disk (max_depth=5)

model_filename = "WT_ActivePower_model.sav"
pickle.dump(regr_2, open(model_filename, "wb"))
print("-- model saved --")

-- model saved --


In [29]:
# Test the model with the scoring/testing data set
loaded_model = pickle.load(open(model_filename, "rb"))
result = loaded_model.score(X_test, y_test)
# print the model score
print(result)

0.9468636838360438


In [30]:
# Sample prediction
# define input
new_input = [[20, 9.6]]  # Temp= 20C, Wind Speed = 9.6 m/s
# get prediction for new input
new_output = regr_2.predict(new_input)
print(new_output)

[1318.64284934]


In [31]:
# Call the OpenWeather API to retrieve the forecasted air temperature and wind speed
# for Jamestown, Australia for the next 5 days
# City code information: http://bulk.openweathermap.org/sample/
#

url = "https://api.openweathermap.org/data/2.5/forecast?q=Jamestown,AU,2069194&units=metric&APPID=5dac981ce33f41f61d8d1ea06ee89798"
weather_forecast = requests.get(url)

In [32]:
# Display first 2 results
weather_forecast.json()["list"][:2]

[{'dt': 1642820400,
  'main': {'temp': 23.23,
   'feels_like': 23.15,
   'temp_min': 23.23,
   'temp_max': 23.89,
   'pressure': 1014,
   'sea_level': 1014,
   'grnd_level': 962,
   'humidity': 59,
   'temp_kf': -0.66},
  'weather': [{'id': 500,
    'main': 'Rain',
    'description': 'light rain',
    'icon': '10d'}],
  'clouds': {'all': 100},
  'wind': {'speed': 8.62, 'deg': 41, 'gust': 13.81},
  'visibility': 10000,
  'pop': 0.61,
  'rain': {'3h': 0.37},
  'sys': {'pod': 'd'},
  'dt_txt': '2022-01-22 03:00:00'},
 {'dt': 1642831200,
  'main': {'temp': 21.75,
   'feels_like': 21.89,
   'temp_min': 21.17,
   'temp_max': 21.75,
   'pressure': 1013,
   'sea_level': 1013,
   'grnd_level': 961,
   'humidity': 73,
   'temp_kf': 0.58},
  'weather': [{'id': 500,
    'main': 'Rain',
    'description': 'light rain',
    'icon': '10d'}],
  'clouds': {'all': 97},
  'wind': {'speed': 7.55, 'deg': 34, 'gust': 13.53},
  'visibility': 10000,
  'pop': 0.9,
  'rain': {'3h': 2.11},
  'sys': {'pod': 'd'},

In [33]:
# Store the forecasted air temperature, wind speed and timestamp from the API json response 
# in a pandas DataFrame

values = weather_forecast.json()["list"]
timestamps = np.array([datetime.datetime.strptime(v["dt_txt"], "%Y-%m-%d %H:%M:%S") for v in values])

df_weather_forecast = pd.DataFrame(
    {
        "Timestamp": timestamps,
        "Temp (C)": np.array([v["main"]["temp"] for v in values]),
        "Wind Speed (m/s)": np.array([v["wind"]["speed"] for v in values]),
    }
)

df_weather_forecast.head()  # first 5 results

Unnamed: 0,Timestamp,Temp (C),Wind Speed (m/s)
0,2022-01-22 03:00:00,23.23,8.62
1,2022-01-22 06:00:00,21.75,7.55
2,2022-01-22 09:00:00,21.86,7.29
3,2022-01-22 12:00:00,20.94,6.77
4,2022-01-22 15:00:00,20.74,5.08


In [34]:
# Use the machine learning model developed previously to predict the Cluster Active Power
# and add the values to the existing Data Frame

loaded_model = pickle.load(open(model_filename, "rb"))

predicted_power = []

for _, row in df_weather_forecast.iterrows():
    new_input = [[row["Temp (C)"], row["Wind Speed (m/s)"]]]
    result = loaded_model.predict(new_input)
    predicted_power.append(10 * result / 1000.0)  # 10 turbines in the cluster, in MW

df_power_forecast = df_weather_forecast
cluster_power_col = "Cluster Predicted Power (MW)"
df_power_forecast[cluster_power_col] = pd.DataFrame(predicted_power)

df_power_forecast.head()  # first 5 results

Unnamed: 0,Timestamp,Temp (C),Wind Speed (m/s),Cluster Predicted Power (MW)
0,2022-01-22 03:00:00,23.23,8.62,10.921792
1,2022-01-22 06:00:00,21.75,7.55,7.783458
2,2022-01-22 09:00:00,21.86,7.29,6.587185
3,2022-01-22 12:00:00,20.94,6.77,5.413313
4,2022-01-22 15:00:00,20.74,5.08,2.17398


In [35]:
# Plot cluster predicted power over time
px.bar(
    df_power_forecast, 
    x="Timestamp", 
    y=cluster_power_col, 
    text=cluster_power_col,
    text_auto=".2s"
)

In [36]:
# 3D-plot of predicted power to grid according to model and weather predictions

px.scatter_3d(
    df_power_forecast,
    x="Temp (C)",
    y="Wind Speed (m/s)",
    z=cluster_power_col,
    size=cluster_power_col,
    color=cluster_power_col,
    hover_data=["Timestamp"],
    log_x=False,
    size_max=100,
    range_x=[0, 90],
    range_y=[0, 12],
    height=600
)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=2e67e941-33e4-4e9d-ae41-35f87c992cab' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>