Downloading data from public drive link

In [6]:
!gdown 1EA2FaU5Qwo9z7kJB0J4tMhIkM2DU8MXk
!gdown 1i9TX8Y3cTBhl-74aEnbtgJm14BkP9AJI

Downloading...
From: https://drive.google.com/uc?id=1EA2FaU5Qwo9z7kJB0J4tMhIkM2DU8MXk
To: /content/air_properties_data.xlsx
100% 386k/386k [00:00<00:00, 19.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1i9TX8Y3cTBhl-74aEnbtgJm14BkP9AJI
To: /content/power_data.xlsx
100% 366k/366k [00:00<00:00, 20.9MB/s]


In [7]:
import pandas as pd

# Load the data
air_properties_data = pd.read_excel("air_properties_data.xlsx")
power_data = pd.read_excel("power_data.xlsx")

# Merge datasets based on timestamp
data = pd.merge(power_data, air_properties_data, on="Timestamp")
data.rename(columns = {'power':'Power', 'WindSpeed':'Wind Speed', 'Unnamed: 0_y':'Air Density' }, inplace = True)
# Handle missing values
data = data.dropna()

# Normalize features
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data[['Power', 'Air Density', 'Wind Speed']])

# Convert to DataFrame
data_scaled_df = pd.DataFrame(data_scaled, columns=['Power', 'Air Density', 'Wind Speed'])
data_scaled_df['Timestamp'] = data['Timestamp']


print(data_scaled_df.head())

      Power  Air Density  Wind Speed           Timestamp
0  0.378601     0.000000    0.369528 2022-01-01 01:00:00
1  0.433128     0.000071    0.437079 2022-01-01 02:00:00
2  0.624486     0.000141    0.450589 2022-01-01 03:00:00
3  0.662551     0.000212    0.338005 2022-01-01 04:00:00
4  0.339506     0.000282    0.333501 2022-01-01 05:00:00


Feature Engineering:

In [8]:
data_scaled_df['Hour'] = data_scaled_df['Timestamp'].dt.hour
data_scaled_df['DayOfWeek'] = data_scaled_df['Timestamp'].dt.dayofweek
data_scaled_df['Month'] = data_scaled_df['Timestamp'].dt.month
print(data_scaled_df.head())


      Power  Air Density  Wind Speed           Timestamp  Hour  DayOfWeek  \
0  0.378601     0.000000    0.369528 2022-01-01 01:00:00     1          5   
1  0.433128     0.000071    0.437079 2022-01-01 02:00:00     2          5   
2  0.624486     0.000141    0.450589 2022-01-01 03:00:00     3          5   
3  0.662551     0.000212    0.338005 2022-01-01 04:00:00     4          5   
4  0.339506     0.000282    0.333501 2022-01-01 05:00:00     5          5   

   Month  
0      1  
1      1  
2      1  
3      1  
4      1  


Model Selection and Training:

In [9]:
import numpy as np
filtered_df = data_scaled_df[data_scaled_df['Timestamp'].dt.date.isin([pd.Timestamp('2023-01-15').date()])]
filtered_df = filtered_df.drop(columns=['Timestamp'])
special_date = np.array(filtered_df)
special_date_label = np.array(filtered_df.iloc[23, 0])


In [10]:
data_scaled_df = data_scaled_df.drop(columns=['Timestamp'])

In [11]:
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split



train_data, test_data = train_test_split(data_scaled_df, test_size=0.2, shuffle=False)

# Prepare input sequences and target values
sequence_length = 24  # Number of previous hours to consider
X_train, y_train = [], []
X_test, y_test = [], []

for i in range(sequence_length, len(train_data)):
    X_train.append(train_data.iloc[i - sequence_length:i])
    y_train.append(train_data.iloc[i, 0])  # Power column

for i in range(sequence_length, len(test_data)):
    X_test.append(test_data.iloc[i - sequence_length:i])
    y_test.append(test_data.iloc[i, 0])  # Power column

# Convert to arrays
X_train = np.array(X_train, np.float32)
y_train = np.array(y_train, np.float32)
X_test = np.array(X_test, np.float32)
y_test = np.array(y_test, np.float32)

print(X_train.shape)


(11296, 24, 6)


In [20]:

model = tf.keras.Sequential([
    tf.keras.layers.LSTM(64, input_shape=(X_train.shape[1], X_train.shape[2])),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), loss='mean_squared_error')

# Callbacks
cb_e = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss', patience=4)
cb_r = tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss', factor=0.1)

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test), callbacks=[cb_e, cb_r])

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50


<keras.callbacks.History at 0x7e5615cc1810>

In [21]:
print(np.expand_dims(special_date, 0).shape)
y_pred = model.predict(np.expand_dims(special_date, 0))
print('Power consumption of 2023-01-15  is {:.2f}, predicted value is {:.2f}'.format(special_date_label, y_pred[0][0]))

(1, 24, 6)
Power consumption of 2023-01-15  is 0.96, predicted value is 0.86


**Communication and Interpretation: Imagine you have a manager who doesn’t know much about
data. How would you describe and summarize your Project birefly without missing any steps?**

Hello! I'd like to share a summary of the Wind Energy Power Production Prediction project I've been working on. Our goal was to build a model that can forecast the amount of power a wind plant will produce in the future. This is crucial for effective planning and decision-making in the wind energy field.

First, I collected data on important factors affecting power production, such as air density and wind speed. I combined this data with historical power production records to create a comprehensive dataset. After cleaning and organizing the data, I identified key patterns and trends.

To make accurate predictions, I developed a special 'brain' for our model called a Recurrent Neural Network (RNN). This type of model is great for handling sequences of data, which is exactly what we have with hourly wind and air property measurements. RNNs excel at capturing dependencies over time, allowing us to predict power production accurately.

To ensure our model works well, I divided the data into training and testing sets. I trained the RNN on the training data to help it learn the relationships between air properties, wind speed, and power production. Then, I tested the model on the testing data to measure how well it predicts power production values it hasn't seen before.

For evaluation, I used a metric called Mean Squared Error (MSE) to quantify how close our predictions were to the actual power production values. Lower MSE values indicate better predictions.

In the end, our model demonstrated promising results, especially for short-term forecasts. However, we encountered some challenges when making predictions far into the future due to limited historical data. If given more time, I would explore ways to gather more historical data to improve long-term predictions.

Overall, this project highlights the potential of data-driven approaches in optimizing wind energy production planning. By accurately forecasting power production, we contribute to maximizing the efficiency of renewable energy resources and supporting sustainable energy practices.