# Wind Power Prediction using Genetic Algorithm (GA)

This notebook demonstrates the process of training and evaluating a neural network model for wind power prediction using a Genetic Algorithm optimization approach.

In [1]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Add the parent directory to the Python path
sys.path.append('..')

from src.data import load_data
from src.models import create_ga_model
from src.genetic_algorithm import genetic_algorithm
from src.utils import set_seeds, create_results_directory
from src.visualization import plot_predictions, plot_fitness_history

2024-10-18 02:31:33.751488: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-18 02:31:33.762924: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-18 02:31:33.773756: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-18 02:31:33.776764: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-18 02:31:33.785371: I tensorflow/core/platform/cpu_feature_guar

## 1. Set up the environment

In [2]:
# Set random seeds for reproducibility
set_seeds(42)

# Create a results directory
results_dir = create_results_directory()
print(f"Results will be saved in: {results_dir}")

2024-10-18 02:31:34,648 - INFO - Random seeds set to 42
2024-10-18 02:31:34,649 - INFO - Results directory created: /home/dkat/computational-intelligence/src/../results/20241018_023134


Results will be saved in: /home/dkat/computational-intelligence/src/../results/20241018_023134


## 2. Load and preprocess data

In [3]:
# Load the data
(train_X, train_y), (val_X, val_y), (test_X, test_y), scaler_X, scaler_y = load_data()

print("Data shapes:")
print(f"Train: {train_X.shape}, {train_y.shape}")
print(f"Validation: {val_X.shape}, {val_y.shape}")
print(f"Test: {test_X.shape}, {test_y.shape}")

2024-10-18 02:31:34,654 - INFO - Loading data from ../data/raw/Train.csv
2024-10-18 02:31:34,743 - INFO - Train data shape: (140160, 12)
2024-10-18 02:31:34,746 - INFO - Input columns: ['Temp_2m', 'RelHum_2m', 'DP_2m', 'WS_10m', 'WS_100m', 'WD_10m', 'WD_100m', 'WG_10m']
2024-10-18 02:31:34,746 - INFO - Output column: Power
2024-10-18 02:31:34,759 - INFO - Training set shape: (98112, 9)
2024-10-18 02:31:34,760 - INFO - Validation set shape: (21024, 9)
2024-10-18 02:31:34,760 - INFO - Test set shape: (21024, 9)
2024-10-18 02:31:34,769 - INFO - Data scaling completed


Data shapes:
Train: (98112, 8), (98112, 1)
Validation: (21024, 8), (21024, 1)
Test: (21024, 8), (21024, 1)


## 3. Create the model

In [4]:
# Define model architecture
input_shape = (train_X.shape[1],)
layer_sizes = [256, 128, 64]

# Create the model
model = create_ga_model(input_shape, layer_sizes)
model.summary()

I0000 00:00:1729218694.799645  467458 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729218694.815936  467458 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729218694.815972  467458 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729218694.818923  467458 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:00:1729218694.818951  467458 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
I0000 00:0

## 4. Run the Genetic Algorithm

In [None]:
# Set GA parameters
ga_params = {
    "population_size": 50,
    "generations": 100,
    "tournament_size": 3,
    "mutation_rate": 0.01,
    "mutation_scale": 0.1
}

# Run Genetic Algorithm
best_model, fitness_history, eval_results = genetic_algorithm(
    model,
    (train_X, train_y),
    (val_X, val_y),
    (test_X, test_y),
    scaler_y,
    results_dir=results_dir,
    **ga_params
)

print("\nEvaluation Results:")
print(f"Test MSE: {eval_results['mse']:.4f}")
print(f"Test RMSE: {eval_results['rmse']:.4f}")
print(f"Test R2 Score: {eval_results['r2']:.4f}")

I0000 00:00:1729218697.584940  467587 service.cc:146] XLA service 0x7fa2c0016970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1729218697.584967  467587 service.cc:154]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2024-10-18 02:31:37.587990: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-10-18 02:31:37.605520: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907


Generation 0


I0000 00:00:1729218698.302014  467587 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Best fitness: 1.088452
Generation 1
Best fitness: 1.125772
Generation 2
Best fitness: 1.239722
Generation 3
Best fitness: 1.153017
Generation 4
Best fitness: 1.159202
Generation 5
Best fitness: 1.153718
Generation 6
Best fitness: 1.138746
Generation 7
Best fitness: 1.246677
Generation 8
Best fitness: 1.238113
Generation 9
Best fitness: 1.205135
Generation 10
Best fitness: 1.209342
Generation 11
Best fitness: 1.247210
Generation 12
Best fitness: 1.228393
Generation 13
Best fitness: 1.268898
Generation 14
Best fitness: 1.294535
Generation 15
Best fitness: 1.297778
Generation 16
Best fitness: 1.278307
Generation 17
Best fitness: 1.424801
Generation 18
Best fitness: 1.373575
Generation 19
Best fitness: 1.406388
Generation 20
Best fitness: 1.431060
Generation 21
Best fitness: 1.521067
Generation 22
Best fitness: 1.575243
Generation 23
Best fitness: 1.571181
Generation 24
Best fitness: 1.632362
Generation 25
Best fitness: 1.606331
Generation 26
Best fitness: 1.577475
Generation 27
Best fitne

## 5. Visualize Results

In [None]:
# Plot predictions
plot_predictions(eval_results['actual'], eval_results['predictions'], "GA Model Predictions")

# Plot fitness history
plot_fitness_history(fitness_history)

## 6. Analyze GA Performance

In [None]:
# Plot fitness over generations
plt.figure(figsize=(12, 6))
plt.plot(fitness_history)
plt.title('Fitness History over Generations')
plt.xlabel('Generation')
plt.ylabel('Best Fitness')
plt.show()

# Calculate improvement rate
initial_fitness = fitness_history[0]
final_fitness = fitness_history[-1]
improvement_rate = (final_fitness - initial_fitness) / initial_fitness * 100

print(f"Initial Fitness: {initial_fitness:.6f}")
print(f"Final Fitness: {final_fitness:.6f}")
print(f"Improvement Rate: {improvement_rate:.2f}%")

## 7. Compare with Random Search (Optional)

In [None]:
def random_search(model, train_data, val_data, num_iterations):
    best_fitness = -np.inf
    best_weights = None
    fitness_history = []
    
    for _ in range(num_iterations):
        # Generate random weights
        weights = [np.random.randn(*w.shape) for w in model.get_weights()]
        model.set_weights(weights)
        
        # Evaluate fitness
        y_pred = model.predict(val_data[0], verbose=0)
        mse = mean_squared_error(val_data[1], y_pred)
        fitness = 1 / (mse + 1e-8)
        
        if fitness > best_fitness:
            best_fitness = fitness
            best_weights = weights
        
        fitness_history.append(best_fitness)
    
    model.set_weights(best_weights)
    return model, fitness_history

# Run random search
random_model = create_ga_model(input_shape, layer_sizes)
best_random_model, random_fitness_history = random_search(random_model, (train_X, train_y), (val_X, val_y), ga_params['generations'] * ga_params['population_size'])

# Plot comparison
plt.figure(figsize=(12, 6))
plt.plot(fitness_history, label='Genetic Algorithm')
plt.plot(random_fitness_history, label='Random Search')
plt.title('GA vs Random Search: Fitness History')
plt.xlabel('Iterations')
plt.ylabel('Best Fitness')
plt.legend()
plt.show()

print(f"GA Final Fitness: {fitness_history[-1]:.6f}")
print(f"Random Search Final Fitness: {random_fitness_history[-1]:.6f}")

## 8. Conclusion

In this notebook, we have:
1. Loaded and preprocessed the wind power prediction data.
2. Created a neural network model for optimization with a Genetic Algorithm.
3. Ran the Genetic Algorithm to find optimal weights for the model.
4. Evaluated the model's performance on the test set.
5. Visualized the predictions and analyzed the fitness history.
6. Compared the GA performance with a random search baseline (optional).

The GA-optimized model achieved an RMSE of {eval_results['rmse']:.4f} and an R2 score of {eval_results['r2']:.4f} on the test set.

Further improvements could potentially be made by:
- Experimenting with different GA parameters (population size, mutation rate, etc.)
- Trying different model architectures
- Implementing more advanced GA techniques (e.g., adaptive mutation rates, different selection methods)
- Combining GA with local search techniques for fine-tuning
- Incorporating domain-specific knowledge into the fitness function or model structure