# Hybrid Renewable Energy Prediction and Optimization System

This notebook generates a synthetic hybrid (solar + wind) dataset, trains two regression models (Random Forest and Gradient Boosting), evaluates them, and saves results (CSV, plots, and model files). It is written to be clear and presentation-ready.

## 1. Import libraries

Run the cell below to import required libraries.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import joblib
import os
print('libraries imported')

## 2. Dataset generation

We create a realistic synthetic hourly dataset with weather features and computed solar, wind, and total outputs. The dataset will be saved as `data/hybrid_energy_dataset.csv`. Run to generate.

In [None]:
np.random.seed(0)
n = 1000
dates = pd.date_range('2025-01-01', periods=n, freq='H')
temperature = np.random.uniform(0, 45, n)
humidity = np.random.uniform(10, 100, n)
solar_irradiance = np.random.uniform(0, 1100, n)
wind_speed = np.random.uniform(0, 20, n)
cloud_cover = np.random.uniform(0, 1, n)
pressure = np.random.uniform(980, 1035, n)

solar_efficiency = 0.18 * (1 - 0.005*(temperature - 25)) * (1 - 0.4*cloud_cover)
solar_output = solar_irradiance * solar_efficiency * np.random.uniform(0.9, 1.05, n) / 1000

air_density = 1.225 * (1 - 0.0036*(temperature - 15))
wind_output = 0.5 * air_density * (wind_speed**3) * 0.0001 * np.random.uniform(0.85, 1.1, n)

total_output = solar_output + wind_output
total_output = np.maximum(total_output, 0)

data = pd.DataFrame({
    'Datetime': dates,
    'Temperature_C': np.round(temperature,2),
    'Humidity_%': np.round(humidity,2),
    'Solar_Irradiance_W_m2': np.round(solar_irradiance,2),
    'Wind_Speed_m_s': np.round(wind_speed,2),
    'Cloud_Cover': np.round(cloud_cover,3),
    'Pressure_hPa': np.round(pressure,2),
    'Solar_Output_kW': np.round(solar_output,4),
    'Wind_Output_kW': np.round(wind_output,4),
    'Total_Output_kW': np.round(total_output,4)
})

os.makedirs('data', exist_ok=True)
data.to_csv('data/hybrid_energy_dataset.csv', index=False)
print('dataset saved to data/hybrid_energy_dataset.csv')

## 3. Quick exploratory overview

Display the first rows and basic statistics.

In [None]:
data.head()

In [None]:
data.describe()

## 4. Train/Test split and feature selection

We use weather features to predict `Total_Output_kW`. Split data into train and test sets.

In [None]:
features = ['Temperature_C','Humidity_%','Solar_Irradiance_W_m2','Wind_Speed_m_s','Cloud_Cover','Pressure_hPa']
X = data[features]
y = data['Total_Output_kW']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('split done: train size', X_train.shape[0], 'test size', X_test.shape[0])

## 5. Model training

Train Random Forest and Gradient Boosting regressors.

In [None]:
rf = RandomForestRegressor(n_estimators=150, random_state=42)
gb = GradientBoostingRegressor(n_estimators=200, learning_rate=0.1, random_state=42)
rf.fit(X_train, y_train)
gb.fit(X_train, y_train)
print('models trained')

## 6. Evaluation

Compute MAE, RMSE, and RÂ² for both models and save the performance table.

In [None]:
def metrics(y_true, y_pred):
    mae = mean_absolute_error(y_true, y_pred)
    rmse = mean_squared_error(y_true, y_pred, squared=False)
    r2 = r2_score(y_true, y_pred)
    return mae, rmse, r2

y_pred_rf = rf.predict(X_test)
y_pred_gb = gb.predict(X_test)

rf_mae, rf_rmse, rf_r2 = metrics(y_test, y_pred_rf)
gb_mae, gb_rmse, gb_r2 = metrics(y_test, y_pred_gb)

perf = pd.DataFrame({
    'Model': ['RandomForest', 'GradientBoosting'],
    'MAE_kW': [round(rf_mae,5), round(gb_mae,5)],
    'RMSE_kW': [round(rf_rmse,5), round(gb_rmse,5)],
    'R2': [round(rf_r2,5), round(gb_r2,5)]
})

os.makedirs('results', exist_ok=True)
perf.to_csv('results/model_performance.csv', index=False)
perf

## 7. Prediction vs Actual plots

Scatter plots of predicted vs actual for both models are saved to `results/`.

In [None]:
plt.figure(figsize=(8,5))
plt.scatter(y_test, y_pred_rf, alpha=0.5)
minv = min(y_test.min(), y_pred_rf.min())
maxv = max(y_test.max(), y_pred_rf.max())
plt.plot([minv, maxv], [minv, maxv], linestyle='--')
plt.xlabel('Actual Total Output (kW)')
plt.ylabel('Predicted Total Output (kW) - RandomForest')
plt.title('Predicted vs Actual (RandomForest)')
plt.grid(True)
plt.tight_layout()
plt.savefig('results/pred_vs_actual_rf.png')
plt.show()

In [None]:
plt.figure(figsize=(8,5))
plt.scatter(y_test, y_pred_gb, alpha=0.5)
minv = min(y_test.min(), y_pred_gb.min())
maxv = max(y_test.max(), y_pred_gb.max())
plt.plot([minv, maxv], [minv, maxv], linestyle='--')
plt.xlabel('Actual Total Output (kW)')
plt.ylabel('Predicted Total Output (kW) - GradientBoosting')
plt.title('Predicted vs Actual (GradientBoosting)')
plt.grid(True)
plt.tight_layout()
plt.savefig('results/pred_vs_actual_gb.png')
plt.show()

## 8. Save models and results

Models are saved as joblib files and a comparison CSV is created.

In [None]:
joblib.dump(rf, 'results/rf_model.joblib')
joblib.dump(gb, 'results/gb_model.joblib')

comparison = pd.DataFrame({
    'Datetime': data.loc[y_test.index, 'Datetime'].values,
    'Actual_kW': y_test.values,
    'Pred_RF_kW': y_pred_rf,
    'Pred_GB_kW': y_pred_gb
}).reset_index(drop=True)

comparison.to_csv('results/results_comparison.csv', index=False)
print('saved models and comparison to results/')

## 9. Quick sample table and conclusion

Below is a short sample of predictions. Concluding remarks describe next steps you can take when using real data.

In [None]:
comparison.head(20)

### Conclusion

This notebook provides a complete pipeline for hybrid renewable energy prediction. Replace the synthetic dataset in `data/hybrid_energy_dataset.csv` with real sensor or historical data and re-run cells to produce real-world results. Consider adding a dashboard (Streamlit) for interactive prediction.