# Introduction

This project aims to predict solar energy production using machine learning models. Historical weather and energy data are imported from CSV files and stored in a MongoDB database to facilitate data access and manipulation. The project is structured into several key steps:

1. **Data Import**: Data is imported from CSV files and stored in MongoDB.
2. **Data Preprocessing**: Data is cleaned and prepared for model training.
3. **Model Training**: Several machine learning models, including Linear Regression, Random Forest, and Support Vector Machine, are trained on the historical data.
4. **Model Prediction**: The models are used to make predictions on test data, and their performance is evaluated using metrics such as RMSE, MSE, and MAPE.
5. **Interactive Dashboard**: An interactive dashboard is created using Dash to visualize the predictions and performance metrics of the models.

This project helps to understand the factors influencing solar energy production and improves forecasting through the application of machine learning techniques.


In [4]:
import pandas as pd
from pymongo import MongoClient
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np
import math
import dash
from dash import dcc, html, Input, Output
import plotly.graph_objs as go

# Data Importation

In this section, we will import the necessary libraries, set up the MongoDB connection, and import the CSV data into MongoDB. After that, we will preprocess the data for further analysis.


In [5]:
# MongoDB setup
client = MongoClient('mongodb://localhost:27017/')
db = client['EnergyData1']
collection_train = db['WeatherAndEnergyDataTrain']
collection_test = db['WeatherAndEnergyDataTest']

# Paths to CSV files
csv_files = {
    "train": "Datasets/Energy_Data_20200920_20231027.csv",
    "test": "Datasets/Energy_Data_20200920_20240118.csv"
}

# Columns to import for each file
columns = ['dtm', 'MIP', 'Solar_MW', 'Solar_capacity_mwp', 'Solar_installedcapacity_mwp', 'SS_Price', 'boa_MWh', 'DA_Price']

def import_csv_data(file_path, required_columns, collection):
    df = pd.read_csv(file_path)
    df = df[required_columns]
    records = df.to_dict('records')
    collection.insert_many(records)

# Import CSV data for training and testing
import_csv_data(csv_files["train"], columns, collection_train)
import_csv_data(csv_files["test"], columns, collection_test)

print("Data has been successfully imported into MongoDB.")

Data has been successfully imported into MongoDB.


# Data Preprocessing

In this section, we will retrieve the data from MongoDB and preprocess it. This includes handling missing data and displaying basic statistics.


In [6]:

# Retrieve data from MongoDB for EDA
data_train = pd.DataFrame(list(collection_train.find()))
data_test = pd.DataFrame(list(collection_test.find()))

# Drop the MongoDB ID column
data_train.drop('_id', axis=1, inplace=True)
data_test.drop('_id', axis=1, inplace=True)

# Handle missing data
data_train.fillna(method='ffill', inplace=True)
data_test.fillna(method='ffill', inplace=True)

# Display basic statistics
print(data_train.describe())
print(data_test.describe())

# Close MongoDB connection
client.close()


                MIP      Solar_MW  Solar_capacity_mwp  \
count  54384.000000  54384.000000        54384.000000   
mean     129.647321    237.101422         2180.185332   
std       96.188113    382.635087           82.631288   
min      -77.290000      0.000000         2108.431714   
25%       67.500000      0.000000         2118.198318   
50%      104.800000      0.368705         2139.253276   
75%      168.535000    353.564719         2267.405793   
max     1983.660000   1792.289600         2337.607243   

       Solar_installedcapacity_mwp      SS_Price       boa_MWh      DA_Price  
count                 54384.000000  54384.000000  54384.000000  54384.000000  
mean                   2305.999500    130.830663     -1.675239    133.842703  
std                      97.989387    132.215372     22.906842    102.125494  
min                    2206.064655   -185.330000   -599.500000    -51.520000  
25%                    2229.567230     60.000000      0.000000     69.775000  
50%         

  data_train.fillna(method='ffill', inplace=True)
  data_test.fillna(method='ffill', inplace=True)


# Model Training

In this section, we will split the data into training and validation sets, scale the features, and train multiple models (Linear Regression, Random Forest, and Support Vector Machine).


In [7]:
features = ['MIP', 'Solar_capacity_mwp', 'Solar_installedcapacity_mwp', 'SS_Price', 'boa_MWh', 'DA_Price']
target = 'Solar_MW'

X_train = data_train[features]
y_train = data_train[target]

# Split the data
X_train_split, X_valid_split, y_train_split, y_valid_split = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_split)
X_valid_scaled = scaler.transform(X_valid_split)

# Train models
models = {
    'lr': LinearRegression(),
    'rf': RandomForestRegressor(),
    'svm': SVR()
}

for model_name, model in models.items():
    model.fit(X_train_scaled, y_train_split)

print("Models have been trained.")


Models have been trained.


# Model Prediction

In this section, we will prepare the test data, make predictions using the trained models, and calculate evaluation metrics such as RMSE, MSE, and MAPE.


In [8]:
# Prepare test data
X_test = data_test[features]
y_test = data_test[target]

# Scale the test data
X_test_scaled = scaler.transform(X_test)

# Make predictions
predictions = {model_name: model.predict(X_test_scaled) for model_name, model in models.items()}

# Calculate evaluation metrics
results = {}
for model_name, preds in predictions.items():
    rmse = math.sqrt(mean_squared_error(y_test, preds))
    mse = mean_squared_error(y_test, preds)
    mape = np.mean(np.abs((y_test - preds) / y_test)) * 100
    results[model_name] = {'RMSE': rmse, 'MSE': mse, 'MAPE': mape}

print("Evaluation metrics calculated:", results)


Evaluation metrics calculated: {'lr': {'RMSE': 379.4843489745361, 'MSE': 144008.3711166275, 'MAPE': inf}, 'rf': {'RMSE': 403.06649233131674, 'MSE': 162462.5972402714, 'MAPE': inf}, 'svm': {'RMSE': 432.491909966763, 'MSE': 187049.25218669864, 'MAPE': inf}}


# Dashboard

In this section, we will create a Dash dashboard to visualize the predictions and evaluation metrics. The dashboard will have buttons to start and end the prediction process and a dropdown to select the model. The results will be displayed in a graph and as text.


In [9]:
# Define the Dash app
app = dash.Dash(__name__)

# Define the layout of the dashboard
app.layout = html.Div([
    html.Div([
        html.Button('Start', id='start-button', n_clicks=0, style={'margin-right': '10px'}),
        html.Button('End', id='end-button', n_clicks=0),
    ], style={'textAlign': 'left', 'margin': '20px'}),
    
    html.Div([
        dcc.Dropdown(
            id='model-selector',
            options=[
                {'label': 'Linear Regression', 'value': 'lr'},
                {'label': 'Random Forest', 'value': 'rf'},
                {'label': 'Support Vector Machine', 'value': 'svm'}
            ],
            value='lr',
            style={'width': '50%', 'display': 'inline-block'}
        ),
        dcc.Graph(id='prediction-graph', style={'width': '100%', 'display': 'inline-block'})
    ], style={'textAlign': 'right', 'margin': '20px'}),
    
    html.Div([
        html.Div(id='rmse-display', style={'margin': '10px'}),
        html.Div(id='mse-display', style={'margin': '10px'}),
        html.Div(id='mape-display', style={'margin': '10px'})
    ])
], style={'font-family': 'Arial', 'padding': '10px'})

@app.callback(
    [Output('prediction-graph', 'figure'),
     Output('rmse-display', 'children'),
     Output('mse-display', 'children'),
     Output('mape-display', 'children')],
    [Input('start-button', 'n_clicks'),
     Input('model-selector', 'value')]
)
def update_output(n_clicks, model_type):
    if n_clicks > 0:
        # Make predictions using the selected model
        preds = predictions[model_type]
        
        # Plotting
        trace = go.Scatter(x=y_test.index, y=preds, mode='lines+markers', name='Predicted')
        trace_actual = go.Scatter(x=y_test.index, y=y_test, mode='lines', name='Actual')
        layout = go.Layout(title='Solar Energy Production Forecast', xaxis_title='Time', yaxis_title='Energy Produced')
        fig = go.Figure(data=[trace, trace_actual], layout=layout)
        
        # Retrieve evaluation metrics
        metrics = results[model_type]
        
        return fig, f"RMSE: {metrics['RMSE']}", f"MSE: {metrics['MSE']}", f"MAPE: {metrics['MAPE']}"
    return go.Figure(), '', '', ''

if __name__ == '__main__':
    print("Dashboard is running at http://127.0.0.1:8051")
    app.run_server(debug=True, port=8051)


Dashboard is running at http://127.0.0.1:8051
