In [None]:
from google.colab import drive
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# connect to google drive
drive.mount('/content/drive')

# Load your dataset
df = pd.read_csv('/content/drive/MyDrive/Final_Models/PV_Prediction_Data.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Set 'Time' as the index and ensure it's in datetime format
#df['Time'] = pd.to_datetime(df['Time'])
df['Time'] = pd.to_datetime(df['Time'], format='%d.%m. %H:%M')
df['Time'] = df['Time'].apply(lambda x: x.replace(year=2022))
df.set_index('Time', inplace=True)
df.reset_index(inplace = True)
df.sort_index(inplace=True)  # Sorting just in case

In [None]:
df.head()

Unnamed: 0,Time,Irradiance onto horizontal plane,Outside Temperature,Grid Feed-in,Energy from Grid,Consumption,Own Consumption,Global radiation - horizontal,Deviation from standard spectrum,Ground Reflection (Albedo),Orientation and inclination of the module surface,Shading,Reflection on the Module Interface,Irradiance on the rear side of the module,Global Radiation at the Module,Global PV Radiation,STC Conversion (Rated Efficiency of Module),PV energy (DC)
0,2022-01-01 00:00:00,0.0,-1.3,0.0,5.12,5,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0
1,2022-01-01 01:00:00,0.0,0.0,0.0,0.12,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0
2,2022-01-01 02:00:00,0.0,1.1,0.0,0.12,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0
3,2022-01-01 03:00:00,0.0,1.0,0.0,0.12,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0
4,2022-01-01 04:00:00,0.0,1.0,0.0,0.12,0,0.0,0.0,0.0,0.0,0.0,0,0.0,0,0.0,0.0,0.0,0.0


In [None]:
df.columns

Index(['Time', 'Irradiance onto horizontal plane ', 'Outside Temperature ',
       'Grid Feed-in ', 'Energy from Grid ', 'Consumption ',
       'Own Consumption ', 'Global radiation - horizontal ',
       'Deviation from standard spectrum ', 'Ground Reflection (Albedo) ',
       'Orientation and inclination of the module surface ', 'Shading ',
       'Reflection on the Module Interface ',
       'Irradiance on the rear side of the module ',
       'Global Radiation at the Module ', 'Global PV Radiation ',
       'STC Conversion (Rated Efficiency of Module) ', 'PV energy (DC) '],
      dtype='object')

In [None]:
# General Information
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 18 columns):
 #   Column                                              Non-Null Count  Dtype         
---  ------                                              --------------  -----         
 0   Time                                                8760 non-null   datetime64[ns]
 1   Irradiance onto horizontal plane                    8760 non-null   float64       
 2   Outside Temperature                                 8760 non-null   float64       
 3   Grid Feed-in                                        8760 non-null   float64       
 4   Energy from Grid                                    8760 non-null   float64       
 5   Consumption                                         8760 non-null   int64         
 6   Own Consumption                                     8760 non-null   float64       
 7   Global radiation - horizontal                       8760 non-null   float64       
 8   Deviatio

In [None]:
# Check for missing values
print(df.isnull().sum())

Time                                                  0
Irradiance onto horizontal plane                      0
Outside Temperature                                   0
Grid Feed-in                                          0
Energy from Grid                                      0
Consumption                                           0
Own Consumption                                       0
Global radiation - horizontal                         0
Deviation from standard spectrum                      0
Ground Reflection (Albedo)                            0
Orientation and inclination of the module surface     0
Shading                                               0
Reflection on the Module Interface                    0
Irradiance on the rear side of the module             0
Global Radiation at the Module                        0
Global PV Radiation                                   0
STC Conversion (Rated Efficiency of Module)           0
PV energy (DC)                                  

In [None]:
df.columns

Index(['Time', 'Irradiance onto horizontal plane ', 'Outside Temperature ',
       'Grid Feed-in ', 'Energy from Grid ', 'Consumption ',
       'Own Consumption ', 'Global radiation - horizontal ',
       'Deviation from standard spectrum ', 'Ground Reflection (Albedo) ',
       'Orientation and inclination of the module surface ', 'Shading ',
       'Reflection on the Module Interface ',
       'Irradiance on the rear side of the module ',
       'Global Radiation at the Module ', 'Global PV Radiation ',
       'STC Conversion (Rated Efficiency of Module) ', 'PV energy (DC) '],
      dtype='object')

##**Selected Features and Importance**
1. **Irradiance onto horizontal plane:**
This is the power of sunlight received per unit area on a flat surface. It directly affects the potential energy the PV module can generate. A higher irradiance typically leads to higher energy output.

2. **Outside Temperature:**
Solar panels tend to have a decrease in performance as temperature increases. As the temperature rises, the efficiency of a solar cell drops, thereby affecting the energy output.

3. **Global radiation - horizontal:**
This accounts for the total amount of solar radiation received on a horizontal surface, both directly from the sun and reflected from the surroundings. It provides a measure of the total energy potential available for conversion.

4. **Deviation from standard spectrum:**
The solar spectrum changes based on time of day, weather, and location. The "standard" spectrum is used to rate solar panels under lab conditions. Deviation from this standard can affect how efficiently a panel converts sunlight to electricity.

5. **Ground Reflection (Albedo):**
Albedo measures the reflectivity of the ground. Some of the sunlight can be reflected by the ground back onto the underside of tilted or elevated panels, which can contribute additional energy.

6. **Orientation and inclination of the module surface:**
The angle and direction at which the panel is oriented play crucial roles. Panels facing directly towards the sun and at the correct tilt can capture more sunlight, leading to higher energy outputs.

7. **Shading:**
Shading can significantly reduce the energy output of PV modules. Even a small amount of shade on a part of a panel can reduce its output because of how cells are interconnected.

8. **Reflection on the Module Interface:**
Some sunlight can be reflected off the surface of the solar panels without being absorbed. This reflected sunlight won't contribute to energy generation.

9. **Irradiance on the rear side of the module:**
For bifacial solar panels, which can capture sunlight on both sides, the irradiance on the rear side can contribute to additional energy output.

10. **Global Radiation at the Module:**
This provides a measure of the total solar radiation directly received by the PV module, which is a primary factor in its energy output.

11. **Global PV Radiation:**
This is the total solar radiation available for all PV systems in the vicinity. It's a broader measure of solar energy potential in the area.

12. **STC Conversion (Rated Efficiency of Module):**
STC (Standard Test Conditions) efficiency is a standardized measure of how well a solar panel converts sunlight to electricity. Higher STC indicates a more efficient panel under ideal conditions.






In [None]:
import plotly.express as px

# Create the line chart using plotly express
fig = px.line(df, x='Time', y='PV energy (DC) ', title='Time Series Plot of PV energy (DC)')

# Display the plot
fig.show()

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd


# List of selected features
features = [
    "Irradiance onto horizontal plane ", "Outside Temperature ",
    "Global radiation - horizontal ", "Deviation from standard spectrum ",
    "Ground Reflection (Albedo) ", "Orientation and inclination of the module surface ",
    "Shading ", "Reflection on the Module Interface ",
    "Irradiance on the rear side of the module ", "Global Radiation at the Module ",
    "Global PV Radiation ", "STC Conversion (Rated Efficiency of Module) "
]


# Create subplots: Now, assuming 2 columns of plots
rows = len(features) // 2 + (len(features) % 2 > 0)  # Update to divide by 2
cols = 2  # Update columns to 2
fig = make_subplots(rows=rows, cols=cols, subplot_titles=features)

for index, feature in enumerate(features):
    row = index // 2 + 1  # Update for 2 columns
    col = index % 2 + 1  # Update for 2 columns
    fig.add_trace(go.Histogram(x=df[feature], name=feature), row=row, col=col)

# Adjusting size and layout
fig.update_layout(
    showlegend=False,
    title_text="Histograms of Selected Features",
    bargap=0.05,
    width=1200,  # Adjust width as per preference
    height=400 * rows  # Height adjusted based on the number of rows
)

fig.show()


In [None]:
df.describe()

Unnamed: 0,Irradiance onto horizontal plane,Outside Temperature,Grid Feed-in,Energy from Grid,Consumption,Own Consumption,Global radiation - horizontal,Deviation from standard spectrum,Ground Reflection (Albedo),Orientation and inclination of the module surface,Shading,Reflection on the Module Interface,Irradiance on the rear side of the module,Global Radiation at the Module,Global PV Radiation,STC Conversion (Rated Efficiency of Module),PV energy (DC)
count,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0
mean,0.118953,10.922797,2.811116,2.015394,3.163242,1.199357,0.118953,-0.00119,0.000179,-0.001324,0.0,-0.000161,0.0,0.116457,21.954824,-17.652038,4.109165
std,0.195369,8.022307,5.544991,2.349716,2.410555,2.001757,0.195369,0.001954,0.000294,0.002762,0.0,0.00026,0.0,0.190929,35.994465,28.94015,6.754618
min,0.0,-8.8,0.0,0.0,0.0,0.0,0.0,-0.00925,0.0,-0.013695,0.0,-0.001319,0.0,0.0,0.0,-136.9,0.0
25%,0.0,4.3,0.0,0.0,0.0,0.0,0.0,-0.0016,0.0,-0.001498,0.0,-0.00025,0.0,0.0,0.0,-23.78775,0.0
50%,0.004,10.7,0.0,0.12,5.0,0.0,0.004,-4e-05,6e-06,0.0,0.0,-4e-06,0.0,0.003932,0.74127,-0.59599,0.0
75%,0.16,17.3,2.69735,5.04,5.0,1.9579,0.16,0.0,0.000241,0.0,0.0,0.0,0.0,0.15694,29.58625,0.0,5.654925
max,0.925,35.9,29.221,5.12,5.0,5.0,0.925,0.0,0.001391,0.018889,0.0,0.0,0.0,0.90316,170.27,0.0,30.64


##**MODEL SELECTION BASED ON THE ANALYSIS**

**1. Linear Regression:**
The data has multiple features that might have linear relationships with the target variable "PV energy (DC)". Linear regression can model the impact of these features on the output. Given the straightforwardness and interpretability of linear regression, it's commonly chosen as a baseline model in many predictive tasks.

**2.Lasso Regression (L1 Regularization):**
The data has several features, some of which might not be significant predictors. Lasso regression adds L1 regularization, which can shrink the coefficients of less important features to zero. This essentially performs feature selection and can lead to a simpler, more interpretable model, especially valuable if the dataset has multicollinearity.

**3. Ridge Regression (L2 Regularization):**
Like Lasso, Ridge regression introduces regularization, but it tends to distribute the importance more uniformly across features rather than zeroing them out. It's especially useful when there's multicollinearity in the data. Given the number of features and potential inter-correlations, Ridge can produce a more stable model than plain Linear Regression.

**4. Decision Trees:**
Decision trees are non-linear models that can capture complex patterns and interactions between features. Given features like "Outside Temperature", which might have non-linear relationships with "PV energy (DC)", decision trees can be effective. They also provide intuitive visualizations and rules for interpretation.

**5. Random Forest:**
Rationale: Random Forest is an ensemble of decision trees. It combines predictions from multiple trees to produce a more robust and accurate model. Given the potential complexities in the dataset and the power of ensemble methods, Random Forest can handle both linearity and non-linearity in the data, provide feature importance rankings, and generally offer higher accuracy than a single decision tree.

**6. Neural Network:**
Neural networks, especially deep ones, are highly flexible and can model intricate patterns in the data. If the relationships between features and the target variable are complex and can't be easily captured by the other models, a neural network can be employed. The multiple features and potential interactions in your dataset might be well-suited for a neural network, provided there's enough data to train it without overfitting.