<a href="https://colab.research.google.com/github/alialmulla97/PV-Energy-Output-Regression/blob/main/PV_Energy_Predictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Photovoltaic Energy Output Predictor in Shagaya Renewable Energy Park

## The Challenge
As the world transitions toward renewable energy, Kuwait, an oil-rich nation, remains heavily reliant on non-renewable resources for electricity production, with a staggering 99.6% of its electricity coming from such sources. Only 0.4% of Kuwait's electricity is derived from renewable energy, specifically from the Shagaya Renewable Energy Park. This park integrates three renewable technologies: Concentrated Solar Power (CSP), Wind, and Photovoltaic (PV) systems.

This project focuses solely on the PV technology due to Kuwait's high solar irradiance, which provides more direct sunlight hours than the global average. Within PV, two distinct technologies are employed: polycrystalline panels with an installed capacity of 315W per panel (18,820 panels of model JKM315PP-72 from Jinko Solar) and thin film panels with an installed capacity of 160W per panel (34,560 panels of model SF160-S from Solar Frontier).

The primary goal of this project is to develop a predictive model for PV energy output based on forecasted weather data.

## The Predictive Analytical Process
1. **Problem Understanding and Definition:** The project aims to create a machine learning model that forecasts PV energy output in kWh on an hourly basis using weather data.
2. **Data Collection and Preperation:**


# Problem Understanding and Definition
The aim of this project is to develop a machine learning model that predicts photovoltaic (PV) energy output based on forecasted weather data. The model will output the energy produced in kilowatt-hours (kWh) on an hourly basis using regression techniques.

# Data Collection and Preperation

In [None]:
# For data manipulation and analysis
import pandas as pd

# For numerical operations
import numpy as np

# For creating visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# For regular expressions
import re

# Suppress all warnings
import warnings
warnings.filterwarnings("ignore")

## 2.1 Load the Data

In [None]:
# Import data file
energy = pd.read_csv("/content/drive/MyDrive/Data Science Project/PV Energy Predictor/pv_energy_output.csv")

## 2.2 Read the Data

In [None]:
# Column information
energy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 15 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Day                                  8760 non-null   int64  
 1   Time                                 8760 non-null   object 
 2   Global Horizon Irradiation           8760 non-null   int64  
 3   Direct Normal Irradiation            8760 non-null   int64  
 4   Diffuse Horizonal Irradiation        8760 non-null   int64  
 5   Sun Azimuth Angle                    8760 non-null   float64
 6   Sun Elevation Angle                  8760 non-null   float64
 7   Air Temp                             8760 non-null   float64
 8   Relative Humidty                     8760 non-null   float64
 9   Wind Speed                           8760 non-null   float64
 10  Wind Direction                       8760 non-null   int64  
 11  Atmospheric Pressure          

In [None]:
# Rename columns
energy.columns = ["day","time","global_horizon_irradiation",
                  "direct_normal_irradiation","diffuse_horizonal_irradiation",
                  "sun_azimuth_angle","sun_elevation_angle","air_temp","relative_humidity",
                  "wind_speed","wind_direction","atmospheric_pressure","wet_bulb_temperature",
                  "thin_film","polycrystalline"]

In [None]:
# List all missing values
energy.isnull().sum()

day                               0
time                              0
global_horizon_irradiation        0
direct_normal_irradiation         0
diffuse_horizonal_irradiation     0
sun_azimuth_angle                 0
sun_elevation_angle               0
air_temp                          0
relative_humidity                 0
wind_speed                        0
wind_direction                    0
atmospheric_pressure              0
wet_bulb_temperature              0
thin_film                        47
polycrystalline                  47
dtype: int64

In [None]:
# Statistical description
energy.describe()

Unnamed: 0,day,global_horizon_irradiation,direct_normal_irradiation,diffuse_horizonal_irradiation,sun_azimuth_angle,sun_elevation_angle,air_temp,relative_humidity,wind_speed,wind_direction,atmospheric_pressure,wet_bulb_temperature,thin_film,polycrystalline
count,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8713.0,8713.0
mean,183.0,238.053311,212.001712,97.747489,0.419943,0.311655,25.840765,21.489098,3.956336,168.105365,984.440217,14.746062,1057.931367,1111.775967
std,105.372043,319.370489,275.231914,121.229947,95.112868,41.126024,10.980324,16.154036,1.836954,86.982807,7.378646,6.654681,1784.897597,1933.863497
min,1.0,0.0,0.0,0.0,-167.9,-81.9,0.0,2.4,0.1,0.0,970.6,-4.5,0.0,0.0
25%,92.0,0.0,0.0,0.0,-89.525,-34.1,17.0,9.2,2.6,113.0,977.9,9.9,0.0,0.0
50%,183.0,10.0,5.0,9.0,-1.8,0.95,26.4,16.2,3.7,136.0,985.3,14.9,0.0,4.0
75%,274.0,479.0,461.0,203.0,88.825,34.825,33.7,29.4,5.1,238.0,990.5,19.8,2092.0,2176.0
max,365.0,1037.0,953.0,474.0,163.8,81.5,50.1,91.1,11.9,359.0,1000.6,30.3,29984.0,33104.0


In [None]:
# Top 5 rows
energy.head()

Unnamed: 0,day,time,global_horizon_irradiation,direct_normal_irradiation,diffuse_horizonal_irradiation,sun_azimuth_angle,sun_elevation_angle,air_temp,relative_humidity,wind_speed,wind_direction,atmospheric_pressure,wet_bulb_temperature,thin_film,polycrystalline
0,1,00:30:00,0,0,0,-97.1,-79.5,9.1,86.4,3.3,141,992.2,8.4,0.0,0.0
1,1,01:30:00,0,0,0,-101.1,-67.8,8.8,87.0,3.0,137,992.1,8.2,0.0,0.0
2,1,02:30:00,0,0,0,-90.9,-54.8,8.7,87.6,2.4,124,992.0,8.0,0.0,0.0
3,1,03:30:00,0,0,0,-84.3,-41.8,7.9,20.9,2.6,135,995.5,0.9,0.0,0.0
4,1,04:30:00,0,0,0,-78.3,-28.9,7.4,20.2,2.7,144,995.6,0.3,0.0,0.0


In [None]:
# Bottom 5 rows
energy.tail()

Unnamed: 0,day,time,global_horizon_irradiation,direct_normal_irradiation,diffuse_horizonal_irradiation,sun_azimuth_angle,sun_elevation_angle,air_temp,relative_humidity,wind_speed,wind_direction,atmospheric_pressure,wet_bulb_temperature,thin_film,polycrystalline
8755,365,19:30:00,0,0,0,79.1,-31.1,11.1,74.8,4.8,132,991.7,9.7,0.0,0.0
8756,365,20:30:00,0,0,0,85.2,-44.1,10.3,80.2,3.9,139,992.1,9.2,0.0,0.0
8757,365,21:30:00,0,0,0,92.3,-57.1,9.8,84.0,3.9,139,992.3,8.9,0.0,0.0
8758,365,22:30:00,0,0,0,103.9,-70.1,9.5,84.9,3.7,139,992.3,8.7,0.0,0.0
8759,365,23:30:00,0,0,0,133.3,-81.0,9.3,85.7,3.3,141,992.2,8.5,0.0,0.0


## 2.3 Dealing with Missing Values

In [None]:
#
energy.loc[energy["sun_elevation_angle"] < 0, ["thin_film_energy", "polycrystalline_energy"]] = 0