# Solar Power Generation Data: Study
## Solar power generation and sensor data for two power plants.

Description
This data has been gathered at two solar power plants in India over a 34 day period. It has two pairs of files - each pair has one power generation dataset and one sensor readings dataset. The power generation datasets are gathered at the inverter level - each inverter has multiple lines of solar panels attached to it. The sensor data is gathered at a plant level - single array of sensors optimally placed at the plant.

There are a few areas of concern at the solar power plant -

- Can we predict the power generation for next couple of days? - this allows for better grid management
- Can we identify generation profiles?
- Can we identify the need for panel cleaning/maintenance?
- Can we identify faulty or suboptimally performing equipment?

[Link to source](https://www.kaggle.com/anikannal/solar-power-generation-data)

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
dfGen01 = pd.read_csv("/home/zau/Desktop/UFRN_S/TAP3_2021.1/data/source/Plant_1_Generation_Data.csv", index_col="SOURCE_KEY")
dfGen02 = pd.read_csv("/home/zau/Desktop/UFRN_S/TAP3_2021.1/data/source/Plant_2_Generation_Data.csv", index_col="SOURCE_KEY")

## Removing the Plant_ID column (contains the same value throughout the dataset)

In [None]:
del dfGen01['PLANT_ID']
del dfGen02['PLANT_ID']

## Column conversion from DateTime string to DateTime type

In [None]:
dfGen01['DATE_TIME'] = pd.to_datetime(dfGen01['DATE_TIME'], format='%d-%m-%Y %H:%M')
dfGen02['DATE_TIME'] = pd.to_datetime(dfGen02['DATE_TIME'], format='%Y-%m-%d %H:%M:%S')

## Separation of the date and time in different columns


In [None]:
dfGen01['DATE'] = dfGen01['DATE_TIME'].dt.date
dfGen01['TIME'] = dfGen01['DATE_TIME'].dt.time

dfGen02['DATE'] = dfGen02['DATE_TIME'].dt.date
dfGen02['TIME'] = dfGen02['DATE_TIME'].dt.time

# Initial verification of data structure after conversion


In [None]:
dfGen01.info(), dfGen01.columns

In [None]:
dfGen02.info(), dfGen02.columns

# Absolute frequency check of data received by sensors

It is noticeable that there is inconsistency in the data


In [None]:
dfGen01.index.value_counts()

In [None]:
dfGen02.index.value_counts()

# Initial DataSet View

In [None]:
dfGen01

In [None]:
dfGen02

## Add Weekday column

In [None]:
dfGen01["WEEKDAY"] = dfGen01['DATE_TIME'].dt.dayofweek
dfGen02["WEEKDAY"] = dfGen02['DATE_TIME'].dt.dayofweek
