# PV Diagnosis

In case of large-scale renewable energy plants, production output is regularly reviewed to ensure ROI. This is different for small plants. By comparing predicted production based on past meteorological data with actual production of a PV power plant, we estimate plant's condition and predict next required maintenance. The analysis detects performance losses and, therby, enhances economic profits. This challenge was part of the Energy Hackdays 2019 (https://hack.opendata.ch/project/284).


## Model

$q_{t,p} = \alpha_0 + \alpha_1 \times \hat{q}_{t,p}(\omega_{t-1}) + \epsilon_{t,p}$,
where $q_{t,p}$ is actual production, $\hat{q}_{t,p}$ estimated potential production based on past weather parameters $\omega_{t-1}$. 

## Approach
1. Estimate potential production based on past weather parameters $\hat{q}_{t,p}(\omega_{t-1})$
2. Compare this estimation with actual production $q_{t,p}$


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns
import ppscore as pps
import sys,os

In [2]:
os.chdir("..")
sys.path.append(os.getcwd())
from preprocessor.paths import (PATH_TO_PLANT_A,
                                 PATH_TO_PLANT_B,
                                 PATH_TO_PLANT_C,
                                 PATH_TO_WEATHER)
from preprocessor.preprocessor import Preprocessor

In [3]:
data_plant_a = Preprocessor(PATH_TO_PLANT_A, "timestamp").df_indexed_utc
data_plant_a = Preprocessor(PATH_TO_PLANT_A, "timestamp").df_indexed_utc
data_plant_c = Preprocessor(PATH_TO_PLANT_C, "timestamp").df_indexed_utc
data_weather = Preprocessor(PATH_TO_WEATHER, "local_time").df_indexed_utc

TypeError: _set_datetime_index() missing 1 required positional argument: 'local_time'

## Weather (Radiation) Prediction

### Which factors contribute most to PV production?

In [None]:
data_weather = WeatherPreprocessor(PATH_TO_WEATHER).df_indexed_utc

In [None]:
data = pd.merge(left=data_plant_a, right=data_weather,
                left_index=True, right_index=True)
data.columns

In [None]:
def heatmap(df):
    fig, ax = plt.subplots()
    heatmap = sns.heatmap(df, vmin=0, vmax=1, cmap="Blues", 
                          linewidths=0.5, annot=True, ax=ax)
    heatmap.set_title('PPS matrix')
    heatmap.set_xlabel('feature')
    heatmap.set_ylabel('target')
    return heatmap

In [None]:
def corr_heatmap(df):
    ax = sns.heatmap(df, vmin=-1, vmax=1, cmap="BrBG", linewidths=0.5, annot=True)
    ax.set_title('Correlation matrix')
    return ax

In [None]:
sns.set()
heatmap(pps.matrix(data))

In [None]:
corr_heatmap(data.corr())

In [None]:
sns.scatterplot(data=data, x="generation_kw", y="radiation_surface", alpha=0.1)

In [None]:
sns.scatterplot(data=data, x="generation_kw", y="radiation_toa", alpha=0.1)

### Predict radiation in top of athmosphere based on past weather factors

#### Linear Regression

In [None]:
data["radiation_toa"].plot()

In [None]:
for col in data.columns:
    print(col)
    sns.distplot(data[col])
    plt.show()

In [None]:
data.columns

In [None]:
# separate our my predictor variables (X) from my outcome variable y
predictors = ['temperature', 'precipitation', 'snowfall', 'snow_mass', 'air_density', 
              'radiation_surface', 'cloud_cover']
X = data[predictors]
y = data['radiation_toa']

# Add a constant to the predictor variable set to represent the Bo intercept
X = sm.add_constant(X)
X

In [None]:
X[X.isna().sum(axis=1)>0]

In [None]:
# (1) select a significance value
alpha = 0.05

# (2) Fit the model
model = sm.OLS(y, X).fit()

# (3) evaluate the coefficients' p-values
model.summary()