Python API for the SolarDB photovoltaic dataset.
This repository contains auxiliary code for accessing the SolarDB dataset. The SolarDB is a photovoltaic dataset consisting of 1-year of data for 16 solar power plants, including power production, weather, weather forecasts, and additional exogenous features. In total, it contains over 40M records.
The SolarDB dataset is a part of a currently in-review paper. For more information, see cphoto.fit.vutbr.cz/solar:
The solardb
package can be simply installed by following the subsequent
instructions. First, solardb
has the following dependencies:
- Python >= 3.7.1
- numpy
- pandas
- scipy
- scikit-learn
- sqlalchemy
Next, to install the solardb
package and any dependencies, run the following:
conda create --name solardb -y python">=3.7.1" numpy pandas \
scipy scikit-learn sqlalchemy
conda activate solardb
git clone https://github.com/PolasekT/SolarDB ./solardb
cd ./solardb && python ./setup.py install
The solardb
package only contains the code necessary for access to the
various data sources. First, download the db files from the projects
website. Next, to test if the installation
is valid, run the following Python commands:
from solardb.db import SolarDBData
db = SolarDBData("./path/to/solardb.db")
print(f"Total power plant / inverters: {len(db.get_pp_info_df())}")
% Output: Total power plant / inverters: 274
For a quick-start, the SolarDBAssembler
and SolarDBEvaluator
may be used.
First, we calculate history, weather, and prediction DataFrames for the prediction
task of <10.5.2019, 11.5.2019)
for power plant #8:
import datetime
from solardb.db import SolarDBAssembler
assembler = SolarDBAssembler(db)
history_df, weather_df, prediction_df = assembler.prepare_prediction(
pp_id=8,
dt_start=datetime.datetime(2019, 5, 10),
dt_end=datetime.datetime(2019, 5, 11),
weather_scheme="realistic", history_cnt=288,
)
print(f"DataFrame lengths: {len(history_df)}, {len(weather_df)}, {len(prediction_df)}")
# Output: DataFrame lengths: 288, 288, 288
Next, we calculate the predictions, filling the prediction_df
DataFrame. In
this example, our model just it with a constant value:
prediction_df.fillna(value=42, inplace=True)
Finally, we can evaluate the efficacy of the proposed prediction model:
from solardb.db import SolarDBEvaluator
evaluator = SolarDBEvaluator(db)
evaluation = evaluator.evaluate_prediction(
prediction_df=prediction_df,
history_df=history_df,
weather_df=weather_df
)
print(f"Prediction error = {evaluation['power']['error'] * 100.0:.2f}%")
# Output: Prediction error = 99.14%
For additional uses, see code documentation in SolarDBData
and the helper script
solardb/run/solardb_main.py
.
The code in this repository is licensed under MIT. For information about the SolarDB licensing, see the project website.