# PVOutput.org

This notebook shows how to download historical solar photovoltaic data from [PVOutput.org](https://pvoutput.org). In order to get access to historic data for any system via the API, a user account is required as well as a donation. For more details, see the _README.md_. 

- The donation also increases the API requests per hour to 300. 
- A request returns the data for 24 hours of one day and one system.
- This [Python library](https://github.com/openclimatefix/pvoutput) is used to interact with the API.
- To search for registered PV systems in a certain area, specify the latitude/longitude and the search radius around that location.
- With the function `get_timeseries()`, the timeseries of one system for one day can be accessed.
- The time resolution of the systems varies (based on samples, some have 5 min, some 15 min resolution)
 
The re-sampled timeseries of the instantaneous power output is converted from W in kW and then saved as a _csv_ file. The filename is made up of the location name, start/end date and the time resolution of the resampled data `PVOutput--<location>--start-<YYmmdd>--end-<YYmmdd>--<time_resolution>min.csv`. Each column of the file represents the timeseries for one day and one PV system. The column name includes the unique system ID and the date. 

The function `resample_timeseries()` re-samples the power timeseries as specified with the parameter `time_resolution`. The function is based on `pd.DataFrame.resample()`and in this notebook, `mean()` is used to aggregate the data over the resampling interval.

The metadata file contains information about the requested data:
- start and end date of the request
- date of the first and last record
- location (longitude/latitude) and search radius
- number of systems found in the search area and number of systems with records for requested dates


In [None]:
from pvoutput.pvoutput import PVOutput
from pathlib import Path
import pandas as pd
from datetime import date, datetime, timedelta
import matplotlib.dates as mdates
import logging
import sys

# Set up logging
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

In [None]:
# parameters
time_resolution = "30min" # in min for resampling timeseries
start_date = "2021-02-07"
end_date = "2021-02-08"
dates = pd.date_range(start=start_date,end=end_date).to_pydatetime()

# Specify search parameters
radius = "20km"
latitude =  51.44654
longitude = 0.21539

# Location name for output file
location = "Dartford"
save_output_file = True

## Set-up

First, we need to set the API key and system ID. A detailed description on how to obtain a system ID is given in the `README.md`.

In [None]:
# Set API key and System ID here, or in ~/.pvoutput.yml

with open(Path.cwd().parent / "secrets" / "pvoutput_api_key.txt" ) as file:
    API_KEY = file.readline()
    
with open(Path.cwd().parent / "secrets" / "pvoutput_system_id.txt" ) as file:
    SYSTEM_ID = file.readline()

pv = PVOutput(API_KEY, SYSTEM_ID)

## Functions

Functions to 
- get timeseries for a given date
- re-sample the timeseries 
- save dataframe to _csv_

In [None]:
def get_timeseries(system_id, date):
    """ returns timeseries of a PV system for a specified date. 
    
    Parameters
    -----------
    system_id: int
        System ID of the PV system.
    date: datetime.time 
        Date of the timeseries.
    """
    timeseries = pv.get_status(system_id, date=date)
    # The timestamps are localtime, local to the PV system
    # and we know this PV system is from the United Kingdom.
    timeseries = timeseries.tz_localize('Europe/London',ambiguous='raise')

    # replace NaN in column instantaneous_power_gen_W with zeros
    timeseries['instantaneous_power_gen_W']=timeseries['instantaneous_power_gen_W'].fillna(0)
    return timeseries

In [None]:
def resample_timeseries(df, system_id, time_resolution):
    """ resamples the power output of one pv system for one day
    
    The current is averaged over the specified time resolution before converted to power.
    
    Parameters
    ----------
    df_event: pd.DataFrame
        The dataframe containing the timeseries of the pv output of one system and day
    system_id: int
        System ID of the PV system.
    date: datetime.time 
        Date of the timeseries.
        
    Returns
    -------
    timeseries: pd.DataFrame
        Dataframe containing the resampled power in kW.
    """
    date = df.index[0].date()
    timeseries = pd.DataFrame(df["instantaneous_power_gen_W"].resample(time_resolution).mean())
    timeseries = timeseries.assign(timestamps = timeseries.index.time)
    timeseries = timeseries.set_index("timestamps")
    # convert current into power in kW
    timeseries["instantaneous_power_gen_W"] = timeseries["instantaneous_power_gen_W"]/1000.0
    timeseries = timeseries.rename(columns={"instantaneous_power_gen_W": f"{system_id}_{date}"})
    return timeseries



In [None]:
def save_df(df_to_save, dates, time_resolution, path=None):
    """ save df containing all events to .csv and create textfile with metadata
    
    Parameters
    -----------
    df_to_save: pd.DataFrame
        The dataframe to be saved.
    dates: array of datetime objects
    pv_systems: pd.DataFrame
        The PV systems found in the specified area.
    path: str
        Directory where file is saved
    """
    filename_out = f"PVOutput--{location}--start-{dates[0].date()}--end-{dates[-1].date()}--{time_resolution}"
    if path is None:
        path = Path.cwd()
    path_out = path / f"{filename_out}.csv"
    df_to_save.to_csv(path_out)
    
 

## Find PV systems in an area

Search within a specified radius (in km) around a point (described by latitude and longitude). 

In [None]:
pv_systems = pv.search(query=radius, lat=latitude, lon=longitude)
logger.info(f"\nNumber of PV systems found: {pv_systems.shape[0]}")
if pv_systems.empty:
    logging.error("No PV systems found.")
pv_systems.head()

## Filter systems

Filter systems to only retain systems that fulfill certain criteria, e.g. a certain number of outputs or recent outputs.

In [None]:
# Specify filter parameters
number_outputs = 50

In [None]:
pv_systems = pv_systems.query(f"num_outputs >= {number_outputs}")
pv_systems.head()

## Initialise dataframe to collect all charging events

In [None]:
# initialise dataframe to combine all pv profiels
timestamps = pd.date_range("00:00", "23:59", freq=time_resolution) # change freq to modify time resolution

pv_profiles = pd.DataFrame(timestamps, columns=['Timestamps']) 
pv_profiles['Timestamps'] = pv_profiles['Timestamps'].apply(lambda x: x.time() )
pv_profiles.set_index("Timestamps", inplace=True)

## Get timeseries 

You can get the timeseries of one system for one day with each request.

In order to do so, we can choose a PV system from the dataframe we obtained and extract its system ID and metadata. 

In [None]:
# initialise for metadata info
date_first_record = datetime(1900,1,1).date() 
date_last_record = datetime(1900,1,1).date() 
dict_systems = {}

# get pv timeseries if it exists and resample
for date_time in dates:
    date = date_time.date()
    for index, system in pv_systems.iterrows():
        pv_system_id = system.name
        try:
            pv_data = get_timeseries(pv_system_id, date)
            if not pv_data.empty:
                timeseries=resample_timeseries(pv_data, pv_system_id, time_resolution)
                pv_profiles = pv_profiles.join(timeseries)
                # get date for last record
                date_last_record = date
                # get date for first record
                if date_first_record == datetime(1900,1,1).date():
                    date_first_record = date       
                # set dictionary index for system to 1
                if pv_system_id not in dict_systems:
                    dict_systems[pv_system_id] = 1
        except: # catch *all* exceptions
            e = sys.exc_info()[0]
            logging.error(f"Error: {e}")
pv_profiles.fillna(0.0, inplace=True)




In [None]:
# save data to csv
if save_output_file:
    save_df(pv_profiles, dates, time_resolution) 

    # save metadata metadata to txt file
    path = Path.cwd() 
    filename_metadata = path / f"Metadata-PVOutput--{location}--start-{dates[0].date().strftime('%Y%m%d')}--end-{dates[-1].date().strftime('%Y%m%d')}.txt"
    file_metadata = open(filename_metadata, "w+")
    file_metadata.write(f"""
    location: {location} 
    longitude: {longitude}
    latitude: {latitude}
    radius: {radius}
    start date request: {dates[0].date()}
    end date request: {dates[-1].date()} 
    date first record: {date_first_record}
    date last record: {date_last_record} 
    number of systems found in search area: {pv_systems.shape[0]}
    number of systems with recorded events: {len(dict_systems)}""")
    file_metadata.close()



In [None]:
pv_profiles