# Download and Format SMARTEOLE SCADA Data Set

Download SMARTEOLE wake steering experiment SCADA data set from Zenodo, convert variable names to the FLASC convention, and compute the reference wind direction, wind speed, and power variables used in the SMARTEOLE experiment.

In [1]:
import os
import requests
import numpy as np
import pandas as pd
from zipfile import ZipFile

from flasc import circular_statistics as circ
from flasc.dataframe_operations import dataframe_manipulations as dfm

# Initial data download
First, we will download the SMARTEOLE wind farm control open data set from the Sole du Moulin Vieux wind plant from Zenodo, extract the files from the zip folder, and save them in the "/data" folder. This step will be skipped if the data are found locally.

This dataset has been obtained by ENGIE Green in the scope of French national project SMARTEOLE (grant no. ANR-14-CE05-0034).

The publication about the test is available online here: https://wes.copernicus.org/articles/6/1427/2021/wes-6-1427-2021.html.

In [2]:
def download_smarteole_data():
    """Function that downloads 1-minute SCADA data from the SMARTEOLE wake
    steering experiment at the Sole du Moulin Vieux wind plant along with
    static wind plant and turbine data.
    """

    r = requests.get(r"https://zenodo.org/api/records/7342466")

    r_json = r.json()

    filesize = r_json["files"][0]["size"]/(1024*1024)
    
    filename = os.path.join("data",r_json["files"][0]["key"])

    result = requests.get(r_json["files"][0]["links"]["self"],stream=True)

    if not os.path.exists("data"):
        os.makedirs("data")

    if not os.path.exists(filename):
        print("SMARTEOLE data not found locally. Beginning file download from Zenodo...")

        chunk_number = 0

        with open(filename, "wb") as f:
            for chunk in result.iter_content(chunk_size=1024*1024):
        
                chunk_number = chunk_number + 1
        
                print(f"{chunk_number} out of {int(np.ceil(filesize))} MB downloaded", end="\r")
        
                f.write(chunk)
    else:
        print("SMARTEOLE data found locally.")

    if not os.path.exists(filename[:-4]):
        print("Extracting SMARTEOLE zip file")
        with ZipFile(filename) as zipfile:
            zipfile.extractall("data")
    else:
        print("SMARTEOLE data already extracted locally.")
        
    print("\nList of SMARTEOLE files:")
    for f in os.listdir(os.path.join(filename[:-4])):
        print(f)

# download data from Zenodo
download_smarteole_data()

SMARTEOLE data not found locally. Beginning file download from Zenodo...
Extracting SMARTEOLE zip file

List of SMARTEOLE files:
SMARTEOLE_WakeSteering_NTF_SMV6_staticData.csv
SMARTEOLE_WakeSteering_SCADA_1minData.csv
SMARTEOLE_WakeSteering_Map.pdf
SMARTEOLE_WakeSteering_ControlLog_1minData.csv
SMARTEOLE_WakeSteering_WindCube_1minData.csv
SMARTEOLE_WakeSteering_correction_factors_SMV1237_staticData.csv
SMARTEOLE_WakeSteering_Coordinates_staticData.csv
SMARTEOLE_WakeSteering_ReadMe.xlsx
SMARTEOLE_WakeSteering_GuaranteedPowerCurve_staticData.csv


# Load SCADA data and format to common FLASC format
Next, we'll load the 1-minute resolution turbine-level SCADA and wake steering control log data and combine them into a single data frame df_scada. Then we'll format df_scada to port the dataframe into the common FLASC format. For example, wind speeds are columns denoted by ws_{ti}, with {ti} the turbine number with prevailing zeros. Hence, wind speed for the third turbine is defined by ws_002, and the power production of the thirteenth turbine is defined by pow_012.

For the SMARTEOLE data set, the SCADA data contain more variables than we'll need to demonstrate FLASC. Further, the data set contains the average, min., max., std. dev., and count for each variable, whereas we only need the averages. Therefore, we'll only keep the average power, wind speed, wind direction, and normal operation variables as well as the wind vane angle and target yaw offset for the turbine for which wake steering is implemented (SMV6 or turbine 005 in the FLASC convention).

In [3]:
def format_dataframes(df_scada):
    # Format columns and data. The operations required differ per dataset.

    # In FLORIS, turbines are numbered from 0 to nturbs - 1. In SCADA data,
    # turbines often have a different name. We save the mapping between
    # the turbine indices in FLORIS and the turbine names in the original SMARTEOLE data set 
    # to a separate .csv file.
    root_path = os.getcwd()
    out_path = os.path.join(root_path, "postprocessed")
    os.makedirs(out_path, exist_ok=True)
    turbine_names = ["SMV1", "SMV2", "SMV3", "SMV4", "SMV5", "SMV6", "SMV7"]
    pd.DataFrame({"turbine_names": turbine_names}).to_csv(
        os.path.join(out_path, "turbine_names.csv")
    )

    # Now map columns to conventional format
    scada_dict = {}
    for ii in range(len(turbine_names)):
        scada_dict.update(
            {
                "active_power_{:1d}_avg".format(ii+1): "pow_{:03d}".format(ii),  # We want to use the 'active' power production for our analysis in FLASC
                "wind_speed_{:1d}_avg".format(ii+1): "ws_{:03d}".format(ii),  # Turbine-felt wind speed. Ideally, this should be the freestream-equivalent wind speed at this turbine.
                "wind_direction_{:1d}_avg".format(ii+1): "wd_{:03d}".format(ii),  # Wind direction from the data. If this is not available, can approximate this with the nacelle heading.
                "derate_{:1d}".format(ii+1): "is_operation_normal_{:03d}".format(ii),
            }
        )

    df_list = []
    print("formatting dataframe...")
    df_scada = df_scada.rename(columns=scada_dict)

    # Convert  is_operation_normal columns from integers in original format (0: normal, 1: not normal)
    # to boolean FLASC convention (True: normal, False: not normal)
    for ii in range(len(turbine_names)):
        df_scada["is_operation_normal_{:03d}".format(ii)] = ~df_scada["is_operation_normal_{:03d}".format(ii)].astype(bool)
    
    # We'll also save the wind vane angle of the turbine for which wake steering is implemented, the 
    # target yaw offset from the wake steering controller, and the control mode (baseline or wake steering)
    scada_dict = {
        "wind_vane_6_avg": "wind_vane_005", 
        "control_log_offset_avg": "target_yaw_offset_005",
        "control_log_offset_active_avg": "control_mode",
    }
    df_scada = df_scada.rename(columns=scada_dict) # Simplify names and use FLASC zero indexing convention
    
    # The control mode is indicated as 0 for baseline and 1 for wake steering. Let's change this to a column of 
    # strings with values of "baseline" or "controlled". 
    df_scada["control_mode"] = df_scada["control_mode"].round()
    df_scada.loc[df_scada["control_mode"] == 0.0,"control_mode"] = "baseline"
    df_scada.loc[df_scada["control_mode"] == 1.0,"control_mode"] = "controlled"
    
    # Only keep columns we need
    cols_save = ["time"]
    cols_save += ["pow_{:03d}".format(ii) for ii in range(len(turbine_names))]
    cols_save += ["ws_{:03d}".format(ii) for ii in range(len(turbine_names))]
    cols_save += ["wd_{:03d}".format(ii) for ii in range(len(turbine_names))]
    cols_save += ["is_operation_normal_{:03d}".format(ii) for ii in range(len(turbine_names))]
    cols_save += ["wind_vane_005", "target_yaw_offset_005", "control_mode"]
    
    df_scada = df_scada[cols_save]
    
    # Reduce precision in dataframe to use half of the memory
    df_scada = dfm.df_reduce_precision(df_scada, verbose=True)

    # Sort dataframe and save
    df_scada = df_scada.sort_values(axis=0, by="time")
    df_scada = df_scada.reset_index(drop=True)

    return df_scada

# Open and combine 1-minute SCADA and control log files into single data frame
data_dir = os.path.join("data", "SMARTEOLE-WFC-open-dataset")

df_scada = pd.read_csv(os.path.join(data_dir, "SMARTEOLE_WakeSteering_SCADA_1minData.csv"))
df_ctrl = pd.read_csv(os.path.join(data_dir, "SMARTEOLE_WakeSteering_ControlLog_1minData.csv"))
df_scada = df_scada.merge(df_ctrl, how="inner", on="time")

# Convert strings to timestamps
df_scada["time"] = pd.to_datetime(df_scada["time"])

# Sort dataframe by time and fix any duplicates
df_scada = dfm.df_sort_and_fix_duplicates(df_scada)

print("Columns available in original df_scada:")
print(*list(df_scada.columns), sep="\n")

# format column names
df_scada_formatted = format_dataframes(df_scada)

print("\nColumns available in df_scada_formatted:")
print(*list(df_scada_formatted.columns), sep="\n")

Columns available in original df_scada:
time
active_power_1_avg
active_power_1_min
active_power_1_max
active_power_1_std
active_power_1_count
active_power_2_avg
active_power_2_min
active_power_2_max
active_power_2_std
active_power_2_count
active_power_3_avg
active_power_3_min
active_power_3_max
active_power_3_std
active_power_3_count
active_power_4_avg
active_power_4_min
active_power_4_max
active_power_4_std
active_power_4_count
active_power_5_avg
active_power_5_min
active_power_5_max
active_power_5_std
active_power_5_count
active_power_6_avg
active_power_6_min
active_power_6_max
active_power_6_std
active_power_6_count
active_power_7_avg
active_power_7_min
active_power_7_max
active_power_7_std
active_power_7_count
blade_1_pitch_angle_1_avg
blade_1_pitch_angle_1_min
blade_1_pitch_angle_1_max
blade_1_pitch_angle_1_std
blade_1_pitch_angle_1_count
blade_1_pitch_angle_2_avg
blade_1_pitch_angle_2_min
blade_1_pitch_angle_2_max
blade_1_pitch_angle_2_std
blade_1_pitch_angle_2_count
blade_1_pitc

# Compute the reference wind direction, wind speed, and power variables used in the SMARTEOLE experiment
Next, we'll compute the reference wind direction, wind speed, and power signals that were used to quantify the energy uplift from wake steering during the SMARTEOLE experiment, as described in:

Simley, E., Fleming, P., Girard, N., Alloin, L., Godefroy, E., and Duc, T., "Results from a wake-steering experiment at a commercial wind plant: investigating the wind speed dependence of wake-steering performance," *Wind Energy Science*, 6(6) 2021, https://doi.org/10.5194/wes-6-1427-2021.

Specifically, the reference variables are calculated using the average wind direction, wind speed, and power values from turbines SMV1, SMV2, SMV3, and SMV7 (turbines 000, 001, 002, and 006 in the FLASC convention), which generally experience freestream inflow for the wind directions of interest (195-200 degrees). Wind direction and wind speed-dependent correction factors are then applied to the reference wind speed and power values to remove biases from the wind speed and power of the controlled turbine SMV6 during baseline operation. Lastly, a correction is applied to the reference wind speed signal to better represent the freestream wind inflow using a nacelle transfer function derived from nacelle lidar measurements.

Note that FLASC uses the column names "wd", "ws", and "pow_ref" for the reference wind direction, wind speed, and power. To avoid interfering with these names, we'll call the reference variables used in the SMARTEOLE experiment "wd_smarteole", "ws_smarteole", and "pow_ref_smarteole".

In [4]:
def compute_reference_variables(df):
    """Computes reference wind direction, wind speed, and power from four
    reference turbines and applies precomputed correction factors. 
    
    Args:
        df (pd.DataFrame): SMARTEOLE SCADA data frame with 1-minute data for
            all turbines.

    Returns:
        df [pd.DataFrame]: Dataframe with added wd_smarteole, ws_smarteole, and pow_ref_smarteole columns.
    """

    data_dir = os.path.join("data", "SMARTEOLE-WFC-open-dataset")
    
    # Load correction factors to apply to reference wind speed and power as a
    # function of wind speed and direction
    df_crct = pd.read_csv(os.path.join(data_dir, "SMARTEOLE_WakeSteering_correction_factors_SMV1237_staticData.csv"))

    # Load nacelle transfer function to correct reference wind speed to freestream
    df_ntf = pd.read_csv(os.path.join(data_dir, "SMARTEOLE_WakeSteering_NTF_SMV6_staticData.csv"))

    # Calculate reference wind direction, wind speed, and power as average of
    # turbines SMV 1, 2, 3, and 7
    df["wd_smarteole"] = circ.calc_wd_mean_radial(df[["wd_{:03d}".format(ii) for ii in [0, 1, 2, 6]]],axis=1)
    df["ws_smarteole"] = df[["ws_{:03d}".format(ii) for ii in [0, 1, 2, 6]]].mean(axis=1)
    df["pow_ref_smarteole"] = df[["pow_{:03d}".format(ii) for ii in [0, 1, 2, 6]]].mean(axis=1)

    # Apply transfer functions to correct reference wind speed and power to 
    # match test turbine SMV6 in baseline operation. Note that corrections are
    # only provided for wind directions between 195 and 241 degrees, where wake
    # steering is analyzed.
    df["ws_round"] = df["ws_smarteole"].round()
    df["wd_round"] = df["wd_smarteole"].round()

    for i in range(len(df_crct)):
        wd = df_crct.iloc[i]["wind_direction_1237"]
        ws = df_crct.iloc[i]["wind_speed_1237"]
        df.loc[(df["wd_round"] == wd) & (df["ws_round"] == ws),"ws_smarteole"] *= df_crct.iloc[i]["wind_speed_correction_factor_1237"]
        df.loc[(df["wd_round"] == wd) & (df["ws_round"] == ws),"pow_ref_smarteole"] *= df_crct.iloc[i]["power_correction_factor_1237"]

    # Apply nacelle transfer function to correct reference wind speed to freestream
    df["ws_smarteole"] = np.interp(df["ws_smarteole"],df_ntf["wind_speed_6"],df_ntf["wind_speed_freestream"])

    # Drop temp columns
    df = df.drop(columns=["ws_round", "wd_round"])

    return df

# Compute reference wind direction, wind speed, and power
df_scada_formatted = compute_reference_variables(df_scada_formatted)

print("Final list of columns in df_scada_formatted:")
print(*list(df_scada_formatted.columns), sep="\n")

  sin_samp = sin((samples - low)*2.*pi / (high - low))
  cos_samp = cos((samples - low)*2.*pi / (high - low))


Final list of columns in df_scada_formatted:
time
pow_000
pow_001
pow_002
pow_003
pow_004
pow_005
pow_006
ws_000
ws_001
ws_002
ws_003
ws_004
ws_005
ws_006
wd_000
wd_001
wd_002
wd_003
wd_004
wd_005
wd_006
is_operation_normal_000
is_operation_normal_001
is_operation_normal_002
is_operation_normal_003
is_operation_normal_004
is_operation_normal_005
is_operation_normal_006
wind_vane_005
target_yaw_offset_005
control_mode
wd_smarteole
ws_smarteole
pow_ref_smarteole


# Save postprocessed data to a local file
Lastly, we'll save the formatted SCADA data frame locally. This data frame will be used as the starting point in the next example notebook.

In [5]:
root_path = os.getcwd()
fout = os.path.join(root_path, "postprocessed", "df_scada_60s_formatted.ftr")
df_scada_formatted.to_feather(fout)
print("Saved formatted SMARTEOLE SCADA dataset to: '{:s}'.".format(os.path.relpath(fout)))

Saved formatted SMARTEOLE SCADA dataset to: 'postprocessed/df_scada_60s_formatted.ftr'.
