# FlascDataFrame

FLASC has historically used a `pandas.DataFrame` to store the data to be processed.  Beginning in version 2.1, the `FlascDataFrame` class was introduced to provide additional methods and functionality to the data.  `FlascDataFrame` is a subclass of `pandas.DataFrame` and can be used in place of a `pandas.DataFrame`.  This notebook provides an overview of the `FlascDataFrame` class and its methods.  Support is added for converting between "flasc" style data formatting and "user" formats

## FlascFormat

FlascFormat of a dataframe where columns appear according to a certain convention:

- `time` represents the time, preferably in UTC
-  turbines are sequentially numbered, starting from 0, numbers are always 3 digits long
-  `pow_000` represents the power output of turbine 0
-  `ws_000` represents the wind speed at turbine 0
-  `wd_000` represents the wind direction at turbine 0
-  `wd` represents the wind direction chosen for example to represent the overall inflow direction
-  `ws` represents the wind speed chosen for example to represent the overall inflow speed
-  `pow_ref` represents the power output of the reference turbine (or average of reference turbines)
-  `pow_test` represents the power output of the test turbine (or average of test turbines)

In [1]:
# A dataframe to used in flasc initialized as a normal pandas dataframe
import pandas as pd

# This dataframe could be used for flasc functions
df = pd.DataFrame(
    {
        "time": [0, 1, 2, 3, 4, 5],
        "pow_000": [100, 100, 100, 100, 100, 100],
        "pow_001": [100, 100, 100, 100, 100, 100],
        "ws_000": [10, 10, 10, 10, 10, 10],
        "ws_001": [10, 10, 10, 10, 10, 10],
        "wd_000": [270, 270, 270, 270, 270, 270],
        "wd_001": [270, 270, 270, 270, 270, 270],
    }
)

## Using FlascDataFrame

In [2]:
# The above dataframe could be converted to a FlascDataFrame directly
from flasc import FlascDataFrame

fdf = FlascDataFrame(df)
print(fdf.head())

FlascDataFrame in FLASC format
   time  pow_000  pow_001  ws_000  ws_001  wd_000  wd_001
0     0      100      100      10      10     270     270
1     1      100      100      10      10     270     270
2     2      100      100      10      10     270     270
3     3      100      100      10      10     270     270
4     4      100      100      10      10     270     270


In [3]:
# The FlascDataFrame includes a few helper functions added to the base pandas dataframe
fdf.n_turbines

2

## Creating a FlascDataFrame from User Data

More value from a FlascDataFrame is obtained when using it convert back and forth between user-formatted data and Flasc Data.  

In [4]:
import numpy as np

# Suppose the we have a 3 turbine farm with turbines names 'TB01', 'TB02', 'TB03'
# For each turbine we have power, wind speed and wind direction data
# Assume that in the native data collection system,
# the signal names for each channel are given below

N = 20  # Number of data points

# Wind speeds
wind_speed_TB01 = np.random.rand(N) + 8.0
wind_speed_TB02 = np.random.rand(N) + 7.5
wind_speed_TB03 = np.random.rand(N) + 8.5

# Wind directions
wind_dir_TB01 = 10 * np.random.rand(N) + 270.0
wind_dir_TB02 = 10 * np.random.rand(N) + 270.0
wind_dir_TB03 = 10 * np.random.rand(N) + 270.0

# Power
power_TB01 = wind_speed_TB01**3
power_TB02 = wind_speed_TB02**3
power_TB03 = wind_speed_TB03**3

# Time
time = np.arange(N)

In [5]:
# Add this data to a pandas dataframe
df = pd.DataFrame(
    {
        "time": time,
        "wind_speed_TB01": wind_speed_TB01,
        "wind_speed_TB02": wind_speed_TB02,
        "wind_speed_TB03": wind_speed_TB03,
        "wind_dir_TB01": wind_dir_TB01,
        "wind_dir_TB02": wind_dir_TB02,
        "wind_dir_TB03": wind_dir_TB03,
        "power_TB01": power_TB01,
        "power_TB02": power_TB02,
        "power_TB03": power_TB03,
    }
)

The data is currently stored using the the channel and turbine names of the user, by supplying additional metadata to the FlascDataFrame, the data can be converted to and from the FlascFormat.  

In [6]:
# Declare a name_map dictionary to map the signal names to the turbine names.
# The turbine numbers when 0-indexed in FLASC format should
# align with their numbering in the FLORIS model of the same farm.
channel_name_map = {
    "time": "time",
    "wind_speed_TB01": "ws_000",
    "wind_speed_TB02": "ws_001",
    "wind_speed_TB03": "ws_002",
    "wind_dir_TB01": "wd_000",
    "wind_dir_TB02": "wd_001",
    "wind_dir_TB03": "wd_002",
    "power_TB01": "pow_000",
    "power_TB02": "pow_001",
    "power_TB03": "pow_002",
}

In [7]:
## Declare an instance of FlascDataFrame
fdf = FlascDataFrame(df, channel_name_map=channel_name_map)
print(fdf.head())

FlascDataFrame in user (wide) format
   time  wind_speed_TB01  wind_speed_TB02  wind_speed_TB03  wind_dir_TB01  \
0     0         8.463413         7.863936         8.776576     278.626071   
1     1         8.711924         8.330972         8.974587     273.507583   
2     2         8.748676         7.875490         9.253844     276.095074   
3     3         8.849804         8.470775         9.172530     275.670068   
4     4         8.931129         8.295958         8.535815     271.398405   

   wind_dir_TB02  wind_dir_TB03  power_TB01  power_TB02  power_TB03  
0     272.173679     270.231054  606.228762  486.317571  676.044609  
1     278.969616     273.886149  661.214183  578.211855  722.842070  
2     275.958102     279.953554  669.617729  488.464150  792.440237  
3     272.821422     278.974526  693.108030  607.812130  771.733565  
4     277.497031     279.340766  712.392037  570.952022  621.920747  


  self.channel_name_map = channel_name_map


In [8]:
# Convert now into flasc format
fdf_flasc = fdf.convert_to_flasc_format()
print(fdf_flasc.head())

FlascDataFrame in FLASC format
   time    ws_000    ws_001    ws_002      wd_000      wd_001      wd_002  \
0     0  8.463413  7.863936  8.776576  278.626071  272.173679  270.231054   
1     1  8.711924  8.330972  8.974587  273.507583  278.969616  273.886149   
2     2  8.748676  7.875490  9.253844  276.095074  275.958102  279.953554   
3     3  8.849804  8.470775  9.172530  275.670068  272.821422  278.974526   
4     4  8.931129  8.295958  8.535815  271.398405  277.497031  279.340766   

      pow_000     pow_001     pow_002  
0  606.228762  486.317571  676.044609  
1  661.214183  578.211855  722.842070  
2  669.617729  488.464150  792.440237  
3  693.108030  607.812130  771.733565  
4  712.392037  570.952022  621.920747  


## Converting Wide and Long

FlascDataFrame also provides methods to convert between wide and long formats.  FlascFormat is always "wide", that is each channel has it's own column.  But `FlascDataFrame` can be used
to convert to a user format that is "long" where each channel is a row in the dataframe.  

In [12]:
# Convert the user data into a long format
df = pd.DataFrame(
    {
        "time": time,
        "wind_speed_TB01": wind_speed_TB01,
        "wind_speed_TB02": wind_speed_TB02,
        "wind_speed_TB03": wind_speed_TB03,
        "wind_dir_TB01": wind_dir_TB01,
        "wind_dir_TB02": wind_dir_TB02,
        "wind_dir_TB03": wind_dir_TB03,
        "power_TB01": power_TB01,
        "power_TB02": power_TB02,
        "power_TB03": power_TB03,
    }
)

df = pd.melt(df, id_vars=["time"], var_name="channel", value_name="value")
print(df.head())
print(df.tail())

   time          channel     value
0     0  wind_speed_TB01  8.463413
1     1  wind_speed_TB01  8.711924
2     2  wind_speed_TB01  8.748676
3     3  wind_speed_TB01  8.849804
4     4  wind_speed_TB01  8.931129
     time     channel       value
175    15  power_TB03  747.868257
176    16  power_TB03  804.574705
177    17  power_TB03  645.420723
178    18  power_TB03  726.957312
179    19  power_TB03  805.571713


In [14]:
# This time include in the specification of the FlascDataFrame the name of the
# columns of the long data
fdf = FlascDataFrame(
    df,
    channel_name_map=channel_name_map,
    long_data_columns={"variable_column": "channel", "value_column": "value"},
)

  self.channel_name_map = channel_name_map


In [15]:
print(fdf.head())

FlascDataFrame in user (long) format
   time          channel     value
0     0  wind_speed_TB01  8.463413
1     1  wind_speed_TB01  8.711924
2     2  wind_speed_TB01  8.748676
3     3  wind_speed_TB01  8.849804
4     4  wind_speed_TB01  8.931129


In [16]:
print(fdf.convert_to_flasc_format().head())

FlascDataFrame in FLASC format
   time     pow_000     pow_001     pow_002      wd_000      wd_001  \
0     0  606.228762  486.317571  676.044609  278.626071  272.173679   
1     1  661.214183  578.211855  722.842070  273.507583  278.969616   
2     2  669.617729  488.464150  792.440237  276.095074  275.958102   
3     3  693.108030  607.812130  771.733565  275.670068  272.821422   
4     4  712.392037  570.952022  621.920747  271.398405  277.497031   

       wd_002    ws_000    ws_001    ws_002  
0  270.231054  8.463413  7.863936  8.776576  
1  273.886149  8.711924  8.330972  8.974587  
2  279.953554  8.748676  7.875490  9.253844  
3  278.974526  8.849804  8.470775  9.172530  
4  279.340766  8.931129  8.295958  8.535815  


  self.channel_name_map = channel_name_map


## Converting to wind-up format

A final use case for `FlascDataFrame` is to convert the data into the "wind-up" format.  [Wind-up](https://github.com/resgroup/wind-up) is an open source tool for assessing uplift provided by RES.  This conversion  provides a convenient way to assess the data, in the case of uplift assessment, using the wind-up tool, which is imported by FLASC.  A full demonstration of the usage of the wind-up tool in FLASC is provided within the [Smarteole](https://github.com/NREL/flasc/tree/main/examples_smarteole) example set.

In [18]:
fdf = fdf.convert_to_flasc_format()
print(fdf.convert_to_windup_format().head())

                       raw_ActivePowerMean  raw_YawAngleMean  \
TimeStamp_StartFormat                                          
0                               606.228762        278.626071   
1                               661.214183        273.507583   
2                               669.617729        276.095074   
3                               693.108030        275.670068   
4                               712.392037        271.398405   

                       raw_WindSpeedMean TurbineName  PitchAngleMean  \
TimeStamp_StartFormat                                                  
0                               8.463413         000               0   
1                               8.711924         000               0   
2                               8.748676         000               0   
3                               8.849804         000               0   
4                               8.931129         000               0   

                       GenRpmMean  raw_Shutdow

  self.channel_name_map = channel_name_map
