# FLASC data format

Data used by FLASC adheres to the following conventions:

- `time` represents the time, preferably in UTC
-  turbines are sequentially numbered, starting from 0, and numbers are always 3 digits long (e.g. the "8th" turbine is represented as `007`)
-  `pow_000` represents the power output of turbine 0
-  `ws_000` represents the wind speed at turbine 0
-  `wd_000` represents the wind direction at turbine 0
-  `wd` represents the wind direction chosen for example to represent the overall inflow direction
-  `ws` represents the wind speed chosen for example to represent the overall inflow speed
-  `pow_ref` represents the power output of the reference turbine (or average of reference turbines)
-  `pow_test` represents the power output of the test turbine (or average of test turbines)

In [5]:
import pandas as pd

# This dataframe adhere's to FLASC's data formatting requirements and could be used for
# FLASC analysis
df = pd.DataFrame(
    {
        "time": [0, 1, 2, 3, 4, 5],
        "pow_000": [100, 100, 100, 100, 100, 100],
        "pow_001": [100, 100, 100, 100, 100, 100],
        "ws_000": [10, 10, 10, 10, 10, 10],
        "ws_001": [10, 10, 10, 10, 10, 10],
        "wd_000": [270, 270, 270, 270, 270, 270],
        "wd_001": [270, 270, 270, 270, 270, 270],
    }
)

# `FlascDataFrame`

FLASC has historically used a `pandas.DataFrame` to store the data to be processed, as demonstrated above.  Beginning in version 2.1, the `FlascDataFrame` class was introduced to provide additional methods and functionality to the data.  `FlascDataFrame` is a subclass of `pandas.DataFrame` and can be used in place of a `pandas.DataFrame`.  The following code cells provide an overview of the `FlascDataFrame` class and its methods.  Support is added for converting between "FLASC" style data formatting and "user" formats, to make adhering to FLASC's data formatting conventions more straightforward.

## Using FlascDataFrame

In [6]:
# The above pandas.DataFrame can be converted to a FlascDataFrame directly
from flasc import FlascDataFrame

fdf = FlascDataFrame(df)
print(fdf.head())

FlascDataFrame in FLASC format
   time  pow_000  pow_001  ws_000  ws_001  wd_000  wd_001
0     0      100      100      10      10     270     270
1     1      100      100      10      10     270     270
2     2      100      100      10      10     270     270
3     3      100      100      10      10     270     270
4     4      100      100      10      10     270     270


In [7]:
# The FlascDataFrame includes a few helper functions added to the base pandas dataframe.
# The following returns the number of turbines found in the dataframe.
print(fdf.n_turbines)

2


## Creating a FlascDataFrame from User Data

More value from a FlascDataFrame is obtained when using it convert back and forth between user-formatted data and Flasc Data.  

In [8]:
import numpy as np

# Suppose the we have a 3 turbine farm with turbines names 'TB01', 'TB02', 'TB03'
# For each turbine we have power, wind speed and wind direction data
# Assume that in the native data collection system,
# the signal names for each channel are given below

N = 20  # Number of data points

# Wind speeds
wind_speed_TB01 = np.random.rand(N) + 8.0
wind_speed_TB02 = np.random.rand(N) + 7.5
wind_speed_TB03 = np.random.rand(N) + 8.5

# Wind directions
wind_dir_TB01 = 10 * np.random.rand(N) + 270.0
wind_dir_TB02 = 10 * np.random.rand(N) + 270.0
wind_dir_TB03 = 10 * np.random.rand(N) + 270.0

# Power
power_TB01 = wind_speed_TB01**3
power_TB02 = wind_speed_TB02**3
power_TB03 = wind_speed_TB03**3

# Time
time = np.arange(N)

In [9]:
# Create a dictrionary storing this data, which could be used to instantiate a pandas.DataFrame
# or a FlascDataFrame
data_dict = {
    "time": time,
    "wind_speed_TB01": wind_speed_TB01,
    "wind_speed_TB02": wind_speed_TB02,
    "wind_speed_TB03": wind_speed_TB03,
    "wind_dir_TB01": wind_dir_TB01,
    "wind_dir_TB02": wind_dir_TB02,
    "wind_dir_TB03": wind_dir_TB03,
    "power_TB01": power_TB01,
    "power_TB02": power_TB02,
    "power_TB03": power_TB03,
}

The data is currently stored using the the channel and turbine names of the user. By supplying additional metadata to the FlascDataFrame, the data can be converted to and from the FLASC format.

In [10]:
# Declare a channel_name_map dictionary to map the signal names to the turbine names.
# The turbine numbers when 0-indexed in FLASC format should
# align with their numbering in the FLORIS model of the same farm.
channel_name_map = {
    "time": "time",
    "wind_speed_TB01": "ws_000",
    "wind_speed_TB02": "ws_001",
    "wind_speed_TB03": "ws_002",
    "wind_dir_TB01": "wd_000",
    "wind_dir_TB02": "wd_001",
    "wind_dir_TB03": "wd_002",
    "power_TB01": "pow_000",
    "power_TB02": "pow_001",
    "power_TB03": "pow_002",
}

We are now in a position to instantiate a `FlascDataFrame`

In [12]:
fdf = FlascDataFrame(data_dict, channel_name_map=channel_name_map)
print(fdf.head())

FlascDataFrame in user (wide) format
   time  wind_speed_TB01  wind_speed_TB02  wind_speed_TB03  wind_dir_TB01  \
0     0         8.863084         7.635063         8.560377     278.647284   
1     1         8.937316         7.986625         8.865439     275.910954   
2     2         8.523315         7.718569         8.507713     276.397628   
3     3         8.807743         7.940359         9.372092     278.976430   
4     4         8.099530         7.620465         9.386467     274.257194   

   wind_dir_TB02  wind_dir_TB03  power_TB01  power_TB02  power_TB03  
0     279.551985     278.088943  696.233092  445.079843  627.304990  
1     274.648224     278.969281  713.873585  509.436367  696.788230  
2     272.939598     277.226717  619.192306  459.843842  615.798372  
3     278.207210     275.532743  683.272356  500.634015  823.208129  
4     279.698878     277.059578  531.348418  442.531676  827.001823  


  self.channel_name_map = channel_name_map


Converting this to the FLASC format (and back) now simply requires calling the appropriate method. This makes it convenient to work with FLASC functions (that require the data to be in FLASC format) and user-provided functions (that may require the user's formatting) within the same workflow.

In [15]:
# Convert now into FLASC format (as a copy)
fdf_flasc = fdf.convert_to_flasc_format()
print(fdf_flasc.head(2))

print("\n\n")
# Convert back to user format (as a copy)
fdf_user = fdf_flasc.convert_to_user_format()
print(fdf_user.head(2))

print("\n\n")
# Conversions can also happen in place, if the inplace argument is set to True
fdf.convert_to_flasc_format(inplace=True)
print(fdf.head(2))
print("\n")
fdf.convert_to_user_format(inplace=True)
print(fdf.head(2))

FlascDataFrame in FLASC format
   time    ws_000    ws_001    ws_002      wd_000      wd_001      wd_002  \
0     0  8.863084  7.635063  8.560377  278.647284  279.551985  278.088943   
1     1  8.937316  7.986625  8.865439  275.910954  274.648224  278.969281   

      pow_000     pow_001    pow_002  
0  696.233092  445.079843  627.30499  
1  713.873585  509.436367  696.78823  



FlascDataFrame in user (wide) format
   time  wind_speed_TB01  wind_speed_TB02  wind_speed_TB03  wind_dir_TB01  \
0     0         8.863084         7.635063         8.560377     278.647284   
1     1         8.937316         7.986625         8.865439     275.910954   

   wind_dir_TB02  wind_dir_TB03  power_TB01  power_TB02  power_TB03  
0     279.551985     278.088943  696.233092  445.079843   627.30499  
1     274.648224     278.969281  713.873585  509.436367   696.78823  



FlascDataFrame in FLASC format
   time    ws_000    ws_001    ws_002      wd_000      wd_001      wd_002  \
0     0  8.863084  7.635063

## Converting Wide and Long

FlascDataFrame also provides methods to convert between wide and long formats.  FLASC's native format is always "wide", that is, each channel has its own column.  But `FlascDataFrame` can be used to convert to a user format that is "long" where each channel is a row in the dataframe.  

In [21]:
df = pd.DataFrame(
    {
        "time": time,
        "wind_speed_TB01": wind_speed_TB01,
        "wind_speed_TB02": wind_speed_TB02,
        "wind_speed_TB03": wind_speed_TB03,
        "wind_dir_TB01": wind_dir_TB01,
        "wind_dir_TB02": wind_dir_TB02,
        "wind_dir_TB03": wind_dir_TB03,
        "power_TB01": power_TB01,
        "power_TB02": power_TB02,
        "power_TB03": power_TB03,
    }
)

# Convert to "long" format; this is taken to be the user's desired format in this example.
df = pd.melt(df, id_vars=["time"], var_name="channel", value_name="value")
print(df)

     time          channel       value
0       0  wind_speed_TB01    8.863084
1       1  wind_speed_TB01    8.937316
2       2  wind_speed_TB01    8.523315
3       3  wind_speed_TB01    8.807743
4       4  wind_speed_TB01    8.099530
..    ...              ...         ...
175    15       power_TB03  649.226050
176    16       power_TB03  767.097096
177    17       power_TB03  796.813641
178    18       power_TB03  837.215661
179    19       power_TB03  631.403542

[180 rows x 3 columns]


In [17]:
# This time include in the specification of the FlascDataFrame the name of the
# columns of the long data
fdf = FlascDataFrame(
    df,
    channel_name_map=channel_name_map,
    long_data_columns={"variable_column": "channel", "value_column": "value"},
)
print(fdf.head())

  self.channel_name_map = channel_name_map


The data can still be converted to FLASC format (and back)

In [20]:
fdf_flasc = fdf.convert_to_flasc_format()
print(fdf_flasc.head(2))
print("\n\n")
fdf_user = fdf_flasc.convert_to_user_format()
print(fdf_user.head(2))

# As before, conversions can also happen in place, if the inplace argument is set to True

FlascDataFrame in FLASC format
   time     pow_000     pow_001    pow_002      wd_000      wd_001  \
0     0  696.233092  445.079843  627.30499  278.647284  279.551985   
1     1  713.873585  509.436367  696.78823  275.910954  274.648224   

       wd_002    ws_000    ws_001    ws_002  
0  278.088943  8.863084  7.635063  8.560377  
1  278.969281  8.937316  7.986625  8.865439  



FlascDataFrame in user (long) format
   time     channel       value
0     0  power_TB01  696.233092
1     0  power_TB02  445.079843


  self.channel_name_map = channel_name_map
  self.channel_name_map = channel_name_map


## Converting to wind-up format

Another use case for `FlascDataFrame` is to convert the data into the "wind-up" format.  [Wind-up](https://github.com/resgroup/wind-up) is an open source tool for assessing uplift provided by RES.  This conversion  provides a convenient way to assess the data, in the case of uplift assessment, using the wind-up tool, which is imported by FLASC.  A full demonstration of the usage of the wind-up tool in FLASC is provided within the [Smarteole](https://github.com/NREL/flasc/tree/main/examples_smarteole) example set.

In [13]:
fdf = fdf.convert_to_flasc_format()
print(fdf.convert_to_windup_format().head())

                       raw_ActivePowerMean  raw_YawAngleMean  \
TimeStamp_StartFormat                                          
0                               588.357466        273.281758   
1                               588.362729        272.688047   
2                               607.726306        270.352028   
3                               710.802621        276.258398   
4                               596.086376        279.982122   

                       raw_WindSpeedMean TurbineName  PitchAngleMean  \
TimeStamp_StartFormat                                                  
0                               8.379416         000               0   
1                               8.379441         000               0   
2                               8.470376         000               0   
3                               8.924482         000               0   
4                               8.415948         000               0   

                       GenRpmMean  raw_Shutdow

  self.channel_name_map = channel_name_map
