In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('Turbine_Data.csv')

In [3]:
df.columns

Index(['Unnamed: 0', 'ActivePower', 'AmbientTemperatue',
       'BearingShaftTemperature', 'Blade1PitchAngle', 'Blade2PitchAngle',
       'Blade3PitchAngle', 'ControlBoxTemperature',
       'GearboxBearingTemperature', 'GearboxOilTemperature', 'GeneratorRPM',
       'GeneratorWinding1Temperature', 'GeneratorWinding2Temperature',
       'HubTemperature', 'MainBoxTemperature', 'NacellePosition',
       'ReactivePower', 'RotorRPM', 'TurbineStatus', 'WTG', 'WindDirection',
       'WindSpeed'],
      dtype='object')

In [4]:
target_col = 'ActivePower'

In [5]:
df = df.rename({'AmbientTemperatue': 'AmbientTemperature', 'Unnamed: 0':'timestamp'}, axis=1)

In [6]:
input_df = df.drop(['timestamp', 'WTG', 'ActivePower'], axis=1)

In [7]:
input_df.columns

Index(['AmbientTemperature', 'BearingShaftTemperature', 'Blade1PitchAngle',
       'Blade2PitchAngle', 'Blade3PitchAngle', 'ControlBoxTemperature',
       'GearboxBearingTemperature', 'GearboxOilTemperature', 'GeneratorRPM',
       'GeneratorWinding1Temperature', 'GeneratorWinding2Temperature',
       'HubTemperature', 'MainBoxTemperature', 'NacellePosition',
       'ReactivePower', 'RotorRPM', 'TurbineStatus', 'WindDirection',
       'WindSpeed'],
      dtype='object')

In [8]:
prompt_template = 'Select the variables from the list that are most relevant for predicting <target_variable>. ' +\
                  'Provide the variables sorted starting with the one with the highest priority. ' +\
                  'All variables: <all_variables>\n' + \
                  '```json\n{"reasoning": "<your reasoning>", "selected_variables": ["variable 1", "variable 2", ..., "variable n"]}\n```'

In [9]:
prompt = prompt_template.replace('<all_variables>', ', '.join(input_df.columns))
prompt = prompt.replace('<target_variable>', 'the wind power that could be generated from the windmill')
print(prompt)

Select the variables from the list that are most relevant for predicting the wind power that could be generated from the windmill. Provide the variables sorted starting with the one with the highest priority. All variables: AmbientTemperature, BearingShaftTemperature, Blade1PitchAngle, Blade2PitchAngle, Blade3PitchAngle, ControlBoxTemperature, GearboxBearingTemperature, GearboxOilTemperature, GeneratorRPM, GeneratorWinding1Temperature, GeneratorWinding2Temperature, HubTemperature, MainBoxTemperature, NacellePosition, ReactivePower, RotorRPM, TurbineStatus, WindDirection, WindSpeed
```json
{"reasoning": "<your reasoning>", "selected_variables": ["variable 1", "variable 2", ..., "variable n"]}
```


In [10]:
preds_chatgpt = {
  "reasoning": "To predict the wind power generated from a windmill, the most critical factor is the kinetic energy available in the wind, which is primarily determined by wind speed. Rotor RPM and Generator RPM are key mechanical outputs directly driven by wind energy and influence power generation. Blade pitch angles (Blade1PitchAngle, Blade2PitchAngle, Blade3PitchAngle) affect how effectively the wind's energy is captured. Wind direction is important for optimizing turbine alignment. ReactivePower can reflect electrical behavior but is secondary. Other temperatures (e.g., GeneratorWinding, GearboxOil) indicate operational conditions and potential losses but are less directly tied to power output. TurbineStatus and WTG are identifiers or states that might influence filtering but are not continuous predictors. ControlBoxTemperature, BearingShaftTemperature, and similar are more related to system health monitoring.",
  "selected_variables": [
    "WindSpeed",
    "RotorRPM",
    "GeneratorRPM",
    "Blade1PitchAngle",
    "Blade2PitchAngle",
    "Blade3PitchAngle",
    "WindDirection",
    "ReactivePower",
    "GeneratorWinding1Temperature",
    "GeneratorWinding2Temperature",
    "GearboxOilTemperature",
    "GearboxBearingTemperature",
    "HubTemperature",
    "MainBoxTemperature",
    "ControlBoxTemperature",
    "BearingShaftTemperature"
  ]
}

In [11]:
df_all = input_df.copy()

In [12]:
df_all['ActivePower'] = df[target_col].shift(1)

In [13]:
all_corrs = df_all.corr().abs()

In [16]:
all_corrs['ActivePower'].sort_values(ascending=False)

ActivePower                     1.000000
GeneratorWinding2Temperature    0.944020
GeneratorWinding1Temperature    0.942895
WindSpeed                       0.920194
GeneratorRPM                    0.828287
GearboxOilTemperature           0.828254
RotorRPM                        0.827884
GearboxBearingTemperature       0.807685
ReactivePower                   0.699559
BearingShaftTemperature         0.647259
Blade1PitchAngle                0.359975
Blade3PitchAngle                0.357485
Blade2PitchAngle                0.357485
HubTemperature                  0.335836
MainBoxTemperature              0.089761
AmbientTemperature              0.074715
NacellePosition                 0.030422
WindDirection                   0.030422
TurbineStatus                   0.000612
ControlBoxTemperature                NaN
Name: ActivePower, dtype: float64

In [43]:
[round(all_corrs['ActivePower'][pred] * 100, 2) for pred in preds_chatgpt['selected_variables'][:5]]

[92.02, 82.79, 82.83, 36.0, 35.75]