### pcoe17

# **Turbofan Engine Dataset Variable Descriptions**

### **Variables in `A_var` (Auxiliary Data)**
| Variable  | Description                                                                                     | Usage                                       |
|-----------|-------------------------------------------------------------------------------------------------|---------------------------------------------|
| `unit`    | Identifier for a specific engine or unit. Each unit represents an individual engine in the dataset. | Tracks the operational history of each engine. |
| `cycle`   | Operational cycle count for the engine. Effectively the "time index" for each engine.           | Indicates how far along the engine is in its lifecycle. |
| `Fc`      | Flight condition. Represents a discrete operating condition, such as altitude band or mission phase. | Differentiates between operational regimes. |
| `hs`      | Health state or status. Indicates whether the engine is in a nominal or degraded state.         | Useful for classifying engine behavior.     |

---

### **Variables in `T_var` (Engine Degradation Factors)**
| Variable        | Description                                                                                      | Usage                                                        |
|------------------|--------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
| `fan_eff_mod`    | Efficiency modifier for the fan.                                                                | Affects the fan's performance, influencing airflow and thrust. |
| `fan_flow_mod`   | Flow modifier for the fan.                                                                      | Impacts the amount of air the fan moves through the engine.   |
| `LPC_eff_mod`    | Efficiency modifier for the Low-Pressure Compressor (LPC).                                      | Affects how efficiently the LPC compresses incoming air.      |
| `LPC_flow_mod`   | Flow modifier for the LPC.                                                                      | Adjusts the airflow through the LPC.                         |
| `HPC_eff_mod`    | Efficiency modifier for the High-Pressure Compressor (HPC).                                     | Impacts the high-pressure stage of air compression.           |
| `HPC_flow_mod`   | Flow modifier for the HPC.                                                                      | Controls the volume of air processed in the HPC.              |
| `HPT_eff_mod`    | Efficiency modifier for the High-Pressure Turbine (HPT).                                        | Affects the turbine's ability to extract energy from high-pressure exhaust gases. |
| `HPT_flow_mod`   | Flow modifier for the HPT.                                                                      | Adjusts the amount of exhaust gas processed by the HPT.       |
| `LPT_eff_mod`    | Efficiency modifier for the Low-Pressure Turbine (LPT).                                         | Determines the energy extraction efficiency at the low-pressure stage. |
| `LPT_flow_mod`   | Flow modifier for the LPT.                                                                      | Regulates the exhaust gas flow through the LPT.               |

---

### **Variables in `X_s_var` (Sensor Measurements)**
| Variable | Description                                                                                 | Units             | Usage                                                 |
|----------|---------------------------------------------------------------------------------------------|-------------------|-------------------------------------------------------|
| `T24`    | Total temperature at the LPC inlet.                                                        | Degrees Rankine (°R) | Indicates ambient air temperature before compression. |
| `T30`    | Total temperature at the HPC inlet.                                                        |                   | Reflects the air temperature after the LPC stage.     |
| `T48`    | Total temperature at the HPT outlet.                                                       |                   | Indicates the exhaust gas temperature after the turbine. |
| `T50`    | Total temperature at the LPT outlet.                                                       |                   | Reflects the final exhaust gas temperature.           |
| `P15`    | Total pressure in the bypass duct.                                                         | Pounds per square inch (psi) | Measures pressure in the fan bypass flow.             |
| `P2`     | Total pressure at the LPC inlet.                                                           |                   | Indicates the intake pressure.                        |
| `P21`    | Total pressure at the LPC outlet.                                                          |                   | Reflects the pressure after the first compression stage. |
| `P24`    | Total pressure at the HPC inlet.                                                           |                   | Measures pressure after the LPC stage.                |
| `Ps30`   | Static pressure at the HPC outlet.                                                         |                   | Indicates pressure in the compressed air ready for combustion. |
| `P40`    | Total pressure at the combustor outlet.                                                    |                   | Measures the pressure of combustion gases.            |
| `P50`    | Total pressure at the LPT outlet.                                                          |                   | Reflects exhaust pressure.                            |
| `Nf`     | Physical speed of the fan.                                                                 | Revolutions per minute (rpm) | Tracks fan rotational speed.                          |
| `Nc`     | Core rotational speed.                                                                     |                   | Indicates the rpm of the engine's core.               |
| `Wf`     | Fuel flow rate.                                                                            | Pounds per second (lb/s) | Tracks how much fuel is being burned.                |
| `T40`    | Total temperature at the combustor outlet.                                                 |                   | Represents combustion efficiency.                     |
| `P30`    | Total pressure at the HPC outlet.                                                          |                   | Affects air-fuel mixing and combustion.               |
| `P45`    | Total pressure at the HPT inlet.                                                           |                   | Reflects pressure entering the turbine.               |
| `W21`, `W22`, `W25`, `W31`, `W32`, `W48`, `W50` | Airflow rates at different engine stages.                              | Pounds per second (lb/s) | Measures how much air is being processed at each component (LPC, HPC, turbines, etc.). |
| `SmFan`, `SmLPC`, `SmHPC` | Corrected speeds (smoothed) for the fan, LPC, and HPC.                                    |                   | Reflects component speed, adjusted for varying conditions. |
| `phi`    | Angle or throttle resolver angle.                                                         |                   | Represents the throttle position.                     |

---

### **Variable in `Y_dev`**
| Variable | Description                                   | Units     | Usage                            |
|----------|-----------------------------------------------|-----------|----------------------------------|
| `Y`      | Remaining Useful Life (RUL).                 | Cycles    | Target variable for prediction. |

--- 


### Example data loading

In [3]:
import h5py

file_path = 'datasets/pcoe/17. Turbofan Engine Degradation Simulation Data Set 2/N-CMAPSS_DS01-005.h5'

with h5py.File(file_path, 'r') as f:
    def print_structure(name, obj):
        print(name, ":", obj)
    f.visititems(print_structure)

A_dev : <HDF5 dataset "A_dev": shape (4906636, 4), type "<f8">
A_test : <HDF5 dataset "A_test": shape (2735232, 4), type "<f8">
A_var : <HDF5 dataset "A_var": shape (4,), type "|S5">
T_dev : <HDF5 dataset "T_dev": shape (4906636, 10), type "<f8">
T_test : <HDF5 dataset "T_test": shape (2735232, 10), type "<f8">
T_var : <HDF5 dataset "T_var": shape (10,), type "|S12">
W_dev : <HDF5 dataset "W_dev": shape (4906636, 4), type "<f8">
W_test : <HDF5 dataset "W_test": shape (2735232, 4), type "<f8">
W_var : <HDF5 dataset "W_var": shape (4,), type "|S4">
X_s_dev : <HDF5 dataset "X_s_dev": shape (4906636, 14), type "<f8">
X_s_test : <HDF5 dataset "X_s_test": shape (2735232, 14), type "<f8">
X_s_var : <HDF5 dataset "X_s_var": shape (14,), type "|S4">
X_v_dev : <HDF5 dataset "X_v_dev": shape (4906636, 14), type "<f8">
X_v_test : <HDF5 dataset "X_v_test": shape (2735232, 14), type "<f8">
X_v_var : <HDF5 dataset "X_v_var": shape (14,), type "|S5">
Y_dev : <HDF5 dataset "Y_dev": shape (4906636, 1), 

In [6]:
import pandas as pd

with h5py.File(file_path, 'r') as f:
    dataset = f['A_dev'][:]
    df = pd.DataFrame(dataset)
    print(df.head())

In [7]:
import h5py

file_path = 'datasets/pcoe/17. Turbofan Engine Degradation Simulation Data Set 2/N-CMAPSS_DS01-005.h5'

with h5py.File(file_path, 'r') as f:
    a_var = [x.decode() for x in f['A_var'][:]]
    t_var = [x.decode() for x in f['T_var'][:]]
    print("A_var:", a_var)
    print("T_var:", t_var)


A_var: ['unit', 'cycle', 'Fc', 'hs']
T_var: ['fan_eff_mod', 'fan_flow_mod', 'LPC_eff_mod', 'LPC_flow_mod', 'HPC_eff_mod', 'HPC_flow_mod', 'HPT_eff_mod', 'HPT_flow_mod', 'LPT_eff_mod', 'LPT_flow_mod']


In [8]:
import pandas as pd
import h5py

file_path = 'datasets/pcoe/17. Turbofan Engine Degradation Simulation Data Set 2/N-CMAPSS_DS01-005.h5'

with h5py.File(file_path, 'r') as f:
    a_dev_data = f['A_dev'][:]
    a_var = [x.decode() for x in f['A_var'][:]]  # Get variable names
    
    df_a_dev = pd.DataFrame(a_dev_data, columns=a_var)
    print(df_a_dev.head())

   unit  cycle   Fc   hs
0   1.0    1.0  1.0  1.0
1   1.0    1.0  1.0  1.0
2   1.0    1.0  1.0  1.0
3   1.0    1.0  1.0  1.0
4   1.0    1.0  1.0  1.0


In [9]:
with h5py.File(file_path, 'r') as f:
    x_s_dev = f['X_s_dev'][:]
    x_s_var = [x.decode() for x in f['X_s_var'][:]]

    x_v_dev = f['X_v_dev'][:]
    x_v_var = [x.decode() for x in f['X_v_var'][:]]

    y_dev = f['Y_dev'][:]

    df_x_s = pd.DataFrame(x_s_dev, columns=x_s_var)
    df_x_v = pd.DataFrame(x_v_dev, columns=x_v_var)
    df_y = pd.DataFrame(y_dev, columns=['Y'])

    df_combined = pd.concat([df_x_s, df_x_v, df_y], axis=1)
    print(df_combined.head())

          T24          T30          T48          T50        P15         P2  \
0  618.288596  1470.469798  1849.620676  1269.275585  19.432070  14.484611   
1  618.296355  1470.415593  1849.519871  1269.177159  19.431385  14.484683   
2  618.336514  1470.453853  1849.566139  1269.167353  19.435163  14.488224   
3  618.302173  1470.650929  1850.195069  1269.518670  19.426003  14.477632   
4  618.345228  1470.640421  1849.950988  1269.253972  19.427484  14.478114   

         P21        P24        Ps30         P40  ...         W25        W31  \
0  19.727990  24.410990  394.701872  401.205188  ...  269.293218  31.319011   
1  19.727295  24.410483  394.629899  401.132851  ...  269.254244  31.314408   
2  19.731130  24.415476  394.667850  401.171401  ...  269.276125  31.316992   
3  19.721830  24.406544  394.773533  401.272707  ...  269.296081  31.319350   
4  19.723334  24.410159  394.732158  401.234620  ...  269.290780  31.318723   

         W32         W48         W50      SmFan     SmLP

In [10]:
df_combined.to_csv('processed_data.csv', index=False)

In [11]:
dff = pd.read_csv('processed_data.csv')

In [12]:
dff.head()

Unnamed: 0,T24,T30,T48,T50,P15,P2,P21,P24,Ps30,P40,...,W25,W31,W32,W48,W50,SmFan,SmLPC,SmHPC,phi,Y
0,618.288596,1470.469798,1849.620676,1269.275585,19.43207,14.484611,19.72799,24.41099,394.701872,401.205188,...,269.293218,31.319011,18.791407,255.174279,269.985138,15.879671,9.925145,25.648148,42.152925,99
1,618.296355,1470.415593,1849.519871,1269.177159,19.431385,14.484683,19.727295,24.410483,394.629899,401.132851,...,269.254244,31.314408,18.788645,255.135216,269.944204,15.888485,9.916771,25.651719,42.150936,99
2,618.336514,1470.453853,1849.566139,1269.167353,19.435163,14.488224,19.73113,24.415476,394.66785,401.171401,...,269.276125,31.316992,18.790195,255.156706,269.966648,15.902505,9.911186,25.652633,42.151475,99
3,618.302173,1470.650929,1850.195069,1269.51867,19.426003,14.477632,19.72183,24.406544,394.773533,401.272707,...,269.296081,31.31935,18.79161,255.179231,269.989313,15.889568,9.927131,25.632641,42.169738,99
4,618.345228,1470.640421,1849.950988,1269.253972,19.427484,14.478114,19.723334,24.410159,394.732158,401.23462,...,269.29078,31.318723,18.791234,255.172343,269.983253,15.895957,9.91687,25.644562,42.160144,99


In [15]:
dff.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4906636 entries, 0 to 4906635
Data columns (total 29 columns):
 #   Column  Dtype  
---  ------  -----  
 0   T24     float64
 1   T30     float64
 2   T48     float64
 3   T50     float64
 4   P15     float64
 5   P2      float64
 6   P21     float64
 7   P24     float64
 8   Ps30    float64
 9   P40     float64
 10  P50     float64
 11  Nf      float64
 12  Nc      float64
 13  Wf      float64
 14  T40     float64
 15  P30     float64
 16  P45     float64
 17  W21     float64
 18  W22     float64
 19  W25     float64
 20  W31     float64
 21  W32     float64
 22  W48     float64
 23  W50     float64
 24  SmFan   float64
 25  SmLPC   float64
 26  SmHPC   float64
 27  phi     float64
 28  Y       int64  
dtypes: float64(28), int64(1)
memory usage: 1.1 GB


In [16]:
with h5py.File(file_path, 'r') as f:
    x_s_dev = f['X_s_dev'][:]
    x_s_var = [x.decode() for x in f['X_s_var'][:]]

    x_v_dev = f['X_v_dev'][:]
    x_v_var = [x.decode() for x in f['X_v_var'][:]]

    y_dev = f['Y_dev'][:]

    a_dev = f['A_dev'][:]
    a_var = [x.decode() for x in f['A_var'][:]]

    df_x_s = pd.DataFrame(x_s_dev, columns=x_s_var)
    df_x_v = pd.DataFrame(x_v_dev, columns=x_v_var)
    df_y = pd.DataFrame(y_dev, columns=['Y'])
    df_a = pd.DataFrame(a_dev, columns=a_var)  # Include A_dev data

    df_combined = pd.concat([df_a, df_x_s, df_x_v, df_y], axis=1)

print(df_combined.head())


   unit  cycle   Fc   hs         T24          T30          T48          T50  \
0   1.0    1.0  1.0  1.0  618.288596  1470.469798  1849.620676  1269.275585   
1   1.0    1.0  1.0  1.0  618.296355  1470.415593  1849.519871  1269.177159   
2   1.0    1.0  1.0  1.0  618.336514  1470.453853  1849.566139  1269.167353   
3   1.0    1.0  1.0  1.0  618.302173  1470.650929  1850.195069  1269.518670   
4   1.0    1.0  1.0  1.0  618.345228  1470.640421  1849.950988  1269.253972   

         P15         P2  ...         W25        W31        W32         W48  \
0  19.432070  14.484611  ...  269.293218  31.319011  18.791407  255.174279   
1  19.431385  14.484683  ...  269.254244  31.314408  18.788645  255.135216   
2  19.435163  14.488224  ...  269.276125  31.316992  18.790195  255.156706   
3  19.426003  14.477632  ...  269.296081  31.319350  18.791610  255.179231   
4  19.427484  14.478114  ...  269.290780  31.318723  18.791234  255.172343   

          W50      SmFan     SmLPC      SmHPC        phi

### Column List:
df_combined DataFrame will have the following structure:

- Metadata (from A_var):
  unit, cycle, Fc, hs

- Sensor Variables (from X_s_var and X_v_var):
  T24, T30, T48, ..., phi

- Target Variable:
  Y