# Bayesian Neural Networks to Predict Hard Landing with DASHlink Data
Authors: Dr. Yingxiao Kong, Vanderbilt University

Email: yingxiao.kong@vanderbit.edu

## Overview of Research
In this work, we use an open-source dataset - [NASA's DASHlink data](https://c3.ndc.nasa.gov/dashlink/) - to isolate data for landing aircraft that both have hard landing and normal landing occurrences. The objective is to use [this sample data](https://c3.ndc.nasa.gov/dashlink/projects/85/resources/?type=ds) to train a Bayesian Neural Network model to predict touchdown vertical speed for a landing aircraft with the intent to use as a screening for identifying hard landing events before they occur.

This series of Jupyter notebook demonstrations into 3 modules. The presented module is in **bold**:
- Module 1 - Download DASHlink Data
- **Module 2 - DASHlink Data Pre-Processing and Feature Selection with Maximum Relevance and Minimum Reduandancy (mRMR)**
- Module 3 - Bayesian Neural Network Model Training

## Module 2: DASHlink Data Pre-Processing and Parameter Selection
This is a demonstration of how to filter, standardize, and clean the DASHlink data downloaded and viewed in Module 1. In addition, we select the features most relevant to hard landing event.


## Installing the required Python packages

The required Python packages for this module are:
- ***```pandas```***
- ***```numpy```***
- ***```scipy```***
- ***```matplotlib```***

In [3]:
import numpy as np
import pandas as pd

## Step 1: Isolate Data Landing at MSP

### Step 1a: Get all downloaded ```.mat``` files

In [4]:
import glob
OUTPUT_DIRECTORY = r'../../../dashlink-data'
downloaded_mat_files = glob.glob(OUTPUT_DIRECTORY+'/**/*.mat',recursive=True)
downloaded_mat_files

['../../../dashlink-data/Tail_687_8/687200312261442.mat',
 '../../../dashlink-data/Tail_687_8/687200403210418.mat',
 '../../../dashlink-data/Tail_687_8/687200312300437.mat',
 '../../../dashlink-data/Tail_687_8/687200309090752.mat',
 '../../../dashlink-data/Tail_687_8/687200403162255.mat',
 '../../../dashlink-data/Tail_687_8/687200403191335.mat',
 '../../../dashlink-data/Tail_687_8/687200311301726.mat',
 '../../../dashlink-data/Tail_687_8/687200402021225.mat',
 '../../../dashlink-data/Tail_687_8/687200312241558.mat',
 '../../../dashlink-data/Tail_687_8/687200401281006.mat',
 '../../../dashlink-data/Tail_687_8/687200312180425.mat',
 '../../../dashlink-data/Tail_687_8/687200402081856.mat',
 '../../../dashlink-data/Tail_687_8/687200403211154.mat',
 '../../../dashlink-data/Tail_687_8/687200403130700.mat',
 '../../../dashlink-data/Tail_687_8/687200310291853.mat',
 '../../../dashlink-data/Tail_687_8/687200402010708.mat',
 '../../../dashlink-data/Tail_687_8/687200309111920.mat',
 '../../../das

### Step 1b: Isolate landing data at MSP at specified heights
According to the DASHlink website, the ```PH``` enumerated codes are: 
- 0=Unknown
- 1=Preflight
- 2=Taxi
- 3=Takeoff
- 4=Climb
- 5=Cruise
- 6=Approach
- 7=Rollout

Here we are interested in ```PH = 7``` at heights of 200,100,50,40,30,20,10,8,6,4,2, and 0 feet above landing altitude. 

In [5]:
PHASE_NO = 7
MSP_AIRPORT_LAT_LON = [44.88526995556498, -93.2015923365669]
HEIGHT_LIST = np.array([200,100,50,40,30,20,10,8,6,4,2,0])

In [6]:
from dataUtils import DASHlinkData

key_list_25=['LATP','LONP','MSQT_1','BAL1','TAS','GS','TH','FLAP','GLS','LOC','N1_1','PTCH','ROLL','TRK','AIL_1','RUDD','ELEV_1',\
         'BLAC','CTAC','FPAC','CCPC','CWPC','WS','WD','ALTR']

SUBSET_FOR_DEMO = 5376
dfs = []

for i,mat_file in enumerate(downloaded_mat_files[:SUBSET_FOR_DEMO]):
    print("Processing {} of {} .mat files.".format(i+1,len(downloaded_mat_files)))
    dl_data = DASHlinkData(mat_file)
    if dl_data.contains_phase_no(phase=PHASE_NO):
        if dl_data.lands_at_airport(airport_lat_lon=MSP_AIRPORT_LAT_LON):
            for key in key_list_25:
                dl_data.temporal_resample_to_4_seconds(key)
            df_new = dl_data.get_data_at_heights_in_ft(HEIGHT_LIST)
            dfs.append(df_new)

df_landing = pd.concat(dfs)

Processing 1 of 5376 .mat files.
Processing 2 of 5376 .mat files.
Processing 3 of 5376 .mat files.
Processing 4 of 5376 .mat files.
Processing 5 of 5376 .mat files.
Processing 6 of 5376 .mat files.
Processing 7 of 5376 .mat files.
Processing 8 of 5376 .mat files.
Processing 9 of 5376 .mat files.
Processing 10 of 5376 .mat files.
Processing 11 of 5376 .mat files.
Processing 12 of 5376 .mat files.
Processing 13 of 5376 .mat files.
Processing 14 of 5376 .mat files.
Processing 15 of 5376 .mat files.
Processing 16 of 5376 .mat files.
Processing 17 of 5376 .mat files.
Processing 18 of 5376 .mat files.
Processing 19 of 5376 .mat files.
Processing 20 of 5376 .mat files.
Processing 21 of 5376 .mat files.
Processing 22 of 5376 .mat files.
Processing 23 of 5376 .mat files.
Processing 24 of 5376 .mat files.
Processing 25 of 5376 .mat files.
Processing 26 of 5376 .mat files.
Processing 27 of 5376 .mat files.
Processing 28 of 5376 .mat files.
Processing 29 of 5376 .mat files.
Processing 30 of 5376 .

Processing 241 of 5376 .mat files.
Processing 242 of 5376 .mat files.
Processing 243 of 5376 .mat files.
Processing 244 of 5376 .mat files.
Processing 245 of 5376 .mat files.
Processing 246 of 5376 .mat files.
Processing 247 of 5376 .mat files.
Processing 248 of 5376 .mat files.
Processing 249 of 5376 .mat files.
Processing 250 of 5376 .mat files.
Processing 251 of 5376 .mat files.
Processing 252 of 5376 .mat files.
Processing 253 of 5376 .mat files.
Processing 254 of 5376 .mat files.
Processing 255 of 5376 .mat files.
Processing 256 of 5376 .mat files.
Processing 257 of 5376 .mat files.
Processing 258 of 5376 .mat files.
Processing 259 of 5376 .mat files.
Processing 260 of 5376 .mat files.
Processing 261 of 5376 .mat files.
Processing 262 of 5376 .mat files.
Processing 263 of 5376 .mat files.
Processing 264 of 5376 .mat files.
Processing 265 of 5376 .mat files.
Processing 266 of 5376 .mat files.
Processing 267 of 5376 .mat files.
Processing 268 of 5376 .mat files.
Processing 269 of 53

Processing 478 of 5376 .mat files.
Processing 479 of 5376 .mat files.
Processing 480 of 5376 .mat files.
Processing 481 of 5376 .mat files.
Processing 482 of 5376 .mat files.
Processing 483 of 5376 .mat files.
Processing 484 of 5376 .mat files.
Processing 485 of 5376 .mat files.
Processing 486 of 5376 .mat files.
Processing 487 of 5376 .mat files.
Processing 488 of 5376 .mat files.
Processing 489 of 5376 .mat files.
Processing 490 of 5376 .mat files.
Processing 491 of 5376 .mat files.
Processing 492 of 5376 .mat files.
Processing 493 of 5376 .mat files.
Processing 494 of 5376 .mat files.
Processing 495 of 5376 .mat files.
Processing 496 of 5376 .mat files.
Processing 497 of 5376 .mat files.
Processing 498 of 5376 .mat files.
Processing 499 of 5376 .mat files.
Processing 500 of 5376 .mat files.
Processing 501 of 5376 .mat files.
Processing 502 of 5376 .mat files.
Processing 503 of 5376 .mat files.
Processing 504 of 5376 .mat files.
Processing 505 of 5376 .mat files.
Processing 506 of 53

Processing 713 of 5376 .mat files.
Processing 714 of 5376 .mat files.
Processing 715 of 5376 .mat files.
Processing 716 of 5376 .mat files.
Processing 717 of 5376 .mat files.
Processing 718 of 5376 .mat files.
Processing 719 of 5376 .mat files.
Processing 720 of 5376 .mat files.
Processing 721 of 5376 .mat files.
Processing 722 of 5376 .mat files.
Processing 723 of 5376 .mat files.
Processing 724 of 5376 .mat files.
Processing 725 of 5376 .mat files.
Processing 726 of 5376 .mat files.
Processing 727 of 5376 .mat files.
Processing 728 of 5376 .mat files.
Processing 729 of 5376 .mat files.
Processing 730 of 5376 .mat files.
Processing 731 of 5376 .mat files.
Processing 732 of 5376 .mat files.
Processing 733 of 5376 .mat files.
Processing 734 of 5376 .mat files.
Processing 735 of 5376 .mat files.
Processing 736 of 5376 .mat files.
Processing 737 of 5376 .mat files.
Processing 738 of 5376 .mat files.
Processing 739 of 5376 .mat files.
Processing 740 of 5376 .mat files.
Processing 741 of 53

Processing 949 of 5376 .mat files.
Processing 950 of 5376 .mat files.
Processing 951 of 5376 .mat files.
Processing 952 of 5376 .mat files.
Processing 953 of 5376 .mat files.
Processing 954 of 5376 .mat files.
Processing 955 of 5376 .mat files.
Processing 956 of 5376 .mat files.
Processing 957 of 5376 .mat files.
Processing 958 of 5376 .mat files.
Processing 959 of 5376 .mat files.
Processing 960 of 5376 .mat files.
Processing 961 of 5376 .mat files.
Processing 962 of 5376 .mat files.
Processing 963 of 5376 .mat files.
Processing 964 of 5376 .mat files.
Processing 965 of 5376 .mat files.
Processing 966 of 5376 .mat files.
Processing 967 of 5376 .mat files.
Processing 968 of 5376 .mat files.
Processing 969 of 5376 .mat files.
Processing 970 of 5376 .mat files.
Processing 971 of 5376 .mat files.
Processing 972 of 5376 .mat files.
Processing 973 of 5376 .mat files.
Processing 974 of 5376 .mat files.
Processing 975 of 5376 .mat files.
Processing 976 of 5376 .mat files.
Processing 977 of 53

Processing 1181 of 5376 .mat files.
Processing 1182 of 5376 .mat files.
Processing 1183 of 5376 .mat files.
Processing 1184 of 5376 .mat files.
Processing 1185 of 5376 .mat files.
Processing 1186 of 5376 .mat files.
Processing 1187 of 5376 .mat files.
Processing 1188 of 5376 .mat files.
Processing 1189 of 5376 .mat files.
Processing 1190 of 5376 .mat files.
Processing 1191 of 5376 .mat files.
Processing 1192 of 5376 .mat files.
Processing 1193 of 5376 .mat files.
Processing 1194 of 5376 .mat files.
Processing 1195 of 5376 .mat files.
Processing 1196 of 5376 .mat files.
Processing 1197 of 5376 .mat files.
Processing 1198 of 5376 .mat files.
Processing 1199 of 5376 .mat files.
Processing 1200 of 5376 .mat files.
Processing 1201 of 5376 .mat files.
Processing 1202 of 5376 .mat files.
Processing 1203 of 5376 .mat files.
Processing 1204 of 5376 .mat files.
Processing 1205 of 5376 .mat files.
Processing 1206 of 5376 .mat files.
Processing 1207 of 5376 .mat files.
Processing 1208 of 5376 .mat

Processing 1412 of 5376 .mat files.
Processing 1413 of 5376 .mat files.
Processing 1414 of 5376 .mat files.
Processing 1415 of 5376 .mat files.
Processing 1416 of 5376 .mat files.
Processing 1417 of 5376 .mat files.
Processing 1418 of 5376 .mat files.
Processing 1419 of 5376 .mat files.
Processing 1420 of 5376 .mat files.
Processing 1421 of 5376 .mat files.
Processing 1422 of 5376 .mat files.
Processing 1423 of 5376 .mat files.
Processing 1424 of 5376 .mat files.
Processing 1425 of 5376 .mat files.
Processing 1426 of 5376 .mat files.
Processing 1427 of 5376 .mat files.
Processing 1428 of 5376 .mat files.
Processing 1429 of 5376 .mat files.
Processing 1430 of 5376 .mat files.
Processing 1431 of 5376 .mat files.
Processing 1432 of 5376 .mat files.
Processing 1433 of 5376 .mat files.
Processing 1434 of 5376 .mat files.
Processing 1435 of 5376 .mat files.
Processing 1436 of 5376 .mat files.
Processing 1437 of 5376 .mat files.
Processing 1438 of 5376 .mat files.
Processing 1439 of 5376 .mat

KeyError: nan

In [7]:
print("Number of Files with Landing Aircraft: {} out of {}.".format(len(dfs),len(downloaded_mat_files)))

Number of Files with Landing Aircraft: 323 out of 5376.


In [9]:
df_landing = pd.concat(dfs)
df_landing

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
0,44.891682,-93.241822,1.0,1006.643045,117.933373,115.007588,125.581422,3645.0,0.014040,-0.003136,...,1782.000000,10.948235,-159.834434,-536.630359,-104.780302,44.886191,-93.228955,806.643045,200,1185.595409
1,44.888764,-93.234960,1.0,906.643045,115.530863,111.638458,125.322379,3645.0,0.183690,0.003136,...,1795.861409,9.932823,-165.738392,-709.834437,-104.780302,44.886191,-93.228955,806.643045,100,553.902372
2,44.887734,-93.232386,1.0,856.643045,113.021779,110.749442,123.876531,3645.0,0.133380,0.003332,...,1672.000000,6.212059,-163.252476,-823.996865,-104.780302,44.886191,-93.228955,806.643045,50,320.767080
3,44.887502,-93.231929,1.0,846.643045,112.537305,110.196048,124.172654,3645.0,0.136028,0.002001,...,1718.580098,6.975467,-166.607074,-688.934215,-104.780302,44.886191,-93.228955,806.643045,40,276.478561
4,44.887351,-93.231586,1.0,836.643045,111.188473,109.765476,124.137817,3645.0,0.133199,0.002040,...,1748.036175,6.994513,-160.733711,-773.697501,-104.780302,44.886191,-93.228955,806.643045,30,244.578379
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7,44.882928,-93.195317,1.0,786.438823,113.839470,122.158225,-59.134194,3645.0,0.069030,0.001176,...,1467.000000,6.996122,128.635156,-400.249286,-271.354242,44.883445,-93.196665,778.438823,8,121.006367
8,44.882928,-93.195317,1.0,784.438823,113.909336,121.361765,-58.945564,3645.0,0.069030,0.001176,...,1581.000000,6.408334,123.705862,-285.024058,-271.354242,44.883445,-93.196665,778.438823,6,121.006367
9,44.883100,-93.195807,1.0,782.438823,113.571076,120.748203,-58.783896,3645.0,0.307710,0.001568,...,1665.000000,5.906483,119.784757,-376.884980,-271.354242,44.883445,-93.196665,778.438823,4,77.845605
10,44.883100,-93.195807,1.0,780.438823,111.655264,119.874082,-58.497770,3645.0,0.307710,0.001568,...,1296.000000,5.595276,111.880163,-259.560171,-271.354242,44.883445,-93.196665,778.438823,2,77.845605


In [10]:
df_landing.to_csv('processed_data_landing_at_msp.csv',index=False)

## Step 2: Feature Selection with Maximum Relevance Minimum Redundancy (MRMR)
The original 186 parameters are cut down to 26 based on literature review. Then the 26 parameters are further sorted based on Maximum relavance Minimum Redundancy (MRMR). 
Data is first smoothed, and then sliced based on selected heights. The average of each paramters is calculated.

### Step 2a: Group data by height with ```pandas.groupby```

In [11]:
grpby = df_landing.groupby(by='heights')
grpby.get_group(0)

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
11,44.886191,-93.228955,1.0,806.643045,103.976395,103.616195,123.378983,3645.0,0.27144,0.002744,...,1667.0,4.278736,-155.074287,-104.780302,-104.780302,44.886191,-93.228955,806.643045,0,0.0
11,44.885846,-93.226045,1.0,799.291242,117.358540,119.301601,123.410491,3645.0,-0.10959,0.005880,...,2236.0,6.403527,-64.065263,-177.267211,-177.267211,44.885846,-93.226045,799.291242,0,0.0
11,44.883786,-93.199606,1.0,822.106482,105.834217,95.831617,-57.981012,3645.0,-0.02184,0.011368,...,1734.0,8.281502,-54.447144,125.745074,125.745074,44.883786,-93.199606,822.106482,0,0.0
11,44.885333,-93.226382,1.0,793.020025,112.328895,108.958568,121.460417,3645.0,-0.24882,0.004704,...,2080.0,3.900689,147.137838,-270.814762,-270.814762,44.885333,-93.226382,793.020025,0,0.0
11,44.882756,-93.198228,1.0,778.709209,110.752766,104.562259,-57.987705,3645.0,0.54210,-0.004900,...,2210.0,6.968710,-33.283197,-32.735479,-32.735479,44.882756,-93.198228,778.709209,0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11,44.892371,-93.216762,1.0,826.453818,111.458923,98.982078,120.587934,3645.0,-0.01872,0.003724,...,1488.0,14.964404,114.931008,-68.149041,-68.149041,44.892371,-93.216762,826.453818,0,0.0
11,44.883617,-93.194613,1.0,805.311687,106.816379,107.819248,-59.479534,3645.0,0.21801,0.001372,...,1968.0,1.098360,-166.960137,-166.170628,-166.170628,44.883617,-93.194613,805.311687,0,0.0
11,44.881725,-93.198044,1.0,815.577007,106.032051,107.148559,-54.594218,3645.0,0.38532,-0.001372,...,2438.0,7.374969,30.142230,-26.485288,-26.485288,44.881725,-93.198044,815.577007,0,0.0
11,44.877092,-93.204416,1.0,803.589946,111.501855,100.419028,-55.257613,3645.0,0.13455,0.280084,...,1503.0,12.842577,-31.604227,138.856011,138.856011,44.877092,-93.204416,803.589946,0,0.0


In [12]:
grpby.get_group(200).heights

0    200
0    200
0    200
0    200
0    200
    ... 
0    200
0    200
0    200
0    200
0    200
Name: heights, Length: 323, dtype: int64

### Step 2b: Compute parameter averages at each height

In [13]:
ave_by_height = pd.DataFrame(columns=df_landing.columns)
for i,g in enumerate(grpby):
    height = g[0]
    df = g[1]
    ave_by_height.loc[i,:] = df.mean()

In [14]:
ave_by_height

Unnamed: 0,LATP,LONP,MSQT_1,BAL1,TAS,GS,TH,FLAP,GLS,LOC,...,CWPC,WS,WD,ALTR,TD_ALTR,TD_LAT,TD_LON,TD_ALT,heights,DIST
0,44.884192,-93.211779,0.996904,804.385162,98.874716,104.063538,33.37884,3663.919505,0.006715,0.001765,...,1832.835913,5.987552,9.670039,-142.563363,-142.563363,44.884192,-93.211779,804.385162,0.0,0.0
1,44.884127,-93.211732,0.99705,806.385162,107.530059,106.639878,33.371948,3663.88467,0.023907,0.001431,...,1848.548725,6.138852,4.442049,-292.490647,-142.563363,44.884192,-93.211779,804.385162,2.0,75.611048
2,44.884118,-93.211736,1.0,808.385162,109.026773,107.63386,33.357657,3663.896976,0.006223,0.001095,...,1863.329123,6.245552,1.264728,-343.020011,-142.563363,44.884192,-93.211779,804.385162,4.0,108.689308
3,44.884106,-93.211737,1.0,810.385162,110.597024,108.387776,33.357167,3663.894485,-0.00431,0.001895,...,1850.152334,6.39752,0.713573,-405.50305,-142.563363,44.884192,-93.211779,804.385162,6.0,136.416163
4,44.884119,-93.211786,1.0,812.385162,111.971512,109.082017,33.363979,3663.889485,-0.021477,0.00161,...,1858.22471,6.702527,1.933554,-455.366942,-142.563363,44.884192,-93.211779,804.385162,8.0,160.098133
5,44.884114,-93.211799,1.0,814.385162,112.803284,109.617697,33.365325,3663.891783,-0.028608,0.001991,...,1861.823818,6.930953,1.629935,-490.374351,-142.563363,44.884192,-93.211779,804.385162,10.0,180.00225
6,44.884126,-93.211901,1.0,824.385162,115.821616,111.357957,33.376308,3663.885606,-0.057211,0.000298,...,1852.309428,7.576133,1.348635,-608.236475,-142.563363,44.884192,-93.211779,804.385162,20.0,257.156977
7,44.884115,-93.211935,1.0,834.385162,117.522335,112.401629,33.424848,3663.89505,-0.068371,0.001356,...,1816.188369,8.035595,5.862442,-670.548148,-142.563363,44.884192,-93.211779,804.385162,30.0,317.062297
8,44.884104,-93.211944,1.0,844.385162,118.585095,113.139709,33.45475,3663.876769,-0.050584,0.001114,...,1828.338907,8.267578,5.108144,-671.830338,-142.563363,44.884192,-93.211779,804.385162,40.0,369.753088
9,44.884087,-93.211972,1.0,854.385162,119.435397,113.762441,33.438747,3663.870557,-0.023198,0.0013,...,1838.654069,8.535319,2.737421,-691.349528,-142.563363,44.884192,-93.211779,804.385162,50.0,423.456526


### Step 2c: Compute MRMR with Spearman Correlation Relative to Veritical Velocity ```ALTR```


In [15]:
sele_key_list=['CCPC','CTAC','PTCH','ELEV_1','BLAC','N1_1','GS','TAS','GLS','WS','ROLL',\
               'FPAC','WD','LONP','TH','LATP','DIST','AIL_1','LOC','TRK','BAL1','RUDD','FLAP','ALTR']
ave_by_height = ave_by_height[sele_key_list]

In [16]:
ave_by_height

Unnamed: 0,CCPC,CTAC,PTCH,ELEV_1,BLAC,N1_1,GS,TAS,GLS,WS,...,TH,LATP,DIST,AIL_1,LOC,TRK,BAL1,RUDD,FLAP,ALTR
0,1926.73065,0.001563,2.437592,0.110089,-0.039902,34.600852,104.063538,98.874716,0.006715,5.987552,...,33.37884,44.884192,0.0,85.077908,0.001765,33.471943,804.385162,-1.797128,3663.919505,-142.563363
1,2008.324588,-0.002311,1.974665,-0.889584,-0.050227,37.116829,106.639878,107.530059,0.023907,6.138852,...,33.371948,44.884127,75.611048,84.88423,0.001431,33.471821,806.385162,-1.966797,3663.88467,-292.490647
2,2039.832899,-0.001895,1.759162,-1.459126,-0.05282,38.587362,107.63386,109.026773,0.006223,6.245552,...,33.357657,44.884118,108.689308,84.991761,0.001095,33.47203,808.385162,-2.306965,3663.896976,-343.020011
3,2065.154614,-0.003231,1.525746,-2.014512,-0.054705,40.121825,108.387776,110.597024,-0.00431,6.39752,...,33.357167,44.884106,136.416163,85.062112,0.001895,33.481436,810.385162,-2.098378,3663.894485,-405.50305
4,2108.657252,-0.003233,1.271863,-2.488021,-0.056525,41.516511,109.082017,111.971512,-0.021477,6.702527,...,33.363979,44.884119,160.098133,85.084116,0.00161,33.494974,812.385162,-1.667993,3663.889485,-455.366942
5,2127.812796,-0.003563,1.057435,-2.690659,-0.057964,42.748676,109.617697,112.803284,-0.028608,6.930953,...,33.365325,44.884114,180.00225,84.750436,0.001991,33.496004,814.385162,-1.917367,3663.891783,-490.374351
6,2202.010385,-0.001612,0.137774,-3.95079,-0.061005,47.460409,111.357957,115.821616,-0.057211,7.576133,...,33.376308,44.884126,257.156977,84.808573,0.000298,33.50542,824.385162,-1.776841,3663.885606,-608.236475
7,2263.670912,0.000628,-0.687212,-4.591668,-0.062207,50.603226,112.401629,117.522335,-0.068371,8.035595,...,33.424848,44.884115,317.062297,85.144737,0.001356,33.504223,834.385162,-1.929116,3663.89505,-670.548148
8,2286.979909,0.002338,-1.288202,-5.266081,-0.062829,52.461056,113.139709,118.585095,-0.050584,8.267578,...,33.45475,44.884104,369.753088,85.324964,0.001114,33.483321,844.385162,-2.046254,3663.876769,-671.830338
9,2323.58186,0.000205,-1.735114,-5.509513,-0.063787,53.606588,113.762441,119.435397,-0.023198,8.535319,...,33.438747,44.884087,423.456526,84.684193,0.0013,33.471277,854.385162,-1.846026,3663.870557,-691.349528


In [17]:
from scipy import stats
def order_features_mrmr(df,output='ALTR',redundancy_weight=0.5):
    selected_order = []
    
    R = np.abs(stats.spearmanr(df.values,axis=0)[0])
    output_idx = df.columns.get_loc(output)
    input_idx = [i for i in range(df.shape[1]) if i != output_idx]
    
    corr_with_output = R[input_idx,output_idx]
    corr_with_inputs = R[[[i] for i in input_idx],input_idx]
    
    idx_best = np.argmax(corr_with_output)
    selected_order.append(idx_best)
    input_idx.remove(idx_best)
    while len(input_idx)!=0:
        with_output = corr_with_output[input_idx]
        with_others = np.array([np.mean(corr_with_inputs[idx,input_idx]) for idx in input_idx])
            
        mrmr_values = with_output-with_others*redundancy_weight
        mrmr_max_idx = np.argmax(mrmr_values)
        best_index=input_idx[mrmr_max_idx]
        
        input_idx.remove(best_index)
        selected_order.append(best_index)
    ordered_features = df.columns[selected_order]
    return ordered_features

In [18]:
min_redundancy_weight = [0,.25,.5,.75,1] #contribution of minimum redundancy
df_features = pd.DataFrame(np.zeros([len(sele_key_list)-1,len(min_redundancy_weight)]),columns=min_redundancy_weight)
for i,weight in enumerate(min_redundancy_weight):
    ordered_features = order_features_mrmr(ave_by_height,redundancy_weight=weight)
    df_features.iloc[:,i]=ordered_features

In [19]:
df_insert = pd.DataFrame({key:'ALTR' for key in df_features.columns},index=[0])
df_features = pd.concat([df_insert,df_features]).reset_index(drop = True)
df_features

Unnamed: 0,0.00,0.25,0.50,0.75,1.00
0,ALTR,ALTR,ALTR,ALTR,ALTR
1,CCPC,CCPC,CCPC,CCPC,CCPC
2,PTCH,PTCH,BLAC,BLAC,BLAC
3,ELEV_1,ELEV_1,PTCH,PTCH,GLS
4,N1_1,N1_1,ELEV_1,ELEV_1,PTCH
5,GS,GS,N1_1,N1_1,ELEV_1
6,TAS,TAS,GS,GS,N1_1
7,WS,WS,TAS,TAS,GS
8,DIST,DIST,WS,WS,TAS
9,BAL1,BAL1,DIST,DIST,WS


In [20]:
df_features.to_csv('ordered_features.csv',index=False)