##### The issue of weighed ensemble in kaggle competitions is that weight are tuned according the test data in the public score. Therefore, one may get high score in the public leaderboard, but lose his/her position in the public one because of overfitting. It is much saver to not use test dataset in model at all and try various statistical techniques instead.

*Upvote if useful!*

In [1]:
import pandas as pd
import numpy as np

In [2]:
sub = pd.read_csv('../input/ventilator-pressure-prediction/sample_submission.csv')

# public notebooks results
sub_0 = pd.read_csv('../input/gb-vpp-whoppity-dub-dub/median_submission.csv')
sub_1 = pd.read_csv('../input/gb-vpp-to-infinity-and-beyond/submission.csv')
sub_2 = pd.read_csv('../input/pred-ventilator-lstm-model-0-149/submission.csv')

In [3]:
pred = np.array([np.array(sub_0['pressure'].values), np.array(sub_1['pressure'].values), np.array(sub_2['pressure'].values)])
pred

array([[6.259344  , 5.978134  , 7.102974  , ..., 6.470251  , 6.189041  ,
        6.329647  ],
       [6.317731  , 5.987313  , 7.154845  , ..., 6.398248  , 6.1584907 ,
        6.33103   ],
       [6.32960672, 5.978096  , 7.10293032, ..., 6.32960672, 6.18900243,
        6.32960672]])

Here we try 3 things. 
1. Find mean predictions of three notebooks. If the predictions are very scattered, this will not improve score much. 
1. Try is to use median, it will work well with scattered data. 
1. This is more experimental; standard deviation of the predictions will be calculated, data will be clipped with this range and average of clipped data calculated. It still uses mean, but it will eliminate data that are very far from average point and therefore should reduce sparce effect

In [4]:
# Finding statistical features
mean = np.mean(pred, axis=0)
med = np.median(pred, axis=0)
std = np.std(pred, axis=0)

In [5]:
# mean of values inside the standard mean
clipped_pres = np.clip(np.vstack(pred), mean-std, mean+std)
clipped_mean = np.mean(clipped_pres, axis=0)

In [6]:
sub['pressure'] = mean
sub.to_csv('submission_mean.csv', index=False)
sub.head(5)

Unnamed: 0,id,pressure
0,1,6.302227
1,2,5.981181
2,3,7.12025
3,4,7.609636
4,5,9.147787


In [7]:
sub['pressure'] = med
sub.to_csv('submission_median.csv', index=False)
sub.head(5)

Unnamed: 0,id,pressure
0,1,6.317731
1,2,5.978134
2,3,7.102974
3,4,7.595091
4,5,9.141746


In [8]:
sub['pressure'] = clipped_mean
sub.to_csv('submission_clipped_mean.csv', index=False)
sub.head(5)

Unnamed: 0,id,pressure
0,1,6.306286
1,2,5.980582
2,3,7.116872
3,4,7.606791
4,5,9.146602


1. Score of mean on public leaderboard -> 0.144
1. Score of median on public leaderboard -> 0.141
1. Score ofclipped mean on public leaderboard -> 0.143

This shows that median method is best statistical approach among three and mean is the worst. However, in my opinion if larger number of predictions would be used, clipped mean would show same or better result as median