<a href="https://www.kaggle.com/code/edrickkesuma/power-averaging-is-your-friend?scriptVersionId=91095561" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## Power Averaging - TPS September 2021

Power Averaging is a form of ensembling to optimize for the AUC metric, being used in TPS Sept.

It works best on highly correlated models, which are in abundance in this competition.

The formula is **Final Submission = (Submission1^Power + Submission2^Power + Submission3^Power + Submission4^Power) / 4**

Here's a detailed post on it - https://medium.com/data-design/reaching-the-depths-of-power-geometric-ensembling-when-targeting-the-auc-metric-2f356ea3250e

**Update**: I've written a more in-depth guide to pushing the limits of power averaging [here](https://www.kaggle.com/edrickkesuma/in-depth-power-averaging-0-81848).

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

## Read in submission files

In [2]:
# 0.81826 - Stacking from https://www.kaggle.com/vishwas21/tps-sep-21-3-level-custom-stacking
stacking_sub = pd.read_csv('../input/tps-sep-21-3-level-custom-stacking/submission.csv')
# 0.81789 - Catboost from https://www.kaggle.com/jonigooner/catboost-classifier
cb_sub = pd.read_csv('../input/catboost-classifier/catboost_classifier.csv')
# 0.81814 - LGBM from https://www.kaggle.com/realtimshady/single-simple-lightgbm
lgbm_sub = pd.read_csv('../input/single-simple-lightgbm/submission.csv')

## Check for correlations

In [3]:
import matplotlib as plt
import plotly.figure_factory as ff
import plotly.express as px

hist_data = [stacking_sub.claim, cb_sub.claim, lgbm_sub.claim]
group_labels = ['stacking', 'catboost', 'lgbm']
fig = ff.create_distplot(hist_data, group_labels, bin_size=0.3, show_hist=False, show_rug=False)
fig.show()

In [4]:
# High correlation between all models ~0.998+
data = np.corrcoef([stacking_sub.claim, cb_sub.claim, lgbm_sub.claim])
fig=px.imshow(data,x=group_labels, y=group_labels)

fig.show()

## Submission file

In [5]:
# Power is arbitrary - refer to blog post for more info to get a better power
ensemble = stacking_sub.copy()
ensemble.loc[:,'claim'] = (stacking_sub**4 + cb_sub**4 + lgbm_sub**4)/3

In [6]:
ensemble.to_csv('submission.csv', index=False)