In [None]:
import logging
import numpy as np
import os
import pandas as pd
import time
import plotly.express as px

from datetime import datetime
from collections import Counter
from itertools import combinations, combinations_with_replacement, permutations

This is a notebook code version of discussion here - https://www.kaggle.com/c/commonlitreadabilityprize/discussion/240886


## Introduction

The target in this competition seems to love playing with our feelings. In this discussion, I am going to use previous discussions and simply the “readability” and “target” in this competition.

> **The target value is the result of a Bradley-Terry analysis of more than 111,000 pairwise comparisons between excerpts. Teachers spanning grades 3-12 (a majority teaching between grades 6-10) served as the raters for these comparisons.**

### What are pairwise comparisons?

One-liner: Comparing two entities in pairs to judge which of each entity is preferred
<img src="https://i.postimg.cc/GhmxX3kg/image.png" width=500px>

For 111,000 pairs of excerpts, there would be a (111,000 x 111,000) size matrix. Now, the preference of #2 over other excerpts is what is termed as “readability” here

### Ok, What's Bradley-Terry (BT) Analysis?

One-liner: Bradley-Terry analysis converts these matrices of references to values on a ratio scale

Textbook Definition: Bradley-Terry (BT) is a statistical technique used to convert importance or preference rankings of a list of attributes into importance or preference values for the same attributes on a powerful ratio scale.

<img src="https://www.rsw-software.com/wp-content/uploads/2017/05/Drivers-%E2%80%93-BT-example-1024x620.jpg" width=500px>

Now, a BT model is fit on this pairwise data and a rating is the result of model output. In this case below, we can see Attribute 2 (our case Excerpt 2) is 22 times preferred, meaning good **readability** (than Excerpt 1). And, in the final ratio scale, we can clearly see how Attribute 2 being preferred majority times than rest, have to lead it to rank most preferable among all

This is something we are chasing here 🏃‍♂️

Example Resources:
1. [GitHub Code For BT Model]( https://github.com/bjlkeng/Bradley-Terry-Model)
2. [Pairwise comparisions example](http://resources.qiagenbioinformatics.com/manuals/phylogenymodule/current/_pairwise_comparison_table.html) 
3. [Modelling Pairwise distances with BT paper](https://www.acer.org/files/Tutorial-13--Pairwise-comparisons.pdf)
4. [BT Analysis](https://www.rsw-software.com/r-sw-drivers/bradley-terry-analysis/)

In [None]:
atrrib_list = ['Excerpt 1', 'Excerpt 2', 'Excerpt 3', 'Excerpt 4', 'Excerpt 5']
pairs = list(permutations(atrrib_list, 2))

Let us generate possible combinations of pairs. We are using pairs for 5 total examples here. You can relate this to excerpts in competition

## Generating Pairs

In [None]:
pairs = list(combinations(atrrib_list, 2))
print("Following pairs compared wrt readability ")
for i in pairs: print(i[0],'vs',i[1])

## Example data

<img src=https://www.rsw-software.com/wp-content/uploads/2017/05/Drivers-%E2%80%93-BT-example-1024x620.jpg width=600x>

Let us fill the same data here in a pandas dataframe manually. For instance, if attribute (2) is preferred over attribute (3) and attribute (3) is preferred over attribute (4), with traditional rank analysis we can say that attribute (2) is preferred over attributes (3) and (4), but we cannot properly quantify the difference in preference (only rank information can be provided).

In [None]:
df = pd.DataFrame(pairs, columns=['Excerpt A','Excerpt B'])
wins_a = [3,11,11,5,19,20,15,16,9,7]
wins_b = [22,14,14,20,6,4,10,9,16,18]
df['Wins A'] = wins_a
df['Wins B'] = wins_b
print("Columns: Excerpt A vs. Exceprt B")
print("Wins A --> Number of times A preferred over B")
print("Wins B ---> Number of times B preferred over A")
df

In [None]:
def bradley_terry_analysis(text_data, max_iters=1000, error_tol=1e-3):
    ''' Computes Bradley-Terry using iterative algorithm
        See: https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model
    '''
    # Do some aggregations for convenience
    # Total wins per excerpt
    winsA = text_data.groupby('Excerpt A').agg(sum)['Wins A'].reset_index()
    winsA = winsA[winsA['Wins A'] > 0]
    winsA.columns = ['Excerpt', 'Wins']
    winsB = text_data.groupby('Excerpt B').agg(sum)['Wins B'].reset_index()
    winsB = winsB[winsB['Wins B'] > 0]
    winsB.columns = ['Excerpt', 'Wins']
    wins = pd.concat([winsA, winsB]).groupby('Excerpt').agg(sum)['Wins']

    # Total games played between pairs
    num_games = Counter()
    for index, row in text_data.iterrows():
        key = tuple(sorted([row['Excerpt A'], row['Excerpt B']]))
        total = sum([row['Wins A'], row['Wins B']])
        num_games[key] += total

    # Iteratively update 'ranks' scores
    excerpts = sorted(list(set(text_data['Excerpt A']) | set(text_data['Excerpt B'])))
    ranks = pd.Series(np.ones(len(excerpts)) / len(excerpts), index=excerpts)
    for iters in range(max_iters):
        oldranks = ranks.copy()
        for excerpt in ranks.index:
            denom = np.sum(num_games[tuple(sorted([excerpt, p]))]
                           / (ranks[p] + ranks[excerpt])
                           for p in ranks.index if p != excerpt)
            ranks[excerpt] = 1.0 * wins[excerpt] / denom

        ranks /= sum(ranks)

        if np.sum((ranks - oldranks).abs()) < error_tol:
            break

    if np.sum((ranks - oldranks).abs()) < error_tol:
        logging.info(" * Converged after %d iterations.", iters)
    else:
        logging.info(" * Max iterations reached (%d iters).", max_iters)


    # Note we can control scaling here. For this competiton we have -'ve and positive values on the scale
    # To reproduce the results from example; I choose to multiply the rank with x100
    ranks = ranks.sort_values(ascending=False) \
                    .apply(lambda x : x*100).round(2)

    return ranks

In [None]:
final_scores = pd.DataFrame(bradley_terry_analysis(df), columns=['readability']).reset_index()
final_scores.sort_values(by='index', inplace=True)
final_scores.columns= ['Example Excerpt', 'readability score']
print(">> Target scores")
final_scores

In [None]:
fig = px.bar(final_scores, x='readability score', y='Example Excerpt',
             title='The target in this competition is similar. Has its own range as per set scale', 
             text='readability score', width=600)
fig.show()

<img src=https://www.rsw-software.com/wp-content/uploads/2017/05/Drivers-%E2%80%93-BT-example-1024x620.jpg width=600x>

## Hope this helps 😇