## GL Points and DOTS Refitting & Analysis

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.optimize import curve_fit
from matplotlib import pyplot
pyplot.rcParams['figure.dpi'] = 100
pyplot.rcParams['savefig.dpi'] = 300

In [2]:
old_coef = [1199.72839, 1025.18162, 0.00921]
bwts = np.linspace(50, 200, 1000)

def ipf_formula(bwt, A, B, C):
    return (A - B * np.exp(-C * bwt))

### Refitting GL Points on new data

In [3]:
data = pd.read_csv("gold_standard_data_ipf_epf.csv")
popt, pcov = curve_fit(ipf_formula, data['BodyweightKg'].values, data['TotalKg'], p0=old_coef)
new_coef = popt

FileNotFoundError: [Errno 2] No such file or directory: 'gold_standard_data_ipf_epf.csv'

In [None]:
plt.figure(figsize=(8, 5))
plt.plot(bwts, ipf_formula(bwts, *old_coef), c = 'r', label='Old Model')
plt.plot(bwts, ipf_formula(bwts, *new_coef),  c = 'g', label='Refitted Model')
plt.scatter(data['BodyweightKg'].values, data['TotalKg'], alpha=0.2, s = 1.5, color="blue")
plt.ylabel("Expected total [kg]")
plt.xlabel('Bodyweight [kg]')
plt.legend()

Converting this expected total to the GL coefficients and plotting them.

In [None]:
plt.figure(figsize=(8, 5))
plt.plot(bwts, 100 / ipf_formula(bwts, *old_coef), c = 'r', label='Old Model')
plt.plot(bwts, 100 / ipf_formula(bwts, *new_coef),  c = 'g', label='Refitted Model')
plt.ylabel("GL Coefficients")
plt.xlabel('Bodyweight [kg]')
plt.legend()

Now we can look at the expected change in GL points due to refitting for a given bodyweigh. Note that this will be the same regardless of the total obtained, as the final score is just multiplied with the total.

In [None]:
relative_diff = (100 / ipf_formula(bwts, *new_coef)) /  \
    (100 / ipf_formula(bwts, *old_coef))
plt.figure(figsize=(8, 5))
plt.plot(bwts, 100 * (relative_diff-1))
plt.xlabel("Bodyweight [kg]")
plt.ylabel("% change in GL Points due to refitting")

### Weight Class Analysis
Looking at the distribution of the GL points in the weight classes and its changes due to refitting. We look at the entire dataset so we have a (relatively) unbiased sample.

In [None]:
data_full = pd.read_csv('filtered_data_ipf_epf.csv')

In [None]:
data_full['GLold'] = data['TotalKg'] * (100 / ipf_formula(data['BodyweightKg'], *old_coef))
data_full['GLnew'] = data['TotalKg'] * (100 / ipf_formula(data['BodyweightKg'], *new_coef))

In [None]:
import matplotlib.pyplot as plt

# Define the grid dimensions (here 9 weightclasses, so 3x3)
rows, cols = 3, 3
fig, axes = plt.subplots(nrows=rows, ncols=cols, figsize=(16, 8), constrained_layout=True)

axes = axes.flatten()

for ax, (weight_class, group) in zip(axes, data_full.groupby('WeightClassKg')):
    group['GLold'].plot(kind='kde', ax=ax, label = 'Old')
    group['GLnew'].plot(kind='kde', ax=ax, label = 'Refitted')
    ax.axvline(100, ymin = 0, color = 'red', linestyle = 'dashed', alpha = 0.2)
    ax.set_title(f'Weight Class {weight_class}')
    ax.set_xlabel('GL Points')
    ax.set_ylabel('Density')
    ax.legend()

# Show the plots
plt.show()


We see the distributions become fairer, as we generally want the distribution to be symmetric around 100 (given an ideal data split). They aren't very symmetric, but the volume of probability at either sides does tend to be more equal with the refitted variant. This does show the (on average) favoured weight classes by this formula (e.g. 93).

## Refitting DOTS
Repeating the same procedure for DOTS.

In [None]:
def dots(bwt, a, b, c, d, e):
    return (a + b * bwt + c * bwt**2 + d * bwt**3 + e * bwt**4)

dots_p0 = [-307.75076, 24.0900756, -0.1918759221, 0.0007391293, -0.000001093] 
# For men, see https://www.inchcalculator.com/lifting-strength-calculator/ for women
popt, pcov = curve_fit(dots, data['BodyweightKg'].values, data['TotalKg'], p0=dots_p0)
dots_coef = popt

In [None]:
plt.figure(figsize=(8, 5))
plt.plot(bwts, dots(bwts, *dots_p0), c = 'r', label='DOTS Old')
plt.plot(bwts, dots(bwts, *dots_coef), c = 'g', label='DOTS Refitted')
plt.scatter(data['BodyweightKg'].values, data['TotalKg'], alpha=0.2, s = 1.5, color="blue")
plt.ylabel("Expected Total [kg]")
plt.xlabel("Bodyweight [kg]")
plt.legend()

Quite clear that a higher order polynomial is not an appropiate model for this data, as it heavily overfits. Will also be obvious in the relative change graph. 
(Wilks is just DOTS with a 5th order term, so will be even worse. Just add f * x^5 in the formula.)

In [None]:
relative_diff = (500 / dots(bwts, *dots_coef)) /  \
    (500 / dots(bwts, *dots_p0))

plt.figure(figsize=(8, 5))
plt.plot(bwts, 100 * (relative_diff-1))
plt.xlabel("Bodyweight [kg]")
plt.ylabel("% change in DOTS due to refitting")

### Weight Class Analysis

In [None]:
data_full['DotsOld'] = data['TotalKg'] * (100 / dots(data['BodyweightKg'], *dots_p0))
data_full['DotsNew'] = data['TotalKg'] * (100 / dots(data['BodyweightKg'], *dots_coef))

In [None]:
import matplotlib.pyplot as plt

# Define the grid dimensions (here 9 weightclasses, so 3x3)
rows, cols = 3, 3
fig, axes = plt.subplots(nrows=rows, ncols=cols, figsize=(16, 8), constrained_layout=True)

axes = axes.flatten()

for ax, (weight_class, group) in zip(axes, data_full.groupby('WeightClassKg')):
    group['GLold'].plot(kind='kde', ax=ax, label = 'Old')
    group['GLnew'].plot(kind='kde', ax=ax, label = 'Refitted')
    ax.axvline(100, ymin = 0, color = 'red', linestyle = 'dashed', alpha = 0.2)
    ax.set_title(f'Weight Class {weight_class}')
    ax.set_xlabel('DOTS')
    ax.set_ylabel('Density')
    ax.legend()

plt.show()