# Weighted Mean Baseline

**Goal:** Create a simple baseline using a constant prediction for all test sample. This will come from the mean value of each target variable scaled by proposed sample weights: 

- 1 for all healthy labels.
- 2 for low grade solid organ injuries (liver, spleen, kidney).
- 4 for high grade solid organ injuries.
- 2 for bowel injuries.
- 6 for extravasation.
- 6 for the auto-generated any_injury label.

In addition, we provide a method to evaluate the score for the training data and investigate alternative scale factors. This will be used to highlight the challenges of unbalanced data and weighted scoring metrics. 

# Imports

In [None]:
import numpy as np
import pandas as pd
import pandas.api.types
import sklearn.metrics

# Load Data

In [None]:
# Only requires the training target data. 
y_train = pd.read_csv('../input/train.csv')

y_train.head()

In [None]:
# List of Targets
Injuries = ['bowel_healthy', 'bowel_injury', 
            'extravasation_healthy', 'extravasation_injury', 
            'kidney_healthy', 'kidney_low', 'kidney_high', 
            'liver_healthy', 'liver_low', 'liver_high', 
            'spleen_healthy', 'spleen_low', 'spleen_high', 
            'any_injury']

# Target EDA

In [None]:
y_train[Injuries].describe()

# [Score](https://www.kaggle.com/code/metric/rsna-trauma-metric/notebook)

In [None]:
# I'm not sure if this is needed!
# class ParticipantVisibleError(Exception):
#     pass

def normalize_probabilities_to_one(df: pd.DataFrame, group_columns: list) -> pd.DataFrame:
    # Normalize the sum of each row's probabilities to 100%.
    # 0.75, 0.75 => 0.5, 0.5
    # 0.1, 0.1 => 0.5, 0.5
    row_totals = df[group_columns].sum(axis=1)
    if row_totals.min() == 0:
        raise ParticipantVisibleError('All rows must contain at least one non-zero prediction')
    for col in group_columns:
        df[col] /= row_totals
    return df


def score(solution: pd.DataFrame, submission: pd.DataFrame, row_id_column_name: str) -> float:
    '''
    Pseudocode:
    1. For every label group (liver, bowel, etc):
        - Normalize the sum of each row's probabilities to 100%.
        - Calculate the sample weighted log loss.
    2. Derive a new any_injury label by taking the max of 1 - p(healthy) for each label group
    3. Calculate the sample weighted log loss for the new label group
    4. Return the average of all of the label group log losses as the final score.
    '''
    del solution[row_id_column_name]
    del submission[row_id_column_name]

    # Run basic QC checks on the inputs
    if not pandas.api.types.is_numeric_dtype(submission.values):
        raise ParticipantVisibleError('All submission values must be numeric')

    if not np.isfinite(submission.values).all():
        raise ParticipantVisibleError('All submission values must be finite')

    if solution.min().min() < 0:
        raise ParticipantVisibleError('All labels must be at least zero')
    if submission.min().min() < 0:
        raise ParticipantVisibleError('All predictions must be at least zero')

    # Calculate the label group log losses
    binary_targets = ['bowel', 'extravasation']
    triple_level_targets = ['kidney', 'liver', 'spleen']
    all_target_categories = binary_targets + triple_level_targets

    label_group_losses = []
    for category in all_target_categories:
        if category in binary_targets:
            col_group = [f'{category}_healthy', f'{category}_injury']
        else:
            col_group = [f'{category}_healthy', f'{category}_low', f'{category}_high']

        solution = normalize_probabilities_to_one(solution, col_group)

        for col in col_group:
            if col not in submission.columns:
                raise ParticipantVisibleError(f'Missing submission column {col}')
        submission = normalize_probabilities_to_one(submission, col_group)
        label_group_losses.append(
            sklearn.metrics.log_loss(
                y_true=solution[col_group].values,
                y_pred=submission[col_group].values,
                sample_weight=solution[f'{category}_weight'].values
            )
        )

    # Derive a new any_injury label by taking the max of 1 - p(healthy) for each label group
    healthy_cols = [x + '_healthy' for x in all_target_categories]
    any_injury_labels = (1 - solution[healthy_cols]).max(axis=1)
    any_injury_predictions = (1 - submission[healthy_cols]).max(axis=1)
    any_injury_loss = sklearn.metrics.log_loss(
        y_true=any_injury_labels.values,
        y_pred=any_injury_predictions.values,
        sample_weight=solution['any_injury_weight'].values
    )

    label_group_losses.append(any_injury_loss)
    return label_group_losses  # np.mean(label_group_losses)

In order to evaluate the score, the appropriate weights need to be assigned. The score function defined above expects sample weights for each category of injury (bowel, extravasation, kidney, liver, spleen, any). The sample weight is assigned based on the true target for a given category. For example, if a sample has a low grade kidney injury, then 

    [kidney_healthy, kidney_low, kidney_high] = [0,1,0]

and we would set kidney_weight = 2 for for that sample.

In [None]:
# Assign the appropriate weights to each category
def create_training_solution(y_train):
    sol_train = y_train.copy()
    
    # bowel healthy|injury sample weight = 1|2
    sol_train['bowel_weight'] = np.where(sol_train['bowel_injury'] == 1, 2, 1)
    
    # extravasation healthy/injury sample weight = 1|6
    sol_train['extravasation_weight'] = np.where(sol_train['extravasation_injury'] == 1, 6, 1)
    
    # kidney healthy|low|high sample weight = 1|2|4
    sol_train['kidney_weight'] = np.where(sol_train['kidney_low'] == 1, 2, np.where(sol_train['kidney_high'] == 1, 4, 1))
    
    # liver healthy|low|high sample weight = 1|2|4
    sol_train['liver_weight'] = np.where(sol_train['liver_low'] == 1, 2, np.where(sol_train['liver_high'] == 1, 4, 1))
    
    # spleen healthy|low|high sample weight = 1|2|4
    sol_train['spleen_weight'] = np.where(sol_train['spleen_low'] == 1, 2, np.where(sol_train['spleen_high'] == 1, 4, 1))
    
    # any healthy|injury sample weight = 1|6
    sol_train['any_injury_weight'] = np.where(sol_train['any_injury'] == 1, 6, 1)
    return sol_train

In [None]:
solution_train = create_training_solution(y_train)

# predict a constant using the mean of the training data
y_pred = y_train.copy()
y_pred[Injuries] = y_train[Injuries].mean().tolist()

no_scale_score = score(solution_train,y_pred,'patient_id')
print(f'Training score without scaling: {no_scale_score}')

In [None]:
y_pred

In [None]:
preds = [
    y_pred['bowel_injury'] / (y_pred['bowel_healthy'] +  y_pred['bowel_injury']),
    y_pred['extravasation_injury'] / (y_pred['extravasation_healthy'] +  y_pred['extravasation_injury']),
    y_pred['kidney_healthy'] / (y_pred['kidney_healthy'] +  y_pred['kidney_low']  +  y_pred['kidney_high']),
    y_pred['kidney_low'] / (y_pred['kidney_healthy'] +  y_pred['kidney_low']  +  y_pred['kidney_high']),
    y_pred['kidney_high'] / (y_pred['kidney_healthy'] +  y_pred['kidney_low']  +  y_pred['kidney_high']),
    y_pred['liver_healthy'] / (y_pred['liver_healthy'] +  y_pred['liver_low']  +  y_pred['liver_high']),
    y_pred['liver_low'] / (y_pred['liver_healthy'] +  y_pred['liver_low']  +  y_pred['liver_high']),
    y_pred['liver_high'] / (y_pred['liver_healthy'] +  y_pred['liver_low']  +  y_pred['liver_high']),
    y_pred['spleen_healthy'] / (y_pred['spleen_healthy'] +  y_pred['spleen_low']  +  y_pred['spleen_high']),
    y_pred['spleen_low'] / (y_pred['spleen_healthy'] +  y_pred['spleen_low']  +  y_pred['spleen_high']),
    y_pred['spleen_high'] / (y_pred['spleen_healthy'] +  y_pred['spleen_low']  +  y_pred['spleen_high']),
]

preds = [p.values for p in preds]

preds = np.array(preds).T

losses, avg_loss = rsna_loss(preds, df_patient)

for k, v in losses.items():
    print(f"- {k.split('_')[0][:8]} loss\t: {v:.3f}")

In [None]:
preds.shape

### Mine

In [None]:
cd ../src

In [None]:
import os
import sys
import glob
import torch
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from tqdm import tqdm
from sklearn.metrics import *

pd.set_option('display.width', 500)
pd.set_option('max_colwidth', 100)

In [None]:
from training.main import k_fold

from util.metrics import rsna_loss

from params import *
from data.dataset import *
from data.preparation import *
from data.transforms import get_transfos

from model_zoo.models import define_model
from training.losses import *

In [None]:
df_patient, df_img = prepare_data(DATA_PATH)

In [None]:
losses, avg_loss = rsna_loss(preds, df_patient)

for k, v in losses.items():
    print(f"- {k.split('_')[0][:8]} loss\t: {v:.3f}")

In [None]:
no_scale_score

### ??

In [None]:
# Group by different sample weights
scale_by_2 = ['bowel_injury','kidney_low','liver_low','spleen_low']
scale_by_4 = ['kidney_high','liver_high','spleen_high']
scale_by_6 = ['extravasation_injury','any_injury']

# Scale factors based on described metric 
sf_2 = 2
sf_4 = 4
sf_6 = 6

# The score function deletes the ID column so we remake it
solution_train = create_training_solution(y_train)

# Reset the prediction
y_pred = y_train.copy()
y_pred[Injuries] = y_train[Injuries].mean().tolist()

# Scale each target 
y_pred[scale_by_2] *=sf_2
y_pred[scale_by_4] *=sf_4
y_pred[scale_by_6] *=sf_6

weight_scale_score = score(solution_train, y_pred, 'patient_id')
print(f'Training score with weight scaling: {weight_scale_score}')

We can do even better by increasing the scale factors. We will investigate one alternative option. The goal is to highlight that the most accurate prediction doesn't necessarily give the highest score!

In [None]:
# Update scale factors to improve score
# sf_2 = 2
# sf_4 = 4
sf_6 = 14

# The score function deletes the ID column so we remake it
solution_train = create_training_solution(y_train)

# Reset the prediction, again
y_pred = y_train.copy()
y_pred[Injuries] = y_train[Injuries].mean().tolist()

# Scale each target 
y_pred[scale_by_2] *=sf_2
y_pred[scale_by_4] *=sf_4
y_pred[scale_by_6] *=sf_6

improved_scale_score = score(solution_train,y_pred,'patient_id')
print(f'Training score with better scaling: {improved_scale_score}')


In [None]:
preds = [
    y_pred['bowel_injury'] / (y_pred['bowel_healthy'] +  y_pred['bowel_injury']),
    y_pred['extravasation_injury'] / (y_pred['extravasation_healthy'] +  y_pred['extravasation_injury']),
    y_pred['kidney_healthy'] / (y_pred['kidney_healthy'] +  y_pred['kidney_low']  +  y_pred['kidney_high']),
    y_pred['kidney_low'] / (y_pred['kidney_healthy'] +  y_pred['kidney_low']  +  y_pred['kidney_high']),
    y_pred['kidney_high'] / (y_pred['kidney_healthy'] +  y_pred['kidney_low']  +  y_pred['kidney_high']),
    y_pred['liver_healthy'] / (y_pred['liver_healthy'] +  y_pred['liver_low']  +  y_pred['liver_high']),
    y_pred['liver_low'] / (y_pred['liver_healthy'] +  y_pred['liver_low']  +  y_pred['liver_high']),
    y_pred['liver_high'] / (y_pred['liver_healthy'] +  y_pred['liver_low']  +  y_pred['liver_high']),
    y_pred['spleen_healthy'] / (y_pred['spleen_healthy'] +  y_pred['spleen_low']  +  y_pred['spleen_high']),
    y_pred['spleen_low'] / (y_pred['spleen_healthy'] +  y_pred['spleen_low']  +  y_pred['spleen_high']),
    y_pred['spleen_high'] / (y_pred['spleen_healthy'] +  y_pred['spleen_low']  +  y_pred['spleen_high']),
]

preds = [p.values for p in preds]

preds = np.array(preds).T

losses, avg_loss = rsna_loss(preds, df_patient)

for k, v in losses.items():
    print(f"- {k.split('_')[0][:8]} loss\t: {v:.3f}")

In [None]:
print(np.mean(improved_scale_score), avg_loss)

We will stick with this choice of scale factors as it appears to be better. Note, this appears to be near the optimal choice of scale factors for the mean, it is not necessarily the optimal choice of constant solution even for the training data. The optimal choice may given equal scaling to each injury which should have the same weight!

When producing proper predictions, a constant scale factor may not be the optimal choice!

# Submission

In [None]:
# Load submission template 
submission = pd.read_csv('/kaggle/input/rsna-2023-abdominal-trauma-detection/sample_submission.csv')

# Set output to mean of training data
submission[Injuries] = y_train[Injuries].mean().tolist()

# Scale each category by desired scale factor
submission[scale_by_2] *=sf_2
submission[scale_by_4] *=sf_4
submission[scale_by_6] *=sf_6

# Save Submission!
submission.to_csv('submission.csv', index=False)