## Fill cache

This notebook extracts the results from the transfer experiments (the results must be stored according to `MML` configuration) and computes the task distances (based on the features extracted and placed inside the `data` folder). The extracted results and computed distances are placed inside the `cache` folder and shared with this repository. All following transferability evaluations are only based on the `cache`. The extraction process will also aggregate the GPU time.

In [1]:
import mml.interactive
from pathlib import Path
mml.interactive.init(Path('~/.config/mml.env').expanduser())

from mml_tf.experiments import load_arch_experiment, load_augmentation_experiment, load_baseline_experiment, \
    load_multi_task_experiment, load_pretrain_experiment, GPU_TIME, METRICS, EXPERIMENTS
from mml_tf.aggregate import AggregateStrategy
import copy
from rich.progress import track
import numpy as np

 _____ ______   _____ ______   ___
|\   _ \  _   \|\   _ \  _   \|\  \
\ \  \\\__\ \  \ \  \\\__\ \  \ \  \
 \ \  \\|__| \  \ \  \\|__| \  \ \  \
  \ \  \    \ \  \ \  \    \ \  \ \  \____
   \ \__\    \ \__\ \__\    \ \__\ \_______\
    \|__|     \|__|\|__|     \|__|\|_______|
         ____  _  _    __  _  _  ____  _  _
        (  _ \( \/ )  (  )( \/ )/ ___)( \/ )
         ) _ ( )  /    )( / \/ \\___ \ )  /
        (____/(__/    (__)\_)(_/(____/(__/
Interactive MML API initialized.


## Fill experiments cache

In [None]:
# baselines 
for metric in METRICS:
    for validation in [True, False]:
        for shrunk in [True, False]:
            load_baseline_experiment(metric=metric, shrunk=shrunk, validation=validation)  

In [None]:
# exp 1
for metric in METRICS:
    for validation in [True, False]:
        for shrunk in [True, False]:
            load_arch_experiment(metric=metric, shrunk=shrunk, validation=validation)  

The arch reports document our claim that each architecture is preferable in at least some cases. The first report corresponds to the `BA` validation on shrunk targets as in our paper:

### BA
{'tf_efficientnet_b2': 4, 'tf_efficientnet_b2_ap': 13, 'tf_efficientnet_b2_ns': 7, 'tf_efficientnet_cc_b0_4e': 2, 'swsl_resnet50': 7, 'ssl_resnext50_32x4d': 24, 'regnetx_032': 19, 'regnety_032': 23, 'rexnet_100': 8, 'ecaresnet50d': 11, 'cspdarknet53': 18, 'mixnet_l': 9, 'cspresnext50': 14, 'cspresnet50': 8, 'ese_vovnet39b': 5, 'resnest50d': 12, 'hrnet_w18': 12, 'skresnet34': 3, 'mobilenetv3_large_100': 3, 'res2net50_26w_4s': 11}

Same for the first AUROC report:

### AUROC
{'tf_efficientnet_b2': 9, 'tf_efficientnet_b2_ap': 7, 'tf_efficientnet_b2_ns': 12, 'tf_efficientnet_cc_b0_4e': 4, 'swsl_resnet50': 9, 'ssl_resnext50_32x4d': 23, 'regnetx_032': 19, 'regnety_032': 25, 'rexnet_100': 7, 'ecaresnet50d': 10, 'cspdarknet53': 14, 'mixnet_l': 12, 'cspresnext50': 9, 'cspresnet50': 2, 'ese_vovnet39b': 5, 'resnest50d': 20, 'hrnet_w18': 6, 'skresnet34': 2, 'mobilenetv3_large_100': 10, 'res2net50_26w_4s': 8}

In [None]:
# exp 2
for metric in METRICS:
    for validation in [True, False]:
        load_pretrain_experiment(metric=metric, validation=validation)

In [None]:
# exp 3
for metric in METRICS:
    for validation in [True, False]:
        load_augmentation_experiment(metric=metric, validation=validation)

In [None]:
# exp 4
for metric in METRICS:
    for validation in [True, False]:
        for shrunk in [True, False]:
            load_multi_task_experiment(metric=metric, shrunk=shrunk, validation=validation)

In [None]:
# 10673 total GPU hours for the development and validation phase
print(sum(GPU_TIME.values()) / 3600)

## Fill distances cache

In [2]:
from mml_tf.representations import FullFeatureRepresentations, AveragedFeatureRepresentations, \
    MeanAndCovarianceRepresentations, TagBasedRepresentations, \
    FisherEmbeddingRepresentations, BinnedFeatureRepresentations

In [3]:
# full and averaged representations
full_rep = FullFeatureRepresentations()
full_rep.load_representations()
avg_rep = AveragedFeatureRepresentations(full_features=full_rep)
avg_rep.load_representations()

In [4]:
# further standard representations
mean_cov_rep = MeanAndCovarianceRepresentations(full_features=full_rep)
mean_cov_rep.load_representations()
tag_rep = TagBasedRepresentations()
tag_rep.load_representations()
few_bins_rep = BinnedFeatureRepresentations(full_features=full_rep, n_bins=100)
few_bins_rep.load_representations()
lot_bins_rep = BinnedFeatureRepresentations(full_features=full_rep, n_bins=1000)
lot_bins_rep.load_representations()
tiny_bins_rep = BinnedFeatureRepresentations(full_features=full_rep, n_bins=5, min_q=0., max_q=0.9)
tiny_bins_rep.load_representations()
fisher_rep = FisherEmbeddingRepresentations()
fisher_rep.load_representations()

Output()

In [5]:
from mml_tf.distances import SemanticDistances, EMDDistances, KLDDistances, JSDistances, COSDistances, LNormDistances, \
    FIDDistances, LogDistances, ExpDistances, MMDDistances, GenericFEDDistances, OptimalDistances

In [6]:
# this list will hold all names of task distances for optimization on the development-split of tasks (`tf/mml_tf/variants.py`) holds the list for later reuse
all_variants = []

In [8]:
# calc manual baseline
all_variants.append(SemanticDistances(representations=tag_rep).name)

Output()

In [9]:
# calc various variants for Kullback-Leibler divergences
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft', 'wo']:
            for s_pp in ['norm', 'soft']:
                for t_pp in ['norm', 'soft']:
                    for inverted in [True, False]:
                        if isinstance(rep, BinnedFeatureRepresentations) and ((s_pp == 'norm' and not inverted) or (inverted and t_pp == 'norm')):
                            clip = True
                        else:
                            clip = False
                        _ = KLDDistances(representations=rep, source_pp=s_pp, target_pp=t_pp, invert=inverted, weighing_by=w_by, weights_rep=avg_rep, weights_pp=w_pp, clip=clip)
                        print(f'done {_.name}')
                        all_variants.append(_.name)

done KLD-I-PP:NN-W:SN
done KLD-PP:NN-W:SN
done KLD-I-PP:SN-W:SN
done KLD-PP:SN-W:SN
done KLD-I-PP:NS-W:SN
done KLD-PP:NS-W:SN
done KLD-I-PP:SS-W:SN
done KLD-PP:SS-W:SN
done KLD-I-PP:NN-W:SS
done KLD-PP:NN-W:SS
done KLD-I-PP:SN-W:SS
done KLD-PP:SN-W:SS
done KLD-I-PP:NS-W:SS
done KLD-PP:NS-W:SS
done KLD-I-PP:SS-W:SS
done KLD-PP:SS-W:SS
done KLD-I-PP:NN-W:SW
done KLD-PP:NN-W:SW
done KLD-I-PP:SN-W:SW
done KLD-PP:SN-W:SW
done KLD-I-PP:NS-W:SW
done KLD-PP:NS-W:SW
done KLD-I-PP:SS-W:SW
done KLD-PP:SS-W:SW
done KLD-C-I-PP:NN-W:SN-100-BINS
done KLD-C-PP:NN-W:SN-100-BINS
done KLD-I-PP:SN-W:SN-100-BINS
done KLD-C-PP:SN-W:SN-100-BINS
done KLD-C-I-PP:NS-W:SN-100-BINS
done KLD-PP:NS-W:SN-100-BINS
done KLD-I-PP:SS-W:SN-100-BINS
done KLD-PP:SS-W:SN-100-BINS
done KLD-C-I-PP:NN-W:SS-100-BINS
done KLD-C-PP:NN-W:SS-100-BINS
done KLD-I-PP:SN-W:SS-100-BINS
done KLD-C-PP:SN-W:SS-100-BINS
done KLD-C-I-PP:NS-W:SS-100-BINS
done KLD-PP:NS-W:SS-100-BINS
done KLD-I-PP:SS-W:SS-100-BINS
done KLD-PP:SS-W:SS-100-BINS


In [10]:
# plus some additional ones that use symmetric uniform smoothing 
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft', 'wo', 'uniform']:
            for alpha in [0.1, 0.01, 0.001]:
                _ = KLDDistances(representations=rep, source_pp='uniform', target_pp='uniform', weighing_by=w_by, alpha=alpha, weights_rep=avg_rep, weights_pp=w_pp)
                print(f'done {_.name}')
                all_variants.append(_.name)

done KLD-PP:UU-A:0.1-W:SN
done KLD-PP:UU-A:0.01-W:SN
done KLD-PP:UU-A:0.001-W:SN
done KLD-PP:UU-A:0.1-W:SS
done KLD-PP:UU-A:0.01-W:SS
done KLD-PP:UU-A:0.001-W:SS
done KLD-PP:UU-A:0.1-W:SW
done KLD-PP:UU-A:0.01-W:SW
done KLD-PP:UU-A:0.001-W:SW
done KLD-PP:UU-A:0.1-W:SU
done KLD-PP:UU-A:0.01-W:SU
done KLD-PP:UU-A:0.001-W:SU
done KLD-PP:UU-A:0.1-W:SN-100-BINS
done KLD-PP:UU-A:0.01-W:SN-100-BINS
done KLD-PP:UU-A:0.001-W:SN-100-BINS
done KLD-PP:UU-A:0.1-W:SS-100-BINS
done KLD-PP:UU-A:0.01-W:SS-100-BINS
done KLD-PP:UU-A:0.001-W:SS-100-BINS
done KLD-PP:UU-A:0.1-W:SW-100-BINS
done KLD-PP:UU-A:0.01-W:SW-100-BINS
done KLD-PP:UU-A:0.001-W:SW-100-BINS
done KLD-PP:UU-A:0.1-W:SU-100-BINS
done KLD-PP:UU-A:0.01-W:SU-100-BINS
done KLD-PP:UU-A:0.001-W:SU-100-BINS
done KLD-PP:UU-A:0.1-W:SN-1000-BINS
done KLD-PP:UU-A:0.01-W:SN-1000-BINS
done KLD-PP:UU-A:0.001-W:SN-1000-BINS
done KLD-PP:UU-A:0.1-W:SS-1000-BINS
done KLD-PP:UU-A:0.01-W:SS-1000-BINS
done KLD-PP:UU-A:0.001-W:SS-1000-BINS
done KLD-PP:UU-A:0.1-W

In [11]:
# calc various Jensen–Shannon divergence variants
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft', 'wo', 'uniform']:
            for alpha in [0.1, 0.01, 0.001]:
                _ = JSDistances(representations=rep, weighing_by=w_by, alpha=alpha, weights_rep=avg_rep, weights_pp=w_pp)
                print(f'done {_.name}')
                all_variants.append(_.name)

done JS-PP:NN-W:SN
done JS-PP:NN-W:SN
done JS-PP:NN-W:SN
done JS-PP:NN-W:SS
done JS-PP:NN-W:SS
done JS-PP:NN-W:SS
done JS-PP:NN-W:SW
done JS-PP:NN-W:SW
done JS-PP:NN-W:SW
done JS-PP:NN-W:SU
done JS-PP:NN-W:SU
done JS-PP:NN-W:SU
done JS-PP:NN-W:SN-100-BINS
done JS-PP:NN-W:SN-100-BINS
done JS-PP:NN-W:SN-100-BINS
done JS-PP:NN-W:SS-100-BINS
done JS-PP:NN-W:SS-100-BINS
done JS-PP:NN-W:SS-100-BINS
done JS-PP:NN-W:SW-100-BINS
done JS-PP:NN-W:SW-100-BINS
done JS-PP:NN-W:SW-100-BINS
done JS-PP:NN-W:SU-100-BINS
done JS-PP:NN-W:SU-100-BINS
done JS-PP:NN-W:SU-100-BINS
done JS-PP:NN-W:SN-1000-BINS
done JS-PP:NN-W:SN-1000-BINS
done JS-PP:NN-W:SN-1000-BINS
done JS-PP:NN-W:SS-1000-BINS
done JS-PP:NN-W:SS-1000-BINS
done JS-PP:NN-W:SS-1000-BINS
done JS-PP:NN-W:SW-1000-BINS
done JS-PP:NN-W:SW-1000-BINS
done JS-PP:NN-W:SW-1000-BINS
done JS-PP:NN-W:SU-1000-BINS
done JS-PP:NN-W:SU-1000-BINS
done JS-PP:NN-W:SU-1000-BINS
done JS-PP:NN-W:SN-5-BINS
done JS-PP:NN-W:SN-5-BINS
done JS-PP:NN-W:SN-5-BINS
done JS-PP

In [12]:
# calc Earth-Mover's distances
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft']:
            for do_soft in [True, False]:
                _ = EMDDistances(representations=rep, soft_features=do_soft, weighing_by=w_by, weights_rep=avg_rep, weights_pp=w_pp)
                print(f'done {_.name}')
                all_variants.append(_.name)

done EMD-AVG-PP:SS-W:SN
done EMD-AVG-PP:NN-W:SN
done EMD-AVG-PP:SS-W:SS
done EMD-AVG-PP:NN-W:SS
done VDNA-PP:SS-W:SN-100-BINS
done VDNA-PP:NN-W:SN-100-BINS
done VDNA-PP:SS-W:SS-100-BINS
done VDNA-PP:NN-W:SS-100-BINS
done VDNA-PP:SS-W:SN-1000-BINS
done VDNA-PP:NN-W:SN-1000-BINS
done VDNA-PP:SS-W:SS-1000-BINS
done VDNA-PP:NN-W:SS-1000-BINS
done VDNA-PP:SS-W:SN-5-BINS
done VDNA-PP:NN-W:SN-5-BINS
done VDNA-PP:SS-W:SS-5-BINS
done VDNA-PP:NN-W:SS-5-BINS
done EMD-AVG-PP:SS-W:TN
done EMD-AVG-PP:NN-W:TN
done EMD-AVG-PP:SS-W:TS
done EMD-AVG-PP:NN-W:TS
done VDNA-PP:SS-W:TN-100-BINS
done VDNA-PP:NN-W:TN-100-BINS
done VDNA-PP:SS-W:TS-100-BINS
done VDNA-PP:NN-W:TS-100-BINS
done VDNA-PP:SS-W:TN-1000-BINS
done VDNA-PP:NN-W:TN-1000-BINS
done VDNA-PP:SS-W:TS-1000-BINS
done VDNA-PP:NN-W:TS-1000-BINS
done VDNA-PP:SS-W:TN-5-BINS
done VDNA-PP:NN-W:TN-5-BINS
done VDNA-PP:SS-W:TS-5-BINS
done VDNA-PP:NN-W:TS-5-BINS
done EMD-AVG-PP:SS-W:BN
done EMD-AVG-PP:NN-W:BN
done EMD-AVG-PP:SS-W:BS
done EMD-AVG-PP:NN-W:BS


In [13]:
# calc Cosine Similarity Distances
for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
    for do_soft in [True, False]:
        _ = COSDistances(representations=rep, soft_features=do_soft)
        print(f'done {_.name}')
        all_variants.append(_.name)

done COS-PP:SS
done COS-PP:NN
done COS-PP:SS-100-BINS
done COS-PP:NN-100-BINS
done COS-PP:SS-1000-BINS
done COS-PP:NN-1000-BINS
done COS-PP:SS-5-BINS
done COS-PP:NN-5-BINS


In [14]:
# calc distances based on L-Norm
for w_by in ['source', 'target', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft']:
            for do_soft in [True, False]:
                for p in range(1, 4):
                    _ = LNormDistances(representations=rep, p=p, soft_features=do_soft, weighing_by=w_by, weights_rep=avg_rep, weights_pp=w_pp)
                    print(f'done {_.name}')
                    all_variants.append(_.name)

done L-1-NORM-W:SN
done L-2-NORM-W:SN
done L-3-NORM-W:SN
done L-1-NORM-W:SN
done L-2-NORM-W:SN
done L-3-NORM-W:SN
done L-1-NORM-W:SS
done L-2-NORM-W:SS
done L-3-NORM-W:SS
done L-1-NORM-W:SS
done L-2-NORM-W:SS
done L-3-NORM-W:SS
done L-1-NORM-PP:SS-W:SN-100-BINS
done L-2-NORM-PP:SS-W:SN-100-BINS
done L-3-NORM-PP:SS-W:SN-100-BINS
done L-1-NORM-PP:NN-W:SN-100-BINS
done L-2-NORM-PP:NN-W:SN-100-BINS
done L-3-NORM-PP:NN-W:SN-100-BINS
done L-1-NORM-PP:SS-W:SS-100-BINS
done L-2-NORM-PP:SS-W:SS-100-BINS
done L-3-NORM-PP:SS-W:SS-100-BINS
done L-1-NORM-PP:NN-W:SS-100-BINS
done L-2-NORM-PP:NN-W:SS-100-BINS
done L-3-NORM-PP:NN-W:SS-100-BINS
done L-1-NORM-PP:SS-W:SN-1000-BINS
done L-2-NORM-PP:SS-W:SN-1000-BINS
done L-3-NORM-PP:SS-W:SN-1000-BINS
done L-1-NORM-PP:NN-W:SN-1000-BINS
done L-2-NORM-PP:NN-W:SN-1000-BINS
done L-3-NORM-PP:NN-W:SN-1000-BINS
done L-1-NORM-PP:SS-W:SS-1000-BINS
done L-2-NORM-PP:SS-W:SS-1000-BINS
done L-3-NORM-PP:SS-W:SS-1000-BINS
done L-1-NORM-PP:NN-W:SS-1000-BINS
done L-2-NORM-

In [15]:
# calc distances in exp and log space
for s_w in range(5):
    for t_w in range(5):
        x = LogDistances(representations=avg_rep, w_t=t_w, w_s=s_w)
        print(f'done {x.name}')
        y = ExpDistances(representations=avg_rep, w_t=t_w, w_s=s_w)
        print(f'done {y.name}')
        all_variants.extend([x.name, y.name])

done LOG-S:0-T:0
done EXP-S:0-T:0
done LOG-S:0-T:1
done EXP-S:0-T:1
done LOG-S:0-T:2
done EXP-S:0-T:2
done LOG-S:0-T:3
done EXP-S:0-T:3
done LOG-S:0-T:4
done EXP-S:0-T:4
done LOG-S:1-T:0
done EXP-S:1-T:0
done LOG-S:1-T:1
done EXP-S:1-T:1
done LOG-S:1-T:2
done EXP-S:1-T:2
done LOG-S:1-T:3
done EXP-S:1-T:3
done LOG-S:1-T:4
done EXP-S:1-T:4
done LOG-S:2-T:0
done EXP-S:2-T:0
done LOG-S:2-T:1
done EXP-S:2-T:1
done LOG-S:2-T:2
done EXP-S:2-T:2
done LOG-S:2-T:3
done EXP-S:2-T:3
done LOG-S:2-T:4
done EXP-S:2-T:4
done LOG-S:3-T:0
done EXP-S:3-T:0
done LOG-S:3-T:1
done EXP-S:3-T:1
done LOG-S:3-T:2
done EXP-S:3-T:2
done LOG-S:3-T:3
done EXP-S:3-T:3
done LOG-S:3-T:4
done EXP-S:3-T:4
done LOG-S:4-T:0
done EXP-S:4-T:0
done LOG-S:4-T:1
done EXP-S:4-T:1
done LOG-S:4-T:2
done EXP-S:4-T:2
done LOG-S:4-T:3
done EXP-S:4-T:3
done LOG-S:4-T:4
done EXP-S:4-T:4


In [7]:
_ = MMDDistances(representations=full_rep, kernel='geo-sinkhorn', blur=0.01) # 2 hours 16 minutes

Output()

In [17]:
# maximum mean discrepancy
for kernel in [
    # 'cauchy', 'gaussian',
    'coral',
    # 'geo-energy', 'geo-gaussian', 'geo-laplacian',
    'geo-sinkhorn', ]:
    for blur in [0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3]:
        _ = MMDDistances(representations=full_rep, kernel=kernel, blur=blur)
        all_variants.append(_.name)
        print(_.name)

Output()

MMD-coral-0.01


Output()

MMD-coral-0.05


Output()

MMD-coral-0.1


Output()

MMD-coral-0.15


Output()

MMD-coral-0.2


Output()

MMD-coral-0.25


Output()

MMD-coral-0.3


Output()

MMD-geo-sinkhorn-0.01


Output()

MMD-geo-sinkhorn-0.05


Output()

MMD-geo-sinkhorn-0.1


Output()

MMD-geo-sinkhorn-0.15


Output()

KeyboardInterrupt: 

In [16]:
# calc fid distance
all_variants.append(FIDDistances(representations=mean_cov_rep).name)

In [None]:
from mml_tf.variants import variants as stored_variants
assert sorted(list(set(all_variants))) == stored_variants

In [None]:
print(all_variants)

## additional baselines

In [None]:
all_variants

In [None]:
# STep 1: Run some feature extractions -> can be large models but feature size must be reasonable

# Step 2: gather some generalist foundation models -> in addition to the previous models

# Step 3: extract features from MSD -> evaluate on actual outcomes

# Step 4: 


# extract / run with eva_giant_patch14_224.clip_ft_in1k (LORA? since very LARGE) -> right size
# see https://github.com/huggingface/pytorch-image-models/blob/main/results/results-imagenet.csv
# SOme models below 100m params
# eva02_base_patch14_448.mim_in22k_ft_in22k_in1k (only 87m params!!! while ranked 13!)
# convnextv2_base.fcmae_ft_in22k_in1k_384
# caformer_b36.sail_in22k_ft_in1k_384
# vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k
# coatnet_rmlp_2_rw_384.sw_in12k_ft_in1k
# swinv2_base_window12to24_192to384.ms_in22k_ft_in1k
# convformer_b36.sail_in22k_ft_in1k

# vit_base_patch14_dinov2.lvd142m 
# timm/vit_base_patch16_clip_224.openai
# timm/vit_base_patch16_224.mae
# timm/samvit_huge_patch16.sa1b (huge) -> not yet available

In [None]:
# fisher embedding distance
layers = list(fisher_rep.mapping['lapgyn4_surgical_actions'].keys())
layers = layers[int(0.4*len(layers)):]
_ = GenericFEDDistances(representations=fisher_rep, layers=layers, name='FED')

In [None]:
# "Optimal" distances are a tool to mimic the actually measured performances, these distances may be used for analysis purposes
optimal_variants = []
for exp in EXPERIMENTS:
    for agg in [AggregateStrategy.FIRST, AggregateStrategy.SECOND, AggregateStrategy.THIRD]:
        for metric in METRICS:
            x = OptimalDistances(metric=metric, agg=agg, exp=exp)
            optimal_variants.append(x.name)

In [None]:
from mml_tf.variants import optimal_variants as stored_optimal_variants
assert optimal_variants == stored_optimal_variants

## vary bins experiments

In [16]:
# additional variants by bin size of the chosen three variants of bKLD
bin_range = [10, 25, 50, 75, 100, 250, 500, 750, 1000]
binned_reps = []
for n_bins in bin_range:
    tmp = BinnedFeatureRepresentations(n_bins=n_bins, full_features=full_rep)
    tmp.load_representations()
    binned_reps.append(tmp)

In [17]:
for rep in binned_reps:
    _ = KLDDistances(representations=rep, weighing_by='target', weights_pp='soft', weights_rep=avg_rep)
    _ = KLDDistances(representations=rep, weighing_by=None)
    _ = KLDDistances(representations=rep, weighing_by='source', weights_pp='norm', weights_rep=avg_rep)

## vary sample size experiments

In [21]:
print(full_rep.mapping['lapgyn4_anatomical_actions'].dtype)

float32


In [18]:
# additional experiments by variation of the number of samples used to determine bKLD fingerprints
n_reps = 10
for n_samples in track([10, 100, 1000]):
    for rep_idx in range(n_reps):
        tmp_rep = copy.deepcopy(full_rep)
        idxs = np.random.randint(tmp_rep.n_samples, size=n_samples)
        # adapt samples
        tmp_rep.mapping = {k: v[idxs, :] for k, v in tmp_rep.mapping.items()}
        # avg for weighing
        tmp_avg_rep = AveragedFeatureRepresentations(full_features=tmp_rep)
        tmp_avg_rep.load_representations()
        # bin to fingerprint (small and large)
        rep_small = BinnedFeatureRepresentations(full_features=tmp_rep, n_bins=100)
        rep_small.load_representations()
        rep_large = BinnedFeatureRepresentations(full_features=tmp_rep, n_bins=1000)
        rep_large.load_representations()
        _ = KLDDistances(representations=rep_small, weights_rep=tmp_avg_rep, weights_pp='soft', weighing_by='target', seed=rep_idx)
        _ = KLDDistances(representations=rep_large, weights_rep=tmp_avg_rep, weights_pp='norm', weighing_by='source', seed=rep_idx)
        _ = KLDDistances(representations=rep_large, weighing_by=None, seed=rep_idx)

Output()

In [19]:
few_bins_rep.scaling_subset

('lapgyn4_anatomical_structures',
 'lapgyn4_surgical_actions',
 'lapgyn4_instrument_count',
 'lapgyn4_anatomical_actions',
 'sklin2_skin_lesions',
 'identify_nbi_infframes',
 'laryngeal_tissues',
 'nerthus_bowel_cleansing_quality',
 'stanford_dogs_image_categorization',
 'svhn',
 'caltech101_object_classification',
 'caltech256_object_classification',
 'cifar10_object_classification',
 'cifar100_object_classification',
 'mnist_digit_classification',
 'emnist_digit_classification',
 'hyperkvasir_anatomical-landmarks',
 'hyperkvasir_pathological-findings',
 'hyperkvasir_quality-of-mucosal-views',
 'hyperkvasir_therapeutic-interventions',
 'cholec80_grasper_presence',
 'cholec80_bipolar_presence',
 'cholec80_hook_presence',
 'cholec80_scissors_presence',
 'cholec80_clipper_presence',
 'cholec80_irrigator_presence',
 'cholec80_specimenbag_presence',
 'derm7pt_skin_lesions')

In [20]:
stacked = np.concatenate(tuple([few_bins_rep.full_features.mapping[t] for t in few_bins_rep.scaling_subset]), axis=0).astype(np.float32)
feature_mins = np.quantile(stacked, few_bins_rep.min_q, axis=0)
feature_maxs = np.quantile(stacked, few_bins_rep.max_q, axis=0)

In [21]:
stacked = np.concatenate(tuple([few_bins_rep.full_features.mapping[t] for t in few_bins_rep.full_features.task_list]), axis=0).astype(np.float32)
new_feature_mins = np.quantile(stacked, few_bins_rep.min_q, axis=0)
new_feature_maxs = np.quantile(stacked, few_bins_rep.max_q, axis=0)

In [22]:
feature_mins

array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       5.16104817e-01, 0.00000000e+00, 6.70333160e-04, 0.00000000e+00,
       2.84840027e-03, 0.00000000e+00, 0.00000000e+00, 8.03377200e-03,
       4.64935973e-03, 0.00000000e+00, 9.44069587e-04, 0.00000000e+00,
       0.00000000e+00, 1.45659782e-02, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.04821183e-03,
       0.00000000e+00, 5.37513639e-04, 0.00000000e+00, 6.19232771e-04,
       8.67359398e-04, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       1.86222550e-02, 1.32665376e-03, 0.00000000e+00, 0.00000000e+00,
       1.14343995e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 4.00600769e-02, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 1.20156627e-04, 0.00000000e+00, 0.00000000e+00,
      

In [28]:
np.abs(feature_mins - new_feature_mins).max()

np.float32(0.3554415)

In [29]:
np.abs(feature_maxs - new_feature_maxs).max()


np.float32(1.0566828)

In [24]:
feature_maxs

array([0.34928352, 0.31342295, 0.4442475 , 0.34801778, 1.9399432 ,
       0.53592646, 0.6404944 , 0.5040042 , 0.7698236 , 0.4738799 ,
       0.42575854, 0.5550468 , 0.5506701 , 0.35209334, 0.5044099 ,
       0.3286378 , 0.35257107, 0.9819334 , 0.3094721 , 0.5031402 ,
       0.60661715, 0.44955945, 0.2222653 , 0.5652985 , 0.49303567,
       0.36309928, 0.29518852, 0.3377321 , 0.4401544 , 0.68422866,
       0.36868328, 0.71486306, 0.25352913, 0.376788  , 0.57925856,
       0.75328225, 0.57032746, 0.22904955, 0.34015948, 0.47355694,
       0.7890268 , 0.5814415 , 0.16312598, 1.3236686 , 1.0221547 ,
       0.28524545, 0.49005127, 0.47525266, 0.40907976, 0.8195322 ,
       0.5765664 , 0.26549217, 0.5249207 , 0.36577752, 0.27592108,
       0.26816908, 0.20098042, 0.25538793, 0.17523813, 0.42489633,
       0.5121824 , 0.7939044 , 0.27091637, 0.7855611 , 0.2727895 ,
       0.27197534, 0.31332272, 0.600336  , 0.27080312, 1.4137048 ,
       0.30912125, 0.5752633 , 0.25592342, 0.35527778, 0.24733

In [25]:
new_feature_maxs

array([0.45085442, 0.29080346, 0.369824  , 0.36805952, 1.7226319 ,
       0.65952605, 0.52518636, 0.55051416, 0.67477036, 0.38440245,
       0.50661975, 0.7536803 , 0.41824397, 0.32377678, 0.42799878,
       0.30669594, 0.46407917, 0.8983557 , 0.27533242, 0.47823116,
       0.6139464 , 0.43535823, 0.16942774, 0.31355628, 0.5273563 ,
       0.27236366, 0.30069914, 0.43845907, 0.45056626, 0.67013115,
       0.45735955, 0.6885409 , 0.25671822, 0.3208137 , 0.66106004,
       1.0911969 , 0.72775495, 0.174128  , 0.3650191 , 0.74411094,
       0.74029386, 0.77899814, 0.16573447, 1.230354  , 0.9337574 ,
       0.18442395, 0.6489288 , 0.61535203, 0.33557054, 0.71315634,
       0.6219101 , 0.26290676, 0.47859463, 0.32894546, 0.23321287,
       0.26419967, 0.20025693, 0.39638934, 0.3139131 , 0.44252098,
       0.44791394, 0.78499335, 0.39528522, 0.78710717, 0.28895462,
       0.33557945, 0.23340063, 0.5875666 , 0.21299867, 1.4244397 ,
       0.35350212, 0.46876198, 0.30334064, 0.43133444, 0.20769