## Fill cache

This notebook extracts the results from the transfer experiments (the results must be stored according to `MML` configuration) and computes the task distances (based on the features extracted and placed inside the `data` folder). The extracted results and computed distances are placed inside the `cache` folder and shared with this repository. All following transferability evaluations are only based on the `cache`. The extraction process will also aggregate the GPU time.

In [1]:
import mml.interactive
mml.interactive.init()

from mml_tf.experiments import load_arch_experiment, load_augmentation_experiment, load_baseline_experiment, \
    load_multi_task_experiment, load_pretrain_experiment, GPU_TIME, METRICS, EXPERIMENTS
from mml_tf.aggregate import AggregateStrategy
import copy
from rich.progress import track
import numpy as np

  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)


 _____ ______   _____ ______   ___
|\   _ \  _   \|\   _ \  _   \|\  \
\ \  \\\__\ \  \ \  \\\__\ \  \ \  \
 \ \  \\|__| \  \ \  \\|__| \  \ \  \
  \ \  \    \ \  \ \  \    \ \  \ \  \____
   \ \__\    \ \__\ \__\    \ \__\ \_______\
    \|__|     \|__|\|__|     \|__|\|_______|
         ____  _  _    __  _  _  ____  _  _
        (  _ \( \/ )  (  )( \/ )/ ___)( \/ )
         ) _ ( )  /    )( / \/ \\___ \ )  /
        (____/(__/    (__)\_)(_/(____/(__/
Interactive MML API initialized.


## Fill experiments cache

In [2]:
# baselines 
for metric in METRICS:
    for validation in [True, False]:
        for shrunk in [True, False]:
            load_baseline_experiment(metric=metric, shrunk=shrunk, validation=validation)  

Loading ...
Extracting ...
Total GPU time for baseline_shrunk was 84076.05636731617s.
Loading ...
Extracting ...
Total GPU time for baseline_full was 195163.2805851912s.
Loading ...
Extracting ...
Loading ...
Extracting ...
Loading ...
Extracting ...
Loading ...
Extracting ...
Loading ...
Extracting ...
Loading ...
Extracting ...


In [3]:
# exp 1
for metric in METRICS:
    for validation in [True, False]:
        for shrunk in [True, False]:
            load_arch_experiment(metric=metric, shrunk=shrunk, validation=validation)  

Loading ...
Loading shrunk ...
Extracting ...
Total GPU time for full_arch_val was 5712678.932425539s.
Extracting ...
Total GPU time for shrunk_arch_val was 1511126.9422990251s.
metric='BA' arch_report={'tf_efficientnet_b2': 4, 'tf_efficientnet_b2_ap': 13, 'tf_efficientnet_b2_ns': 7, 'tf_efficientnet_cc_b0_4e': 2, 'swsl_resnet50': 7, 'ssl_resnext50_32x4d': 24, 'regnetx_032': 19, 'regnety_032': 23, 'rexnet_100': 8, 'ecaresnet50d': 11, 'cspdarknet53': 18, 'mixnet_l': 9, 'cspresnext50': 14, 'cspresnet50': 8, 'ese_vovnet39b': 5, 'resnest50d': 12, 'hrnet_w18': 12, 'skresnet34': 3, 'mobilenetv3_large_100': 3, 'res2net50_26w_4s': 11}
Loading ...
Extracting ...
metric='BA' arch_report={'tf_efficientnet_b2': 4, 'tf_efficientnet_b2_ap': 13, 'tf_efficientnet_b2_ns': 7, 'tf_efficientnet_cc_b0_4e': 2, 'swsl_resnet50': 7, 'ssl_resnext50_32x4d': 24, 'regnetx_032': 19, 'regnety_032': 23, 'rexnet_100': 8, 'ecaresnet50d': 11, 'cspdarknet53': 18, 'mixnet_l': 9, 'cspresnext50': 14, 'cspresnet50': 8, 'ese_

The arch reports document our claim that each architecture is preferable in at least some cases. The first report corresponds to the `BA` validation on shrunk targets as in our paper:

### BA
{'tf_efficientnet_b2': 4, 'tf_efficientnet_b2_ap': 13, 'tf_efficientnet_b2_ns': 7, 'tf_efficientnet_cc_b0_4e': 2, 'swsl_resnet50': 7, 'ssl_resnext50_32x4d': 24, 'regnetx_032': 19, 'regnety_032': 23, 'rexnet_100': 8, 'ecaresnet50d': 11, 'cspdarknet53': 18, 'mixnet_l': 9, 'cspresnext50': 14, 'cspresnet50': 8, 'ese_vovnet39b': 5, 'resnest50d': 12, 'hrnet_w18': 12, 'skresnet34': 3, 'mobilenetv3_large_100': 3, 'res2net50_26w_4s': 11}

Same for the first AUROC report:

### AUROC
{'tf_efficientnet_b2': 9, 'tf_efficientnet_b2_ap': 7, 'tf_efficientnet_b2_ns': 12, 'tf_efficientnet_cc_b0_4e': 4, 'swsl_resnet50': 9, 'ssl_resnext50_32x4d': 23, 'regnetx_032': 19, 'regnety_032': 25, 'rexnet_100': 7, 'ecaresnet50d': 10, 'cspdarknet53': 14, 'mixnet_l': 12, 'cspresnext50': 9, 'cspresnet50': 2, 'ese_vovnet39b': 5, 'resnest50d': 20, 'hrnet_w18': 6, 'skresnet34': 2, 'mobilenetv3_large_100': 10, 'res2net50_26w_4s': 8}

In [4]:
# exp 2
for metric in METRICS:
    for validation in [True, False]:
        load_pretrain_experiment(metric=metric, validation=validation)

Loading ...
Extracting ...
Total GPU time for pretraining_val was 5044369.5314246295s.
Loading ...
Extracting ...
Total GPU time for pretraining_dev was 1445569.1672105892s.
Loading ...
Extracting ...
Loading ...
Extracting ...


In [5]:
# exp 3
for metric in METRICS:
    for validation in [True, False]:
        load_augmentation_experiment(metric=metric, validation=validation)

Loading ...
Extracting ...
Total GPU time for augmentations_val was 1481139.1226490608s.
Loading ...
Extracting ...
Total GPU time for augmentations_dev was 1286579.7905641198s.
Loading ...
Extracting ...
Loading ...
Extracting ...


In [6]:
# exp 4
for metric in METRICS:
    for validation in [True, False]:
        for shrunk in [True, False]:
            load_multi_task_experiment(metric=metric, shrunk=shrunk, validation=validation)

Loading ...
Loading shrunk ...
Extracting ...
Total GPU time for full_multi_val was 9048212.029108422s.
Extracting ...
Total GPU time for shrunk_multi_val was 1609227.1599052493s.
Loading ...
Extracting ...
Loading ...
Loading shrunk ...
Extracting ...
Total GPU time for full_multi_dev was 5072083.557222832s.
Extracting ...
Total GPU time for shrunk_multi_dev was 1471259.682063573s.
Loading ...
Extracting ...
Loading ...
Loading shrunk ...
Extracting ...
Extracting ...
Loading ...
Extracting ...


In [7]:
# 10673 total GPU hours for the development and validation phase
print(sum(GPU_TIME.values()) / 3600)

10673.089001214705


## Fill distances cache

In [8]:
from mml_tf.representations import FullFeatureRepresentations, AveragedFeatureRepresentations, \
    MeanAndCovarianceRepresentations, TagBasedRepresentations, \
    FisherEmbeddingRepresentations, BinnedFeatureRepresentations

In [9]:
# full and averaged representations
full_rep = FullFeatureRepresentations()
full_rep.load_representations()
avg_rep = AveragedFeatureRepresentations(full_features=full_rep)
avg_rep.load_representations()

In [None]:
# further standard representations
mean_cov_rep = MeanAndCovarianceRepresentations(full_features=full_rep)
mean_cov_rep.load_representations()
tag_rep = TagBasedRepresentations()
tag_rep.load_representations()
few_bins_rep = BinnedFeatureRepresentations(full_features=full_rep, n_bins=100)
few_bins_rep.load_representations()
lot_bins_rep = BinnedFeatureRepresentations(full_features=full_rep, n_bins=1000)
lot_bins_rep.load_representations()
fisher_rep = FisherEmbeddingRepresentations()
fisher_rep.load_representations()
tiny_bins_rep = BinnedFeatureRepresentations(full_features=full_rep, n_bins=5, min_q=0., max_q=0.9)
tiny_bins_rep.load_representations()

In [11]:
from mml_tf.distances import SemanticDistances, EMDDistances, KLDDistances, JSDistances, COSDistances, LNormDistances, \
    FIDDistances, LoadMMLComputedDistances, LogDistances, ExpDistances, MMDDistances, GenericFEDDistances, OptimalDistances

In [12]:
# this list will hold all names of task distances for optimisation on the develop split of tasks (`tf/mml_tf/variants.py`) holds the list for later reuse
all_variants = []

In [13]:
# calc manual baseline
all_variants.append(SemanticDistances(representations=tag_rep).name)

In [None]:
# calc various variants for Kullback-Leibler divergences
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft', 'wo']:
            for s_pp in ['norm', 'soft']:
                for t_pp in ['norm', 'soft']:
                    for inverted in [True, False]:
                        if isinstance(rep, BinnedFeatureRepresentations) and ((s_pp == 'norm' and not inverted) or (inverted and t_pp == 'norm')):
                            clip = True
                        else:
                            clip = False
                        _ = KLDDistances(representations=rep, source_pp=s_pp, target_pp=t_pp, invert=inverted, weighing_by=w_by, weights_rep=avg_rep, weights_pp=w_pp, clip=clip)
                        print(f'done {_.name}')
                        all_variants.append(_.name)

In [None]:
# plus some additional ones that use symmetric uniform smoothing 
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft', 'wo', 'uniform']:
            for alpha in [0.1, 0.01, 0.001]:
                _ = KLDDistances(representations=rep, source_pp='uniform', target_pp='uniform', weighing_by=w_by, alpha=alpha, weights_rep=avg_rep, weights_pp=w_pp)
                print(f'done {_.name}')
                all_variants.append(_.name)

In [None]:
# calc various Jensen–Shannon divergence variants
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft', 'wo', 'uniform']:
            for alpha in [0.1, 0.01, 0.001]:
                _ = JSDistances(representations=rep, weighing_by=w_by, alpha=alpha, weights_rep=avg_rep, weights_pp=w_pp)
                print(f'done {_.name}')
                all_variants.append(_.name)

In [None]:
# calc Earth-Mover's distances
for w_by in ['source', 'target', 'both', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft']:
            for do_soft in [True, False]:
                _ = EMDDistances(representations=rep, soft_features=do_soft, weighing_by=w_by, weights_rep=avg_rep, weights_pp=w_pp)
                print(f'done {_.name}')
                all_variants.append(_.name)

In [None]:
# calc Cosine Similarity Distances
for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
    for do_soft in [True, False]:
        _ = COSDistances(representations=rep, soft_features=do_soft)
        print(f'done {_.name}')
        all_variants.append(_.name)

In [None]:
# calc distances based on L-Norm
for w_by in ['source', 'target', None]:
    for rep in [avg_rep, few_bins_rep, lot_bins_rep, tiny_bins_rep]:
        for w_pp in ['norm', 'soft']:
            for do_soft in [True, False]:
                for p in range(1, 4):
                    _ = LNormDistances(representations=rep, p=p, soft_features=do_soft, weighing_by=w_by, weights_rep=avg_rep, weights_pp=w_pp)
                    print(f'done {_.name}')
                    all_variants.append(_.name)

In [None]:
for s_w in range(5):
    for t_w in range(5):
        x = LogDistances(representations=avg_rep, w_t=t_w, w_s=s_w)
        print(f'done {x.name}')
        y = ExpDistances(representations=avg_rep, w_t=t_w, w_s=s_w)
        print(f'done {y.name}')
        all_variants.extend([x.name, y.name])

In [35]:
# calc fid distance
all_variants.append(FIDDistances(representations=mean_cov_rep).name)

In [37]:
from mml_tf.variants import variants as stored_variants
assert sorted(list(set(all_variants))) == stored_variants

## additional baselines

In [22]:
# maximum mean discrepancy
_ = MMDDistances(representations=full_rep)

In [23]:
# fisher embedding distance
layers = list(fisher_rep.mapping['lapgyn4_surgical_actions'].keys())
layers = layers[int(0.4*len(layers)):]
_ = GenericFEDDistances(representations=fisher_rep, layers=layers, name='FED')

In [None]:
# "Optimal" distances are a tool to mimic the actually measured performances, these distances may be used for analysis purposes
optimal_variants = []
for exp in EXPERIMENTS:
    for agg in [AggregateStrategy.FIRST, AggregateStrategy.SECOND, AggregateStrategy.THIRD]:
        for metric in METRICS:
            x = OptimalDistances(metric=metric, agg=agg, exp=exp)
            optimal_variants.append(x.name)

In [38]:
from mml_tf.variants import optimal_variants as stored_optimal_variants
assert optimal_variants == stored_optimal_variants

## vary bins experiments

In [26]:
# additional variants by bin size of the chosen three variants of bKLD
bin_range = [10, 25, 50, 75, 100, 250, 500, 750, 1000]
binned_reps = []
for n_bins in bin_range:
    tmp = BinnedFeatureRepresentations(n_bins=n_bins, full_features=full_rep)
    tmp.load_representations()
    binned_reps.append(tmp)

In [27]:
for rep in binned_reps:
    _ = KLDDistances(representations=rep, weighing_by='target', weights_pp='soft', weights_rep=avg_rep)
    _ = KLDDistances(representations=rep, weighing_by=None)
    _ = KLDDistances(representations=rep, weighing_by='source', weights_pp='norm', weights_rep=avg_rep)

## vary sample size experiments

In [None]:
# additional experiments by variation of the number of samples used to determine bKLD fingerprints
n_reps = 10
for n_samples in track([10, 100, 1000]):
    for rep_idx in range(n_reps):
        tmp_rep = copy.deepcopy(full_rep)
        idxs = np.random.randint(tmp_rep.n_samples, size=n_samples)
        # adapt samples
        tmp_rep.mapping = {k: v[idxs, :] for k, v in tmp_rep.mapping.items()}
        # avg for weighing
        tmp_avg_rep = AveragedFeatureRepresentations(full_features=tmp_rep)
        tmp_avg_rep.load_representations()
        # bin to fingerprint (small and large)
        rep_small = BinnedFeatureRepresentations(full_features=tmp_rep, n_bins=100)
        rep_small.load_representations()
        rep_large = BinnedFeatureRepresentations(full_features=tmp_rep, n_bins=1000)
        rep_large.load_representations()
        _ = KLDDistances(representations=rep_small, weights_rep=tmp_avg_rep, weights_pp='soft', weighing_by='target', seed=rep_idx)
        _ = KLDDistances(representations=rep_large, weights_rep=tmp_avg_rep, weights_pp='norm', weighing_by='source', seed=rep_idx)
        _ = KLDDistances(representations=rep_large, weighing_by=None, seed=rep_idx)