## Information on neural architectures

This notebook produces Tab. 1 of the Extended Data. It aggregates information of the neural architectures used (specifically in the first transfer scenario).

In [1]:
import  pandas as pd
import humanize
from mml.core.models.timm import TimmGenericModel
from mml_tf.paths import DATA_PATH, FIG_PATH

In [5]:
archs = ['tf_efficientnet_b2', 'tf_efficientnet_b2_ap', 'tf_efficientnet_b2_ns', 'tf_efficientnet_cc_b0_4e', 'swsl_resnet50', 'ssl_resnext50_32x4d', 'regnetx_032', 'regnety_032', 'rexnet_100', 'ecaresnet50d', 'cspdarknet53', 'mixnet_l', 'cspresnext50', 'cspresnet50', 'ese_vovnet39b', 'resnest50d', 'hrnet_w18', 'skresnet34', 'mobilenetv3_large_100', 'res2net50_26w_4s', 'resnet34']

Note that multiple updates of the timm library changed weights and other minor details. We try to extract ImageNet performance on respective older reports.

In [6]:
results2020 = pd.read_csv(DATA_PATH / 'results-imagenet-b496b7bde9861b8736c6ef74c735313e20058252.csv', index_col=0)
results2022 = pd.read_csv(DATA_PATH / 'results-imagenet-960f5f92e645a8d02757bf32fa680499127d2c98.csv', index_col=0)

In [4]:
arch_infos = {}
for arch_name in archs:
    res_df = results2022 if arch_name not in results2020.index else results2020
    _info = {'Accuracy': res_df.at[arch_name,'top1'], 'params (reported)': res_df.at[arch_name,'param_count']}
    m = TimmGenericModel(name=arch_name, pretrained=True, drop_rate=0.)
    _info['measured params'] = humanize.intword(m.train().count_parameters(only_trainable=True)['backbone'])
    _info['hub_id'] = m.backbone.default_cfg['hf_hub_id']
    arch_infos[arch_name] = _info
info_df = pd.DataFrame(arch_infos)


  model = create_fn(
  model = create_fn(
  model = create_fn(
  model = create_fn(


In [7]:
info_df.T

Unnamed: 0,Accuracy,params (reported),measured params,hub_id
tf_efficientnet_b2,80.086,9.11,7.7 million,timm/tf_efficientnet_b2.ns_jft_in1k
tf_efficientnet_b2_ap,80.3,9.11,7.7 million,timm/tf_efficientnet_b2.ap_in1k
tf_efficientnet_b2_ns,82.38,9.11,7.7 million,timm/tf_efficientnet_b2.ns_jft_in1k
tf_efficientnet_cc_b0_4e,77.306,13.31,12.0 million,timm/tf_efficientnet_cc_b0_4e.in1k
swsl_resnet50,81.166,25.56,23.5 million,timm/resnet50.fb_swsl_ig1b_ft_in1k
ssl_resnext50_32x4d,80.318,25.03,23.0 million,timm/resnext50_32x4d.fb_ssl_yfcc100m_ft_in1k
regnetx_032,78.172,15.3,14.3 million,timm/regnetx_032.tv2_in1k
regnety_032,78.886,19.44,17.9 million,timm/regnety_032.ra_in1k
rexnet_100,77.858,4.8,3.5 million,timm/rexnet_100.nav_in1k
ecaresnet50d,80.592,25.58,23.5 million,timm/ecaresnet50d.miil_in1k


The difference between reported and measures parameters should be the final classification layer which makes up roughly n_features x n_classes parameters ~ 1,000 x 1,000 ~ 1 million (plus biases, but n_features varies between architectures). We report the measured parameters without the classification head.

In [8]:
info_df.to_csv(FIG_PATH / 'model_infos.csv')