Skip to content

Terminology

melwes edited this page Aug 4, 2025 · 2 revisions

Challenge of terminology

A2D aims to support a wide range of use cases where the (potential) adaptation of the model to a different task, domain or data is in focus. This includes "small" (not necessarily simple) adaptations, such as personalized model calibration, but also adaptations of the model to a completely different domain, e.g. using predictive models trained on financial time-series data in medical time-series tasks.

It is important to understand, that in different data science communities, the same terms can be used to describe different phenomena. On this page, we would like to review several terms related to the domain adaptation problem. The definitions which we collect are by no means exhaustive, so please comment if you would like to add another definition or source to this Wiki page.

Transfer learning

Transfer learning remains one of the most general terms describing the situation of model adaptation to certain changes.

"Transfer learning, ..., indicates the process of effectively and efficiently achieving our goals by transferring knowledge from existing fields." Wang&Cheng 2023

Domain adaptation

Domain Adaptation - Citations

"Simply, domain adaptation methods can access data from unseen domains during training."Zech&Badgeley 2018

"Domain adaptation is a special case of Transfer Learning, which supports and solves real-world [...] challenges by effectively applying the model trained on one dataset (source) for testing on another domain (target) with different distribution." Chekroud&Hawrilenko 2024

Out-of-distribution performance

Generalizability

Generalizability - Citations

“Generalizability of an AI system is a broad concept describing the continuity of its performance when the data is coming from varying (1) geographic (e.g., institutions), (2) historical (e.g., timeframes), and (3) methodologic (e.g., acquisition parameters) settings [15].” Dikici&Nguyen 2023

“... whereas generalizability refers to the extent to which a sample statistic applies to the whole population and its many situations.” Leeuw&Motz 2022

“Generalizability focuses on the setting where the study population is a subset of the target population of interest” Degitiar&Rose 2023

Generalization

Generalization - Citations

“Generalization – the ability of AI systems to apply and/or extrapolate their knowledge to new data which might differ from the original training data – is a major challenge for the effective and responsible implementation of human-centric AI applications.” Goetz&Seedat 2024

“Generalization pertains to a model’s proficiency in performing well on unseen or new data, focusing on its ability to comprehend and capture underlying data patterns rather than memorizing specific details confined to the training dataset. A well-generalized model showcases excellent performance not solely on the training data but also on novel, previously unseen data.” Huang&Yu 2024

The opinions on how different the "novel", "new", or "unseen" data is expected to be, vary from source to source, but there seems to be an agreement about certain levels of similarity to the original data (e.g. similar diagnostic task, similar population).

Other terms

Reproducibility.

Reproducibility assumes that the same characteristics apply to data from the same setting as the original training data. Reproducibility measures whether an experiment e.g. measuring the model's performance on this novel data results in the same results as on the original training data, so whether the performance of the model is the same. Reproducibility assesses within-data performance.

Definitions Reproducibility- Citations

“Reproducibility requires the system to replicate its accuracy in patients who were not included in development of the system but who are from the same underlying population.” Justice 1999

Replicability.

TODO

Transferablity.

Transferability measures whether a model can be applied to a dataset that has some distribution shift, due to a shifted context, and still performs well, i.e. out-of-data distribution performance. It assesses overfitting to the original training data to some extent. Transportability is a synonym of reproducibility.

Definitions Transferablity- Citations

“While synonymous with “replicability,” transferability does not presume that an invariant global effect should exist in the first place (Lincoln & Guba, 1986), to be replicated or not. Rather, transferability presumes that an effect is conditionally dependent on context, analogous to state dependencies in a complex systems framework (Hawe, Shiell, & Riley, 2009).” [10] “This concept of “transferability,” more commonly associated with qualitative research, refers to the extent to which an intervention’s effectiveness could be achieved in another sample and setting” Leeuw&Motz 2022

Transportability.

Transportability measures whether a model performs well on novel data with the same characteristics as the original training data, i.e. within-data distribution performance. It assesses underfitting to the original training data to some extent. Transportability is a synonym of reproducibility.

Definitions Transportability - Citations

“Transportability addresses the setting where the study population is (at least partly) external to the target population.” Degtiar&Rose

It is worth it to be precise in what specific scenario you are describing.

As we can see from these different definitions of generalizability and proposed alternatives and in typically more specific terms, it is important to be precise in what you mean when you state that a model is “generalizable”. We follow the lead of

“within-distribution generalization, corresponds to the traditional evaluation form where there is no shift in data distributions [...], This type of evaluation is the simplest form of generalization. The more challenging setup, the non-IID setup, corresponds to the other cases where shifts occur between train and test data distribution. These cases are commonly referred to as out-of-distribution (OOD) shifts.”

and recommend to specify whether you have evaluated within-distribution or out-of-distribution performance of a model. Whereas there of course are levels to the out-of-distribution shift, so we recommend to be as precise as possible here as well, by stating the type of distribution shift (label vs feature) specify how the shift manifests and the underlying reason for it measure the shift (TODO article on measuring shift)

🔹 Comparison: Transfer Learning vs. Domain Adaptation vs. Generalizability vs. OOD Performance

Aspect Transfer Learning Domain Adaptation Generalizability Out-of-Distribution (OOD) Performance
Goal Adapt a model trained on one task to another (related) task. Adapt a model trained on one domain to another (same task, different data distribution). Ensure a model performs well on unseen but similar data. Make a model robust to data that is significantly different from training data.
Training Data Pretrained on a large dataset, then fine-tuned on a smaller dataset. Labeled source domain, unlabeled or limited labeled target domain. Diverse training data to capture broad variations. Data from one domain, tested on unknown shifts.
Key Challenge Negative transfer (if tasks are unrelated). Domain shift (distribution mismatch between source and target). Overfitting to training distribution, poor generalization. Model failure on unseen distributions (e.g., dataset bias, adversarial shifts).
Typical Methods Fine-tuning, feature extraction, meta-learning. Feature alignment, adversarial adaptation, self-supervised learning. Data augmentation, regularization, large diverse datasets. Domain generalization, uncertainty estimation, adversarial training.
Example Use Case Train on ImageNet → Fine-tune on medical X-rays. Train on synthetic medical scans → Adapt to real patient scans. Train a speech recognition model that works on different accents. A self-driving car model trained in sunny weather needs to work in snowy conditions.
Works Best When? The source and target tasks share useful features. The task is the same, but the distributions differ. There is variability in the training data, allowing for broader learning. The model is trained with robustness techniques to handle unseen shifts.