IODS Framework

Intra-Organismal Data Symbiosis: Multimodal Biological Translation

Reference implementation for:

Vardarlı, A. (2026). "From Theory to Test: Formalizing Life-Specific Data Variables and Bidirectional Translation Protocols for Intra-Organismal Data Symbiosis." Submitted to Theory in Biosciences.

Vardarlı, A. (2026). "Life-Specific Data and Intra-Organismal Data Symbiosis: A Biosemiotic Framework for Post-Genomic Biology." Submitted to Biosemiotics.

Overview

IODS formalises the hypothesis that biological modalities — DNA, morphology, vocalisations, behaviour, environment — mutually constrain one another, with each modality's contribution varying by context rather than being permanently privileged. The framework operationalises this hypothesis as a multimodal learning architecture and tests it through the inverse translation task: predicting DNA features from phenotypic ensembles, evaluated against phylogenetic null models.

This repository implements:

Magnitude function M_m(t, e, s) — sigmoid-gated context-dependent modality weighting (formula F1, Sect. 2.6).
Modality-specific encoders — DNA (Transformer + attentive pooling), image (CNN), audio (mel-spectrogram CNN), time-series (TCN), and MLPs for phenotype, metadata, and environment.
Cross-attention fusion with magnitude weighting and modality-presence indicators (formulas F2, F3, Sect. 2.8).
Forward and inverse prediction heads — heteroskedastic continuous output and three DNA feature targets (k-mer spectra, marker gene embeddings, Pfam domain profiles).
Phylogenetic null models — taxonomic mean, nearest-neighbour-in-tree, and Brownian motion baselines for the C2 phylogenetic-surplus condition.
C1 / C2 / C3 evaluators — symmetric cross-modal information, phylogenetic surplus, and context sensitivity.
Synthetic dataset and quickstart demo — runs end-to-end on CPU in under a minute.

Status

Phase A1 (minimum viable proof) — complete.

This release implements the architectural core of the framework with a runnable end-to-end pipeline on synthetic data. Tier 1 of the empirical roadmap (real GenBank + iNaturalist data, round-robin alignment, GAT for symbiotic context) is the next milestone — see docs/development_roadmap.md.

44 unit and integration tests pass; the quickstart demo trains a 1.25M-parameter model and reports a C2 phylogenetic-surplus evaluation in roughly 30 seconds on CPU.

Installation

git clone https://github.com/alphanv/iods-framework
cd iods-framework
pip install -e .

Requires Python ≥ 3.10, PyTorch ≥ 2.0, NumPy ≥ 1.24.

Quickstart

python -m iods.demo --epochs 5

Expected output: synthetic dataset construction, model build, short training loop, and C2 evaluation against a taxonomic-mean phylogenetic null. The demo confirms the pipeline runs end-to-end. Results on synthetic data are not biological — see docs/paper_to_code_map.md for what each component tests.

Programmatic use

import torch
from iods import IODSModel
from iods.data import make_synthetic_dataset, SyntheticIODSDataset

# Build a model
model = IODSModel(d_z=64)

# Build a tiny dataset
species = make_synthetic_dataset(seed=0)
dataset = SyntheticIODSDataset(species)

# Forward pass on one sample
sample = dataset[0]
batch = {
    "dna_strings": [sample["dna_string"]],
    "image": sample["image"].unsqueeze(0),
    "phenotype": sample["phenotype"].unsqueeze(0),
}
context = {
    "t": torch.tensor([1.0]),
    "e": torch.tensor([0]),
    "s": torch.tensor([0]),
}
out = model(batch, context)
print("integrated latent Z:", out["z"].shape)
print("magnitudes per modality:", out["magnitudes"])
print("inverse k-mer prediction:", out["inverse_pred"]["kmer_logits"].shape)

Repository structure

iods-framework/
├── src/iods/
│   ├── magnitude/        # M_m(t,e,s)  [formula F1, Sect. 2.6]
│   ├── encoders/         # DNA, image, audio, TS, phenotype, metadata, env
│   ├── fusion/           # cross-attention + presence indicator [F2, F3]
│   ├── prediction/       # forward + inverse heads, losses
│   ├── validation/       # phylogenetic nulls + C1/C2/C3 evaluators
│   ├── data/             # synthetic dataset for smoke tests
│   ├── model.py          # IODSModel: full pipeline integration
│   └── demo.py           # quickstart demo (run via `python -m iods.demo`)
├── tests/                # 44 unit + integration tests
├── docs/
│   ├── paper_to_code_map.md       # paper section -> source file mapping
│   ├── development_roadmap.md     # Phase A1 / A2 / A3 plan
│   └── architecture.md            # high-level architecture overview
├── configs/                       # placeholder for Tier 1/2/3 configs
├── data/examples/                 # data-download instructions (no data)
├── notebooks/                     # placeholder for analysis notebooks
└── figures/                       # placeholder for paper figure regeneration

Testing

pip install pytest
pytest tests/ -v

All 44 tests should pass in under 10 seconds on CPU.

Paper-to-code mapping

Every component traces to a section of the formalisation paper. See docs/paper_to_code_map.md for the complete mapping. A brief summary:

Paper section / formula	Source file
F1: M_m(t,e,s) sigmoid gate	`src/iods/magnitude/magnitude.py`
F2: φ̃_m = M_m · φ_m	`src/iods/magnitude/magnitude.py::apply_magnitude`
F3: cross-attention fusion	`src/iods/fusion/cross_attention.py`
Sect. 2.8 step 4: presence vector h	`src/iods/fusion/cross_attention.py`
Sect. 3.1: DNA encoder	`src/iods/encoders/dna_encoder.py`
Sect. 3.5: prediction heads	`src/iods/prediction/heads.py`
Sect. 2.10: distance functions	`src/iods/prediction/losses.py`
Sect. 4.4: phylogenetic null models	`src/iods/validation/phylo_null.py`
Sect. 2.7: C1, C2, C3 conditions	`src/iods/validation/conditions.py`

Data sources (for Tier 1 — not yet ingested)

All real data drawn from public databases. No new collection.

Database	Modality	URL
GenBank / NCBI	Genomes	https://www.ncbi.nlm.nih.gov/genbank/
iNaturalist / GBIF	Images	https://www.inaturalist.org / https://www.gbif.org
xeno-canto	Vocalisations	https://xeno-canto.org
Macaulay Library	Sounds	https://www.macaulaylibrary.org
Movebank	Movement	https://www.movebank.org
Web of Life / GLOBI	Interactions	https://www.web-of-life.es / https://www.globalbioticinteractions.org
Open Tree of Life	Phylogenies	https://opentreeoflife.github.io

Development roadmap

Phase A1 — minimum viable proof. ✅ Complete. Architecture, magnitude function, encoders, fusion, prediction heads, phylogenetic nulls, C1/C2/C3 evaluators, synthetic-data smoke test.
Phase A2 — Tier 1 real data. Bird-species pilot (100 species, GenBank + iNaturalist + xeno-canto + Open Tree of Life), full round-robin alignment, GAT for symbiotic context, planetary context adapter, pre-registration of primary outcome.
Phase A3 — revision-readiness. Stronger phylogenetic null (nearest-neighbour-in-tree as default), reframed Proposition 1, tightened C1 thresholds, regenerable figures, Zenodo archive.

See docs/development_roadmap.md for the full plan.

Citation

@article{vardarli2026iods,
  title={From Theory to Test: Formalizing Life-Specific Data Variables and
         Bidirectional Translation Protocols for Intra-Organismal Data Symbiosis},
  author={Vardarl{\i}, Alphan},
  journal={Theory in Biosciences},
  year={2026},
  note={Submitted}
}

@article{vardarli2026biosemiotics,
  title={Life-Specific Data and Intra-Organismal Data Symbiosis:
         A Biosemiotic Framework for Post-Genomic Biology},
  author={Vardarl{\i}, Alphan},
  journal={Biosemiotics},
  year={2026},
  note={Submitted}
}

License

MIT License. See LICENSE.

Author

Alphan Vardarlı — Independent Researcher, Çanakkale, Türkiye. ORCID: 0009-0007-1581-2190

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data/examples		data/examples
docs		docs
figures		figures
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IODS Framework

Overview

Status

Installation

Quickstart

Programmatic use

Repository structure

Testing

Paper-to-code mapping

Data sources (for Tier 1 — not yet ingested)

Development roadmap

Citation

License

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

IODS Framework

Overview

Status

Installation

Quickstart

Programmatic use

Repository structure

Testing

Paper-to-code mapping

Data sources (for Tier 1 — not yet ingested)

Development roadmap

Citation

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages