Skip to content

Ineichen-Group/AssayExtract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AssayExtract

AssayExtract is a Python package for extracting and standardizing biomedical outcome measures (assays) from research text.

It identifies assays mentioned an input text and maps them to a curated vocabulary of canonical names, outcome domains, and synonyms.


Motivation

Reliable extraction of outcome measures from biomedical literature is challenging due to inconsistent reporting and high variability in terminology. Prior work (PreClinIE, ACL BioNLP 2025) found that manual annotations of outcome measures showed low inter-annotator agreement, making them unsuitable for training robust machine learning models.

To address this, AssayExtract adopts a rule-based approach grounded in a curated assay vocabulary.


Approach

AssayExtract is built on a harmonized vocabulary of outcome assessment techniques, developed through manual curation of the biomedical literature.

  • A core set of commonly used assays was identified from representative studies
  • Each assay was assigned a canonical name and mapped to one of five outcome domains
  • Synonyms and lexical variants were expanded using a large language model and manually reviewed
  • A domain-specific synonym dictionary enables robust matching via pattern-based extraction

Extracted mentions are normalized to canonical names and linked to structured metadata, including domain and subdomain.

The raw source file can be accessed here: ./data/assay_final_harmonized_with_enriched_synonyms_unique.csv.


Outcome Domains

Outcome Domain Unique Assays Canonical Name Examples References
Molecular & Cellular 204 acetylomics; apoptosis -- caspase-3; molsoft icm-pro Adil et al., 2021; Chen et al., 2023; Dufva, 2009; Guevara et al., 2022; Jin & Kennedy, 2015; Just, 2021; Musumeci, 2014; Osier et al., 2015; Pai & Satpathy, 2021; Verma et al., 2025
Behavioral 192 delayed matching-to-place water maze; geller-seifter conflict test; radial arm water maze Acikgoz et al., 2022; Burrows et al., 2019; Choi & Kumar, 2024; Gold et al., 2013; Gregory et al., 2013; Guevara et al., 2022; Harrison et al., 2020; Jones et al., 2025; Meredith & Kang, 2006; Osier et al., 2015; Pinkernell et al., 2016; Sadler et al., 2022; Shepherd et al., 2016; Wahl et al., 2017; Webster et al., 2014; Xiong et al., 2013; Zarruk et al., 2011
Imaging 97 head-mounted three-photon miniscope; optical intrinsic signal; zte-cbv fmri Jones et al., 2025; Just, 2021; Markicevic et al., 2021; Osier et al., 2015; Tremoleda & Sosabowski, 2015; Waerzeggers et al., 2010
Histology 74 frozen section; polarizing microscopy; sirius red Alturkistani et al., 2016; Gurina & Simms, 2025; Javaeed et al., 2021; Jones et al., 2025; Mark et al., 2007; Markicevic et al., 2021; Musumeci, 2014; Osier et al., 2015
Physiology 62 adrenal weight; single-unit extracellular recording; thymus weight Alemán et al., 2000; Burrows et al., 2019; Guevara et al., 2022; Markicevic et al., 2021; Osier et al., 2015; Pinkernell et al., 2016; Wickenden, 2000

Use Cases

  • Literature mining and systematic reviews
  • Analysis of outcome measures across studies
  • Construction of structured datasets for biomedical NLP and LLMs

Installation

pip install assay-extract

Or from source:

git clone https://github.com/Ineichen-Group/AssayExtract.git
cd AssayExtract
pip install -e .

Quick Start

from assay_extract import AssayClassifier

classifier = AssayClassifier()

methods = """
We assessed anxiety using the elevated plus maze and social behavior 
with the three-chamber test. Learning was measured on the morris water maze.
Motor coordination was tested on the accelerating rotarod.
"""

results = classifier.extract_measures(methods)

for result in results:
    print(f"{result.canonical_name}")
    print(f"  Domain: {result.outcome_domain}")
    print(f"  Subdomain: {result.subdomain}")
    print()

Output:

elevated plus maze
  Domain: Behavioral
  Subdomain: Anxiety

three-chamber social approach test
  Domain: Behavioral
  Subdomain: Sociability

morris water maze
  Domain: Behavioral
  Subdomain: Cognition & learning

accelerating rotarod
  Domain: Behavioral
  Subdomain: Motor coordination

Testing

python -m pytest assay_extract/tests/ -v

All 9 tests pass.

License

MIT License

Citation

@software{assayextract2025,
  title={AssayExtract: Extraction of Biomedical Outcome Measures from Text},
  author={Simona Emilova Doneva},
  year={2025},
  url={https://github.com/Ineichen-Group/AssayExtract}
}

References

Acikgoz et al. 2022

Acikgoz, B., Dalkiran, B., & Dayi, A. (2022). An overview of the currency and usefulness of behavioral tests used from past to present to assess anxiety, social behavior and depression in rats and mice. Behavioural Processes, 200, 104670. https://doi.org/10.1016/j.beproc.2022.104670

Adil et al. 2021

Adil, A., Kumar, V., Jan, A. T., & Asger, M. (2021). Single-Cell Transcriptomics: Current Methods and Challenges in Data Acquisition and Analysis. Frontiers in Neuroscience, 15. https://doi.org/10.3389/fnins.2021.591122

Alemán et al. 2000

Alemán, C. L., Noa, M., Más, R., Rodeiro, I., Mesa, R., Menéndez, R., Gámez, R., & Hernández, C. (2000). Reference data for the principal physiological indicators in three species of laboratory animals. Laboratory Animals, 34(4), 379–385. https://doi.org/10.1258/002367700780387741

Alturkistani et al. 2016

Alturkistani, H. A., Tashkandi, F. M., & Mohammedsaleh, Z. M. (2016). Histological Stains: A Literature Review and Case Study. Global Journal of Health Science, 8(3), 72–79. https://doi.org/10.5539/gjhs.v8n3p72

Burrows et al. 2019

Burrows, D. J., McGown, A., Jain, S. A., De Felice, M., Ramesh, T. M., Sharrack, B., & Majid, A. (2019). Animal Models of Multiple Sclerosis: From Rodents to Zebrafish. Multiple Sclerosis Journal, 25(3), 306–324. https://doi.org/10.1177/1352458518805246

Chen et al. 2023

Chen, C., Wang, J., Pan, D., Wang, X., Xu, Y., Yan, J., Wang, L., Yang, X., Yang, M., & Liu, G.-P. (2023). Applications of multi-omics analysis in human diseases. MedComm, 4(4), e315. https://doi.org/10.1002/mco2.315

Choi & Kumar 2024

Choi, J. D., & Kumar, V. (2024). A new era in quantification of animal social behaviors. Neuroscience & Biobehavioral Reviews, 157, 105528. https://doi.org/10.1016/j.neubiorev.2023.105528

Dufva 2009

Dufva, M. (2009). Introduction to Microarray Technology. In M. Dufva (Ed.), DNA Microarrays for Biomedical Research: Methods and Protocols (pp. 1–22). Humana Press. https://doi.org/10.1007/978-1-59745-538-1_1

Gold et al. 2013

Gold, E. M., Su, D., López-Velázquez, L., Haus, D. L., Perez, H., Lacuesta, G. A., Anderson, A. J., & Cummings, B. J. (2013). Functional Assessment of Long-Term Deficits in Rodent Models of Traumatic Brain Injury. Regenerative Medicine, 8(4), 483–516. https://doi.org/10.2217/rme.13.41

Gregory et al. 2013

Gregory, N. S., Harris, A. L., Robinson, C. R., Dougherty, P. M., Fuchs, P. N., & Sluka, K. A. (2013). An Overview of Animal Models of Pain: Disease Models and Outcome Measures. The Journal of Pain, 14(11), 1255–1269. https://doi.org/10.1016/j.jpain.2013.06.008

Guevara et al. 2022

Guevara, R. D., Pastor, J. J., Manteca, X., Tedo, G., & Llonch, P. (2022). Systematic review of animal-based indicators to measure thermal, social, and immune-related stress in pigs. PLOS ONE, 17(5), e0266524. https://doi.org/10.1371/journal.pone.0266524

Gurina & Simms 2025

Gurina, T. S., & Simms, L. (2025). Histology, Staining. In StatPearls. StatPearls Publishing.

Harrison et al. 2020

Harrison, D. J., Creeth, H. D. J., Tyson, H. R., Boque-Sastre, R., Isles, A. R., Palme, R., Touma, C., & John, R. M. (2020). Unified Behavioral Scoring for Preclinical Models. Frontiers in Neuroscience, 14. https://doi.org/10.3389/fnins.2020.00313

Javaeed et al. 2021

Javaeed, A., Qamar, S., Ali, S., Mustafa, M. A. T., Nusrat, A., & Ghauri, S. K. (2021). Histological Stains in the Past, Present, and Future. Cureus. https://doi.org/10.7759/cureus.18486

Jin & Kennedy 2015

Jin, S., & Kennedy, R. T. (2015). New developments in Western blot technology. Chinese Chemical Letters, 26(4), 416–418. https://doi.org/10.1016/j.cclet.2015.01.021

Jones et al. 2025

Jones, L. A. T., Field-Fote, E. C., Magnuson, D., Tom, V., Basso, D. M., Fouad, K., & Mulcahey, M. J. (2025). Outcome measures in rodent models for spinal cord injury and their human correlates. Experimental Neurology, 386, 115169. https://doi.org/10.1016/j.expneurol.2025.115169

Just 2021

Just, N. (2021). Proton functional magnetic resonance spectroscopy in rodents. NMR in Biomedicine, 34(5), e4254. https://doi.org/10.1002/nbm.4254

Mark et al. 2007

Mark, M., Teletin, M., Antal, C., Wendling, O., Auwerx, J., Heikkinen, S., Khetchoumian, K., Argmann, C. A., & Dgheem, M. (2007). Histopathology in Mouse Metabolic Investigations. Current Protocols in Molecular Biology, 78(1), 29B.4.1–29B.4.32. https://doi.org/10.1002/0471142727.mb29b04s78

Markicevic et al. 2021

Markicevic, M., Savvateev, I., Grimm, C., & Zerbi, V. (2021). Emerging imaging methods to study whole-brain function in rodent models. Translational Psychiatry, 11(1), 457. https://doi.org/10.1038/s41398-021-01575-5

Meredith & Kang 2006

Meredith, G. E., & Kang, U. J. (2006). Behavioral models of Parkinson’s disease in rodents: A new look at an old problem. Movement Disorders, 21(10), 1595–1606. https://doi.org/10.1002/mds.21010

Musumeci 2014

Musumeci, G. (2014). Past, present and future: Overview on histology and histopathology. Journal of Histology and Histopathology, 1(1), 5. https://doi.org/10.7243/2055-091X-1-5

Osier et al. 2015

Osier, N. D., Carlson, S. W., DeSana, A., & Dixon, C. E. (2015). Chronic Histopathological and Behavioral Outcomes of Experimental Traumatic Brain Injury in Adult Male Animals. Journal of Neurotrauma, 32(23), 1861–1882. https://doi.org/10.1089/neu.2014.3680

Pai & Satpathy 2021

Pai, J. A., & Satpathy, A. T. (2021). High-throughput and single-cell T cell receptor sequencing technologies. Nature Methods, 18(8), 881–892. https://doi.org/10.1038/s41592-021-01201-8

Pinkernell et al. 2016

Pinkernell, S., Becker, K., & Lindauer, U. (2016). Severity assessment and scoring for neurosurgical models in rodents. Laboratory Animals, 50(6), 442–452. https://doi.org/10.1177/0023677216675010

Sadler et al. 2022

Sadler, K. E., Mogil, J. S., & Stucky, C. L. (2022). Innovations and advances in modelling and measuring pain in animals. Nature Reviews Neuroscience, 23(2), 70–85. https://doi.org/10.1038/s41583-021-00536-7

Shepherd et al. 2016

Shepherd, A., Tyebji, S., Hannan, A. J., & Burrows, E. L. (2016). Translational Assays for Assessment of Cognition in Rodent Models of Alzheimer’s Disease and Dementia. Journal of Molecular Neuroscience, 60(3), 371–382. https://doi.org/10.1007/s12031-016-0837-1

Tremoleda & Sosabowski 2015

Tremoleda, J. L., & Sosabowski, J. (2015). Imaging Technologies and Basic Considerations for Welfare of Laboratory Rodents. Lab Animal, 44(3), 97–105. https://doi.org/10.1038/laban.665

Verma et al. 2025

Verma, V. V., Vimal, S., Mishra, M. K., & Sharma, V. K. (2025). A Comprehensive Review on Structural Insights through Molecular Visualization: Tools, Applications, and Limitations. Journal of Molecular Modeling, 31(6), 173. https://doi.org/10.1007/s00894-025-06402-y

Waerzeggers et al. 2010

Waerzeggers, Y., Monfared, P., Viel, T., Winkeler, A., & Jacobs, A. H. (2010). Mouse models in neurological disorders: Applications of non-invasive imaging. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease, 1802(10), 819–839. https://doi.org/10.1016/j.bbadis.2010.04.009

Wahl et al. 2017

Wahl, D., Coogan, S. C. P., Solon-Biet, S. M., de Cabo, R., Haran, J. B., Raubenheimer, D., Cogger, V. C., Mattson, M. P., Simpson, S. J., & Le Couteur, D. G. (2017). Cognitive and behavioral evaluation of nutritional interventions in rodent models of brain aging and dementia. Clinical Interventions in Aging, 12, 1419–1428. https://doi.org/10.2147/CIA.S145247

Webster et al. 2014

Webster, S. J., Bachstetter, A. D., Nelson, P. T., Schmitt, F. A., & Van Eldik, L. J. (2014). Using Mice to Model Alzheimer’s Dementia: An Overview of the Clinical Disease and the Preclinical Behavioral Changes in 10 Mouse Models. Frontiers in Genetics, 5. https://doi.org/10.3389/fgene.2014.00088

Wickenden 2000

Wickenden, A. D. (2000). Overview of Electrophysiological Techniques. Current Protocols in Pharmacology, 11(1), 11.1.1–11.1.17. https://doi.org/10.1002/0471141755.ph1101s64

Xiong et al. 2013

Xiong, Y., Mahmood, A., & Chopp, M. (2013). Animal Models of Traumatic Brain Injury. Nature Reviews Neuroscience, 14(2), 128–142. https://doi.org/10.1038/nrn3407

Zarruk et al. 2011

Zarruk, J. G., García-Yébenes, I., Romera, V. G., Ballesteros, I., Moraga, A., Cuartero, M. I., Hurtado, O., Sobrado, M., Pradillo, J. M., Fernández-López, D., Serena, J., Castillo-Meléndez, M., Moro, M. A., & Lizasoain, I. (2011). Neurological tests for functional outcome assessment in rodent models of ischaemic stroke. Revista de Neurología, 53.

About

Extracting assays and outcome measures from biomedical text

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages