Skip to content

The code for the paper "MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement"

License

Notifications You must be signed in to change notification settings

RyanWangZf/MediTab

Repository files navigation

MediTab

We are published in IJCAI 2024!

Publicly available data can be found in the github releases. You can extract it into the data folder

TODO:

  • load BioBERT and fine-tune it on the raw sentence dataset
  • load GPT-3 API and generate diverse paraphrases of the raw sentences as augmentations
  • enhance numerical values by adapting the tokenizer and embedding layer of BioBERT (dmis-lab/biobert-base-cased-v1.2)
  • MLM of BioBERT on the augmented data
  • fact checker dataset building with GPT3 API
  • fine-tune BioBERT on the augmented data with fact checker filtering
  • explore extend the raw sentences with new knowledge background texts, e.g., considering the input drug, extend the descriptions of them.
  • extend to trial outcome prediction, three datasets: phase I & II & III.
  • consider transfer learning across databases:
    • EHR (40K+ patients) -> clinical trial patient data (~1k per dataset);
    • clinicaltrials.gov (400K+ trials) -> trial outcome prediction (~5K per dataset)

About

The code for the paper "MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement"

Resources

License

Stars

Watchers

Forks

Packages

No packages published