This repository contains code for training the benchmark entity linking model for the SNOMED CT Entity Linking Challenge
If you'd like to be able to reproduce this notebook or expand upon it for your own submissions, you'll need a few things:
- A GPU machine with at least 24GB of VRAM
- Note: It's possible to use this notebook on machines with less VRAM, but you may need to use a different base model for the CER like
deberta-v3-base
, useLoRA
or an equivalent low-rank LLM adaptation, train with mixed precision by settingfp16=True
in theTrainingArguments
, and/or decrease the batch size.
- Note: It's possible to use this notebook on machines with less VRAM, but you may need to use a different base model for the CER like
- A conda environment that matches the environment provided in
environment-gpu.yml
orconda-lock-gpu.yml
from the challenge runtime repository - A clone of this repository to install additional requirements (specified in
requirements.txt
) as well as leverage utilities for SNOMED CT (insnomed_graph.py
)
To create a valid submission:
- Re-run the notebook to generate model assets
- Clone the runtime repository.
- Copy
main.py
as well as the model assets (thecer_model/
folder and thelinker.pickle
file) generated from re-running the notebook into thesubmission_src
folder in the cloned runtime repo Yoursubmission_src
folder should look like this:submission_src ├── cer_model │ ├── README.md │ ├── added_tokens.json │ ├── config.json │ ├── model.safetensors │ ├── special_tokens_map.json │ ├── spm.model │ ├── tokenizer.json │ ├── tokenizer_config.json │ └── training_args.bin ├── linker.pickle └── main.py
- Run
make pack-submission
to generate a submission zip file. You could also follow the runtime repo instructions to generate smoke test data (make smoke-test-data
) so you can test how your submission performs locally (make test-submission
) before submitting to the platform. - Submit to the platform!
Submitting the benchmark will get you in the door, but there's so much more to explore! We hope this helps you get started in the SNOMED CT Entity Linking Challenge.