Skip to content

JOHNNY-fans/NormPULSE

Repository files navigation

NormPULSE: A Generative Approach for Clinical Term Normalization

This repository is a sub-repository of PULSE. image


Key Features

This repository provides the official implementation of NormPULSE.

Key feature bulletin points here

  • A knowledge transfer approach that utilizes data distillation from LLMs through prompt engineering, converting short clinical terms into knowledge cards that contain enhanced information and clinical knowledge.
  • Leverage the hierarchical structure in the standard term and develop an algorithm for building the tree structure with ICD codes.
  • A generative framework, to find the candidate terms via knowledge-enhanced retrieval and generate the final standard term with hierarchical reasoning.

Details

We outline the comprehensive framework of our solution to clinical term normalization, NormPULSE, which is based on PULSE and comprises three steps:

  1. Training, There are three tasks in the training step, knowledge card generation, aiming at enhancing the knowledge inside term by distilling knowledge from LLM; hierarchical tree construction based on the ICD codes and term normalization, making the model get the ability to select the standard terms from a certain candidate list.
  2. knowledge-enhanced retrieval, the model retrieves candidates for the given mention using the generated knowledge cards and locates each candidate's path in the constructed hierarchical tree to build a subtree.
  3. hierarchical reasoning, the model reasons out the final result layer by layer through the subtree.

image

Dataset

The part of clinical term normalization data is based on the following two open-source datasets.

The standard terminology database is ICD-10医保2.0版 and ICD-9-CM3医保2.0版, and we construct the two corresponding code trees by parsing the term codes, which are available at ICD-10_医保v2_tree.json and ICD-9-CM3_医保v2_tree.json

We also provide the examples of the training data at the data directory.

Get Started

Model Setup

Main Requirements

cuda, no more than 12.x. Preferably 11.4
python=3.9.16
transformers>=4.29.2
faiss-gpu==1.7.2
torch==2.0.1 sentence-transformers==2.2.2
fastapi
uvicorn
NodeJS>=18.x
GPU memory 16 GB at least
Make sure your frontend port 3000 and backend port 2233 is available, or you can change them in main.ts and run.py

Installation

git clone https://github.com/JOHNNY-fans/NormPULSE.git
cd NormPULSE
conda create -n normllm python=3.9.16
conda activate normllm
pip install -r requirements.txt

Download Model
You can find the NormPULSE weights in the following huggingface repository.

In the retrieval step, we select the open-source M3E model as the text embedding model.

Usage
We provide a sample usage in a jupyter notebook usage_example.ipynb

Demo Setup

Here is our simple demo. image

Run Frontend

cd demo-frontend
npm i  
npm run dev

Run Backend

cd demo-backend
python run.py

🛡️ License

The code of this project is licensed under Apache 2.0, and the model weights are licensed under GNU AGPL 3.0. If the models contained in this project, or any modified versions thereof, are used in a service that results in misleading or harmful statements causing adverse effects, the responsibility lies with the service provider and is not associated with or attributable to this project.

🙏 Acknowledgement

  • Shanghai AI Laboratory.
  • East China University of Science and Technology.

About

No description, website, or topics provided.

Resources

License

Apache-2.0, AGPL-3.0 licenses found

Licenses found

Apache-2.0
LICENSE
AGPL-3.0
MODEL_LICENSE

Stars

Watchers

Forks

Releases

No releases published

Packages