NormPULSE: A Generative Approach for Clinical Term Normalization

This repository is a sub-repository of PULSE.

Key Features

This repository provides the official implementation of NormPULSE.

Key feature bulletin points here

A knowledge transfer approach that utilizes data distillation from LLMs through prompt engineering, converting short clinical terms into knowledge cards that contain enhanced information and clinical knowledge.
Leverage the hierarchical structure in the standard term and develop an algorithm for building the tree structure with ICD codes.
A generative framework, to find the candidate terms via knowledge-enhanced retrieval and generate the final standard term with hierarchical reasoning.

Details

We outline the comprehensive framework of our solution to clinical term normalization, NormPULSE, which is based on PULSE and comprises three steps:

Training, There are three tasks in the training step, knowledge card generation, aiming at enhancing the knowledge inside term by distilling knowledge from LLM; hierarchical tree construction based on the ICD codes and term normalization, making the model get the ability to select the standard terms from a certain candidate list.
knowledge-enhanced retrieval, the model retrieves candidates for the given mention using the generated knowledge cards and locates each candidate's path in the constructed hierarchical tree to build a subtree.
hierarchical reasoning, the model reasons out the final result layer by layer through the subtree.

Dataset

The part of clinical term normalization data is based on the following two open-source datasets.

The standard terminology database is ICD-10医保2.0版 and ICD-9-CM3医保2.0版, and we construct the two corresponding code trees by parsing the term codes, which are available at ICD-10_医保v2_tree.json and ICD-9-CM3_医保v2_tree.json

We also provide the examples of the training data at the data directory.

Get Started

Model Setup

Main Requirements

cuda, no more than 12.x. Preferably 11.4
python=3.9.16
transformers>=4.29.2
faiss-gpu==1.7.2
torch==2.0.1 sentence-transformers==2.2.2
fastapi
uvicorn
NodeJS>=18.x
GPU memory 16 GB at least
Make sure your frontend port 3000 and backend port 2233 is available, or you can change them in main.ts and run.py

Installation

git clone https://github.com/JOHNNY-fans/NormPULSE.git
cd NormPULSE
conda create -n normllm python=3.9.16
conda activate normllm
pip install -r requirements.txt

Download Model
You can find the NormPULSE weights in the following huggingface repository.

NormPULSE

In the retrieval step, we select the open-source M3E model as the text embedding model.

Usage
We provide a sample usage in a jupyter notebook usage_example.ipynb

Demo Setup

Here is our simple demo.

Run Frontend

cd demo-frontend
npm i  
npm run dev

Run Backend

cd demo-backend
python run.py

🛡️ License

The code of this project is licensed under Apache 2.0, and the model weights are licensed under GNU AGPL 3.0. If the models contained in this project, or any modified versions thereof, are used in a service that results in misleading or harmful statements causing adverse effects, the responsibility lies with the service provider and is not associated with or attributable to this project.

🙏 Acknowledgement

Shanghai AI Laboratory.
East China University of Science and Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
data		data
demo-backend		demo-backend
demo-frontend		demo-frontend
figure		figure
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
requirements.txt		requirements.txt
usage_example.ipynb		usage_example.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

NormPULSE: A Generative Approach for Clinical Term Normalization

Key Features

Details

Dataset

Get Started

Model Setup

Demo Setup

🛡️ License

🙏 Acknowledgement

About

Licenses found

Releases

Packages

Contributors 2

Languages

License

Licenses found

JOHNNY-fans/NormPULSE

Folders and files

Latest commit

History

Repository files navigation

NormPULSE: A Generative Approach for Clinical Term Normalization

Key Features

Details

Dataset

Get Started

Model Setup

Demo Setup

🛡️ License

🙏 Acknowledgement

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages