GitHub - Tony-Hao/Valx: a Python tool to extract and structure numeric lab test comparison statements from text

Valx

a Python tool to extract and structure numeric lab test comparison statements from text

Objectives: To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text. Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes 7 steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable - numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identified from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov. Results: The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 Diabetes trials, respectively. The precision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively. Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generalizability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.

Usage


import Valx_core

Clean text by preprocessing


Valx_core.preprocessing (text)

Split eligibility criteria text into inclusion and exclusion sections

Please ignore this step if the text is not clincial trial eligibility criteria text


Valx_core.split_text_inclusion_exclusion (text)

Extract candidates containing numeric features


Valx_core.extract_candidates_numeric(text)

Identify numerical expressions

Identify expressions and formalize them into labels, e.g., "<VML(tag) L(logic, e.g., greater_equal)=X U(unit)=X>value</VML>"


Valx_core.formalize_expressions (candidates[])

Identify variable mentions and map them to names


Valx_core.identify_variable(expression_text, feature_dict_dk, fea_dict_umls)

Associate variable and its related numerical values


Valx_core.associate_variable_values(expression_text)

Context-based validation


Valx_core.context_validation(expressions)

Unit conversion and value normalization

Normalize the unit and their corresponding values


Valx_core.normalization(feature_list, expressions)

Heuristic rule-based validation


Valx_core.context_validation(expressions)

Usage examples

Valx_CTgov.py demostrating how to use the Valx for extracting and structuring certain types of numeric lab test comparison statements from clincial trial eligibility criteria texts using single CPU core.

Valx_CTgov_multiCPUcores.py demostrating how to use the Valx for extracting and structuring certain types of numeric lab test comparison statements from clincial trial eligibility criteria texts using multiple CPU cores

Online Demo

http://columbiaelixr.appspot.com/valx

Versions

V0.9 The stable version with full functionality

V1.0 Add multi-CPU core support, enable set core number easily

V1.1 Separate rules from code to a csv file named as "rules.csv"

V1.2 Separate numeric feature list from code to a csv file named as "numeric_features.csv"

Citation

Tianyong Hao, Hongfang Liu, Chunhua Weng. Valx: A system for extracting and structuring numeric lab test comparison statements from text. Methods of Information in Medicine. Vol. 55: Issue 3, pp. 266-275, 2016 on Pubmed

Contributors

Tianyong Hao

Chengtao Li (new Web user interface with online pattern editing function)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
NLP		NLP
W_utility		W_utility
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Valx_CTgov.py		Valx_CTgov.py
Valx_CTgov_multiCPUcores.py		Valx_CTgov_multiCPUcores.py
Valx_core.py		Valx_core.py
word2num.py		word2num.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Valx

Usage

Clean text by preprocessing

Split eligibility criteria text into inclusion and exclusion sections

Extract candidates containing numeric features

Identify numerical expressions

Identify variable mentions and map them to names

Associate variable and its related numerical values

Context-based validation

Unit conversion and value normalization

Heuristic rule-based validation

Usage examples

Online Demo

Versions

Citation

Contributors

About

Releases

Packages

Languages

License

Tony-Hao/Valx

Folders and files

Latest commit

History

Repository files navigation

Valx

Usage

Clean text by preprocessing

Split eligibility criteria text into inclusion and exclusion sections

Extract candidates containing numeric features

Identify numerical expressions

Identify variable mentions and map them to names

Associate variable and its related numerical values

Context-based validation

Unit conversion and value normalization

Heuristic rule-based validation

Usage examples

Online Demo

Versions

Citation

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages