Information Extraction from Randomised Clinical Trials
IERCT extracts key trial information (patient group, treatments, outcome measure, results) from abstracts of Randomised Clinical Trial reports.
take_abstract.take(pmid,path) retrieves the abstract of a RCT report (identified by "pmid") from the PubMed website,
and saves it in a disk location ("path") in a format readable by the preprocessing module
preprocessing_functions.preprocess_file(filename) preprocesses the abstract given in "filename" performing part-of-speech
tagging, text normalization, chunking and semantic categorization.
preprocessing_functinos.preprocess_data(preprocess_from, read_dat) batch preprocesses the abstracts in "preprocess_from"
classifier_functions.apply_features(data, feature_extractor) returns a list of feature sets for each preprocessed abstract
in "data", where the features are extracted by the function "feature_extractor" (i.e.
classifier_functions.Classifier(feature_extractor) contains the main methods for training the model and identifying
the information items.
classifier_functions.Classifier.train(train_data) trains the model on the abstracts in "train_data" as returned by
classifier_functions.Classifier.batch_tagger(test_data,excluded) identifies the information items by tagging the abstracts given
in "test_data" (in "excluded" is a list of indexes of tokens that we do not want the model to consider, and it is returned by
classifier_functions.apply_features along with the feature set)
For a demo output run
from ierct.src import classifier_functions classifier_functions.demo()
perform evaluation routines for the classifier instance "classifier" (
classifier_functions.Classifier) and the test data in
classifier_evaluation.HoldOut) or for a number of Cross Validation folds ("folds") in "data" (
print the results
Run testing.py to replicate the results with the datasets in "./data".
Run demo.py to see a demo output.
- Python 2.7 or higher
- nltk (*)
- GeniaTagger 3.0.1 (**)
(*) The stopword corpus is needed. Instructions here.
(**) Install Genia Tagger files in "%ProgramFiles%\geniatagger-3.0.1". A windows port can be found here (many thanks to Syeed Ibn Faiz for this).