emr-nlp-server provides the backend service for the emr-vis-web project.
To get started, install the pre-requisites, get the emr-nlp-server application and then launch the service as described below:
You must have Java Development Kit (JDK) 1.7 to build or Java Runtime (JRE) 1.7 to run this project. To confirm that you have the right version of JRE installed, run
$ java -versionand verify that the output is similar to:
java version "1.7.0_51" Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
If you don't have the JDK installed or have an older one, you may get the latest version from the Oracle Technology Network.
Building the project
Clone the emr-nlp-server repository using git:
git clone https://github.com/NLPReViz/emr-nlp-server.git cd emr-nlp-server
Our project depends on the following external dependencies which can be downloaded using Apache Ant:
- Java Jersey which is dual licensed: COMMON DEVELOPMENT AND DISTRIBUTION LICENSE and GPL 2.
- Weka licensed under GPL 3.
- Libsvm with a license compatible with GPL.
- Stanford CoreNLP licensed under the GNU General Public License (v3 or later; Stanford NLP code is GPL v2+, but the composite with external libraries is v3+).
To download and resolve these dependencies from their respective repositories use:
Specify the path to the webapps directory in
CATALINA_HOMEenvironment variable and use
ant deployto to build and deploy the backend app.
For example if your Tomcat's webapps directory accessible as /usr/local/Cellar/tomcat/7.0.54/libexec/webapps/, then you may use:
env CATALINA_HOME=/usr/local/Cellar/tomcat/8.0.9/libexec/ ant deploy
Running the server
We have included some "dummy" data with our release so that you can run the tool and play with the interface. These are not actual medical records and and your models will not be useful. Contact the devs if you need more information about real datasets.
Download and copy the data directory inside $CATALINA_BASE. You should be able to figure this path from the print messages you see after launching the server. Example path: /usr/local/Cellar/tomcat/8.0.9/libexec/data.
You need to build libsvm before you may run the server for the first time. To do that run
makeinside data/libsvm directory or follow the instructions in the README file present there.
Start the Tomcat server (eg. using
$ catalina runor
# service tomcat startetc.).
Now follow the steps on emr-vis-web to setup the front-end application.
Using your own dataset and defining custom variables
The tool is currently configured to make predictions for 14 colonoscopy quality variables. It also does specific format parsing for colonoscopy and pathology reports in the data provided with the release. We have a more generic version of the tool in the
general branch of this repository. Checkout this experimental branch here.
You will still need to download the sample data directory, and organize your documents in the same structure defined as follows:
|____documentList | |____initialIDList.xml | |____testIDList.xml | |____fullIDList.xml | |____feedbackIDList.xml |____docs | |____0719 | | |____report.txt | |____0973 | | |____report.txt | |____0184 | | |____report.txt | | |____pathology.txt | |____0726 | | |____report.txt | | |____pathology.txt | : : |____labels | |____class-appendiceal-orifice.csv | |____class-ileo-cecal-valve.csv | |____class-informed-consent.csv | |____class-proc-aborted.csv | |____class-asa.csv | |____class-prep-adequateYes.csv | |____class-any-adenoma.csv | |____class-cecum.csv | |____class-withdraw-time.csv | |____class-indication-type.csv | |____class-prep-adequateNot.csv | |____class-biopsy.csv | |____class-prep-adequateNo.csv | |____class-nursing-report.csv : :
docs contains the list of documents. Each patient or case is represented by a four digit ID as sub-directories. The ID length is hard-coded in
ColonoscopyDS_SVMLightFormat.java. These may contain at most 2 files:
pathology.txt is optional. If you have more than 2 files, you may concatenate them into one report, or extend our code to support them. :)
documentList directory has the following files, with references to the documents described above:
initialIDList.xml- Used to train the initial model. This is how we boostrap the system.
feedbackIDList.xml- This is the list of documents you should be working on to give feedback on and improve the models. Used to create the global feature vector.
fullIDList.xml- Held out test set. There is code to general evaluation metrics but it is not exposed to the front-end at this point. Feel free to contribute ;)
testIDList.xml- List of all the IDs.
labels directory contains the gold standard data; used to train the initial models and run evaluation metrics.
The rest of the files are can be reset by pointing your browser to: /rest/server/resetDB, for example: http://localhost:8080/emr-nlp-server/rest/server/resetDB. Remember to update emr-vis-web to it's general branch.
The easiest way to configure the tool to use your own variables is to map them to the names of your choice in the front end. Remember to update emr-vis-web as described in its README as well.
This project will be updated to make this configuration easier in the near future.
The the rest calls to the server are protected with a basic access http authentication. The default login credentials are "username" and "password". You are encouraged to change them in UserAuthentication.java when running the app on a publicly accessible server.
This project is released under the GPL 3 license. Take a look at the LICENSE file in the source for more information.