bigNN: an open-source big data toolkit focused on biomedical sentence classification
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bigNN architecture.png

bigNN: an open-source big data toolkit focused on biomedical sentence classification

Every single day, a large amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health related social media posts, clinical notes, and drug reviews. Processing this data in an efficient manner is a really daunting task without the help of clever computational strategies, and it makes text classification as an imperative and a major operation to big data text analytics. In this contribution, we developed an open-source software for big data text classification called bigNN. It implements a word2vec neural network model over Apache Spark to aim at big data sentence classification in a timely fashion. The software offers a graphical user interface, and it facilitates reproducible research in sentence analysis by allowing users to configure different sets of Apache Spark and word2vec neural network parameters. Furthermore, we introduce application of bigNN in medical informatics domain. bigNN is fully documented and it is publicly and freely available at

The bigNN includes the following packages:

Package Name Description
edu.mfldclin.mcrf.bignn.gui Implementation of the graphical user interface
edu.mfldclin.mcrf.bignn.setting Implementation of pre-defined and user-defined settings required to the system
edu.mfldclin.mcrf.bignn.learning Implementation of text pre-processing and neural network learning model
edu.mfldclin.mcrf.bignn.evaluation It evaluates the neural network predictive model


  • Apache Spark 2.10
  • Java2SE 8

bigNN software architectural model:

The bigNN software architectural model is shown in includes the following figure.

alt text


  1. Ahmad P. Tafti (Marshfield Clinic Research Institute)
  2. Ehsun Behravesh (IEEE Member)
  3. Mehdi Assefi (University of Georgia)
  4. Eric LaRose (Marshfield Clinic Research Institute)
  5. Jonathan Badger (Marshfield Clinic Research Institute)
  6. John Mayer (Marshfield Clinic Research Institute)
  7. AnHai Doan (University of Wisconsin-Madison)
  8. David Page (University of Wisconsin-Madison)
  9. Peggy Peissig (Marshfield Clinic Research Institute)


The project described was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), grant UL1TR000427. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.


The workflow and architectural model of the bigNN is fully explained in [1]. Any publication using the bigNN would encourage to cite the two following papers. Thanks!

[1] Tafti, A.P., Behravesh, E., Assefi, M., LaRose, E., Badger, J., Mayer, J., Doan, A., Page, D., Peissig, P. 2017. bigNN: an open-source big data toolkit focused on biomedical sentence classification. IEEE BIG DATA 2017. [Paper]

[2] Tafti, A.P., Badger, J., LaRose, E., Shirzadi, E., Mahnke, A., Mayer, J., Ye, Z., Page, D. and Peissig, P., 2017. Adverse Drug Event Discovery Using Biomedical Literature: A Big Data Neural Network Adventure. JMIR medical informatics, 5(4), p.e51. [Paper]