Biomedical Search Engine

This search engine employs several big data concepts to make the Unified Medical Language System (UMLS) Knowledge Source accessible to any user. The search engine features three main components: query handling, classification, and visualization. The user can search a medical term with our system to retrieve a classification of the term determined by Mahout Naive Bayes, relevant information (definition, symptoms, etc.), and a visualization of neighboring/related medical concepts using a combination of IBM System G’s graph storage and a Plotly graph. The programming languages used to make this possible are PHP, HTML, Java, and Python. This repository contains all the require files to deploy this search engine in your own environment.

Dependencies

Relational Database Manager System (RDBMS), we recommend MySQL Server.
Web Server, we recommend Apache.
Java JDK 8.
PHP.
IBM System G.
Python.
Python Packages: python-igraph, json, and plotly.

Steps to deploy the system.

Sign up for license at the UMLS Terminology Services.
Create the database named umls in your RDBMS that will host the UMLS Schemas. For instance CREATE DATABASE IF NOT EXISTS umls CHARACTER SET utf8 COLLATE utf8_unicode_ci for MySQL.
Read the UMLS Tutorial and UMLS Reference Manual to get familiar with the system requirements and be able to access and load, to the umls database created in step 2, the Metathesaurus and Semantic Network Knowledge Sources.
Run the file named Normalize_UMLS.sql in the MySQL directory. This will create a database named sandbox that normalized and subset the umls database improving performance.
Read the IBM System G gShell overview.
Replace the line [file_location] in the file contained in the SYSTEMG directory with the location of the concept.txt and relationship.txt created with MySQL queries. Pass the modified file to gShell (gShell interactive < filename) to load the concepts, semantics and their relationships into System G.
In the PHP directory edit the following files to configure your database credentials: mysqlconnect_umls.php and mysqlconnect_sandbox.php
Create an account in plot.ly and modify the file contained on the PYTHON directory to enter the username and key of your account on the following line py.sign_in('user', 'key').
Copy the content of the of the PHP, JAVA and PYTHON to the sudirectory of the root directory of your web server where you want the system to be access.
Go to this subdirectory in your browsers and add at the end of it "/lookup.php" and you should be able to start using the our system.
The behavior of the classifier can be change by modifying the file on the JAVA/src directory but in order to do this you will need to clone the Hadoop and Mahout repositories.

Note: Please make sure that apache have read and write privileges to the location were you installed system G. If you encounter any other problems and can't figure it a solution, please feel free to contact jaa2220@cumc.columbia.edu to assist you with MySQL, SystemG and Classifier, jj2807@columbia.edu for visualization or mz2517@columbia.edu for PHP related issues.

Project by Jose Alvarado-Guzman (jaa2220), Josh Jacobson (jj2807), and Mohammad Zaryab (mz2517).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAVA

JAVA

MySQL

MySQL

PHP

PHP

PYTHON

PYTHON

SYSTEMG

SYSTEMG

README.md

README.md

Repository files navigation

Biomedical Search Engine

Dependencies

Steps to deploy the system.

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
JAVA		JAVA
MySQL		MySQL
PHP		PHP
PYTHON		PYTHON
SYSTEMG		SYSTEMG
README.md		README.md

Sapphirine/biomedical_search_engine

Folders and files

Latest commit

History

Repository files navigation

Biomedical Search Engine

Dependencies

Steps to deploy the system.

About

Resources

Stars

Watchers

Forks

Languages