Skip to content
NLP Project developed for Grand Finale (National Level, Software Edition) of Smart India Hackathon 2019 for ezDI.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Sample Dataset
data
179963_PDOC-ITS20171010-0406_20171014182534.851.xml
180702_PDOC-ITS20180109-0080_20180110160418.842.xml
186462_PDOC-ITS20171011-0214_20171011152236.258.xml
FINAL-Schedule-for-SIH2019-Software-edition-Grand-Finale.pdf
LICENSE
README.md
Smart India Hackathon 2019 - Problem 2 - Patient Case Similarity.pdf
idea_ppt.pdf
pre_process.py
pre_process_multi_threaded.py
presentation_ppt.pdf
script.sh
sent_text.xsl
vectors.txt
wmd.py
xml_to_csv.xsl

README.md

Patient Case Similarity

patient-case-similarity is a Natural Language Processing (NLP) project used to calculate the similarity between two patients using WMD (Word Movers Distance).

The project was done as a part of Smart India Hackathon 2019 at National Institute of Technology Warangal, India. Project depicted was presented at the national level of SIH 2019.

How to use?

The user needs to provide the permission in linux Terminal using chmod +x script.sh

After the permission is granted, user has to run the script.sh

If you're having trouble running the script, you can type the commands manually in Linux terminal present in script.sh file.

Work Flow

The sample data (given to us) is present ./'Sample Dataset'/NER_XML/ The file used in shell script is copied from here to the root directory.

xslt is used for transforming and extracting annotated data into .csv from .xml

The data from .xml file is extracted and stored in ./data/ directory. The extracted data is preprocessed using the pre_process.py and finally the WMD Score is calculated using wmd.py.

You can’t perform that action at this time.