This platform analyzes the risk of covid-19 for the user's comorbidities based on natural language processing. The analysis was based on version 81 of the CORD-19 dataset.
- User makes input about comobidities.
- Print excerpts of paragraphs deeply related to the user's comorbidities.
- Provide information about the paper, such as authors, design method and number of samples.
- Particularly relevant parts are highlighted in green.
BioBERT fine-tuned on SQuAD2.0 and BM25 were used for paper selection and paragraph excerpts.
Bootstrap, Node.js & Express, EJS, npm's python shell and Apache were used for building web.
Data preprocessing, model architecture, etc. were referenced from other great Kaggle notebooks. In particular, for additional metadata such as design methods and number of samples, we refer to this Kaggle notebook. The notebooks referenced are summarized below.
Team Members:
Seokhan Noh, Seungun Jang
-
We assume you have installed PyTorch, necessary CUDA packages and Node.js.
# Setup virtual environment using conda conda create -n ccnw conda activate ccnw # Install git-lfs curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash sudo apt install git-lfs # Clone repository git clone https://github.com/AI-STACK-dev/Covid19-Comorbidities-NLP-WEB.git cd Covid19-Comorbidities-NLP-WEB pip install -r requirements.txt
If everything fine, run
node main.js
This project would not have been possible without following great resources.
-
Data
-
Data preprocessing
- Nakatani Shuyo's language-detection
- davidmezzetti's paperai
- davidmezzetti's paperetl
- dskswu's topic-modeling-bert-lda
- danielwolffram's cord-19-create-dataframe
- fmitchell259's create-corona-csv-file
- davidmezzetti's cord-19-etl
- davidmezzetti's cord19-study-design
- davidmezzetti's cord19-fasttext-vectors
-
NLP model & pipeline