GitHub - baishalidutta/EPAR-Sentiment-Analysis: Sentiment Analysis Tool for EPARs

Motivation

Pharmaceutical companies need to submit scientific evidence and clinical trial data to support an application for a new drug or for an existing drug. The European Medicine Agency (EMA) is the central health authority in Europe and is responsible for the application process.

Following an application by a pharmaceutical company and an extensive scientific evaluation, the EMA publishes European Public Assessment Report (EPAR). EPARs are freely available and importantly contain the scientific assessment of an application and reasons leading to refusal or approval.

Pharmaceutical companies extract relevant clinical efficacies from their EPARs that evaluate the setup and parameters of clinical trials. Regulatory colleagues from the pharmaceutical companies specify the sentiment of the appropriate extractions (feedbacks), which can either be positive, negative or neutral.

This repository, therefore, strives to provide a tool to analyse the associated sentiments of the feedback above. Besides, it also provides a detailed comparison of various methods for sentiment analysis.

Requirements

Python 3.7.0+
tensorflow 2.4.1+
Keras 2.4.3+
matplotlib 3.3.3+
numpy 1.19.5+
pandas 1.2.1+
scikit-learn 0.24.1+
nltk 3.5+
spacy 3.0.3+
textblob 0.15.3+
gradio 1.5.3+
click 7.1.2+
xgboost 1.3.3+
xlrd 2.0.1+
seaborn 0.11.1+

Dataset

The dataset cannot be provided since I don't have the copyright for it.

But, you still can provide any XLSX dataset containing the following columns:

ID
Sentence
Positive
Negative
Neutral

The ID is the index column and the Positive, Negative and Neutral columns are one-hot encoded, that is, for every row, one of these columns contains 1 whereas the others contain 0.

The following list enumerates different classes (types) of comments -

Positive	Negative	Neutral

Installation

Clone the repository

git clone https://github.com/baishalidutta/EPAR-Sentiment-Analysis.git

Install the required libraries

pip3 install -r requirements.txt

Model Classifiers

The dataset can be trained with several classifiers, including classical and deep learning models.

Currently, the following classical machine learning classifiers are implemented:

Naive Bayes
Decision Tree
Random Forest
XGBoost
Logistic Regression
Linear Support Vector
Kernel Support Vector Machine

And, the following deep learning classifier:

Recurrent Neural Network (RNN) with a Bidirectional LSTM layer

Data Cleaning

Lower all text
Correct misspelled words
Remove punctuations
Remove stop words

Data Preprocessing

TF-IDF Vectorizer for classical machine learning models
Tokenize text data and used Embedding Vector` using Glove.6B for deep learning model

Usage

Navigate to the source directory to execute the following source code.

To train or evaluate on your own, you can list the available options in the client application:

python3 app.py --help

To evaluate the analysis using default data (../data/sentences_with_sentiment.xlsx), you can simply execute the following:

python3 app.py --classifier ID

This will execute the following in order (except deep learning model):

Train Test Validation with default split of 0.2
Cross-Validations
- K-Folds Cross-Validation with default 36 splits
- Leave One Out Cross-Validation

You can also perform the grid search on classical machine learning classifiers:

python3 app.py --grid

This will execute the following in oder:

Grid Search with K-Folds Cross-Validation with default 36 splits
Grid Search with Leave One Out Cross-Validation

To execute the deep learning model, you can specify the associated ID, however, this will not execute K-Folds or Leave One Out cross-validations.

Alternatively, you can find the complete analysis in the notebook inside the notebook directory. To open the notebook, use either jupyter notebook or google colab or any other IDE that supports notebook feature such as PyCharm Professional.

Web Application

To run the web application locally, execute:

python3 app_web.py

This will start a local server that you can access in your browser. By default, the server will be started in http://127.0.0.1:7860/. You can type in any sentence, choose the classifier and find out which target class the classifier determines.

Developer

Baishali Dutta (me@itsbaishali.com)

License

This project is licensed under Apache License Version 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
model		model
notebook		notebook
plots		plots
presentation		presentation
source		source
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

baishalidutta/EPAR-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Motivation

Requirements

Dataset

Installation

Model Classifiers

Data Cleaning

Data Preprocessing

Usage

Web Application

Developer

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages