#Q1


This is an article about a system called MADEx that detects medications, adverse drug events, and their relations in clinical notes. It discusses the challenges of using clinical text for this purpose, and the importance of accurate detection for pharmacovigilance. MADEx is a machine learning-based system that uses recurrent neural networks (RNNs). The system was evaluated in a competition and achieved high performance.

#Literature Review on ADE Detection:

The paper titled "MADEx: A Machine Learning-based NLP System for Detecting Medications, Adverse Drug Events, and Their Relations from Clinical Notes" discusses the use of deep learning models in the context of Adverse Drug Events (ADEs) detection from Electronic Health Records (EHRs). The key findings, methodologies, datasets, and performance metrics from the relevant studies are summarized as follows:

#1. Key Findings:

Early detection of ADEs from EHRs is crucial for pharmacovigilance and drug safety surveillance.
Clinical NLP plays a vital role in extracting information from unstructured clinical text for ADE detection.
The study introduces MADEx, a machine learning-based clinical NLP system, for detecting medications, ADEs, and their relations from clinical notes.
MADEx utilizes a Recurrent Neural Network (RNN) model with Long Short-Term Memory (LSTM) for clinical Named Entity Recognition (NER) and a Support Vector Machines (SVMs) model for relation extraction.

#2. Methodologies:

For clinical NER, the study compares the LSTM-CRFs model with a baseline Conditional Random Fields (CRFs) model. The LSTM-CRFs model incorporates character-level embedding, bidirectional LSTM, and dropout.
Relation extraction involves the comparison of SVMs and Random Forests (RFs) for both single-sentence and cross-sentence relations.
The integrated pipeline combines the NER and relation extraction modules into a unified system for extracting entities and relations together.

#3. Datasets:

The study uses the Medication and Adverse Drug Events (MADE1.0) challenge dataset, consisting of 1,089 de-identified clinical notes with annotations for medications, ADEs, indications, and other signs and symptoms.
The dataset is divided into a training set of 876 notes and a test set of 213 notes, with annotations for 79,114 entities and 27,175 relations.

#4. Performance Metrics:

MADEx achieves top three best performance (F1-score of 0.8233) for clinical NER in the 2018 MADE1.0 challenge.
The relation extraction module and integrated pipeline of MADEx are comparable to the best systems developed in the challenge according to post-challenge evaluation.

#5. Conclusion:

The study demonstrates the efficiency of deep learning methods, specifically LSTM-CRFs, for automatic extraction of medications, ADEs, and their relations from clinical text.
Combining recurrent neural networks and support vector machines in a hybrid system achieves good performance in detecting medications, adverse drug events, and their relations.
The absence of validation data highlights the importance of having more samples in training, emphasizing the significance of dataset size in the absence of validation sets.

In summary, the paper highlights the advancements in ADE detection using deep learning models and emphasizes the significance of clinical NLP in extracting valuable information from unstructured clinical text for pharmacovigilance and drug safety surveillance. MADEx emerges as an effective system in this context, showcasing the potential of hybrid approaches combining deep learning and traditional machine learning methods.

#Q2

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from tqdm import tqdm

# Assuming the ADE dataset is loaded into two separate lists: sentences and labels
# You can modify the loading logic based on the actual format of your dataset

# Example loading logic
# with open('DRUG-AE.rel', 'r') as file:
#     lines = file.readlines()
#     sentences = [line.split('|')[1].strip() for line in lines]
#     labels = [1 if 'Adverse-Effect' in line else 0 for line in lines]

# Split the dataset into training and testing sets
sentences_train, sentences_test, labels_train, labels_test = train_test_split(sentences, labels, test_size=0.2, random_state=42)

# Define a simple dataset class
class ADEDataset(Dataset):
    def __init__(self, sentences, labels, tokenizer):
        self.sentences = sentences
        self.labels = labels
        self.tokenizer = tokenizer

    def __len__(self):
        return len(self.sentences)

    def __getitem__(self, idx):
        sentence = self.sentences[idx]
        label = self.labels[idx]
        tokens = self.tokenizer(sentence)  # Assuming you have a tokenizer function
        return {'tokens': tokens, 'label': label}

# Example tokenizer function
def tokenize(sentence):
    # Implement your own tokenizer (e.g., using spaCy, nltk, or simple split)
    return sentence.split()

# Hyperparameters
embedding_dim = 100
hidden_dim = 128
output_dim = 1
batch_size = 32
learning_rate = 0.001
num_epochs = 5

# Model architecture
class ADEModel(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, rnn_type='lstm'):
        super(ADEModel, self).__init__()
        self.embedding = nn.Embedding(len(vocab), embedding_dim)
        if rnn_type == 'lstm':
            self.rnn = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        elif rnn_type == 'bilstm':
            self.rnn = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, bidirectional=True)
        elif rnn_type == 'gru':
            self.rnn = nn.GRU(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        embedded = self.embedding(x)
        rnn_out, _ = self.rnn(embedded)
        output = self.fc(rnn_out[:, -1, :])
        output = self.sigmoid(output)
        return output

# Training loop
def train_model(model, train_loader, criterion, optimizer, num_epochs):
    model.train()
    for epoch in range(num_epochs):
        total_loss = 0.0
        for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}/{num_epochs}'):
            tokens = batch['tokens']
            labels = batch['label'].float()

            optimizer.zero_grad()
            output = model(tokens)
            loss = criterion(output, labels.unsqueeze(1))
            loss.backward()
            optimizer.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(train_loader)
        print(f'Training Loss: {avg_loss:.4f}')

# Evaluation function
def evaluate_model(model, test_loader, criterion):
    model.eval()
    all_labels = []
    all_preds = []
    total_loss = 0.0
    with torch.no_grad():
        for batch in tqdm(test_loader, desc='Evaluating'):
            tokens = batch['tokens']
            labels = batch['label'].float()

            output = model(tokens)
            loss = criterion(output, labels.unsqueeze(1))
            total_loss += loss.item()

            preds = output.cpu().numpy()
            all_labels.extend(labels.cpu().numpy())
            all_preds.extend(preds)

    avg_loss = total_loss / len(test_loader)
    y_true = [1 if label > 0.5 else 0 for label in all_labels]
    y_pred = [1 if pred > 0.5 else 0 for pred in all_preds]

    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    roc_auc = roc_auc_score(y_true, all_preds)

    print(f'Evaluation Loss: {avg_loss:.4f}')
    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')
    print(f'AUC-ROC: {roc_auc:.4f}')

# Instantiate the model, criterion, and optimizer
model_lstm = ADEModel(embedding_dim, hidden_dim, output_dim, rnn_type='lstm')
criterion = nn.BCELoss()
optimizer = optim.Adam(model_lstm.parameters(), lr=learning_rate)

# Create datasets and loaders
train_dataset = ADEDataset(sentences_train, labels_train, tokenize)
test_dataset = ADEDataset(sentences_test, labels_test, tokenize)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Train and evaluate the LSTM model
train_model(model_lstm, train_loader, criterion, optimizer, num_epochs)
evaluate_model(model_lstm, test_loader, criterion)
