## Fine-tuning Multiclassification 13 Cancer Symtoms

In [65]:
import os # Standard library for file and directory operations
import numpy as np # Scientific computing library for array objects
import pandas as pd # Data manipulation and analysis library
import torch  # PyTorch is a machine learning library for tensor operations and automatic differentiation
import torch.nn as nn  # Neural network module
import matplotlib.pyplot as plt  # Plotting library for creating static, interactive, and animated visualizations
from sklearn.model_selection import train_test_split # Scikit-learn model selection for splitting data
from torch.utils.data import DataLoader, TensorDataset  # DataLoader and TensorDataset for handling datasets in PyTorch

# Transformers library for NLP models and utilities
from transformers import (
    AutoTokenizer,                 # Tokenizer for converting text to tokens
    AutoModelForSequenceClassification, # Model for sequence classification tasks
    AdamW,                         # Adam optimizer with weight decay fix
    get_linear_schedule_with_warmup # Scheduler for learning rate adjustment
)

# Scikit-learn metrics module for model evaluation
from sklearn.metrics import (
    classification_report,  # Detailed classification metrics
    confusion_matrix,       # Compute the confusion matrix to evaluate accuracy
    precision_recall_curve, # Compute precision-recall pairs for different probability thresholds
    roc_auc_score,          # Compute the Area Under the Receiver Operating Characteristic Curve
    roc_curve,              # Compute ROC curve points
    auc                     # Compute the area under a curve
)


## Initialization Setup

This section set up for saving output files.

### Setup for Output Files

- **Folder Name Specification:**
  - A folder named `"results_BERT_models"` is designated to store all CSV output files from model evaluations or other outputs.
  
- **Folder Creation:**
  - The specified folder is created if it does not already exist. This ensures that all output files have a dedicated storage location, preventing any loss of data and maintaining organization.


In [66]:
# Define the folder name where you want to save the CSV files
folder_name = "results_BERT_models"

# Create the folder if it doesn't already exist
if not os.path.exists(folder_name):
    os.makedirs(folder_name)

## Data Loading and Preprocessing

This section outlines the steps taken to load and preprocess the datasets used in the analysis.

### Loading the Datasets

1. **Gold Standard Corpus:**
   - The dataset `df_Gold_Standard_Corpous.csv` contains the gold standard annotations and is loaded into a DataFrame named `df`.
2.  **External Data:**
  - The dataset `df_gpt_external.csv` contains external data for validation or testing, is loaded into the DataFrame `df_external`.
  
3. **Cohort Dataset:**
   - The `df_clean_merged_table.csv` file, which contains cohort data for labeling and identifying labels for each text document, is loaded into a DataFrame named `df_cohort_usecase`.


In [67]:
df=pd.read_csv('df_Gold_Standard_Corpous.csv')
df_external = pd.read_csv('gpt_external.csv')
df_cohort_usecase = pd.read_csv("df_clean_merged_table.csv")

### Data Preparation for Cohort Use Case

This section explains how the code processes a DataFrame specifically for a cohort use case, focusing on symptom analysis.

#### Libraries:
- **Pandas**: Used for handling and manipulating the data in DataFrame format.

#### Steps Involved:

1. **Extracting Cohort Symptoms**:
   - `cohort_symptoms` is derived from the columns of `df_cohort_usecase` starting from the fourth column onward. This is done using `df_cohort_usecase.columns[3:].tolist()`, which captures all columns after the first three as a list of symptoms.

2. **Renaming Columns**:
   - The `SentText` column in `df_cohort_usecase` is renamed to `Note` to standardize the column names or to reflect the content more accurately.

3. **Adding New Columns**:
   - A list `new_columns` is defined, containing the names of new symptoms to be added to the DataFrame. These symptoms include 'Fatigue', 'Depressed Mood', and several others, representing different health conditions relevant to the cohort analysis.

4. **Initializing New Columns**:
   - Each symptom in `new_columns` is added to `df_cohort_usecase` with an initial value of 0. This could represent the absence of these symptoms or a baseline value before actual data is populated. You can replace `0` with `None` or another appropriate default value depending on the context of the data and the analysis requirements.

#### Code Usage:
This script is tailored for situations where new data dimensions (symptoms) need to be integrated into an existing dataset, often in preparation for more extensive data analysis or machine learning modeling.



In [68]:
cohort_symptoms = df_cohort_usecase.columns[3:].tolist()
df_cohort_usecase = df_cohort_usecase.rename(columns={'SentText': 'Note'})

# List of new columns to add
new_columns = ['Fatigue', 'Depressed_Mood', 'Constipation', 'Anxiety', 'Swelling', 'Nausea',
               'Appetite_Loss', 'Pain', 'Numbness', 'Impaired_Memory', 'Pruritus',
               'Shortness_of_Breath', 'Disturbed_Sleep']

# Initialize each new column with a default value, e.g., None or 0
for column in new_columns:
    df_cohort_usecase[column] = 0  # Replace None with your desired default value



## Model Configuration and Data Preparation

This section provides an overview of the initial setup for machine learning models, including defining model abbreviations, setting a seed for reproducibility, and splitting the data for training and testing purposes.

### Model Names and Abbreviations

A dictionary named `model_names` maps various model identifiers to their abbreviations, enhancing the readability and ease of reference throughout the project:

- **BERT**: `bert-base-uncased`
- **SpanBERT**: `SpanBERT/spanbert-large-cased`
- **BioBERT**: `dmis-lab/biobert-v1.1`
- **ClinicalBERT**: `emilyalsentzer/Bio_ClinicalBERT`
- **SciBERT**: `allenai/scibert_scivocab_uncased`
- **PubMedBERT**: `microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract`
- **DistilBERT**: `distilbert-base-uncased`
- **Symptom BERT**: `New_Bio-Clinical_BERT_finetuned`


- **Seed Setting**: The seed is set to `42` to guarantee reproducibility.

- **Split Dataset**: The dataset is organized into distinct training and testing sets to facilitate effective model training and performance evaluation:

- **Symptom Label Extraction**:Symptom labels are extracted from the datasets to aid in model training and evaluation, ensuring that labels are consistently handled across different datasets:

- Main Dataset Symptoms: Extracted from columns starting after the third column in df.
- External Dataset Symptoms: Starting from the second column in df_external.
- Cohort Dataset Symptoms: Starting from the fourth column in df_cohort_usecase.


In [69]:
# Model names with abbreviations
model_names = {
    #"bert-base-uncased": "BERT",
    #"SpanBERT/spanbert-large-cased": "SpanBERT",
    #"dmis-lab/biobert-v1.1": "BioBERT",
    #"emilyalsentzer/Bio_ClinicalBERT": "Bio-ClinicalBERT",
    #"allenai/scibert_scivocab_uncased": "SciBERT",
   # "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract": "PubMedBERT",
   # "distilbert-base-uncased": "DistilBERT",
    "New_Bio-Clinical_BERT_finetuned": "Symptom_BERT"
}

# Set the seed for reproducibility
torch.manual_seed(42)

# Split the dataset into train and test sets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)

# Assuming all columns after 'Note' in the dataframe are symptom labels
symptoms = df.columns[3:].tolist()
ex_symptoms = df_external.columns[1:].tolist()
cohort_symptoms = df_cohort_usecase.columns[3:].tolist()
num_symptoms = len(symptoms)
num_symptoms

13

## Data Processing Function

The `process_data` function is designed to prepare textual data for machine learning models, specifically tailored for use with NLP models that require tokenized input. Below is a summary of its functionality:

### Parameters
- `df`: The DataFrame containing the text data.
- `labels_list`: A list of column names in `df` that contain the labels for the text data.
- `tokenizer`: A tokenizer object, typically from the `transformers` library, used to convert text into a format suitable for model input.
- `max_length`: The maximum length of the tokenized sequences. If texts are longer than this, they will be truncated to this length.

### Process
1. **Extract Texts and Labels**:
   - Texts are extracted from the column `'Note'` of the DataFrame.
   - Labels are obtained based on the `labels_list`, converted to numeric format, handling errors by coercion and filling missing values with zero.

2. **Tokenization**:
   - The extracted texts are tokenized using the provided `tokenizer`. The tokenizer pads or truncates the texts to ensure uniform length, specified by `max_length`.
   - This process generates `input_ids` and `attention_mask` which are essential for models to understand which parts of the sequence are meaningful and which are padding.

3. **Tensor Conversion**:
   - The labels are converted to a tensor of type `float32`.
   - `input_ids` and `attention_mask` are bundled with labels into a `TensorDataset`, which is a convenient format for loading data during model training or evaluation.

### Output
- The function returns a `TensorDataset` containing `input_ids`, `attention_mask`, and `labels_tensor`, ready to be used as input for training or inference in a PyTorch model.


In [70]:
# Data processing function
def process_data(df,labels_list, tokenizer, max_length=512):
    texts = df['Note'].tolist()
    labels = df[labels_list].apply(pd.to_numeric, errors='coerce').fillna(0).values
    encodings = tokenizer(texts, truncation=True, padding=True, return_tensors="pt", max_length=max_length)
    input_ids = encodings['input_ids']
    attention_mask = encodings['attention_mask']
    labels_tensor = torch.tensor(labels, dtype=torch.float32)
    dataset = TensorDataset(input_ids, attention_mask, labels_tensor)
    return dataset


## Evaluation Function for Loss

The `evaluate` function calculates the average loss of a model on a given dataset, useful for assessing the model's performance during or after training.

### Parameters
- `model`: The machine learning model to be evaluated.
- `data_loader`: A DataLoader object that provides batches of data in the form `(input_ids, attention_mask, labels)` format.
- `device`: The computational device (CPU or GPU) where the model computations are performed.
- `loss_fct`: The loss function used to compute the discrepancy between predicted outputs and actual labels.

### Process
1. **Model Preparation**:
   - The model is set to evaluation mode (`model.eval()`) which turns off specific layers and behaviors suited to training, like dropout layers, to ensure consistent predictions.

2. **Batch Processing**:
   - The function iterates over each batch provided by the `data_loader`.
   - Each batch's components (`input_ids`, `attention_mask`, and `labels`) are moved to the specified `device`, ensuring that both model and data are on the same device to avoid errors.

3. **Loss Computation**:
   - The model processes the `input_ids` and `attention_mask` to generate logits (model outputs before activation).
   - The specified `loss_fct` computes the loss between these logits and the actual labels.
   - The loss for each batch is accumulated to calculate the total loss over all batches.

4. **Average Loss Calculation**:
   - After processing all batches, the average loss is computed by dividing the total loss by the number of batches in the `data_loader`.

### Output
- Returns the average loss over all batches in the provided `data_loader`, giving a single scalar value that represents the model's performance in terms of the specified loss function on the given dataset.


In [71]:
# Evaluation function for loss

def evaluate(model, data_loader, device, loss_fct):
    model.eval()
    total_loss = 0.0
    for batch in data_loader:
        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].to(device)
        with torch.no_grad():
            outputs = model(input_ids, attention_mask=attention_mask)
            logits = outputs.logits
            loss = loss_fct(logits, labels)
            total_loss += loss.item()
    avg_loss = total_loss / len(data_loader)
    return avg_loss

### Function: `generate_predictions`

This function is designed to generate predictions and their corresponding true labels from a provided data loader. It is intended for use with a pre-trained PyTorch model and assumes a multi-label classification context.

#### Parameters:
- **data_loader** (`DataLoader`): A PyTorch DataLoader that iterates over the dataset, providing batches of input data.
- **symptoms** (`list`): A list of symptoms or conditions that are being predicted. These are used to key the output dictionaries.
- **model** (`torch.nn.Module`): The trained model that will generate predictions.
- **device** (`torch.device`): The device (CPU or GPU) where the model computations will be performed.

#### Returns:
- **tuple** (`dict`, `dict`): A tuple of two dictionaries. The first dictionary contains lists of predictions for each symptom, while the second contains the corresponding true labels.

#### Code Explanation:

- **Model Preparation**:
  - Sets the model to evaluation mode using `model.eval()`, which disables dropout and batch normalization during inference, ensuring consistent predictions.

- **Initialization**:
  - Initializes two dictionaries, `predictions` and `true_labels`, which will store the predicted values and actual labels for each symptom respectively. These dictionaries use symptoms as keys.

- **Prediction Generation**:
  - Iterates over each batch from the `data_loader`:
    1. **Data Transfer**: Transfers input IDs and attention masks to the specified device to ensure computations are performed in the correct hardware context.
    2. **Inference**: Feeds the inputs through the model to obtain logits, the raw model outputs.
    3. **Data Conversion**: Converts the logits and labels from PyTorch tensors to numpy arrays for easier manipulation.
    4. **Data Aggregation**: Updates the `predictions` and `true_labels` dictionaries with the predicted and true values for each symptom, converting logits to lists of predicted probabilities.

- **Output**:
  - Once all batches are processed, the function returns the `predictions` and `true_labels` dictionaries, which now contain the complete set of predictions and actual labels for each symptom across all data.

#### Example Usage:

```python
# Assuming model, data_loader, device, and symptoms list are predefined
predictions, true_labels = generate_predictions(data_loader, symptoms, model, device)


In [72]:
def generate_predictions(data_loader, symptoms, model, device):
    model.eval()

    # Initialize dictionaries to store predictions and true labels for each symptom
    predictions = {symptom: [] for symptom in symptoms}
    true_labels = {symptom: [] for symptom in symptoms}

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch[0].to(device)
            attention_mask = batch[1].to(device)
            labels = batch[2].cpu().numpy()  # Convert labels tensor to numpy array

            outputs = model(input_ids, attention_mask=attention_mask)
            logits = outputs.logits.cpu().numpy()  # Convert logits tensor to numpy array

            # Update predictions and true labels for each symptom
            for idx, symptom in enumerate(symptoms):
                predictions[symptom].extend(logits[:, idx].tolist())
                true_labels[symptom].extend(labels[:, idx].tolist())
                
    return predictions, true_labels

### Function: `calculate_metrics`

This function computes various classification metrics for each symptom in a given dataset. It handles binary classification metrics, including precision, recall (sensitivity), F1-score, specificity, accuracy, AUPRC (Area Under the Precision-Recall Curve), and AUC (Area Under the Receiver Operating Characteristic Curve).

#### Parameters:
- **predictions** (`dict`): A dictionary where each key is a symptom and the associated value is a list of predicted probabilities for that symptom.
- **true_labels** (`dict`): A dictionary matching the `predictions` structure but containing the actual binary labels for each symptom.
- **symptoms** (`list`): A list of strings representing the symptoms for which metrics are calculated.

#### Returns:
- **dict**: A dictionary of dictionaries, where each top-level key is a metric name and each second-level key is a symptom, with the corresponding metric value.

#### Code Explanation:

- **Initialization**:
  - A dictionary called `metrics` is initialized to store the calculated metrics. Each metric type has its own nested dictionary keyed by symptoms.

- **Metric Calculation**:
  - Iterates over each symptom provided in the `symptoms` list:
    1. **Binarization of Predictions**: Converts predicted probabilities into binary outcomes based on a threshold of 0.5, where predictions equal to or above this threshold are treated as positive (1) and others as negative (0).
    2. **Confusion Matrix**: Utilizes `confusion_matrix` to derive true negatives (tn), false positives (fp), false negatives (fn), and true positives (tp) from the binarized predictions and true labels.
    3. **Calculation of Specific Metrics**:
       - **Specificity**: Computed as tn / (tn + fp).
       - **Accuracy**: Computed as (tp + tn) / (tp + tn + fp + fn).
       - **Classification Report**: Generates a detailed report including precision, recall, and F1-score using `classification_report` with `output_dict=True` for easy extraction of values.
    4. **AUPRC and AUC**:
       - **AUPRC**: Calculated using `precision_recall_curve` and `auc` to determine the area under the curve based on the precision and recall values.
       - **AUC**: Calculated using `roc_auc_score` which provides a statistical measure of how well predictions are capable of distinguishing between classes.

- **Return**:
  - The fully populated `metrics` dictionary is returned, containing all metrics for each symptom, allowing for a comprehensive assessment of model performance across multiple diagnostic categories.

#### Example Usage:

```python
# Assuming predictions, true_labels, and symptoms are predefined
metrics = calculate_metrics(predictions, true_labels, symptoms)
for symptom, metric_values in metrics.items():
    print(f"Metrics for {symptom}:")
    for metric, value in metric_values.items():
        print(f"{metric}: {value:.4f}")


In [73]:
def calculate_metrics(predictions, true_labels, symptoms):
    # Initialize dictionaries to store metrics for each symptom
    metrics = {
        'precision': {},
        'recall': {},
        'f1_score': {},
        'specificity': {},
        'accuracy': {},
        'auprc': {},
        'auc': {}
    }

    for symptom in symptoms:
        # Binarize predictions (considering 0.5 as threshold for demonstration)
        binarized_predictions = [1 if pred >= 0.5 else 0 for pred in predictions[symptom]]
        tn, fp, fn, tp = confusion_matrix(true_labels[symptom], binarized_predictions).ravel()
        specificity = tn / (tn + fp)
        accuracy = (tp + tn) / (tp + tn + fp + fn)
        report = classification_report(true_labels[symptom], binarized_predictions, output_dict=True, zero_division=0)
        metrics['precision'][symptom] = report['weighted avg']['precision']
        metrics['recall'][symptom] = report['weighted avg']['recall']  # Same as sensitivity
        metrics['f1_score'][symptom] = report['weighted avg']['f1-score']
        metrics['specificity'][symptom] = specificity
        metrics['accuracy'][symptom] = accuracy
        precision, recall, _ = precision_recall_curve(true_labels[symptom], predictions[symptom])
        metrics['auprc'][symptom] = auc(recall, precision)
        metrics['auc'][symptom] = roc_auc_score(true_labels[symptom], predictions[symptom])

    return metrics


### Function: `fine_tune_model`

This function fine-tunes a pre-trained model for a specific task using training and validation data loaders. It is structured to work with PyTorch models and leverages optimizers and schedulers from the Hugging Face Transformers library.

#### Parameters:
- **model** (`torch.nn.Module`): The model to be fine-tuned.
- **train_loader** (`DataLoader`): DataLoader for the training set, supplying batches of data.
- **val_loader** (`DataLoader`): DataLoader for the validation set, used for evaluating model performance after each epoch.
- **device** (`torch.device`): The device (CPU or GPU) on which the model computations are performed.
- **epochs** (`int`): Number of total epochs to run the training.
- **learning_rate** (`float`): Initial learning rate for the optimizer.
- **weight_decay** (`float`): Weight decay coefficient for L2 penalty (regularization).

#### Workflow:
1. **Optimizer Initialization**:
   - An `AdamW` optimizer is created with parameters from the model, along with specified learning rate and weight decay. This optimizer is known for combining the benefits of both Adam optimization and L2 regularization.

2. **Scheduler Setup**:
   - A linear schedule with warmup is configured using `get_linear_schedule_with_warmup`. It gradually increases the learning rate from 0 to the initial rate over a number of warmup steps (here set to 0), then linearly decreases it over the total training steps.

3. **Training Loop**:
   - For each epoch:
     - Sets the model to training mode.
     - Initializes `total_loss` to zero.
     - Iterates over each batch in the `train_loader`:
       - Transfers input IDs, attention masks, and labels to the specified device.
       - Performs a forward pass to compute logits.
       - Calculates loss using a predefined loss function.
       - Performs backpropagation and updates the model parameters.
       - Clips gradients to a maximum norm of 1.0 to prevent exploding gradients.
       - Steps the optimizer and the scheduler to update learning rate.
     - After processing all batches, computes the average loss for the epoch.

4. **Validation**:
   - At the end of each epoch, evaluates the model on the validation set using a separate function `evaluate`. This step helps monitor overfitting and adjust training dynamically.

5. **Logging**:
   - Prints the training and validation loss for each epoch to track progress and performance improvements.

#### Example Usage:

```python
# Assuming model, train_loader, val_loader, and device are predefined
epochs = 5
learning_rate = 1e-5
weight_decay = 0.01
fine_tune_model(model, train_loader, val_loader, device, epochs, learning_rate, weight_decay)


In [74]:
# Fine-tuning function
def fine_tune_model(model, train_loader, val_loader, device, epochs, learning_rate, weight_decay):
    optimizer = AdamW(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    total_steps = len(train_loader) * epochs
    scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)
    
    for epoch in range(epochs):
        model.train()
        total_loss = 0.0
        for batch in train_loader:
            input_ids = batch[0].to(device)
            attention_mask = batch[1].to(device)
            labels = batch[2].to(device)

            optimizer.zero_grad()
            outputs = model(input_ids, attention_mask=attention_mask)
            logits = outputs.logits

            loss = loss_fct(logits, labels)
            total_loss += loss.item()

            loss.backward()
            nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
            optimizer.step()
            scheduler.step()

        # Evaluation on the validation set
        val_loss = evaluate(model, val_loader, device, loss_fct)
        print(f"Epoch {epoch + 1} Loss: {total_loss / len(train_loader):.4f}, Val Loss: {val_loss:.4f}")

### Model Fine-Tuning and Evaluation Workflow

The provided script outlines a comprehensive workflow for fine-tuning, evaluating, and generating predictions with machine learning models, particularly for sequence classification tasks using the Hugging Face Transformers library.

#### Workflow Overview:

1. **Model and Tokenizer Initialization**:
   - For each model specified in the `model_names` dictionary:
     - A tokenizer is initialized using `AutoTokenizer.from_pretrained`, specifying the model's name.
     - A model is initialized with `AutoModelForSequenceClassification.from_pretrained`, configured to classify a number of labels equal to the length of `symptoms`.

2. **Data Processing**:
   - Data is prepared by processing datasets (`train_df`, `test_df`, `df_external`, and `df_cohort_usecase`) into formats suitable for training and evaluation, using a custom function `process_data`.

3. **Data Loader Setup**:
   - DataLoaders for training, testing, external evaluation, and cohort analysis are created with defined batch sizes and shuffle settings.

4. **Device Configuration**:
   - Determines if CUDA is available and sets the device accordingly, ensuring that models and computations are moved to GPU if available.

5. **Optimization and Loss Configuration**:
   - Configures the AdamW optimizer with specific learning rates and weight decay for regularization.
   - Sets a binary cross-entropy loss function for the sequence classification tasks.

6. **Model Fine-Tuning**:
   - Calls `fine_tune_model` for actual training and validation using defined parameters, optimizers, and data loaders.

7. **Model Evaluation**:
   - Evaluates the fine-tuned model on test,and external to calculate losses.

8. **Prediction and Metrics Calculation**:
   - Generates predictions for each dataset using `generate_predictions`.
   - Calculates classification metrics using `calculate_metrics`, assessing model performance across various metrics such as precision, recall, F1 score, and AUC.

9. **Metrics Storage and Output**:
   - Converts metrics into pandas DataFrames for easy viewing and analysis.
   - Prints and saves the metrics to CSV files, facilitating further analysis and reporting.

#### Example Console Outputs:
- The script logs the progress of model fine-tuning and displays losses and metrics, ensuring that the user is informed of the model performance at every step.
- After processing, it outputs DataFrames summarizing the precision, recall, F1 score, and AUC for each dataset, providing a clear and structured presentation of results.

#### Usage of External Functions:
- `fine_tune_model`, `evaluate`, `generate_predictions`, and `calculate_metrics` are assumed to be defined externally, performing specific tasks as described in the workflow.

#### Example of Metrics Output:
```python
print("Test evaluation")
print(test_df_metrics)


In [75]:
# For each model in the dictionary
for model_name, abbreviation in model_names.items():
    print(f"fine-tuning model: {abbreviation}")

    # Initialize Model and Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=num_symptoms)
    # Assuming you are using a model from the transformers library and need to set the number of output labels
   
    # Data Processing
    train_dataset = process_data(train_df, symptoms, tokenizer)
    test_dataset = process_data(test_df, symptoms, tokenizer)
    external_dataset = process_data(df_external, ex_symptoms, tokenizer)
    cohort_dataset = process_data(df_cohort_usecase, cohort_symptoms, tokenizer)
    
    # Define the data loaders
    batch_size = 4
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
    external_loader = DataLoader(external_dataset, batch_size=batch_size, shuffle=False)
    cohort_loader=DataLoader(cohort_dataset, batch_size=batch_size, shuffle=False)

    # Set the device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)

    # Define the optimizer and loss function
    optimizer = AdamW(model.parameters(), lr=2e-5)
    loss_fct = nn.BCEWithLogitsLoss()

    # Fine-tuning and evaluation call
    fine_tune_model(model, train_loader, external_loader, device, epochs=5, learning_rate=3e-5, weight_decay=0.01)

    # After fine-tuning, evaluate on test, external datasets, and cohort dataset
    test_loss = evaluate(model, test_loader, device, loss_fct)
    external_loss = evaluate(model, external_loader, device, loss_fct)
    cohort_loss = evaluate(model, cohort_loader, device, loss_fct)

    print(f"Test Loss: {test_loss:.4f}")
    print(f"External Test Loss: {external_loss:.4f}")
    print(f"*********************************************************************************************")
    
    # Generate predictions and metrics for the test set
    test_predictions, test_true_labels = generate_predictions(test_loader, symptoms, model, device)
    test_metrics = calculate_metrics(test_predictions, test_true_labels, symptoms)
    # Generate predictions and metrics for the external set
    external_predictions, external_true_labels = generate_predictions(external_loader, ex_symptoms, model, device)
    external_metrics = calculate_metrics(external_predictions, external_true_labels, ex_symptoms)
    # Generate predictions and metrics for the cohort datset
    cohort_predictions, cohort_true_labels = generate_predictions(cohort_loader, cohort_symptoms, model, device)
    
    
    # Convert dictionaries to DataFrames for a tabular view
    test_df_metrics = pd.DataFrame({
    'symptom': symptoms,
    'Test Precision': [test_metrics['precision'][s] for s in symptoms],
    'Test Recall': [test_metrics['recall'][s] for s in symptoms],
    'Test F1 Score': [test_metrics['f1_score'][s] for s in symptoms],
    'Test AUC': [test_metrics['auc'][s] for s in symptoms]
     })
    print("Test evaluation")
    print(test_df_metrics)
    metrics_filename = os.path.join(folder_name, f"{abbreviation}_test_metrics.csv")
    test_df_metrics.to_csv(metrics_filename, index=False)
    print(f"Saved test metrics to {metrics_filename}")
    print(f"*********************************************************************************************")
    
    external_df_metrics = pd.DataFrame({
    'symptom': symptoms,  # Assuming external_loader uses the same  structure
    'Ext_Precision': [external_metrics['precision'][s] for s in symptoms],
    'Ex_Recall': [external_metrics['recall'][s] for s in symptoms],
    'Ex_F1 Score': [external_metrics['f1_score'][s] for s in symptoms],
    'Ex_AUC': [external_metrics['auc'][s] for s in symptoms]
    })
    print("Extenal_Validation")
    print (external_df_metrics)
    metrics_filename = os.path.join(folder_name, f"{abbreviation}_external_metrics.csv")
    external_df_metrics.to_csv(metrics_filename, index=False)
    print(f"Saved test metrics to {metrics_filename}")
    print(f"*********************************************************************************************")


fine-tuning model: Symptom_BERT


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at New_Bio-Clinical_BERT_finetuned and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1 Loss: 0.3245, Val Loss: 0.5173
Epoch 2 Loss: 0.2223, Val Loss: 0.4702
Epoch 3 Loss: 0.1555, Val Loss: 0.5240
Epoch 4 Loss: 0.1227, Val Loss: 0.5355
Epoch 5 Loss: 0.1063, Val Loss: 0.5078
Test Loss: 0.1358
External Test Loss: 0.5078
*********************************************************************************************
Test evaluation
                symptom  Test Precision  Test Recall  Test F1 Score  Test AUC
0               Fatigue        0.956578     0.958140       0.956907  0.981894
1        Depressed_Mood        0.911246     0.920930       0.913724  0.882180
2          Constipation        0.986869     0.986047       0.986362  0.997073
3               Anxiety        0.893738     0.893023       0.868514  0.898843
4              Swelling        0.919302     0.925581       0.917233  0.900842
5                Nausea        0.926233     0.930233       0.921375  0.833474
6         Appetite_Loss        0.986251     0.986047       0.985188  0.969535
7                  Pain   

In [None]:
from sklearn.metrics import classification_report, roc_auc_score, roc_curve, auc
import matplotlib.pyplot as plt

# Evaluation
model.eval()

# Initialize dictionaries to store predictions and true labels for each symptom
predictions = {symptom: [] for symptom in symptoms}
true_labels = {symptom: [] for symptom in symptoms}

with torch.no_grad():
    for batch in test_loader:
        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].cpu().numpy()  # Convert labels tensor to numpy array

        outputs = model(input_ids, attention_mask=attention_mask)
        logits = outputs.logits.cpu().numpy()  # Convert logits tensor to numpy array

        # Update predictions and true labels for each symptom
        for idx, symptom in enumerate(symptoms):
            predictions[symptom].extend(logits[:, idx].tolist())  # Get predicted probabilities for symptom
            true_labels[symptom].extend(labels[:, idx].tolist())

plt.figure(figsize=(12, 10))

for symptom in symptoms:
    # Calculate ROC curve and AUC for each symptom
    fpr, tpr, _ = roc_curve(true_labels[symptom], predictions[symptom])
    roc_auc = auc(fpr, tpr)
    plt.plot(fpr, tpr, label=f'{symptom} (AUC = {roc_auc:.3f})')
    
    # Binarize predictions for precision, recall, F1-score calculations
    binarized_predictions = [1 if pred >= 0.5 else 0 for pred in predictions[symptom]]

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic for Each Symptom with RoBERTa')
plt.legend(loc='lower right')
plt.tight_layout()


# Display the plot
plt.show()

## Labeling cohort study based on the Symtom-BERT

In [None]:
import numpy as np
import pandas as pd

# Create a new DataFrame by selecting relevant columns directly
df_cohort_results = df_cohort[['Note_ID', 'SentID']].copy()
df_cohort_results['NOTE_TXT'] = df_cohort['Note']

def sigmoid(x):
    # Sigmoid function to convert logits to probabilities, assuming x is already a numpy array
    return 1 / (1 + np.exp(-x))

# Assuming cohort_predictions are arrays, apply sigmoid function directly
probabilities = {key: sigmoid(np.array(value)) for key, value in cohort_predictions.items()}

def to_binary_labels(probabilities, threshold):
    # Convert probabilities to binary labels using a threshold
    return (probabilities >= threshold).astype(int)

# Convert probabilities to binary labels with a threshold of 0.5
binary_labels_50 = {key: to_binary_labels(value, 0.5) for key, value in probabilities.items()}

# Append predictions directly to df_cohort_results without looping over each symptom
for symptom in symptoms:
    df_cohort_results[f'{symptom}_pred_50'] = binary_labels_50[symptom]

# Save the DataFrame to a CSV file
df_cohort_results.to_csv('Case_Sentences_BERTLabels.csv', index=False)
print("DataFrame saved")
