In [7]:
import pandas as pd
import config

How to decrease the learning rate of pytorch optimzier ?

In [3]:
import torch
import torch.optim as optim
from torch.optim.lr_scheduler import ReduceLROnPlateau

# Assuming you already have your model, loss function, and optimizer defined
model = ...
loss_fn = ...
optimizer = ...

# Create the ReduceLROnPlateau scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=3, verbose=True)

# Training loop
epochs = 20
best_accuracy = 0.0

for epoch in range(epochs):
    # Train the model
    # ...

    # Calculate the model performance (accuracy, loss, etc.)
    # For example:
    current_accuracy = 0.85  # Replace this with your actual model evaluation code.

    # Update the learning rate scheduler based on the current model performance
    scheduler.step(current_accuracy)

    # Check if the model performance improved and save the best model
    if current_accuracy > best_accuracy:
        best_accuracy = current_accuracy
        torch.save(model.state_dict(), 'best_model.pth')

    print(f"Epoch [{epoch+1}/{epochs}] - Accuracy: {current_accuracy:.4f}, Best Accuracy: {best_accuracy:.4f}")

# After training, load the best model
model.load_state_dict(torch.load('best_model.pth'))


"""
In this example, the learning rate will be reduced by a factor of 0.5 if the model's accuracy does not improve for 3 consecutive epochs (patience). The mode argument can be set to 'min' if you're using a loss function that you want to minimize instead of maximizing accuracy.

By using the "ReduceLROnPlateau" scheduler, you can dynamically adjust the learning rate based on the model's performance, which often leads to better convergence and more stable training.

"""

How read xlsx file with pandas ?

In [13]:
! pip install pandas openpyxl

import pandas as pd

# Replace 'your_file.xlsx' with the path to your XLSX file
file_path = 'your_file.xlsx'

# Read the XLSX file into a DataFrame
df = pd.read_excel(file_path)

# Now, you can work with the DataFrame 'df' as needed
print(df.head())  # Display the first few rows of the DataFrame

# Read a specific sheet named 'Sheet2' into a DataFrame
df = pd.read_excel(file_path, sheet_name='Sheet2')


## Handling Imbalance dataset

Data Augmentation: Augmenting the minority class (TB) by creating additional synthetic samples can help balance the dataset. Techniques like rotation, flipping, translation, and adding noise can be applied to the TB samples to generate more diverse examples.

Data Undersampling: Randomly removing samples from the majority class (normal) to reduce its dominance in the dataset. Undersampling can be effective when you have a large number of samples in the majority class and you're concerned about the computational overhead of working with an imbalanced dataset.

Data Oversampling: Duplicating or generating new samples for the minority class (TB) to increase its representation. You can use techniques like SMOTE (Synthetic Minority Over-sampling Technique) to create synthetic samples based on existing ones.

Class Weighting: Most machine learning algorithms and libraries allow you to assign different weights to each class during training. By giving higher weight to the minority class, the model focuses more on learning from those samples.

Generating Prototypes: For the minority class, you can generate prototypes using clustering algorithms or other techniques, which represent characteristic patterns of the class and then use them in the training process.

Using Different Evaluation Metrics: Accuracy is not always the best metric for imbalanced datasets. Instead, consider using evaluation metrics like precision, recall, F1-score, or area under the receiver operating characteristic curve (AUC-ROC), which give better insights into model performance with imbalanced data.

Ensemble Methods: Using ensemble methods like Random Forest, Gradient Boosting, or XGBoost can also help in dealing with imbalanced datasets, as they can handle class imbalance better than some other algorithms.

Model Selection: Experiment with different algorithms and architectures to see which ones perform better on imbalanced data. Some models, like SVM and decision trees, can handle imbalanced datasets well.

Combine Techniques: Often, the best results are achieved by combining multiple strategies. For example, you can apply data augmentation, oversampling, and class weighting together to improve performance.

Transfer Learning: Consider using transfer learning, where you leverage pre-trained models on larger datasets to fine-tune them on your imbalanced dataset. Pre-trained models have learned general features from vast amounts of data and can potentially perform better even with limited data.

## Class Weighting method

In [2]:
import torch
import torch.nn as nn

# Assuming you have defined your model, dataloaders, and other necessary components
device = ...
num_epochs = ...
train_dataloader = ...
# Calculate class weights based on the frequency of samples in each class
class_weights = torch.tensor([1.0, 3500/700])  # Weight of 'normal' class is 1.0, weight of 'TB' class is (3500/700)

# Convert the class weights to device (CPU or GPU)
class_weights = class_weights.to(device)

# Define the loss function with class weighting
criterion = nn.CrossEntropyLoss(weight=class_weights)

# Rest of the training loop remains the same
for epoch in range(num_epochs):
    # Training process
    for inputs, labels in train_dataloader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()


# Where dropout should be

In general, the dropout layer is usually placed before the ReLU activation layer. The typical order of operations in a neural network layer is as follows:

Linear transformation (e.g., a fully connected layer or a convolutional layer)
Dropout layer
Activation function (e.g., ReLU)
The dropout layer is a regularization technique that helps prevent overfitting by randomly setting a fraction of the input units to 0 during training. By placing the dropout layer before the ReLU activation, we allow the dropout to apply to the raw input values before they are passed through the activation function.

# Design patterns to use

### 1. Builder pattern
### 2. 