This notebook builds and evaluates a binary image classification model using a pretrained VIT on chest X-ray images to detect pneumonia.

#**Downloading Dataset From Kaggle**

In [None]:
from google.colab import files
files.upload()

In [None]:
import os
import zipfile

# Make Kaggle folder and move file
os.makedirs('/root/.kaggle', exist_ok=True)
!mv kaggle.json /root/.kaggle/kaggle.json

# Set permissions
!chmod 600 /root/.kaggle/kaggle.json


In [None]:
!kaggle datasets download -d paultimothymooney/chest-xray-pneumonia

In [None]:
!unzip chest-xray-pneumonia.zip

#**Installing Required Libraries**

In [None]:
!pip install torch torchvision transformers datasets
!pip install mlflow
!pip install pyngrok


#**Hugging Face Authentication**

To access models and datasets hosted on the Hugging Face Hub, we need to authenticate using a personal access token (PAT). The following code logs you into the Hugging Face Hub using your token:

In [None]:
from huggingface_hub import login
login("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")


 #**Vision Transformer (ViT) Setup for Image Classification**

This code loads a pre-trained Vision Transformer (ViT) model from Hugging Face for binary image classification.

- `ViTImageProcessor`: Prepares input images for the model.
- `ViTForImageClassification`: Loads the ViT model.
- `num_labels=2`: Sets it up for two-class classification.
- `ignore_mismatched_sizes=True`: Allows changing the output layer to match our task.


In [None]:
from transformers import ViTForImageClassification, ViTImageProcessor
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torch import nn, optim
import mlflow
import mlflow.pytorch
import torch
model_name = 'google/vit-base-patch16-224'
processor = ViTImageProcessor.from_pretrained(model_name)
model = ViTForImageClassification.from_pretrained(
    model_name,
    num_labels=2,
    ignore_mismatched_sizes=True
)


#**Training Setup for ViT Model**

- **Device Selection**: Uses GPU (`cuda`) if available, otherwise CPU.
- **Image Transformations**: Resizes images to 224×224, normalizes using ViT's expected mean/std.
- **Dataset Loading**: Loads training, validation, and test images from folders.
- **DataLoaders**: Batches the data and shuffles the training set.
- **Loss Function**: Uses `CrossEntropyLoss` for classification.
- **Optimizer**: Uses `AdamW` optimizer with a learning rate of `5e-5`.

This setup prepares the data and model for training on chest X-ray images.


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=processor.image_mean, std=processor.image_std)
])
train_data = datasets.ImageFolder(root="/content/chest_xray/chest_xray/train", transform=transform)
val_data   = datasets.ImageFolder(root="/content/chest_xray/chest_xray/val", transform=transform)
test_data  = datasets.ImageFolder(root="/content/chest_xray/chest_xray/test", transform=transform)

train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
val_loader   = DataLoader(val_data, batch_size=32, shuffle=False)
test_loader  = DataLoader(test_data, batch_size=32, shuffle=False)
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=5e-5)


#**Training the Vision Transformer (ViT) Model on Chest X-Ray Images**
This code trains a Vision Transformer (ViT) model for 5 epochs on a chest X-ray image dataset. It uses a DataLoader to feed batches of images and labels to the model. For each batch, the model predicts outputs, computes the cross-entropy loss, performs backpropagation, and updates the model weights using the AdamW optimizer. After each epoch, it calculates and prints the average training loss and accuracy. The model is set to run on GPU if available.

In [None]:
num_epochs = 5
model.train()
for epoch in range(num_epochs):
    running_loss = 0.0
    correct = 0
    total = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(pixel_values=images).logits
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    train_accuracy = 100. * correct / total
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss:.4f}, Accuracy: {train_accuracy:.2f}%")


#**Saving Model**

In [None]:
torch.save(model.state_dict(), "vit_chest_model.pkl")


#**Predictions**

In [None]:
from PIL import Image

def predict_image(image_path, model, processor, device):
    model.eval()

    # Load and preprocess the image
    image = Image.open(image_path).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    inputs = {k: v.to(device) for k, v in inputs.items()}

    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        predicted_class = logits.argmax(-1).item()

    return predicted_class


In [None]:
image_path = "/content/NormalXray.png"
predicted_class = predict_image(image_path, model, processor, device)
class_names = train_data.classes
print(f"Predicted class: {class_names[predicted_class]}")
