Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

posts/pytorch-train-mask-rcnn-tutorial/ #44

Open
utterances-bot opened this issue Dec 7, 2023 · 57 comments
Open

posts/pytorch-train-mask-rcnn-tutorial/ #44

utterances-bot opened this issue Dec 7, 2023 · 57 comments

Comments

@utterances-bot
Copy link

Christian Mills - Training Mask R-CNN Models with PyTorch

Learn how to train Mask R-CNN models on custom datasets with PyTorch.

https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/

Copy link

Hi thanks for the tutorial
I am wondering how can i employ backbone101 for this since the library provides only resnet50. How can i customize it.

Copy link
Owner

cj-mills commented Dec 8, 2023

Hi @nattafahhm,

You can create a resnet101 backbone using the resnet_fpn_backbone function from torchvision:

from torchvision.models.detection.backbone_utils import resnet_fpn_backbone

backbone = resnet_fpn_backbone('resnet101', pretrained=True)
model.backbone = backbone

You will probably need to train for longer and at a lower learning rate.

Copy link

fah-iiv commented Dec 8, 2023

Hi, if i create a resnet101 backbone using the resnet_fpn_backbone, then i can use only imagenet1k, right?
how can i use 21k

Copy link

I am a new member. Thanks for the great tutorial. I want to ask what software is used to label images? Thanks

Copy link
Owner

Hi @fah-iiv,

I don't believe torchvision has pretrained weights for the resnet101 model with ImageNet-21K. Are you referring to something from the timm library?

Copy link
Owner

Hi @tuan-nmt,

There are free annotation tools like CVAT and automated annotation methods, as shown in the following videos:

I've received several questions about this, so I'll try to make time for a tutorial.

Copy link

Hi
Thank you for the tutorial. Could you please let me know how the json file should be when I want to implement this code for two classes? For example I have a dataset which contains cats and the cats have number of spots on their body. So I think the classes are cats and their spots. But I don’t know how should the json file looks like and which part of the code should be modified.

Copy link
Owner

Hi @jetsonwork,

The answer to your question depends on what format you use for your dataset. The toy dataset used in the tutorial follows the annotation format for the LabelMe annotation tool. The tool's GitHub repository contains example annotations for instance-segmentation with multiple classes:

Copy link

Thanks for your response.

I’m preparing a dataset for a Mask R-CNN model, involving images of cats and smaller, distinct spots on these cats. While the dataset has more instances of “spots” than “cats,” the latter covers a much larger area in the images. I’m concerned this might bias the model toward the “cat” class due to its larger pixel coverage.

My question is:

Could this difference in area coverage introduce significant training bias towards the “cat” class?

Copy link
Owner

Hi @jetsonwork,

It could potentially introduce a training bias. However, I recommend getting to a point where you can iterate and experiment before worrying too much about that.

Copy link

Thank you for the amazing tutorials Chris!

Copy link

Hi, thanks a ton for the tutorial.
But i am facing an issue while training the model. As I am using my own dataset that contains multiple instances in each images. So while training it's raising an error ~

RuntimeError: Caught RuntimeError in DataLoader worker process 0.

Training stopped at 15% of first epoch. In this case what could be the problem? what i have to change?

Copy link
Owner

cj-mills commented Feb 1, 2024

Hi @enamulrafti,
The training code should work with images that contain multiple object instances. The toy dataset used in the tutorial contains images with more than one object, and I've used the code for other such datasets.

Would you mind providing more details from the RuntimeError and the OS and hardware (e.g., CPU or GPU) running the training code?

Copy link

fah-iiv commented Feb 8, 2024

Hi, thank you once again for the tutorial. I've successfully implemented the MASK-RCNN model following your guide. I have a question regarding the pretraining of MASK-RCNN: Is it possible to train the model with a certain set of classes and then fine-tune it on a different set of classes? For example, could I initially train the model on categories of cat species on "public dataset" and later fine-tune it to recognize different species on my own "dataset"? In this scenario, when I try to continue training the model with the new set of classes, I find that I cannot proceed without adjusting some of the output layers to account for the change in class types.

Additionally, should I consider freezing some parameters during this process, such as setting 'param.requires_grad' to True or False? Your advice on how to approach this would be greatly appreciated."

Copy link
Owner

cj-mills commented Feb 8, 2024

Hi @fah-iiv,
Are you looking to retain the trained classes from the public dataset when fine-tuning the model on your dataset? For example, if the public dataset contained 20 cat species, would you want to add new species but have the model still recognize the original 20?

Copy link

joekeo commented Feb 12, 2024

I am trying to run this on my laptop, as I dont have current access to the computer i have with GOU until next week. I tried specifying the device as CPU, but I keep getting this error:
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU

The issue is happens when using enumerate(train_dataloader), I tried to modify the source file to force it to use the CPU, but then the same error happens somewhere else.

I also tried using the install command for LINUX (CPU) ( im using linux) to no avail.

what can I do to try to run the tutorial on my laptop?

Copy link

joekeo commented Feb 12, 2024

Found the solution:
Pin_memory needs to be set to False when creating the dataloaders:
data_loader_params = {
'batch_size': bs, # Batch size for data loading
'num_workers': num_workers, # Number of subprocesses to use for data loading
'collate_fn': lambda batch: tuple(zip(*batch)),
'pin_memory': False,
'pin_memory_device': device
}

@cj-mills
Copy link
Owner

cj-mills commented Feb 12, 2024

Hi @joekeo,

Sorry about that. You are correct that you must turn off the pin_memory settings for the DataLoaders. I updated the code for some of my other tutorials to handle this automatically (link), but it appears I forgot to push the update for this one.

I'll update this tutorial when I have a chance, but for now, here is the new DataLoader initialization code so you don't need to change it manually when you get your GPU:

# Set the training batch size
bs = 4

# Set the number of worker processes for loading data. This should be the number of CPUs available.
num_workers = multiprocessing.cpu_count()

# Define parameters for DataLoader
data_loader_params = {
    'batch_size': bs,  # Batch size for data loading
    'num_workers': num_workers,  # Number of subprocesses to use for data loading
    'persistent_workers': True,  # If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the worker dataset instances alive.
    'pin_memory': 'cuda' in device,  # If True, the data loader will copy Tensors into CUDA pinned memory before returning them. Useful when using GPU.
    'pin_memory_device': device if 'cuda' in device else '',  # Specifies the device where the data should be loaded. Commonly set to use the GPU.
    'collate_fn': lambda batch: tuple(zip(*batch)),
}

# Create DataLoader for training data. Data is shuffled for every epoch.
train_dataloader = DataLoader(train_dataset, **data_loader_params, shuffle=True)

# Create DataLoader for validation data. Shuffling is not necessary for validation data.
valid_dataloader = DataLoader(valid_dataset, **data_loader_params)

# Print the number of batches in the training and validation DataLoaders
print(f'Number of batches in train DataLoader: {len(train_dataloader)}')
print(f'Number of batches in validation DataLoader: {len(valid_dataloader)}')

@joekeo
Copy link

joekeo commented Feb 13, 2024

Thanks for the help. It was a futile attempt as after fixing it, the estimated training time in my CPU is ~100 days, so will have to run it next week in the GPU.

Copy link
Owner

@joekeo Don't forget you can run it on a GPU with the free tier of Google Colab:

Copy link

how can i handle class imbalance in mask rcnn?

@cj-mills
Copy link
Owner

cj-mills commented Feb 22, 2024

Hi @nattafahhm,

I'd need more information about your current dataset, the feasibility of gathering more data samples, and your comfort level with doing outside research to modify the existing training code before giving specific recommendations.

However, the most straightforward approaches would be to oversample the underrepresented classes, undersample the overrepresented ones, or add new samples.

Oversampling introduces the risk of overfitting on those samples, which you can partially mitigate with data augmentation like those currently in the tutorial. A simple implementation would be duplicating the image and annotation files for the underrepresented classes.

Undersampling would mean not using all available samples in your dataset, which might prevent the model from seeing some required scenarios. A simple implementation would be to remove some images and associated annotation files from the overrepresented classes.

Adding more data would address the potential drawbacks of over and undersampling. However, this might be infeasible depending on the type and quantity of data required. That said, the next series of tutorials I have planned will demonstrate methods to streamline this process using automated annotation and synthetic data generation for object detection and instance segmentation tasks.

You could also try combining the three methods (e.g., do a small amount of oversampling, a small amount of undersampling, and add a small amount of new data) to try and balance the drawbacks of each.

Before going through the hassle of any of these methods, I'd try training the model with your existing dataset to see whether the current imbalance is a significant issue.

Copy link

Thanks CJ-Mills for this tutorial. I am looking for Google Collab Tutorial for Mask R CNN (Object Detection) but couldn't find one. Can you please do one?

@cj-mills
Copy link
Owner

cj-mills commented Feb 25, 2024

Hi @AliAdibArnab9,

It's in the Tutorial Code dropown in the Getting Started with the Code section:
Open In Colab: https://colab.research.google.com/github/cj-mills/pytorch-mask-rcnn-tutorial-code/blob/main/notebooks/pytorch-mask-r-cnn-training-colab.ipynb

Tutorial Section: https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/#getting-started-with-the-code

If you were looking for the tutorial for getting started with Colab, here is the link for that:
https://christianjmills.com/posts/google-colab-getting-started-tutorial/

Copy link

Thank you so much @cj-mills. It's a very details tutorial. I am just bit confused how I can implement my dataset with this code. I have a dataset and annotation (.json file). Should I change the whole code or is there any way I can modify this code and get result?

@cj-mills
Copy link
Owner

@AliAdibArnab9 Do you happen to know what annotation format your dataset uses? If it's a single JSON file for the whole dataset, my first guess would be it's in COCO format.

If so, I have a tutorial showing how to work with COCO segmentation annotations (the type you would use with Mask R-CNN) in PyTorch:

You could use that tutorial as a guide for how to modify the code in the Mask R-CNN tutorial, which uses the LabelMe annotation format.

Copy link

Hi Christian,
just have a question with the annotation. Is the each .json file is 'single file in VGG JSON format' or 'single file in COCO JSON format'?

@cj-mills
Copy link
Owner

@AliAdibArnab9 Both VGG and COCO tend to use a single JSON file. Check out the examples at the links below to see which format you have:

Copy link

Sorry again Christian. Seems like your annotation and my annotation gives me different result. I used makesense ai and downloaded single .json VGG file after annotating. When I open my annotation in a notebook I can see it is different from yours. Can you please tell me by which tool you did the annotation and got the .json file for each images?

(For example in your annotation file you have image_path, I dont have any such thing and thats why its showing me error)

@cj-mills
Copy link
Owner

cj-mills commented Mar 1, 2024

@AliAdibArnab9 I don't have a tutorial for working with VGG annotations, so that would explain the difference in results. This Mask R-CNN tutorial uses the LabelMe annotation format. I currently also have tutorials covering how to work with segmentation annotations in COCO and CVAT format, but not VGG.

Maksense.ai lets you export polygon annotations in COCO format in addition to VGG, so you can simply select the Single file in COCO JSON format option this time.

Copy link

Hi CJ,
Really great and comprehensive article! It is really rare to see a tutorial that include the env setup :)

On your class StudentIDDataset(Dataset): why do you have to convert the image into rgb using image = Image.open(filepath).convert('RGB')?

I'm working on satellite imagery, some of them are not necessarily RGB.

@cj-mills
Copy link
Owner

Hi @amrirasyidi,
The dataset class converts the images to RGB because that is what the Mask R-CNN model expects. What format are your images?

@amrirasyidi
Copy link

I see
Mine is RGBA.

In case the image is already in RGB, it should be okay to just use Image.open(filepath), right?

Anyway, I have another question.
In the json file, the shapes part, if I have 3 student ids in the same image, when I go to the json file and ctrl+f for "student_id", I should be seeing 3 result of it right? i.e. the shapes part should contain all the polygons of the mask

Copy link

Hi Christian, thank you for your tutorial! I have seen questions here regarding the conversion of a json file for the whole dataset to single image json annotations file. I wrote a brief code that work pretty well for conversion, just need to copy the annotation path from the json file for the whole dataset and set up the destination folder and let it run. Here is the github repository

https://github.com/Bombardelli/Convert-Roboflow-annotation-to-Labelme/blob/main/Convert-Roboflow-annotation-to-Labelme

@cj-mills
Copy link
Owner

@amrirasyidi
You are correct that it is alright to use Image.open(filepath) instead of Image.open(filepath).convert('RGB') when you know all the images are RGB.

Regarding your second question, your understanding is also correct. If you have an image with three annotated objects, the "shapes" section in the corresponding JSON file will store the polygon information for all three objects.

Example:

"shapes": [
        {
            "label": "student_id",
            "line_color": null,
            "fill_color": null,
            "points": [...
            ],
            "shape_type": "polygon",
            "flags": {}
        },
        {
            "label": "student_id",
            "line_color": null,
            "fill_color": null,
            "points": [...
            ],
            "shape_type": "polygon",
            "flags": {}
        },
        {
            "label": "student_id",
            "line_color": null,
            "fill_color": null,
            "points": [...
            ],
            "shape_type": "polygon",
            "flags": {}
        }
    ],

You can see a direct comparison between a LabelMe JSON segmentation file and the resulting pandas DataFrame and annotated image in the tutorial linked below:

@cj-mills
Copy link
Owner

@Bombardelli Thanks for sharing!

Copy link

Bombardelli commented Mar 21, 2024

Do you have a tutorial on how to implement inference with a webcam with the model from this tutorial in a similar way that YOLO does?

I tried several ways but it's simply not working:

import cv2
import torch
from torchvision.models.detection import maskrcnn_resnet50_fpn_v2
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from torchvision.transforms import functional as F

def get_model(num_classes):
    # Load a pre-trained Mask R-CNN model
    model = maskrcnn_resnet50_fpn_v2(weights='DEFAULT')
    
    # Get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # Replace the pre-trained head with a new one (adjust number of classes)
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    
    # Do the same for the mask predictor if your task involves instance segmentation
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256  # Typically the size of the hidden layer used in Mask R-CNN
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
    
    return model

# Example: Adjust for your specific number of classes (e.g., 16 classes + background)
num_classes = 17  # Including the background class
model = get_model(num_classes)

# Load the model state dictionary
model.load_state_dict(torch.load('path_to_saved_model_state.pth'))

model.eval()  # Set the model to inference mode

# Webcam feed
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Convert to tensor
    image = F.to_tensor(frame).unsqueeze(0)
    
    with torch.no_grad():
        predictions = model(image)
        
    # Post-process predictions and visualize results
    # This is left as an exercise depending on how you want to display the results.
    # For simplicity, we're just displaying the original webcam feed here.
    cv2.imshow('Webcam Live Inference', frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

@cj-mills
Copy link
Owner

cj-mills commented Mar 21, 2024

@Bombardelli,
I don't have a tutorial for that specifically, but you can probably make what you need using the following tutorials as a reference:

You should be able to swap the inference steps from the Mask R-CNN ONNX export tutorial into the while loop from the object tracking tutorial.

That said, I'm not sure the inference speed for the Mask R-CNN model will be fast enough for real-time inference from a webcam without further optimization or a sufficiently powerful GPU.

Copy link

I'm getting this error:
Can't pickle <function at 0x00000257C8F2CCA0>: attribute lookup on main failed

Cell In[35], line 24, in run_epoch(model, dataloader, optimizer, lr_scheduler, device, scaler, epoch_id, is_training)
21 progress_bar = tqdm(total=len(dataloader), desc="Train" if is_training else "Eval") # Initialize a progress bar
23 # Loop over the data
---> 24 for batch_id, (inputs, targets) in enumerate(dataloader):
25 # Move inputs and targets to the specified device
26 inputs = torch.stack(inputs).to(device)

@cj-mills
Copy link
Owner

Hi @waqarorakzai,

Are you running the tutorial code on Windows? If so, download the Windows notebook and the associated utility file using the following links:

Python multiprocessing works differently in Windows versus Linux, so the code requires a few tweaks.

Copy link

Hi @christian. thanks again for the tutorial. Is there any way to calculate AP, accuracy and confusion matrix from the code? because for calculating that the traditional libraries are more using on cocostyle format .json

Copy link

Do you have a specific reason on choosing PIL over cv2?

@cj-mills
Copy link
Owner

Hi @amrirasyidi,

PIL just ended up being my default for projects. Torchvision also has convenience functions for converting between PIL Images and PyTorch tensors.

@cj-mills
Copy link
Owner

Hi @AliAdibArnab9,

Sorry, I missed your question. You could probably use the same approach from this official PyTorch tutorial for calculating average precision:

It's not super helpful for the toy dataset used in my tutorial, but here is a quick example of the training notebook using the same evaluation code:

@MontassarTn
Copy link

MontassarTn commented May 5, 2024

Training Mask R-CNN on custom data, but the training doesn't stop and produces no output or errors
Here's a brief overview of my process:

1.I generated a dataset using PyTorch by applying the SAM mask from bounding boxes to my images.
2.After creating the dataset, I split it into training and testing sets.
3.I loaded both sets using torch.utils.data.DataLoader.
4.I'm using a pre-trained model with 11 classes.

this is the output of my dataset
image
Any help or insights would be greatly appreciated.

@cj-mills
Copy link
Owner

cj-mills commented May 9, 2024

Hi @MontassarTn,
Are you using this tutorial's training code? It appears from the screenshot you might be using something else. If so, I can't provide much insight without seeing your code.

Copy link

Loss is NaN

I am running this code on my personal computer with windows. I didn't change anything in the code. I get the following error at the end of the first epoch of training:

Loss is NaN or infinite at epoch 0, batch 0. Stopping training.

I checked the loss and the loss is NaN. The code until the training works well and gives the correct outputs. Is there anyone who knows how to fix this?

@MontassarTn
Copy link

Hi @MontassarTn, Are you using this tutorial's training code? It appears from the screenshot you might be using something else. If so, I can't provide much insight without seeing your code.

@cj-mills no I didn't, could I send to you my code?

@cj-mills
Copy link
Owner

Hi @EnesAgirman,

Are you using the Windows version of the training notebook with its associated utility file?

I set up a fresh conda environment (with CUDA 11.8) using the steps in the tutorial this morning and verified the Windows notebook successfully finished training.

@cj-mills
Copy link
Owner

@MontassarTn To be honest, I have very little spare time at the moment and would likely not even have a chance to go through it in the near term.

Also, these comment sections are for questions related to their associated tutorials, and I do not want to set a precedent of expanding that scope too much. It would simply be infeasible for me to address such a range of requests.

If you want to try using your dataset with this training code, I have tutorials on working with segmentation annotations in a few different formats in PyTorch.

Copy link

Hey cj-mills, thanks a lot for your tutorial, it really helped me. I even trained the model on my own custom dataset and now it's working. I've decided to work with videos now. Could you please guide me on how to write the code for videos?

@cj-mills
Copy link
Owner

Hi @averagemol,

If you want to iterate through the frames in a video and run the model on each frame, you can use the opencv-python package to load, iterate through, and write videos.

I cover how to do this in my object-tracking tutorial linked below:

You should be able to adapt the code from the linked section for use with a Mask R-CNN model.

Copy link

Hi, thanks for the great tutorial.
Would you think this finetuning approach works for my custom dataset, which comprises map images as well? I am concerned if the weights pre-trained on COCO would not work for different domains.

@cj-mills
Copy link
Owner

Hi @subaru3577,
I'd say it's probably still better to start with the pretrained weights. If you want to test both options, you can exclude the weights parameter when initializing the Mask R-CNN model.

# Initialize a Mask R-CNN model without pretrained weights
model = maskrcnn_resnet50_fpn_v2()

Copy link

Thank you so much for your kind advice!

Copy link

This is definitely a great tutorial, thanking once again.
But i am facing a new problem while training it on my own system (windows 10, GPU available).

the error is:
TypeError: Compose.call() takes 2 positional arguments but 3 were given

and the code where it the error raise:
# Apply the transformations, if any
if self._transforms:
image, target = self._transforms(image, target)

here i have checked image and target by printing it:

Image: <PIL.Image.Image image mode=RGB size=2066x2036 at 0x2BED6BEAC10>
Target: {'masks': tensor([[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]]]), 'boxes': BoundingBoxes([[ 511.,  238., 1413., 1834.],
           [1295.,  763., 1787., 1532.],
           [ 101.,  497.,  854., 1371.],
           [ 586., 1169.,  717., 1363.],
           [1176.,  302., 1742.,  865.]], format=BoundingBoxFormat.XYXY, canvas_size=(2036, 2066)), 'labels': tensor([ 1,  5,  8,  9, 19])}

@cj-mills
Copy link
Owner

cj-mills commented Jun 28, 2024

Hi @enamulrafti,
I just tested the Windows version of the tutorial code in a fresh mamba environment (Windows 10 with Nvidia GPU), and it ran without issue.

Based on the error message, my best guess is you are using v1 of torchvision transforms (torchvision.transforms) rather than v2 (torchvision.transforms.v2), which is required.

That or there is just something mistyped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests