posts/pytorch-train-mask-rcnn-tutorial/ #44

utterances-bot · 2023-12-07T06:54:39Z

Christian Mills - Training Mask R-CNN Models with PyTorch

Learn how to train Mask R-CNN models on custom datasets with PyTorch.

https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/

nattafahhm · 2023-12-07T06:54:40Z

Hi thanks for the tutorial
I am wondering how can i employ backbone101 for this since the library provides only resnet50. How can i customize it.

cj-mills · 2023-12-08T01:13:06Z

Hi @nattafahhm,

You can create a resnet101 backbone using the resnet_fpn_backbone function from torchvision:

from torchvision.models.detection.backbone_utils import resnet_fpn_backbone

backbone = resnet_fpn_backbone('resnet101', pretrained=True)
model.backbone = backbone

You will probably need to train for longer and at a lower learning rate.

fah-iiv · 2023-12-08T09:55:58Z

Hi, if i create a resnet101 backbone using the resnet_fpn_backbone, then i can use only imagenet1k, right?
how can i use 21k

tuan-nmt · 2023-12-12T09:37:31Z

I am a new member. Thanks for the great tutorial. I want to ask what software is used to label images? Thanks

cj-mills · 2023-12-12T18:42:11Z

Hi @fah-iiv,

I don't believe torchvision has pretrained weights for the resnet101 model with ImageNet-21K. Are you referring to something from the timm library?

cj-mills · 2023-12-12T18:42:53Z

Hi @tuan-nmt,

There are free annotation tools like CVAT and automated annotation methods, as shown in the following videos:

Roboflow: Accelerate Image Annotation with SAM and Grounding DINO | Python Tutorial
Ultralytics: Auto Annotation for generating segmentation dataset using YOLOv8 & SAM

I've received several questions about this, so I'll try to make time for a tutorial.

jetsonwork · 2023-12-16T04:14:38Z

Hi
Thank you for the tutorial. Could you please let me know how the json file should be when I want to implement this code for two classes? For example I have a dataset which contains cats and the cats have number of spots on their body. So I think the classes are cats and their spots. But I don’t know how should the json file looks like and which part of the code should be modified.

cj-mills · 2023-12-20T17:31:41Z

Hi @jetsonwork,

The answer to your question depends on what format you use for your dataset. The toy dataset used in the tutorial follows the annotation format for the LabelMe annotation tool. The tool's GitHub repository contains example annotations for instance-segmentation with multiple classes:

LabelMe Instance Segmentation Annotation Examples

jetsonwork · 2023-12-20T17:35:29Z

Thanks for your response.

I’m preparing a dataset for a Mask R-CNN model, involving images of cats and smaller, distinct spots on these cats. While the dataset has more instances of “spots” than “cats,” the latter covers a much larger area in the images. I’m concerned this might bias the model toward the “cat” class due to its larger pixel coverage.

My question is:

Could this difference in area coverage introduce significant training bias towards the “cat” class?

cj-mills · 2023-12-20T18:17:31Z

Hi @jetsonwork,

It could potentially introduce a training bias. However, I recommend getting to a point where you can iterate and experiment before worrying too much about that.

troymyname · 2024-02-01T12:32:36Z

Thank you for the amazing tutorials Chris!

enamulrafti · 2024-02-01T17:58:06Z

Hi, thanks a ton for the tutorial.
But i am facing an issue while training the model. As I am using my own dataset that contains multiple instances in each images. So while training it's raising an error ~

RuntimeError: Caught RuntimeError in DataLoader worker process 0.

Training stopped at 15% of first epoch. In this case what could be the problem? what i have to change?

cj-mills · 2024-02-01T18:34:48Z

Hi @enamulrafti,
The training code should work with images that contain multiple object instances. The toy dataset used in the tutorial contains images with more than one object, and I've used the code for other such datasets.

Would you mind providing more details from the RuntimeError and the OS and hardware (e.g., CPU or GPU) running the training code?

fah-iiv · 2024-02-08T02:43:20Z

Hi, thank you once again for the tutorial. I've successfully implemented the MASK-RCNN model following your guide. I have a question regarding the pretraining of MASK-RCNN: Is it possible to train the model with a certain set of classes and then fine-tune it on a different set of classes? For example, could I initially train the model on categories of cat species on "public dataset" and later fine-tune it to recognize different species on my own "dataset"? In this scenario, when I try to continue training the model with the new set of classes, I find that I cannot proceed without adjusting some of the output layers to account for the change in class types.

Additionally, should I consider freezing some parameters during this process, such as setting 'param.requires_grad' to True or False? Your advice on how to approach this would be greatly appreciated."

cj-mills · 2024-02-08T20:00:59Z

Hi @fah-iiv,
Are you looking to retain the trained classes from the public dataset when fine-tuning the model on your dataset? For example, if the public dataset contained 20 cat species, would you want to add new species but have the model still recognize the original 20?

joekeo · 2024-02-12T11:39:36Z

I am trying to run this on my laptop, as I dont have current access to the computer i have with GOU until next week. I tried specifying the device as CPU, but I keep getting this error:
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU

The issue is happens when using enumerate(train_dataloader), I tried to modify the source file to force it to use the CPU, but then the same error happens somewhere else.

I also tried using the install command for LINUX (CPU) ( im using linux) to no avail.

what can I do to try to run the tutorial on my laptop?

joekeo · 2024-02-12T13:06:52Z

Found the solution:
Pin_memory needs to be set to False when creating the dataloaders:
data_loader_params = {
'batch_size': bs, # Batch size for data loading
'num_workers': num_workers, # Number of subprocesses to use for data loading
'collate_fn': lambda batch: tuple(zip(*batch)),
'pin_memory': False,
'pin_memory_device': device
}

cj-mills · 2024-02-12T19:28:01Z

Hi @joekeo,

Sorry about that. You are correct that you must turn off the pin_memory settings for the DataLoaders. I updated the code for some of my other tutorials to handle this automatically (link), but it appears I forgot to push the update for this one.

I'll update this tutorial when I have a chance, but for now, here is the new DataLoader initialization code so you don't need to change it manually when you get your GPU:

# Set the training batch size
bs = 4

# Set the number of worker processes for loading data. This should be the number of CPUs available.
num_workers = multiprocessing.cpu_count()

# Define parameters for DataLoader
data_loader_params = {
    'batch_size': bs,  # Batch size for data loading
    'num_workers': num_workers,  # Number of subprocesses to use for data loading
    'persistent_workers': True,  # If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the worker dataset instances alive.
    'pin_memory': 'cuda' in device,  # If True, the data loader will copy Tensors into CUDA pinned memory before returning them. Useful when using GPU.
    'pin_memory_device': device if 'cuda' in device else '',  # Specifies the device where the data should be loaded. Commonly set to use the GPU.
    'collate_fn': lambda batch: tuple(zip(*batch)),
}

# Create DataLoader for training data. Data is shuffled for every epoch.
train_dataloader = DataLoader(train_dataset, **data_loader_params, shuffle=True)

# Create DataLoader for validation data. Shuffling is not necessary for validation data.
valid_dataloader = DataLoader(valid_dataset, **data_loader_params)

# Print the number of batches in the training and validation DataLoaders
print(f'Number of batches in train DataLoader: {len(train_dataloader)}')
print(f'Number of batches in validation DataLoader: {len(valid_dataloader)}')

joekeo · 2024-02-13T08:21:26Z

Thanks for the help. It was a futile attempt as after fixing it, the estimated training time in my CPU is ~100 days, so will have to run it next week in the GPU.

cj-mills · 2024-02-13T22:16:28Z

@joekeo Don't forget you can run it on a GPU with the free tier of Google Colab:

Open In Colab

nattafahhm · 2024-02-18T06:42:37Z

how can i handle class imbalance in mask rcnn?

cj-mills · 2024-02-22T18:21:34Z

Hi @nattafahhm,

I'd need more information about your current dataset, the feasibility of gathering more data samples, and your comfort level with doing outside research to modify the existing training code before giving specific recommendations.

However, the most straightforward approaches would be to oversample the underrepresented classes, undersample the overrepresented ones, or add new samples.

Oversampling introduces the risk of overfitting on those samples, which you can partially mitigate with data augmentation like those currently in the tutorial. A simple implementation would be duplicating the image and annotation files for the underrepresented classes.

Undersampling would mean not using all available samples in your dataset, which might prevent the model from seeing some required scenarios. A simple implementation would be to remove some images and associated annotation files from the overrepresented classes.

Adding more data would address the potential drawbacks of over and undersampling. However, this might be infeasible depending on the type and quantity of data required. That said, the next series of tutorials I have planned will demonstrate methods to streamline this process using automated annotation and synthetic data generation for object detection and instance segmentation tasks.

You could also try combining the three methods (e.g., do a small amount of oversampling, a small amount of undersampling, and add a small amount of new data) to try and balance the drawbacks of each.

Before going through the hassle of any of these methods, I'd try training the model with your existing dataset to see whether the current imbalance is a significant issue.

AliAdibArnab9 · 2024-02-25T23:29:03Z

Thanks CJ-Mills for this tutorial. I am looking for Google Collab Tutorial for Mask R CNN (Object Detection) but couldn't find one. Can you please do one?

cj-mills · 2024-02-25T23:58:19Z

Hi @AliAdibArnab9,

It's in the Tutorial Code dropown in the Getting Started with the Code section:
Open In Colab: https://colab.research.google.com/github/cj-mills/pytorch-mask-rcnn-tutorial-code/blob/main/notebooks/pytorch-mask-r-cnn-training-colab.ipynb

Tutorial Section: https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/#getting-started-with-the-code

If you were looking for the tutorial for getting started with Colab, here is the link for that:
https://christianjmills.com/posts/google-colab-getting-started-tutorial/

AliAdibArnab9 · 2024-02-26T01:48:20Z

Thank you so much @cj-mills. It's a very details tutorial. I am just bit confused how I can implement my dataset with this code. I have a dataset and annotation (.json file). Should I change the whole code or is there any way I can modify this code and get result?

cj-mills · 2024-02-26T01:56:03Z

@AliAdibArnab9 Do you happen to know what annotation format your dataset uses? If it's a single JSON file for the whole dataset, my first guess would be it's in COCO format.

If so, I have a tutorial showing how to work with COCO segmentation annotations (the type you would use with Mask R-CNN) in PyTorch:

Working with COCO Segmentation Annotations in Torchvision

You could use that tutorial as a guide for how to modify the code in the Mask R-CNN tutorial, which uses the LabelMe annotation format.

AliAdibArnab9 · 2024-02-29T23:32:34Z

Hi Christian,
just have a question with the annotation. Is the each .json file is 'single file in VGG JSON format' or 'single file in COCO JSON format'?

cj-mills · 2024-02-29T23:44:16Z

@AliAdibArnab9 Both VGG and COCO tend to use a single JSON file. Check out the examples at the links below to see which format you have:

AliAdibArnab9 · 2024-03-01T03:40:02Z

Sorry again Christian. Seems like your annotation and my annotation gives me different result. I used makesense ai and downloaded single .json VGG file after annotating. When I open my annotation in a notebook I can see it is different from yours. Can you please tell me by which tool you did the annotation and got the .json file for each images?

(For example in your annotation file you have image_path, I dont have any such thing and thats why its showing me error)

cj-mills · 2024-03-01T04:58:57Z

@AliAdibArnab9 I don't have a tutorial for working with VGG annotations, so that would explain the difference in results. This Mask R-CNN tutorial uses the LabelMe annotation format. I currently also have tutorials covering how to work with segmentation annotations in COCO and CVAT format, but not VGG.

Maksense.ai lets you export polygon annotations in COCO format in addition to VGG, so you can simply select the Single file in COCO JSON format option this time.

amrirasyidi · 2024-03-16T09:34:11Z

Hi CJ,
Really great and comprehensive article! It is really rare to see a tutorial that include the env setup :)

On your class StudentIDDataset(Dataset): why do you have to convert the image into rgb using image = Image.open(filepath).convert('RGB')?

I'm working on satellite imagery, some of them are not necessarily RGB.

cj-mills · 2024-03-17T19:40:59Z

Hi @amrirasyidi,
The dataset class converts the images to RGB because that is what the Mask R-CNN model expects. What format are your images?

amrirasyidi · 2024-03-18T03:55:41Z

I see
Mine is RGBA.

In case the image is already in RGB, it should be okay to just use Image.open(filepath), right?

Anyway, I have another question.
In the json file, the shapes part, if I have 3 student ids in the same image, when I go to the json file and ctrl+f for "student_id", I should be seeing 3 result of it right? i.e. the shapes part should contain all the polygons of the mask

Bombardelli · 2024-03-18T04:41:27Z

Hi Christian, thank you for your tutorial! I have seen questions here regarding the conversion of a json file for the whole dataset to single image json annotations file. I wrote a brief code that work pretty well for conversion, just need to copy the annotation path from the json file for the whole dataset and set up the destination folder and let it run. Here is the github repository

https://github.com/Bombardelli/Convert-Roboflow-annotation-to-Labelme/blob/main/Convert-Roboflow-annotation-to-Labelme

cj-mills · 2024-03-21T16:55:31Z

@amrirasyidi
You are correct that it is alright to use Image.open(filepath) instead of Image.open(filepath).convert('RGB') when you know all the images are RGB.

Regarding your second question, your understanding is also correct. If you have an image with three annotated objects, the "shapes" section in the corresponding JSON file will store the polygon information for all three objects.

Example:

"shapes": [
        {
            "label": "student_id",
            "line_color": null,
            "fill_color": null,
            "points": [...
            ],
            "shape_type": "polygon",
            "flags": {}
        },
        {
            "label": "student_id",
            "line_color": null,
            "fill_color": null,
            "points": [...
            ],
            "shape_type": "polygon",
            "flags": {}
        },
        {
            "label": "student_id",
            "line_color": null,
            "fill_color": null,
            "points": [...
            ],
            "shape_type": "polygon",
            "flags": {}
        }
    ],

You can see a direct comparison between a LabelMe JSON segmentation file and the resulting pandas DataFrame and annotated image in the tutorial linked below:

Working with LabelMe Segmentation Annotations in Torchvision

cj-mills · 2024-03-21T16:56:12Z

@Bombardelli Thanks for sharing!

Bombardelli · 2024-03-21T17:16:46Z

Do you have a tutorial on how to implement inference with a webcam with the model from this tutorial in a similar way that YOLO does?

I tried several ways but it's simply not working:

import cv2
import torch
from torchvision.models.detection import maskrcnn_resnet50_fpn_v2
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from torchvision.transforms import functional as F

def get_model(num_classes):
    # Load a pre-trained Mask R-CNN model
    model = maskrcnn_resnet50_fpn_v2(weights='DEFAULT')
    
    # Get the number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # Replace the pre-trained head with a new one (adjust number of classes)
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    
    # Do the same for the mask predictor if your task involves instance segmentation
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256  # Typically the size of the hidden layer used in Mask R-CNN
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
    
    return model

# Example: Adjust for your specific number of classes (e.g., 16 classes + background)
num_classes = 17  # Including the background class
model = get_model(num_classes)

# Load the model state dictionary
model.load_state_dict(torch.load('path_to_saved_model_state.pth'))

model.eval()  # Set the model to inference mode

# Webcam feed
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Convert to tensor
    image = F.to_tensor(frame).unsqueeze(0)
    
    with torch.no_grad():
        predictions = model(image)
        
    # Post-process predictions and visualize results
    # This is left as an exercise depending on how you want to display the results.
    # For simplicity, we're just displaying the original webcam feed here.
    cv2.imshow('Webcam Live Inference', frame)
    
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

cj-mills · 2024-03-21T17:42:41Z

@Bombardelli,
I don't have a tutorial for that specifically, but you can probably make what you need using the following tutorials as a reference:

Exporting Mask R-CNN Models from PyTorch to ONNX: So you do not need to deal with PyTorch after training the model.
Real-Time Object Tracking with YOLOX and ByteTrack: The linked section covers how to perform inference using frames from an OpenCV VideoCapture object.

You should be able to swap the inference steps from the Mask R-CNN ONNX export tutorial into the while loop from the object tracking tutorial.

That said, I'm not sure the inference speed for the Mask R-CNN model will be fast enough for real-time inference from a webcam without further optimization or a sufficiently powerful GPU.

waqarorakzai · 2024-03-29T10:36:02Z

I'm getting this error:
Can't pickle <function at 0x00000257C8F2CCA0>: attribute lookup on main failed

Cell In[35], line 24, in run_epoch(model, dataloader, optimizer, lr_scheduler, device, scaler, epoch_id, is_training)
21 progress_bar = tqdm(total=len(dataloader), desc="Train" if is_training else "Eval") # Initialize a progress bar
23 # Loop over the data
---> 24 for batch_id, (inputs, targets) in enumerate(dataloader):
25 # Move inputs and targets to the specified device
26 inputs = torch.stack(inputs).to(device)

cj-mills · 2024-03-30T20:51:46Z

Hi @waqarorakzai,

Are you running the tutorial code on Windows? If so, download the Windows notebook and the associated utility file using the following links:

Python multiprocessing works differently in Windows versus Linux, so the code requires a few tweaks.

AliAdibArnab9 · 2024-03-30T22:06:03Z

Hi @christian. thanks again for the tutorial. Is there any way to calculate AP, accuracy and confusion matrix from the code? because for calculating that the traditional libraries are more using on cocostyle format .json

amrirasyidi · 2024-04-23T11:42:00Z

Do you have a specific reason on choosing PIL over cv2?

cj-mills · 2024-04-25T18:57:46Z

Hi @amrirasyidi,

PIL just ended up being my default for projects. Torchvision also has convenience functions for converting between PIL Images and PyTorch tensors.

cj-mills · 2024-04-25T18:58:17Z

Hi @AliAdibArnab9,

Sorry, I missed your question. You could probably use the same approach from this official PyTorch tutorial for calculating average precision:

TorchVision Object Detection Finetuning Tutorial

It's not super helpful for the toy dataset used in my tutorial, but here is a quick example of the training notebook using the same evaluation code:

notebooks/pytorch-mask-r-cnn-training-with-coco-style-evaluation.ipynb

MontassarTn · 2024-05-05T09:58:15Z

Training Mask R-CNN on custom data, but the training doesn't stop and produces no output or errors
Here's a brief overview of my process:

1.I generated a dataset using PyTorch by applying the SAM mask from bounding boxes to my images.
2.After creating the dataset, I split it into training and testing sets.
3.I loaded both sets using torch.utils.data.DataLoader.
4.I'm using a pre-trained model with 11 classes.

this is the output of my dataset

Any help or insights would be greatly appreciated.

cj-mills · 2024-05-09T21:25:30Z

Hi @MontassarTn,
Are you using this tutorial's training code? It appears from the screenshot you might be using something else. If so, I can't provide much insight without seeing your code.

EnesAgirman · 2024-05-14T10:02:19Z

Loss is NaN

I am running this code on my personal computer with windows. I didn't change anything in the code. I get the following error at the end of the first epoch of training:

Loss is NaN or infinite at epoch 0, batch 0. Stopping training.

I checked the loss and the loss is NaN. The code until the training works well and gives the correct outputs. Is there anyone who knows how to fix this?

MontassarTn · 2024-05-14T17:07:43Z

Hi @MontassarTn, Are you using this tutorial's training code? It appears from the screenshot you might be using something else. If so, I can't provide much insight without seeing your code.

@cj-mills no I didn't, could I send to you my code?

cj-mills · 2024-05-17T16:34:09Z

Hi @EnesAgirman,

Are you using the Windows version of the training notebook with its associated utility file?

I set up a fresh conda environment (with CUDA 11.8) using the steps in the tutorial this morning and verified the Windows notebook successfully finished training.

cj-mills · 2024-05-17T16:54:06Z

@MontassarTn To be honest, I have very little spare time at the moment and would likely not even have a chance to go through it in the near term.

Also, these comment sections are for questions related to their associated tutorials, and I do not want to set a precedent of expanding that scope too much. It would simply be infeasible for me to address such a range of requests.

If you want to try using your dataset with this training code, I have tutorials on working with segmentation annotations in a few different formats in PyTorch.

Torchvision Annotation Tutorials

averagemol · 2024-05-28T10:44:41Z

Hey cj-mills, thanks a lot for your tutorial, it really helped me. I even trained the model on my own custom dataset and now it's working. I've decided to work with videos now. Could you please guide me on how to write the code for videos?

cj-mills · 2024-05-30T17:06:29Z

Hi @averagemol,

If you want to iterate through the frames in a video and run the model on each frame, you can use the opencv-python package to load, iterate through, and write videos.

I cover how to do this in my object-tracking tutorial linked below:

Real-Time Object Tracking with YOLOX and ByteTrack

You should be able to adapt the code from the linked section for use with a Mask R-CNN model.

subaru3577 · 2024-06-10T20:39:50Z

Hi, thanks for the great tutorial.
Would you think this finetuning approach works for my custom dataset, which comprises map images as well? I am concerned if the weights pre-trained on COCO would not work for different domains.

cj-mills · 2024-06-14T01:43:47Z

Hi @subaru3577,
I'd say it's probably still better to start with the pretrained weights. If you want to test both options, you can exclude the weights parameter when initializing the Mask R-CNN model.

# Initialize a Mask R-CNN model without pretrained weights
model = maskrcnn_resnet50_fpn_v2()

subaru3577 · 2024-06-15T11:56:55Z

Thank you so much for your kind advice!

enamulrafti · 2024-06-23T19:47:31Z

This is definitely a great tutorial, thanking once again.
But i am facing a new problem while training it on my own system (windows 10, GPU available).

the error is:
TypeError: Compose.call() takes 2 positional arguments but 3 were given

and the code where it the error raise:
# Apply the transformations, if any
if self._transforms:
image, target = self._transforms(image, target)

here i have checked image and target by printing it:

Image: <PIL.Image.Image image mode=RGB size=2066x2036 at 0x2BED6BEAC10>
Target: {'masks': tensor([[[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]],

    [[False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     ...,
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False],
     [False, False, False,  ..., False, False, False]]]), 'boxes': BoundingBoxes([[ 511.,  238., 1413., 1834.],
           [1295.,  763., 1787., 1532.],
           [ 101.,  497.,  854., 1371.],
           [ 586., 1169.,  717., 1363.],
           [1176.,  302., 1742.,  865.]], format=BoundingBoxFormat.XYXY, canvas_size=(2036, 2066)), 'labels': tensor([ 1,  5,  8,  9, 19])}

cj-mills · 2024-06-28T01:41:25Z

Hi @enamulrafti,
I just tested the Windows version of the tutorial code in a fresh mamba environment (Windows 10 with Nvidia GPU), and it ran without issue.

Based on the error message, my best guess is you are using v1 of torchvision transforms (torchvision.transforms) rather than v2 (torchvision.transforms.v2), which is required.

That or there is just something mistyped.

posts/pytorch-train-mask-rcnn-tutorial/ #44

posts/pytorch-train-mask-rcnn-tutorial/ #44

Comments

utterances-bot commented Dec 7, 2023

Christian Mills - Training Mask R-CNN Models with PyTorch

nattafahhm commented Dec 7, 2023

cj-mills commented Dec 8, 2023

fah-iiv commented Dec 8, 2023

tuan-nmt commented Dec 12, 2023

cj-mills commented Dec 12, 2023

cj-mills commented Dec 12, 2023

jetsonwork commented Dec 16, 2023

cj-mills commented Dec 20, 2023

jetsonwork commented Dec 20, 2023

cj-mills commented Dec 20, 2023

troymyname commented Feb 1, 2024

enamulrafti commented Feb 1, 2024

cj-mills commented Feb 1, 2024

fah-iiv commented Feb 8, 2024

cj-mills commented Feb 8, 2024

joekeo commented Feb 12, 2024

joekeo commented Feb 12, 2024

cj-mills commented Feb 12, 2024 • edited Loading

joekeo commented Feb 13, 2024

cj-mills commented Feb 13, 2024

nattafahhm commented Feb 18, 2024

cj-mills commented Feb 22, 2024 • edited Loading

AliAdibArnab9 commented Feb 25, 2024

cj-mills commented Feb 25, 2024 • edited Loading

AliAdibArnab9 commented Feb 26, 2024

cj-mills commented Feb 26, 2024

AliAdibArnab9 commented Feb 29, 2024

cj-mills commented Feb 29, 2024

AliAdibArnab9 commented Mar 1, 2024

cj-mills commented Mar 1, 2024

amrirasyidi commented Mar 16, 2024

cj-mills commented Mar 17, 2024

amrirasyidi commented Mar 18, 2024

Bombardelli commented Mar 18, 2024

cj-mills commented Mar 21, 2024

cj-mills commented Mar 21, 2024

Bombardelli commented Mar 21, 2024 • edited by cj-mills Loading

cj-mills commented Mar 21, 2024 • edited Loading

waqarorakzai commented Mar 29, 2024

cj-mills commented Mar 30, 2024

AliAdibArnab9 commented Mar 30, 2024

amrirasyidi commented Apr 23, 2024

cj-mills commented Apr 25, 2024

cj-mills commented Apr 25, 2024

MontassarTn commented May 5, 2024 • edited Loading

cj-mills commented May 9, 2024

EnesAgirman commented May 14, 2024

Loss is NaN

MontassarTn commented May 14, 2024

cj-mills commented May 17, 2024

cj-mills commented May 17, 2024

averagemol commented May 28, 2024

cj-mills commented May 30, 2024

subaru3577 commented Jun 10, 2024

cj-mills commented Jun 14, 2024

subaru3577 commented Jun 15, 2024

enamulrafti commented Jun 23, 2024

cj-mills commented Jun 28, 2024 • edited Loading

cj-mills commented Feb 12, 2024 •

edited

Loading

cj-mills commented Feb 22, 2024 •

edited

Loading

cj-mills commented Feb 25, 2024 •

edited

Loading

Bombardelli commented Mar 21, 2024 •

edited by cj-mills

Loading

cj-mills commented Mar 21, 2024 •

edited

Loading

MontassarTn commented May 5, 2024 •

edited

Loading

cj-mills commented Jun 28, 2024 •

edited

Loading