-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
posts/pytorch-train-mask-rcnn-tutorial/ #44
Comments
Hi thanks for the tutorial |
Hi @nattafahhm, You can create a resnet101 backbone using the from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
backbone = resnet_fpn_backbone('resnet101', pretrained=True)
model.backbone = backbone You will probably need to train for longer and at a lower learning rate. |
Hi, if i create a resnet101 backbone using the resnet_fpn_backbone, then i can use only imagenet1k, right? |
I am a new member. Thanks for the great tutorial. I want to ask what software is used to label images? Thanks |
Hi @fah-iiv, I don't believe torchvision has pretrained weights for the resnet101 model with ImageNet-21K. Are you referring to something from the timm library? |
Hi @tuan-nmt, There are free annotation tools like CVAT and automated annotation methods, as shown in the following videos:
I've received several questions about this, so I'll try to make time for a tutorial. |
Hi |
Hi @jetsonwork, The answer to your question depends on what format you use for your dataset. The toy dataset used in the tutorial follows the annotation format for the LabelMe annotation tool. The tool's GitHub repository contains example annotations for instance-segmentation with multiple classes: |
Thanks for your response. I’m preparing a dataset for a Mask R-CNN model, involving images of cats and smaller, distinct spots on these cats. While the dataset has more instances of “spots” than “cats,” the latter covers a much larger area in the images. I’m concerned this might bias the model toward the “cat” class due to its larger pixel coverage. My question is: Could this difference in area coverage introduce significant training bias towards the “cat” class? |
Hi @jetsonwork, It could potentially introduce a training bias. However, I recommend getting to a point where you can iterate and experiment before worrying too much about that. |
Thank you for the amazing tutorials Chris! |
Hi, thanks a ton for the tutorial. RuntimeError: Caught RuntimeError in DataLoader worker process 0. Training stopped at 15% of first epoch. In this case what could be the problem? what i have to change? |
Hi @enamulrafti, Would you mind providing more details from the |
Hi, thank you once again for the tutorial. I've successfully implemented the MASK-RCNN model following your guide. I have a question regarding the pretraining of MASK-RCNN: Is it possible to train the model with a certain set of classes and then fine-tune it on a different set of classes? For example, could I initially train the model on categories of cat species on "public dataset" and later fine-tune it to recognize different species on my own "dataset"? In this scenario, when I try to continue training the model with the new set of classes, I find that I cannot proceed without adjusting some of the output layers to account for the change in class types. Additionally, should I consider freezing some parameters during this process, such as setting 'param.requires_grad' to True or False? Your advice on how to approach this would be greatly appreciated." |
Hi @fah-iiv, |
I am trying to run this on my laptop, as I dont have current access to the computer i have with GOU until next week. I tried specifying the device as CPU, but I keep getting this error: The issue is happens when using enumerate(train_dataloader), I tried to modify the source file to force it to use the CPU, but then the same error happens somewhere else. I also tried using the install command for LINUX (CPU) ( im using linux) to no avail. what can I do to try to run the tutorial on my laptop? |
Found the solution: |
Hi @joekeo, Sorry about that. You are correct that you must turn off the I'll update this tutorial when I have a chance, but for now, here is the new DataLoader initialization code so you don't need to change it manually when you get your GPU: # Set the training batch size
bs = 4
# Set the number of worker processes for loading data. This should be the number of CPUs available.
num_workers = multiprocessing.cpu_count()
# Define parameters for DataLoader
data_loader_params = {
'batch_size': bs, # Batch size for data loading
'num_workers': num_workers, # Number of subprocesses to use for data loading
'persistent_workers': True, # If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the worker dataset instances alive.
'pin_memory': 'cuda' in device, # If True, the data loader will copy Tensors into CUDA pinned memory before returning them. Useful when using GPU.
'pin_memory_device': device if 'cuda' in device else '', # Specifies the device where the data should be loaded. Commonly set to use the GPU.
'collate_fn': lambda batch: tuple(zip(*batch)),
}
# Create DataLoader for training data. Data is shuffled for every epoch.
train_dataloader = DataLoader(train_dataset, **data_loader_params, shuffle=True)
# Create DataLoader for validation data. Shuffling is not necessary for validation data.
valid_dataloader = DataLoader(valid_dataset, **data_loader_params)
# Print the number of batches in the training and validation DataLoaders
print(f'Number of batches in train DataLoader: {len(train_dataloader)}')
print(f'Number of batches in validation DataLoader: {len(valid_dataloader)}') |
Thanks for the help. It was a futile attempt as after fixing it, the estimated training time in my CPU is ~100 days, so will have to run it next week in the GPU. |
@joekeo Don't forget you can run it on a GPU with the free tier of Google Colab: |
how can i handle class imbalance in mask rcnn? |
Hi @nattafahhm, I'd need more information about your current dataset, the feasibility of gathering more data samples, and your comfort level with doing outside research to modify the existing training code before giving specific recommendations. However, the most straightforward approaches would be to oversample the underrepresented classes, undersample the overrepresented ones, or add new samples. Oversampling introduces the risk of overfitting on those samples, which you can partially mitigate with data augmentation like those currently in the tutorial. A simple implementation would be duplicating the image and annotation files for the underrepresented classes. Undersampling would mean not using all available samples in your dataset, which might prevent the model from seeing some required scenarios. A simple implementation would be to remove some images and associated annotation files from the overrepresented classes. Adding more data would address the potential drawbacks of over and undersampling. However, this might be infeasible depending on the type and quantity of data required. That said, the next series of tutorials I have planned will demonstrate methods to streamline this process using automated annotation and synthetic data generation for object detection and instance segmentation tasks. You could also try combining the three methods (e.g., do a small amount of oversampling, a small amount of undersampling, and add a small amount of new data) to try and balance the drawbacks of each. Before going through the hassle of any of these methods, I'd try training the model with your existing dataset to see whether the current imbalance is a significant issue. |
Thanks CJ-Mills for this tutorial. I am looking for Google Collab Tutorial for Mask R CNN (Object Detection) but couldn't find one. Can you please do one? |
Hi @AliAdibArnab9, It's in the Tutorial Section: https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/#getting-started-with-the-code If you were looking for the tutorial for getting started with Colab, here is the link for that: |
Thank you so much @cj-mills. It's a very details tutorial. I am just bit confused how I can implement my dataset with this code. I have a dataset and annotation (.json file). Should I change the whole code or is there any way I can modify this code and get result? |
@AliAdibArnab9 Do you happen to know what annotation format your dataset uses? If it's a single JSON file for the whole dataset, my first guess would be it's in COCO format. If so, I have a tutorial showing how to work with COCO segmentation annotations (the type you would use with Mask R-CNN) in PyTorch: You could use that tutorial as a guide for how to modify the code in the Mask R-CNN tutorial, which uses the LabelMe annotation format. |
Hi Christian, |
@AliAdibArnab9 Both VGG and COCO tend to use a single JSON file. Check out the examples at the links below to see which format you have: |
Sorry again Christian. Seems like your annotation and my annotation gives me different result. I used makesense ai and downloaded single .json VGG file after annotating. When I open my annotation in a notebook I can see it is different from yours. Can you please tell me by which tool you did the annotation and got the .json file for each images? (For example in your annotation file you have image_path, I dont have any such thing and thats why its showing me error) |
@AliAdibArnab9 I don't have a tutorial for working with VGG annotations, so that would explain the difference in results. This Mask R-CNN tutorial uses the LabelMe annotation format. I currently also have tutorials covering how to work with segmentation annotations in COCO and CVAT format, but not VGG. Maksense.ai lets you export polygon annotations in COCO format in addition to VGG, so you can simply select the |
Hi CJ, On your I'm working on satellite imagery, some of them are not necessarily RGB. |
Hi @amrirasyidi, |
I see In case the image is already in RGB, it should be okay to just use Anyway, I have another question. |
Hi Christian, thank you for your tutorial! I have seen questions here regarding the conversion of a json file for the whole dataset to single image json annotations file. I wrote a brief code that work pretty well for conversion, just need to copy the annotation path from the json file for the whole dataset and set up the destination folder and let it run. Here is the github repository |
@amrirasyidi Regarding your second question, your understanding is also correct. If you have an image with three annotated objects, the "shapes" section in the corresponding JSON file will store the polygon information for all three objects. Example: "shapes": [
{
"label": "student_id",
"line_color": null,
"fill_color": null,
"points": [...
],
"shape_type": "polygon",
"flags": {}
},
{
"label": "student_id",
"line_color": null,
"fill_color": null,
"points": [...
],
"shape_type": "polygon",
"flags": {}
},
{
"label": "student_id",
"line_color": null,
"fill_color": null,
"points": [...
],
"shape_type": "polygon",
"flags": {}
}
], You can see a direct comparison between a LabelMe JSON segmentation file and the resulting pandas DataFrame and annotated image in the tutorial linked below: |
@Bombardelli Thanks for sharing! |
Do you have a tutorial on how to implement inference with a webcam with the model from this tutorial in a similar way that YOLO does? I tried several ways but it's simply not working: import cv2
import torch
from torchvision.models.detection import maskrcnn_resnet50_fpn_v2
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
from torchvision.transforms import functional as F
def get_model(num_classes):
# Load a pre-trained Mask R-CNN model
model = maskrcnn_resnet50_fpn_v2(weights='DEFAULT')
# Get the number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# Replace the pre-trained head with a new one (adjust number of classes)
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# Do the same for the mask predictor if your task involves instance segmentation
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256 # Typically the size of the hidden layer used in Mask R-CNN
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, hidden_layer, num_classes)
return model
# Example: Adjust for your specific number of classes (e.g., 16 classes + background)
num_classes = 17 # Including the background class
model = get_model(num_classes)
# Load the model state dictionary
model.load_state_dict(torch.load('path_to_saved_model_state.pth'))
model.eval() # Set the model to inference mode
# Webcam feed
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
# Convert to tensor
image = F.to_tensor(frame).unsqueeze(0)
with torch.no_grad():
predictions = model(image)
# Post-process predictions and visualize results
# This is left as an exercise depending on how you want to display the results.
# For simplicity, we're just displaying the original webcam feed here.
cv2.imshow('Webcam Live Inference', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows() |
@Bombardelli,
You should be able to swap the inference steps from the Mask R-CNN ONNX export tutorial into the That said, I'm not sure the inference speed for the Mask R-CNN model will be fast enough for real-time inference from a webcam without further optimization or a sufficiently powerful GPU. |
I'm getting this error: Cell In[35], line 24, in run_epoch(model, dataloader, optimizer, lr_scheduler, device, scaler, epoch_id, is_training) |
Hi @waqarorakzai, Are you running the tutorial code on Windows? If so, download the Windows notebook and the associated utility file using the following links: Python multiprocessing works differently in Windows versus Linux, so the code requires a few tweaks. |
Hi @christian. thanks again for the tutorial. Is there any way to calculate AP, accuracy and confusion matrix from the code? because for calculating that the traditional libraries are more using on cocostyle format .json |
Do you have a specific reason on choosing |
Hi @amrirasyidi, PIL just ended up being my default for projects. Torchvision also has convenience functions for converting between PIL Images and PyTorch tensors. |
Hi @AliAdibArnab9, Sorry, I missed your question. You could probably use the same approach from this official PyTorch tutorial for calculating average precision: It's not super helpful for the toy dataset used in my tutorial, but here is a quick example of the training notebook using the same evaluation code: |
Hi @MontassarTn, |
Loss is NaNI am running this code on my personal computer with windows. I didn't change anything in the code. I get the following error at the end of the first epoch of training:
I checked the loss and the loss is NaN. The code until the training works well and gives the correct outputs. Is there anyone who knows how to fix this? |
@cj-mills no I didn't, could I send to you my code? |
Hi @EnesAgirman, Are you using the Windows version of the training notebook with its associated utility file? I set up a fresh conda environment (with CUDA 11.8) using the steps in the tutorial this morning and verified the Windows notebook successfully finished training. |
@MontassarTn To be honest, I have very little spare time at the moment and would likely not even have a chance to go through it in the near term. Also, these comment sections are for questions related to their associated tutorials, and I do not want to set a precedent of expanding that scope too much. It would simply be infeasible for me to address such a range of requests. If you want to try using your dataset with this training code, I have tutorials on working with segmentation annotations in a few different formats in PyTorch. |
Hey cj-mills, thanks a lot for your tutorial, it really helped me. I even trained the model on my own custom dataset and now it's working. I've decided to work with videos now. Could you please guide me on how to write the code for videos? |
Hi @averagemol, If you want to iterate through the frames in a video and run the model on each frame, you can use the I cover how to do this in my object-tracking tutorial linked below: You should be able to adapt the code from the linked section for use with a Mask R-CNN model. |
Hi, thanks for the great tutorial. |
Hi @subaru3577, # Initialize a Mask R-CNN model without pretrained weights
model = maskrcnn_resnet50_fpn_v2() |
Thank you so much for your kind advice! |
This is definitely a great tutorial, thanking once again. the error is: and the code where it the error raise: here i have checked image and target by printing it: Image: <PIL.Image.Image image mode=RGB size=2066x2036 at 0x2BED6BEAC10>
|
Hi @enamulrafti, Based on the error message, my best guess is you are using v1 of torchvision transforms ( That or there is just something mistyped. |
Christian Mills - Training Mask R-CNN Models with PyTorch
Learn how to train Mask R-CNN models on custom datasets with PyTorch.
https://christianjmills.com/posts/pytorch-train-mask-rcnn-tutorial/
The text was updated successfully, but these errors were encountered: