This is still a work in progress project. I plan to create a blank template for use for your own project
This project demonstrates how to build, train, and evaluate an object detection model using PyTorch's fasterrcnn_resnet50_fpn. The model is fine-tuned on a COCO-style custom dataset to detect:
- Empty Trailers
- Material
- Not Empty Trailers
The trained model can be used for inference and visualization of bounding box predictions.
project_root/
├── dataset/
│ ├── train/
│ │ ├── image1.jpg
│ │ └── _annotations.coco.json
│ ├── valid/
│ │ ├── image2.jpg
│ │ └── _annotations.coco.json
│ └── test/
│ ├── image3.jpg
│ └── _annotations.coco.json
├── visualized_results/
├── faster_rcnn_model.pth
├── main.py
└── README.md
- Python 3.9
- CUDA 11.6
- PyTorch 1.12.1 with CUDA support
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 \
--extra-index-url https://download.pytorch.org/whl/cu116
pip install matplotlib tqdmpython main.pyThe script trains the Faster R-CNN model and saves the weights as faster_rcnn_model.pth.
After training, the model performs inference on test images. Predicted bounding boxes, class labels, and scores will be printed.
Detections are saved as image files in the visualized_results/ folder.
This project expects COCO-style annotations:
{
"images": [
{"id": 1, "file_name": "image1.jpg", "width": 640, "height": 480}
],
"annotations": [
{"id": 1, "image_id": 1, "bbox": [x, y, width, height], "category_id": 1}
],
"categories": [
{"id": 1, "name": "Empty Trailer"},
{"id": 2, "name": "Material"},
{"id": 3, "name": "Not Empty Trailer"}
]
}- Base Model:
fasterrcnn_resnet50_fpn - Number of Classes: 4 (includes background)
- Optimizer: SGD (lr=0.005, momentum=0.9, weight_decay=0.0005)
- Epochs: 10
train_one_epoch: Handles training loop and backpropagationvalidate: Validates the model after trainingprocess_detections: Filters and displays predictionsvisualize_detections: Saves annotated images
- PyTorch
- torchvision
- torchaudio
- matplotlib
- tqdm
- PIL (from Pillow)
MIT License
- PyTorch Detection Models
- COCO Dataset Format
- torchvision's
fasterrcnn_resnet50_fpn
Ryan — Software Engineering Student at WGU
For any questions or feedback, please open an issue or contact the project maintainer.