- Overview
- Features
- Project Structure
- Requirements
- Installation
- Quick Start
- Usage
- Models
- Performance & Results
- Dataset
- Troubleshooting
- Contributing
- License
- Citation
- References
- Support & Contact
ObjectDetect is a comprehensive implementation of two state-of-the-art object detection architectures:
- Faster R-CNN (TensorFlow 2.x) - Region-based CNN detector optimized for accuracy
- YOLOv5 (PyTorch) - Single-stage detector optimized for real-time performance
This project provides complete implementations including data loading, model training, evaluation, and inference pipelines for both architectures. It's designed for research and production use cases in computer vision and object detection tasks.
π Published Research: This framework is based on and extends the work published in:
Murthy, J. S., et al. (2022). "ObjectDetect: A Real-Time Object Detection Framework for Advanced Driver Assistant Systems Using YOLOv5." Wireless Communications and Mobile Computing, 2022(1), 9444360. Wiley Online Library.
Figure 1: Proposed ObjectDetect framework showing the dual-path detection pipeline combining real-time YOLO detection with accurate Faster R-CNN inference, integrated object tracking, and visualization.
- Multi-model support: Train and evaluate multiple detection architectures
- Flexible data handling: Support for custom datasets and formats
- Real-time inference: Optimized inference pipelines for video and image inputs
- Production-ready code: Comprehensive error handling, logging, and configuration management
- Comparative analysis: Direct comparison between Faster R-CNN and YOLO architectures
- β Pre-trained model support (COCO, Custom models)
- β Video and image inference
- β Batch processing capabilities
- β Configurable confidence thresholds
- β Class-specific detection filtering
- β Real-time FPS display
- β Visualization with bounding boxes and labels
- β Custom model training from scratch
- β Support for BDD100K and custom datasets
- β Image and video inference pipelines
- β Batch processing with data augmentation
- β Training checkpoints and model resumption
- β Validation during training
- β Configurable detection settings and anchor parameters
ObjectDetect/
βββ README.md # This file
βββ LICENSE # MIT License
βββ Paper0.pdf # Research paper and methodology
β
βββ Faster R-CNN/ # TensorFlow 2 Implementation
β βββ detector.py # Main detector class (DetectorTF2)
β βββ detect_objects.py # Inference script for images/videos
β βββ Faster_RCNN_Final.ipynb # Training notebook (Jupyter)
β βββ models/ # Pre-trained models and configs
β β βββ label_map.pbtxt # Class label definitions
β β βββ inference_graph/ # Frozen graph for inference
β β βββ saved_model/ # TensorFlow SavedModel format
β βββ train_tf2/ # Training utilities
β βββ model_main_tf2.py # Training entry point
β βββ exporter_main_v2.py # Model export script
β βββ start_train.sh # Training startup script
β βββ start_eval.sh # Evaluation startup script
β
βββ YOLO/ # PyTorch YOLO Implementation
β βββ Training YOLO/ # Training components
β β βββ train.py # Training entry point
β β βββ model.py # YOLO model architecture
β β βββ loss.py # Custom YOLO loss function
β β βββ dataset.py # Data loading and preprocessing
β β βββ utils.py # Utility functions (IoU, coordinate transforms)
β β βββ validation.py # Validation utilities
β β
β βββ Inference YOLO/ # Inference only (lightweight)
β βββ model.py # YOLO model architecture
β βββ YOLO_to_image.py # Single image inference
β βββ YOLO_to_video.py # Video stream inference
β
βββ Result images and videos/ # Pre-generated detection results
βββ Faster R-CNN/ # Results from Faster R-CNN
βββ YOLO/ # Results from YOLO
βββ Video Thumbnails/ # Thumbnails of video results
- Python: 3.8 - 3.11
- CUDA: 11.x or 12.x (for GPU acceleration, optional)
- cuDNN: 8.x (if using CUDA)
- RAM: Minimum 8GB (16GB recommended)
- GPU: NVIDIA GPU with compute capability 3.5+ (optional but recommended)
tensorflow>=2.10.0
tensorflow-object-detection-api>=2.10.0
opencv-python>=4.6.0
numpy>=1.21.0
torch>=1.12.0
torchvision>=0.13.0
opencv-python>=4.6.0
Pillow>=9.0.0
jupyter>=1.0.0
ipython>=8.0.0
matplotlib>=3.5.0
git clone https://github.com/yourusername/ObjectDetect.git
cd ObjectDetect# Create environment with GPU support (CUDA 11.8)
conda create -n objectdetect python=3.10 cuda-toolkit::cuda-toolkit=11.8 -y
conda activate objectdetect
# For CPU-only (skip GPU packages)
conda create -n objectdetect python=3.10 -y
conda activate objectdetectpython3 -m venv objectdetect_env
source objectdetect_env/bin/activate # On Windows: objectdetect_env\Scripts\activate# Install TensorFlow and Faster R-CNN dependencies
pip install tensorflow==2.13.0 tensorflow-object-detection-api opencv-python numpy
# Install PyTorch for YOLO (CPU version shown; for GPU see PyTorch.org)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# For development/Jupyter notebooks
pip install jupyter ipython matplotlibOR use the requirements file (when created):
pip install -r requirements.txt# Test TensorFlow
python -c "import tensorflow as tf; print(f'TensorFlow version: {tf.__version__}')"
# Test PyTorch
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
# Test OpenCV
python -c "import cv2; print(f'OpenCV version: {cv2.__version__}')"cd "Faster R-CNN"
python detect_objects.py \
--model_path models/efficientdet_d0_coco17_tpu-32/saved_model \
--path_to_labelmap models/mscoco_label_map.pbtxt \
--images_dir data/samples/images/ \
--threshold 0.4 \
--save_outputcd "Faster R-CNN"
python detect_objects.py \
--model_path models/efficientdet_d0_coco17_tpu-32/saved_model \
--path_to_labelmap models/mscoco_label_map.pbtxt \
--video_path data/samples/pedestrian_test.mp4 \
--video_input \
--threshold 0.4 \
--save_output \
--output_directory data/samples/outputcd "YOLO/Inference YOLO"
python YOLO_to_image.py \
--input path/to/image.jpg \
--weights_path path/to/model_weights.pt \
--threshold 0.45cd "YOLO/Inference YOLO"
python YOLO_to_video.py \
--input path/to/video.mp4 \
--weights_path path/to/model_weights.pt \
--output output_video.mp4 \
--threshold 0.45from detector import DetectorTF2
import cv2
# Initialize detector
detector = DetectorTF2(
path_to_checkpoint='models/efficientdet_d0_coco17_tpu-32/saved_model',
path_to_labelmap='models/mscoco_label_map.pbtxt',
class_id=None, # None = all classes, or provide list like [1, 2] for specific classes
threshold=0.4
)
# Detect from image
image = cv2.imread('test_image.jpg')
detections = detector.DetectFromImage(image) # Returns: [[x_min, y_min, x_max, y_max, class_label, confidence], ...]
# Visualize
output_image = detector.DisplayDetections(image, detections, det_time=50)
cv2.imwrite('output.jpg', output_image)python detect_objects.py \
--model_path <path_to_model> \
--path_to_labelmap <path_to_labels> \
--images_dir <directory_with_images> \
--threshold 0.4 \
--save_outputArguments:
--model_path: Path to TensorFlow SavedModel directory--path_to_labelmap: Path to labelmap (.pbtxt) file--class_ids: Comma-separated class IDs to detect (e.g., "1,3" for person,car)--threshold: Detection confidence threshold [0.0-1.0]--images_dir: Directory containing input images--video_path: Path to input video file--output_directory: Directory for detection results--video_input: Flag to enable video input mode--save_output: Flag to save results
from model import YOLOv1
from PIL import Image
import torch
from torchvision import transforms
# Legacy class name retained in the repository; documented here as the YOLOv5 path.
model = YOLOv1(split_size=14, num_boxes=2, num_classes=13)
model.load_state_dict(torch.load('weights.pt'))
model.eval()
# Preprocess image
image = Image.open('test_image.jpg')
transform = transforms.Compose([
transforms.Resize((448, 448)),
transforms.ToTensor()
])
image_tensor = transform(image).unsqueeze(0)
# Perform detection
with torch.no_grad():
predictions = model(image_tensor)# Single image
cd YOLO/Inference\ YOLO/
python YOLO_to_image.py \
--input image.jpg \
--weights_path model_weights.pt \
--threshold 0.45 \
--output output.jpg
# Video
python YOLO_to_video.py \
--input video.mp4 \
--weights_path model_weights.pt \
--output output.mp4 \
--threshold 0.45cd YOLO/Training\ YOLO/
python train.py \
--train_img_files_path bdd100k/images/100k/train/ \
--train_target_files_path bdd100k_labels_release/bdd100k/labels/det_v2_train_release.json \
--learning_rate 1e-5 \
--batch_size 10 \
--number_epochs 100 \
--number_boxes 2 \
--lambda_coord 5 \
--lambda_noobj 0.5 \
--load_model 0Training Arguments:
--train_img_files_path: Path to training images--train_target_files_path: Path to JSON labels (BDD100K format)--learning_rate: Learning rate for optimizer--batch_size: Mini-batch size--number_epochs: Number of training epochs--load_model: Load previous checkpoint (1=yes, 0=no)--load_model_file: Checkpoint filename to load
cd "Faster R-CNN/train_tf2"
bash start_train.shEdit start_train.sh to configure:
- Output directory
- Model config path
- Training data paths
Pre-trained Models Available:
- EfficientDet-D0 (COCO trained) - Fast, lightweight
- EfficientDet-D1 through D7 - Increasing accuracy/speed tradeoff
- SSD MobileNet v2 (COCO trained) - Mobile-friendly
Model Format: TensorFlow SavedModel
Input: RGB images (variable size, resized to model input)
Output:
- Detection boxes: [batch_size, max_detections, 4] (normalized coordinates)
- Detection classes: [batch_size, max_detections]
- Detection scores: [batch_size, max_detections]
Architecture:
- Input: configurable image size for real-time detection workloads
- Output: multi-scale detection predictions
- Anchor-based object localization
- 13 object classes (BDD100K)
Model Format: PyTorch .pt files
Output: Detection predictions with:
- Bounding boxes
- Object confidence scores
- Per-class probabilities
All benchmarks were conducted on NVIDIA V100 SXM2 32GB GPU with batch size=1 for fair FPS comparison.
| Architecture | mAP (%) | FPS | Model Size | Inference Time |
|---|---|---|---|---|
| YOLOv5 | 18.6 | 212.4 | ~250 MB | 4.7 ms |
| Faster R-CNN | 41.8 | 17.1 | ~350 MB | 58.5 ms |
Key Observations:
- YOLOv5: Prioritizes speed (212.4 FPS) - ideal for real-time applications with moderate accuracy requirements
- Faster R-CNN: Prioritizes accuracy (41.8% mAP) - recommended for safety-critical autonomous driving systems
- Speed-Accuracy Tradeoff: YOLO is ~12.4x faster; Faster R-CNN is 2.25x more accurate
- COCO Dataset: Included in pre-trained models
- Custom Datasets: Support for Pascal VOC and COCO formats
- Label Format:
.pbtxtlabel maps
- BDD100K: Primary training dataset (requires download)
- Custom Datasets: JSON-based annotation format
- Label Format: JSON with
[x, y, w, h, class_id]in normalized coordinates
# Download from: https://bdd-data.berkeley.edu/
# After download, structure as:
bdd100k/
βββ images/
β βββ 100k/
β βββ train/
β βββ val/
β βββ test/
βββ labels/
βββ det_v2_*_release.json{
"name": "image_name.jpg",
"width": 1920,
"height": 1080,
"labels": [
{
"category": "car",
"box2d": {
"x1": 100,
"y1": 200,
"x2": 300,
"y2": 400
}
}
]
}Problem: CUDA not found when importing TensorFlow
# Solution: Install CPU-only version or check NVIDIA drivers
nvidia-smi # Verify GPU is recognized
pip install tensorflow-cpu # CPU versionProblem: ImportError for object_detection
# Solution: Install TensorFlow Object Detection API
pip install tf-models-officialProblem: memory.out_of_memory error during inference
# Solution: Reduce batch size or image resolution
python detect_objects.py --threshold 0.5 # Lower threshold reduces processingProblem: Video codec not found
# Solution: Install ffmpeg and opencv-python
brew install ffmpeg # macOS
# Linux: sudo apt-get install ffmpeg python3-opencvProblem: Model weights not found
# Solution: Check paths and ensure model is downloaded
find . -name "*.pt" -o -name "saved_model.pb"Problem: Slow inference speed
- Check GPU utilization:
nvidia-smi - Use lighter model variant
- Batch process when possible
- Reduce input image resolution
Problem: Out of memory errors
- Reduce batch size
- Use gradient checkpointing (training)
- Process images in streams for video
export CUDA_VISIBLE_DEVICES="0" # GPU device ID
export TF_CPP_MIN_LOG_LEVEL="2" # Reduce TensorFlow logging
export PYTHONUNBUFFERED="1" # Real-time outputEdit respective config files:
- Faster R-CNN:
models/pipeline.config - YOLO: See
train.pyarguments
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit changes with clear messages
- Push to branch and create a Pull Request
- Follow PEP 8 style guidelines
- Add docstrings to new functions/classes
- Include unit tests for new functionality
This project is licensed under the MIT License - see LICENSE file for details.
Copyright: Β© 2026 Jamuna S Murthy
If this project or the published paper is useful for your research, please cite:
@article{murthy2022objectdetect,
title={ObjectDetect: A Real-Time Object Detection Framework for Advanced Driver Assistant Systems Using YOLOv5},
author={Murthy, Jamuna S and Siddesh, GM and Lai, Wen-Cheng and Parameshachari, BD and Patil, Sujata N and Hemalatha, KL},
journal={Wireless Communications and Mobile Computing},
volume={2022},
number={1},
pages={9444360},
year={2022},
publisher={Wiley Online Library},
doi={10.1155/2022/9444360},
url={https://onlinelibrary.wiley.com/doi/10.1155/2022/9444360}
}Murthy, J. S., Siddesh, G. M., Lai, W.-C., Parameshachari, B. D., Patil, S. N., & Hemalatha, K. L. (2022). ObjectDetect: A real-time object detection framework for advanced driver assistant systems using YOLOv5. Wireless Communications and Mobile Computing, 2022(1), 9444360. https://doi.org/10.1155/2022/9444360
-
Faster R-CNN: Ren et al. (2015) - "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks"
-
YOLOv5: Ultralytics YOLOv5 repository and documentation
-
ObjectDetect Framework: Murthy et al. (2022) - See Citation section above
- Detectron2 - FB Research detection framework
- MMDetection - OpenMMLab detection toolbox
- Ultralytics YOLOv5 - YOLOv5 production implementation
Jamuna Srinivasa Murthy
- Email: jamunamurthy.s@gmail.com
- GitHub: github.com/jamuna-murthy
- ResearchGate: Research Profile
For issues, questions, or feature requests:
- GitHub Issues: Check existing issues first
- Documentation: Review module READMEs in each folder:
- Examples: See example scripts in each module
- Contact: Email maintainer at: jamunamurthy.s@gmail.com
Common Issues and Solutions:
| Issue | Solution |
|---|---|
| CUDA not found | Install NVIDIA drivers or use CPU version |
| Out of memory | Reduce batch size or image resolution |
| Model weights not loading | Verify path and file integrity |
| Video codec error | Install ffmpeg: brew install ffmpeg |
| Import errors | Reinstall dependencies: pip install -r requirements.txt |
For detailed troubleshooting, see Troubleshooting Section above.
When reporting issues, please include:
- Python version (
python --version) - OS and hardware (CPU/GPU model)
- Error message and full traceback
- Minimal reproducible example
- Steps to reproduce
Example Issue Template:
**Environment:**
- Python 3.10
- NVIDIA RTX 3090
- PyTorch 2.0
**Error:**
[Full error message here]
**Reproduction:**
[Steps to reproduce]
We welcome contributions! Please:
- Follow PEP 8 style guide
- Add docstrings to all functions
- Include unit tests for new features
- Update documentation
- Submit detailed pull requests
See Contributing section for full guidelines.
- License: MIT (see LICENSE file)
- Copyright: Β© 2026 Jamuna S Murthy
- Citation: See Citation section for academic use
ObjectDetect provides production-ready implementations of two complementary detection approaches:
- Faster R-CNN: Maximum accuracy (41.8% mAP) for safety-critical autonomous driving systems
- YOLOv5: Maximum speed (212.4 FPS) for real-time surveillance and embedded systems
Both implementations feature:
β
Comprehensive error handling and logging
β
Multi-device support (GPU/CPU with automatic fallback)
β
Input validation and parameter checking
β
Professional documentation and examples
β
Production-tested code patterns
Whether you're building research prototypes or deploying to production, this framework provides the tools and examples you need.
Happy detecting! π―
For the latest updates and news, star this repository on GitHub and follow for releases.












