VisionDetect is a comprehensive computer vision framework for object detection using state-of-the-art deep learning techniques. It provides a modular, extensible architecture that supports multiple backends (PyTorch and TensorFlow) and various model architectures.
- Multiple Model Architectures: Support for Faster R-CNN, with extensibility for other architectures
- Multiple Backends: Implementations in both PyTorch and TensorFlow
- Transfer Learning: Utilize pre-trained models for faster training and better performance
- Data Augmentation: Comprehensive data augmentation pipeline for improved model generalization
- Evaluation Metrics: Detailed performance metrics including mAP, precision, and recall
- Visualization Tools: Utilities for visualizing predictions and model performance
- Model Serving: REST API for serving models in production environments
- Command-Line Interface: Easy-to-use CLI for training, evaluation, and inference
- Comprehensive Documentation: Detailed documentation and examples
- Python 3.8+
- CUDA-compatible GPU (recommended for training)
# Clone the repository
git clone https://github.com/yourusername/visiondetect.git
cd visiondetect
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install the package
pip install -e .from src import VisionDetect
# Create VisionDetect instance
vd = VisionDetect()
# Train model
trainer, metrics = vd.train(
data_dir="path/to/data",
model_type="faster_rcnn",
backbone="resnet50",
num_classes=91,
epochs=50
)
# Make prediction
result = vd.predict(
image_path="path/to/image.jpg",
model_path="checkpoints/best_model.pth"
)visiondetect/
├── config/ # Configuration files
├── data/ # Data storage (gitignored)
├── docs/ # Documentation
├── notebooks/ # Jupyter notebooks for exploration and demos
├── src/ # Source code
│ ├── data/ # Data processing modules
│ ├── models/ # Model implementations
│ ├── utils/ # Utility functions
│ └── api/ # API for model serving
├── tests/ # Unit and integration tests
├── train.py # Training script
├── evaluate.py # Evaluation script
├── infer.py # Inference script
├── .gitignore # Git ignore file
├── LICENSE # License file
├── README.md # Project documentation
└── requirements.txt # Python dependencies
python train.py --data-dir data --model-type faster_rcnn --backbone resnet50 --epochs 50python evaluate.py --model-path checkpoints/best_model.pth --data-dir data --visualizepython infer.py --model-path checkpoints/best_model.pth --input path/to/image.jpgContributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- The project structure and design patterns are inspired by best practices in the deep learning community
- Pre-trained models are based on the work of various research teams
- Special thanks to the PyTorch and TensorFlow teams for their excellent frameworks