Generalized Crack Detection (GenFaD) is a comprehensive framework for automated infrastructure crack detection that combines classical computer vision techniques with deep learning approaches. This project investigates cross-domain generalization challenges and proposes hybrid solutions for robust crack detection across diverse infrastructure types.
- Dual Approach Implementation: Both traditional CV and deep learning pipelines
- Cross-Domain Evaluation: Rigorous testing across multiple datasets to assess generalization
- Hybrid Architecture: Combines SIFT-based proposal generation with ResNet classification
- Geometric Quantification: Automated crack measurement including width, length, and area
- Multi-Dataset Support: Compatible with 18+ crack detection datasets
- Production-Ready: Modular codebase with clear separation of concerns
Our research demonstrates:
- ✅ 99.73% accuracy on in-domain test data (ResNet18)
⚠️ 50% accuracy drop when models face out-of-distribution data- 🔬 Systematic cross-domain analysis revealing generalization challenges
- 🛠️ Hybrid pipeline combining strengths of classical and modern approaches
- Installation
- Quick Start
- Project Structure
- Methodology
- Datasets
- Usage
- Results
- Contributing
- Citation
- Team
- License
- Python 3.8 or higher
- CUDA-capable GPU (recommended for training)
- 8GB+ RAM
- Clone the repository
git clone https://github.com/vivekjyotibanerjee/GenFaD.git
cd GenFaD- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt# PyTorch (adjust CUDA version as needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# OpenCV for computer vision
pip install opencv-python opencv-contrib-python
# Additional ML libraries
pip install scikit-learn matplotlib pillow tqdm tensorboardDetect and measure cracks using traditional computer vision:
python finalPipeline.py --image crack.jpg --output results/For crack area measurement with physical dimensions:
python final_with_area_in_cm.py --image crack.jpg --reference-width 10.0Train ResNet model on crack detection:
python training_script.py \
--dataset-paths /path/to/CCIC /path/to/SAHighway \
--model resnet18 \
--epochs 50 \
--batch-size 32 \
--output-dir models/# Download classification datasets
python utils/download_cls_datasets.py --output data/
# Download GAPs datasets (requires authentication)
python utils/download_gaps.py --output data/gaps/GenFaD/
├── data/ # Dataset links and configurations
│ └── cls_dataset_links.csv # 18 curated dataset sources
├── utils/ # Utility functions
│ ├── download_cls_datasets.py # Dataset downloader
│ ├── download_gaps.py # GAPs dataset handler
│ └── misc_utils.py # Helper functions
├── training_script.py # Deep learning training pipeline
├── finalPipeline.py # Classical CV crack detection
├── finalPipeline_Area.py # Area-enhanced detection
├── final_with_area_in_cm.py # Physical measurement pipeline
├── requirements.txt # Python dependencies
├── setup.py # Package installation
├── crack.jpg # Example crack image
├── noCrack.png # Example non-crack image
└── README.md # This file
Our traditional CV approach uses a three-stage process:
- SIFT Feature Extraction: Detect scale-invariant keypoints
- K-Means Clustering: Cluster keypoints into N regions (default: 20)
- Proposal Generation: Extract 150×150px windows around cluster centers
- HSV Color Masking: Threshold for dark crack regions
- Canny Edge Detection: Identify crack boundaries
- Contour Extraction: Isolate crack contours
- Skeletonization: Extract crack centerline
- Distance Transform: Compute perpendicular widths
- Metric Extraction: Calculate width, length, and area
ResNet-Based Transfer Learning:
- Base architectures: ResNet18 (11M params) and ResNet50 (25M params)
- ImageNet pre-trained initialization
- Two-phase training: frozen backbone → full fine-tuning
- Binary classification: Crack vs. No-Crack
Training Strategy:
Epoch 1-5: Freeze backbone, train classifier only (lr=0.001)
Epoch 6+: Fine-tune entire network (lr=0.0001)
We employ leave-one-dataset-out evaluation:
Train on Dataset A → Test on Dataset B
Domain Gap = Accuracy(A) - Accuracy(B)
This protocol reveals true generalization capability beyond in-domain performance.
We support 18+ crack detection datasets spanning multiple infrastructure types:
| Dataset | Type | Images | Domains |
|---|---|---|---|
| CCIC | Classification | 40,000 | Concrete structures |
| South African Highway | Classification | 14,000 | Road pavements |
| GAPs v1 | Classification | 9.0M | Multi-domain |
| GAPs v2 | Classification | 4.3M | Multi-domain |
| Railway Track Fault | Detection | Various | Railway infrastructure |
| SDNET-2018 | Classification | 56,000 | Bridge, pavement, walls |
All dataset links are maintained in data/cls_dataset_links.csv. Use our download utilities:
# List available datasets
python utils/download_cls_datasets.py --list
# Download specific dataset
python utils/download_cls_datasets.py --dataset CCIC --output data/Expected directory structure:
dataset/
├── crack/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── no_crack/
├── image1.jpg
├── image2.jpg
└── ...
Basic crack detection:
import cv2
from finalPipeline import get_proposals, detect_cracks
# Load image
image = cv2.imread('crack.jpg')
# Generate proposals
centers, kp, kp_loc, labels = get_proposals(image, n_clusters=20)
# Detect cracks in proposals
results = detect_cracks(image, centers)With area measurement:
from final_with_area_in_cm import measure_crack_area
# Measure crack with known reference
area_cm2, width_avg = measure_crack_area(
image_path='crack.jpg',
reference_width_cm=10.0, # Known dimension in image
reference_pixels=500 # Corresponding pixels
)Single dataset training:
python training_script.py \
--dataset-paths /data/CCIC \
--model resnet18 \
--epochs 50 \
--batch-size 32 \
--lr 0.001 \
--warmup-epochs 5 \
--output-dir models/ccic_resnet18Multi-dataset training:
python training_script.py \
--dataset-paths /data/CCIC /data/SAHighway \
--model resnet50 \
--epochs 100 \
--batch-size 64 \
--output-dir models/multidomain_resnet50Cross-domain evaluation:
python training_script.py \
--dataset-paths /data/CCIC \
--test-dataset-path /data/SAHighway \
--load-checkpoint models/ccic_resnet18/best_model.pth \
--eval-onlyKey training parameters in training_script.py:
| Parameter | Default | Description |
|---|---|---|
--model |
resnet18 | Architecture (resnet18/resnet50) |
--epochs |
50 | Total training epochs |
--batch-size |
32 | Training batch size |
--lr |
0.001 | Initial learning rate |
--warmup-epochs |
5 | Epochs with frozen backbone |
--weight-decay |
1e-4 | L2 regularization |
--num-workers |
4 | DataLoader workers |
| Model | Dataset | Val Acc | Test Acc | F1 Score |
|---|---|---|---|---|
| ResNet18 | CCIC + SAHighway | 99.70% | 99.73% | 0.9987 |
| ResNet50 | CCIC + SAHighway | 95.80% | 97.00% | 0.9848 |
| Train Dataset | Test Dataset | Accuracy | Domain Gap |
|---|---|---|---|
| CCIC | SA Highway | 49.86% | 50.11% |
| SA Highway | CCIC | ~50% | ~50% |
Key Finding: Models achieve near-perfect accuracy on in-domain data but fail catastrophically on out-of-distribution datasets, highlighting severe overfitting to dataset-specific features.
| Dataset | Total Images | Crack Detection Rate | False Positive Rate | Overall Accuracy |
|---|---|---|---|---|
| CCIC | 8,000 | 99.33% | 7.45% | 95.94% |
| SA Highway | 1,400 | 99.14% | 82.43% | 58.36% |
Key Finding: Classical methods provide consistent performance but struggle with high-texture backgrounds, generating excessive false positives on asphalt surfaces.
Crack Measurement:
- Minimum width: 2.00 pixels
- Maximum width: 32.56 pixels
- Average width: 16.38 pixels
- Total area: 8,856 pixels
Based on our findings, we recommend a three-stage hybrid approach:
SIFT + K-Means → Region Proposals
- Fast, interpretable
- No training required
- Effective crack localization
ResNet/U-Net → Crack Segmentation
- Multi-domain trained
- Domain-adversarial losses
- Heavy augmentation
CRF + Morphology → Final Masks
- Reconnect thin segments
- Remove false positives
- Geometric validation
This hybrid approach leverages the complementary strengths of both methodologies.
- ❌ Severe domain shift: 50% accuracy drop on unseen datasets
- ❌ Dataset bias: Models overfit to surface textures and lighting
- ❌ Thin structure loss: CNN pooling disconnects fine cracks
- ❌ Limited context: Local patches lack global scene understanding
- ❌ High false positives: 82% FP rate on textured surfaces
- ❌ Manual tuning: HSV thresholds require per-dataset adjustment
- ❌ Lighting sensitivity: Performance degrades under variable illumination
- ❌ No semantic understanding: Cannot distinguish crack-like patterns
⚠️ Limited GPU resources prevented large-scale GAPs training⚠️ Domain adaptation techniques remain unexplored⚠️ Segmentation models (U-Net, DeepLab) not yet implemented
- Implement domain adaptation (DANN, ADDA)
- Large-scale training on GAPs (13M images)
- Transition to segmentation (U-Net, DeepLabv3+)
- Enhanced evaluation (confusion matrices, ROC curves)
- Implement uncertainty quantification
- Multi-modal fusion (RGB + thermal/depth)
- Temporal modeling for video-based inspection
- Active learning for efficient labeling
- Real-time edge deployment optimization
- 3D crack reconstruction from multiple views
This project was developed as part of EN.601.661 Computer Vision (Fall 2025) at Johns Hopkins University.
Team Members:
- Vivekjyoti Banerjee (vbanerj3)
- Vivek Reddy Nalla Chandrasekharreddy (vreddyn1)
- Venkata Harshavardhan Bontalakoti (vbontal1)
- D.G. Lowe, "Object recognition from local scale-invariant features," ICCV, 1999.
- Xinan Zhang et al., "Deep Learning for Crack Detection: A Review," arXiv:2508.10256, 2025.
- Zhengyun Xu et al., "Application of Deep Convolution Neural Network in Crack Identification," Applied AI, 2022.
- Drew Linsley et al., "Recurrent neural circuits for contour detection," arXiv:2010.15314, 2020.
- Hermann Tapamo et al., "CNNs for Crack Detection on Flexible Road Pavements," SoCPaR, 2023.
- Ç.F. Özgenel, "Concrete Crack Images for Classification," 2019.
- Johns Hopkins University Computer Vision course (EN.601.661.01.FA25)
- Dataset creators and maintainers
- Open-source community for tools and libraries
Last Updated: December 2025

