This repository contains the code and pre-trained models used in our DASC 2025 paper on CycleGAN-assisted domain adaptation for UAV payload detection, where a classifier trained only on synthetic data is evaluated on real UAV flight imagery after a CycleGAN-based translation step.
The project includes:
- PyTorch implementations of ResNet34 and EfficientNet-B2 classifiers.
- Pre-trained weights trained on a simulated UAV payload dataset.
- Scripts to reproduce the classification results (Table 1) and feature-space t-SNE visualization (Figure 3).
- Project overview
- How to use this repository
- Dataset details
- Dataset structure
- Dataset stats
- Models and training
- Feature-space visualization (Figure 3)
- Evaluation metrics and main results
- Citation
The goal of this work is binary classification of a UAV as either:
loaded– UAV carrying a payload.unloaded– UAV without payload.
Because collecting and annotating large real UAV datasets is difficult, we:
- Generate a synthetic dataset of UAV images in Microsoft AirSim under varied conditions (flight trajectories, lighting, backgrounds, sensor noise, camera settings, etc.).
- Train deep classifiers (ResNet34 and EfficientNet-B2) only on synthetic images.
- Translate real test images into the synthetic domain using a pre-trained CycleGAN.
- Evaluate the classifiers on:
- real images directly (no adaptation),
- CycleGAN-translated real images (with adaptation).
This repository provides the code and pre-trained weights for steps (2) and (4), plus a feature-space analysis script that compares simulated, real, and CycleGAN-translated feature distributions.
Create a Python environment with (example versions):
- Python >= 3.9
- PyTorch and torchvision (CUDA support recommended)
- NumPy
- Matplotlib
- scikit-learn
Example (conda):
conda create -n dasc2025 python=3.9
conda activate dasc2025
pip install torch torchvision matplotlib scikit-learnOur experiments use two main types of data:
- Total of 4,538 images, balanced between:
- loaded class: 2,269 images
- unloaded class: 2,269 images
- Generated in Microsoft AirSim using four different quadrotor models and varied:
- flight trajectories,
- backgrounds and terrain,
- lighting and weather conditions,
- camera settings and sensor noise.
- This dataset is used only for training the classifiers (no real data is used during training).
- Real flight experiments with a target UAV (loaded/unloaded) observed by an RGB camera mounted on another UAV.
- Real images are used only for testing:
- once directly (no adaptation),
- once after translation by CycleGAN into the synthetic style.
- Real test images are passed through a pre-trained CycleGAN generator that translates them into the synthetic domain before classification.
- Used to evaluate the effect of domain adaptation on classification performance and feature alignment.
Due to size and sharing constraints, the actual image datasets (DASC2025_datasetK) are not distributed with this repository. To reproduce our results, you should organize your data to match the expected folder structure.
Each dataset is stored in a folder named:
DASC2025_datasetK/where K is an integer (e.g., 1, 2, 3, ...). Inside each dataset folder, the expected structure is:
DASC2025_datasetK
├── Training
│ ├── loaded
│ └── unloaded
└── Testing
├── loaded
└── unloaded- Training/loaded and Training/unloaded contain the synthetic (or synthetic-style) training images for each class.
- Testing/loaded and Testing/unloaded contain the test images for each class, which may be:
- real images (direct evaluation),
- CycleGAN-translated real images,
- or other variants depending on the experiment.
For the experiments reported in the DASC 2025 paper:
DASC2025_dataset1/
Contains the synthetic training data and the test images for direct real-data evaluation for both classifiers.
DASC2025_dataset2/
Contains the CycleGAN-translated real test images used to evaluate the adaptation pipeline.
DASC2025_dataset3/
Used together with DASC2025_dataset1/ by feature_compare_v02.py to build the three domains:
- Simulated,
- Real,
- CycleGAN-adapted real
for the t-SNE visualization and domain alignment metrics.
You are free to define additional DASC2025_datasetK folders, as long as they follow the same Training/Testing/loaded/unloaded structure. The classifier scripts and visualization script only require this directory layout and the correct data_dir/folders paths.
At minimum, the following statistics should hold for the synthetic training dataset:
| Split | Domain | Class | # Images |
|---|---|---|---|
| Training | Synthetic | loaded | 2,269 |
| Training | Synthetic | unloaded | 2,269 |
In our experiments, we randomly divide the Training subset into:
- Train: ~85% of images
- Validation: ~15% of images
using a fixed random seed for reproducibility.
The real and CycleGAN-translated test sets follow the same loaded/unloaded class structure in their respective Testing folders; exact counts depend on your recorded data.
We use two ImageNet-pretrained models from torchvision.models:
1- ResNet34
- Final fully-connected layer replaced by a 2-unit linear layer for binary classification.
2- EfficientNet-B2
- The classifier head is replaced with a dropout + 2-unit linear layer:
self.network.classifier = nn.Sequential(
nn.Dropout(p=0.3, inplace=True),
nn.Linear(1408, 2),
)Both networks are wrapped in a common ImageClassificationBase class with:
- cross-entropy loss,
- accuracy computation,
- convenience methods for training, validation, and logging.
For all experiments, images are resized and converted to tensors:
transforms.Resize((255, 255)),
transforms.ToTensor()applied separately to training and testing datasets.
Common settings:
- Optimizer:
torch.optim.SGD - Loss: cross-entropy
- Batch size: 16
- Device: GPU if available, else CPU
- Train/validation split: 85% / 15% of
Trainingsubset
- Epochs: 10
- Max learning rate: 0.03
- Weight decay: 1e-4
- Gradient clipping: 0.1
- Learning-rate schedule: one-cycle (
OneCycleLR).
- Epochs: 15
- Max learning rate: 0.001
- Weight decay: 1e-4
- Gradient clipping: 0.1
- Learning-rate schedule: one-cycle (
OneCycleLR).
By default, the backbone is frozen for initial training, and then unfrozen for fine-tuning. The scripts then evaluate the final model on the Testing subset and (optionally) save the trained weights.
To reproduce the t-SNE visualization and domain alignment metrics (cosine similarity & MMD) shown in Figure 3 of the paper, use:
feature_compare_v02.py
This script:
1. Loads a pre-trained ResNet34 from dasc_SimBased.pth.
2. Strips any network. prefix from the state dict and inserts the weights into a vanilla torchvision.models.resnet34.
3. Builds a feature extractor by removing the final classification layer and flattening the output.
4. Iterates over three folders corresponding to:
- Simulated data (
TrainingfromDASC2025_dataset1), - Real data (
TestingfromDASC2025_dataset1), - CycleGAN data (
TestingfromDASC2025_dataset3).
5. Extracts deep features, computes:
- cosine similarity between domain means,
- Maximum Mean Discrepancy (MMD) between domain feature distributions,
- a 2-D t-SNE embedding for all features.
6. Plots a scatter plot with different colors for each domain. Before running:
- Make sure the
foldersdictionary at the top offeature_compare_v02.pypoints to your actual dataset locations:
folders = {
"Simulated": "/path/to/DASC2025_dataset1/Training",
"Real": "/path/to/DASC2025_dataset1/Testing",
"CycleGAN": "/path/to/DASC2025_dataset3/Testing"
}Run:
python feature_compare_v02.py
The script will print cosine similarity and MMD values and display (or save, if you uncomment plt.savefig) the t-SNE figure.
We use:
-
Classification accuracy on the real test set:
- Directly (real images only),
- With CycleGAN translation (CycleGAN → Classifier).
-
Feature-space alignment metrics:
- Cosine similarity between synthetic and real feature means,
- Maximum Mean Discrepancy (MMD) between synthetic and real feature distributions.
These metrics are computed using the feature vectors extracted by ResNet34.
On real experimental images, we obtain:
| Classifier | Input Type | Accuracy on real data |
|---|---|---|
| ResNet34 | Real (Direct) | 67% |
| ResNet34 | CycleGAN → Classifier | 82% |
| EfficientNet-B2 | Real (Direct) | 62% |
| EfficientNet-B2 | CycleGAN → Classifier | 80% |
CycleGAN-based domain translation produces a substantial accuracy gain for both networks, demonstrating that aligning real images with the synthetic training domain significantly improves generalization.
Using ResNet34 features for three sets of images (Simulated, Real, CycleGAN-translated), we observed:
- Cosine similarity (Simulated ↔ Real) ≈ 0.9031
- Cosine similarity (Simulated ↔ CycleGAN-translated Real) ≈ 0.9898
- MMD (Simulated ↔ Real) ≈ 0.0060
- MMD (Simulated ↔ CycleGAN-translated Real) ≈ 0.0021
The t-SNE plot shows that CycleGAN-translated real samples cluster much closer to the simulated data than raw real images, explaining the improved classification accuracy.
If you use this code or pre-trained models in your research, please cite our DASC 2025 paper: