FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
This is the official repository for the paper: FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry.
Intermediate feature representations represent the backbone for the expressivity and adaptability of deep neural networks. However, their geometric structure remains poorly understood. In this submission, we provide indirect insights into this matter by applying a broad selection of manipulations in input space, ranging from geometric and photometric transformations to local masking and semantic manipulations using generative image editing models, and assess the feasibility of learning a mapping in the feature space, mapping from the original to the manipulated feature map. To this end, we devise different types of mappings, from linear to non-linear and local to global mappings and assess both the reconstruction quality of the mapping as well as the semantic content of the mapped representations. We demonstrate the feasibility of learning such mappings for all considered transformations. While global (transformer) models that operate on the full feature map often achieve best results, we show that the same can be achieved with a shared linear model operating on a single feature vector typically with very little degradation in reconstruction quality, even for highly non-trivial semantic manipulations. We analyze the corresponding mappings across different feature layers and characterize them according to dominance of weight vs. bias and the effective rank of the linear transformations. These results provide hints for the hypothesis that the feature space is to a first degree of approximation organized in linear structures. From a broader perspective, the study demonstrates that generative image editing models might open the door to a deeper understanding of the feature space through input manipulation.
If you find our work helpful, please cite our paper:
@misc{krey2026featmapunderstandingimagemanipulation,
title={FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry},
author={Elias B. Krey and Nils Neukirch and Nils Strodthoff},
year={2026},
eprint={2605.11203},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2605.11203},
}
All experiments were conducted on a Linux system with an NVIDIA L40 GPU. Our pipeline consists of the following main steps:
For ease of use we utilize a Python 3.10 conda environment. Install all necessary dependencies listed in featmap.yaml and activate the conda environment. All experiments were done with these exact package versions.
conda env create -f featmap.yaml
conda activate featmap
We cannot directly provide our used datasets, but here are the steps to reproduce them:
- Download the Stanford Cars dataset (16,185 images, 196 different classes), maintain the original 8,144 training images and 8,041 testing images split
@inproceedings{KrauseStarkDengFei-Fei-3D2013,
author = {Jonathan Krause and Michael Stark and Jia Deng and Li Fei-Fei},
title = {3D Object Representations for Fine-grained Categorization},
booktitle = {4th International IEEE Workshop on 3D Representation and Recognition (3DPR)},
year = {2013}
}
- Download the CUB-200-2011 dataset (11,788 images)
@techreport{WahCUB_200_2011,
title = {The Caltech-UCSD Birds-200-2011 Dataset},
author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},
year = {2011},
institution = {California Institute of Technology},
number = {CNS-TR-2011-001}
}
- Sort the dataset into this structure, all images need unique numeric ids!:
datasets/
├── STANFORD_CARS/
│ └── images/
│ ├── train/
│ │ ├── class_1/
│ │ ├── class_2/
│ │ └── ...
│ └── test/
│ ├── class_1/
│ ├── class_2/
│ └── ...
│
└── CUB_200_2011/
└── images/
├── train/
│ ├── class_1/
│ ├── class_2/
│ └── ...
└── test/
├── class_1/
├── class_2/
└── ...
Configure the manipulations you want to apply in config/apply_manipulations.yaml. Adjust the dataset_path to your stored dataset. This will then create folders augmented_train and augmented_test with the same class folder structure but with the augmented images named like ID_manipulation.png.
NOTE: If you want to use the Qwen Image editing model VRAM of ~57 GB is required. We used two L40 GPUs for this. Inference took ~30s per image, so configure imgs_per_class to your compute budget.
- Create a token at https://huggingface.co/settings/tokens.
- Set env var:
export HF_TOKEN='hf_xxxxxxxx'.
python src/prepare_datasets/apply_manipulations.py
Next extract the features, configure config/extract_features.yaml. Adjust the dataset_path to your stored dataset. Select which backbone (convnext, swin), which manipulation_model_dirs (direct, qwen), and train/test/both should be extracted. This will create per backbone separate features folders in the dataset folder.
NOTE: Since ConvNeXt and SwinV2 are trained for different image sizes, they are resized here before extraction.
python src/prepare_datasets/extract_features.py
NOTE: Extracting features for all manipulations can require a large amount of disk space (up to ~1.6 TB), due to the high dimensionality of the stored feature vectors and the number of augmented samples.
| Backbone | Input Images | Layer depth | Feature dimensions |
|---|---|---|---|
| ConvNeXt | 288 × 288 × 3 | 0 | 128 × 72 × 72 |
| 1 | 256 × 36 × 36 | ||
| 2 | 512 × 18 × 18 | ||
| 3 | 1024 × 9 × 9 | ||
| SwinV2 | 384 × 384 × 3 | 3 | 1024 × 12 × 12 |
Configure config/train_mapping.yaml. Here you can set up which model should be trained with which model. Feature dimensions and manipulation names need to match to the ones created during feature extraction! Adjust all paths to where you saved your datasets and want the models and training logs to be saved.
python src/train_mapping.py
This will apply the trained mapping models to new test features, reconstruct with FeatInv and calculate evaluation metrics, configure config/test_mapping.yaml.
Required for testing:
-
Clone FeatInv into
src/FeatInv -
Download checkpoint → place in
models/- This requires the correct pretrained FeatInv model checkpoints for the selected backbone and the FeatInv source code placed into this projects src folder.
- All details for the use of FeatInv are explained in the FeatInv GitHub Repository
- Place a pre-trained FeatInv model checkpoint in the models folder
At the time of writing only the pre-trained FeatInv model weights of the model that was trained on the final ConvNeXt feature maps (in our naming feat3) is publicly available HERE.
python src/test_mapping.py
Optionally you can evaluate our mapping models features classification performance using a finetuned ConvNeXt or SwinV2 model. Both original Backbones are originally trained for the ImageNet-1K and need to be finetuned to the 196 Stanford Cars classes.
All finetuning is done in ft_classifiers_cars. You have the option to train only with the original Stanford Cars images or additionally all of the augmented images created with apply_manipulations.py. Finetuning with all augmentations is computationally expensive but proved to improve classification performance.
Once finetuned you can run image classification tests with test_img_classifier.py and features with test_classifier.py. All settings for the feature classification can be adjusted here config/test_classifier.yaml. Select the correct name of the finetuned model here!
test_classifier.py will initially only save the class probabilities of all experiments. eval_class_probs.py then evaluates the probabilities and calculates all classification metrics we used in our paper.
python src/test_classifier.py
Once linear mapping models are trained you can evaluate their weight and bias properties using eval_lin_models_wxb. Again adjust config/eval_lin_models.yaml to the models and paths you used.
python src/eval_vis/eval_lin_models_wxb.py
For the analysis of the weight matrix properties using singular value decomposition (SVD) first run extract_svd_linear, then calculate metrics with calc_svd_metrics_linear.
python src/eval_vis/svd/calc_svd_metrics_linear.py
Core directories and their purpose:
modelsSaved mapping models, FeatInv checkpoints, and finetuned backbonesevalsEvaluation outputs and reconstructed imageslogsTensorBoard training logsconfigConfiguration files for all scripts (named after corresponding Python files)srcAll source code
Feature-to-image reconstructor from:
https://github.com/AI4HealthUOL/FeatInv
Clone in to the src folder.
manipulations.pyImage manipulationsapply_manipulations.pyApplies selected transformationsfeature_dataset_impl.pyFeature dataset handling with metadataextract_features.pyFeature extraction
train_mapping.pyTrains mappings from original → manipulated featurestest_mapping.pyApplies mappings and evaluates reconstructionsmodel_implementations.pyMapping model definitionseval_helpersLoss functions etc.eval_functions.pyAll code for evaluating the mapped, reconstructed images against the target original and augmented imagesutils.pyUtility functions
ft_classifier_cars.pyFinetunes both Backbones on Stanford Cars 196 classestest_img_classifier.pyClassification on imagestest_img_classifier_recon.pyClassification on reconstructed imagestest_classifier.pyClassification directly on features

