DaedalusData is an open-source platform for the exploration, visualization, and labeling of scientific image collections. It provides researchers and data scientists with a flexible and accessible environment for analyzing, exploring, and extracting knowledge from image datasets.
Designed for researchers by researchers: DaedalusData prioritizes ease of use and minimal setup requirements. You can be up and running in minutes with just Docker installed, no machine learning or web development expertise required. You can run the full application locally with all data fully in your control.
- Image Exploration: Browse and interact with image collections in a web-based interface
- Dimensionality Reduction: Visualize high-dimensional data in intuitive 2D projections using Three.js
- Metadata Exploration: Analyze and filter images based on metadata attributes using bar charts and violin plots
- Labeling System: Create and maintain multiple label alphabets for image classification
- Jupyter Integration: Perform feature extraction and analysis with template notebooks
- Fully Dockerized: Simple setup and deployment with Docker Compose
- Extensible Architecture: Mount directories for easy data interchange with other tools
DaedalusData combines a Nuxt.js frontend with Jupyter notebooks for analysis and feature extraction. The entire application is containerized with Docker, providing a consistent environment across different systems.
DaedalusData/
├── frontend/ # Nuxt.js frontend application
├── notebooks/ # Jupyter notebooks for analysis
├── data/ # Data files (mounted volume)
│ ├── images/ # PNG image files
│ ├── metadata/ # JSON metadata
│ ├── features/ # Extracted features (CSV/NPZ)
│ ├── projections/ # Dimensionality reduction results
│ └── labels/ # Label alphabets and assignments
DaedalusData is designed to be extremely easy to set up and use. The only prerequisites are:
- Docker - Installation Guide
- Docker Compose - Installation Guide
That's it! Everything else runs inside Docker, so you don't need to worry about dependencies, Python versions, or library conflicts.
git clone git@github.com:alexv710/daedalusData.git
cd daedalusdata
docker compose up -d
# or
podman-compose up -dThis starts two services:
- Frontend → http://localhost:3000
- Jupyter Lab → http://localhost:8888
Open Jupyter at http://localhost:8888 and run the following notebooks in order:
| Notebook | Purpose | Output |
|---|---|---|
load_demo_dataset.ipynb |
Download sample dataset | data/images/ & data/metadata/ |
feature_extraction.ipynb |
Extract image features | data/features/ |
Dimensionality_Reduction.ipynb |
Compute 2D projections | data/projections/ |
Tip: Run all cells in each notebook sequentially before moving to the next.
-
Open the frontend at http://localhost:3000
-
Verify that images, metadata, features, and projections show as loaded:
-
Click
GENERATE ATLASto create an optimized image atlas for fast loading: -
Once complete, click
EXPLORE DATASETto start exploring and labeling:
The application is configured through the compose.yaml file. Key configuration options:
- Volume mounts for data directories
DaedalusData operates on the following data structures:
Place your PNG image files in the data/images/ directory. The image filenames (without extension) must match the keys in your metadata JSON file.
Example structure:
data/images/
├── image1.png
├── image2.png
└── image3.png
Create a JSON file in the data/metadata/ directory with image metadata. The structure should be a flat JSON object where keys are image names (without file extension) and values are objects containing atomic attributes.
Example images.json:
{
"image1": {
"type": "sample",
"size": 10.5,
"category": "A"
},
"image2": {
"type": "control",
"size": 8.3,
"category": "B"
}
}Features can be stored in two formats (You can store them however you like, but the sample dimred notebook would expect these formats):
-
CSV format:
,0,1,2,3 image1,0.002,0.016,0.885, image2,0.055,0.844,, -
NPZ format:
np.savez_compressed( "features.npz", image_names=image_names, features=img_features_array )
The Jupyter notebooks provide templates for feature extraction.
Projections are stored as JSON files in the data/projections/ directory. A manifest file (projection_manifest.json) lists all available projections.
Example projection_manifest.json:
[
"umap_image_projection.json",
"umap_combined_projection.json"
]Example projection file:
[
{
"image": "image1",
"UMAP1": 1.3517,
"UMAP2": -1.3488
},
{
"image": "image2",
"UMAP1": -1.4149,
"UMAP2": 1.5770
}
]Labels are organized into "alphabets" - sets of label categories. Each alphabet is stored as a JSON file with a unique ID.
Example label alphabet:
{
"id": "54f54ea2-a39b-4405-9e4c-9c5bd1456ea5",
"name": "Category",
"description": "Main image categories",
"labels": [
{
"id": "843c0f3f-4403-4f30-babf-9f5863da86aa",
"value": "Type A",
"description": "Category A images",
"color": "#17c200",
"images": ["image1", "image3"]
},
{
"id": "d43e75c8-91e7-4c2b-9302-feea559dc38b",
"value": "Type B",
"description": "Category B images",
"color": "#de1753",
"images": ["image2"]
}
]
}A manifest file (label_manifest.json) keeps track of all label alphabets:
{
"alphabets": [
"alphabet_a52854a0-0a66-41b2-98e8-74d7da996caa.json",
"alphabet_54f54ea2-a39b-4405-9e4c-9c5bd1456ea5.json"
]
}-
Data Preparation:
- Place your images in
data/images/ - Create metadata in
data/metadata/images.json
- Place your images in
-
Feature Extraction:
- Use the Jupyter notebooks to extract features from images
- Save features to
data/features/
-
Dimensionality Reduction:
- Generate projections using the notebook templates
- Save projections to
data/projections/
-
Exploration & Labeling:
- Use the frontend to explore your data
- Create label alphabets and assign labels to images
- Filter and analyze based on metadata attributes
- Labeling in the frontend will update the label jsons in
data/labels/
The application provides template notebooks for:
- Data Loading: Load and preprocess image data
- Feature Extraction: Extract features from images using pre-trained models
- Dimensionality Reduction: Project features to 2D space for visualization
The included notebooks implement standard, widely-used methods that work well out of the box:
The default feature extraction uses ResNet50, a pre-trained deep learning model:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
# Pre-trained model for feature extraction
model = ResNet50(weights='imagenet', include_top=False, pooling='avg')
# Extract features from images
features = model.predict(preprocessed_images)This approach provides robust image embeddings without requiring any machine learning expertise.
The standard dimensionality reduction uses UMAP to create 2D projections:
def compute_normalized_umap(features, n_neighbors=15, min_dist=0.1,
n_components=2, random_state=42):
reducer = umap.UMAP(n_neighbors=n_neighbors, min_dist=min_dist,
n_components=n_components, random_state=random_state)
embedding = reducer.fit_transform(features)
# Normalize coordinates such that the mean is 0 (centered)
embedding_centered = embedding - np.mean(embedding, axis=0)
return embedding_centeredThe notebooks generate two standard projections:
- Image-only projection based on visual features
- Combined projection integrating image features and metadata
Access Jupyter at http://localhost:8888.
The web interface provides tools for:
- Visualizing image collections in 2D/3D projections
- Filtering and selecting images based on metadata
- Creating and managing label alphabets
- Visualizing metadata distributions with bar charts and violin plots
- Exporting labeled datasets
Access the frontend at http://localhost:3000.
DaedalusData includes test suites for both the Python data processing and the frontend application.
Validate data formats and structures:
# Run from project root
pytest
# Or inside Docker/Podman
docker compose exec app python -m pytest /app/tests
podman-compose exec app python -m pytest /app/testsTest the Nuxt.js API routes and components:
cd frontend
pnpm test # Run tests
pnpm test:ci # Run with coverageSee tests/README.md for more details.
- Create a new notebook in the
notebooks/directory - Use the template notebooks as a guide
- Save extracted features to
data/features/
- Use the dimensionality reduction notebook as a template
- Experiment with different algorithms (UMAP, t-SNE, PCA, etc.)
- Save projections in the format described above
- Update the projection manifest
Contributions are welcome! Please see our Contributing Guidelines for details on:
- Reporting bugs
- Suggesting features
- Submitting pull requests
- Getting help and support
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
If you use DaedalusData in your research, please cite:
@article{wyss2023daedalusdata,
title={DaedalusData: A Dockerized Open-Source Platform for Exploration, Visualization, and Interactive Labeling of Image Collections},
author={Wyss, Alexander},
journal={tbd},
year={tbd}
}
This open-source implementation builds upon research originally published in IEEE Transactions on Visualization and Computer Graphics:
A. Wyss, G. Morgenshtern, A. Hirsch-Hüsler and J. Bernard, "DaedalusData: Exploration, Knowledge Externalization and Labeling of Particles in Medical Manufacturing — A Design Study" in IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 54-64, Jan. 2025, doi: 10.1109/TVCG.2024.3456329.
Research Keywords: Visual Analytics, Image Data, Knowledge Externalization, Data Labeling, Anomaly Detection, Medical Manufacturing
The original design study addressed challenges in medical diagnostics, specifically focusing on particle-based contamination in in-vitro diagnostics consumables. This dockerized implementation makes the DaedalusData approach accessible to researchers in various domains beyond medical manufacturing.
For more information about the design study methodology, evaluation results, and theoretical framework for knowledge externalization, please refer to the original publication.
# adjust the docker tag
singularity pull docker://ghcr.io/alexv710/daedalusdata/frontend:sha-a0a47b7
singularity run -B ./data:/app/data frontend_sha-5ed09d8.sif



