DaedalusData

DaedalusData is an open-source platform for the exploration, visualization, and labeling of scientific image collections. It provides researchers and data scientists with a flexible and accessible environment for analyzing, exploring, and extracting knowledge from image datasets.

Designed for researchers by researchers: DaedalusData prioritizes ease of use and minimal setup requirements. You can be up and running in minutes with just Docker installed, no machine learning or web development expertise required. You can run the full application locally with all data fully in your control.

Features

Image Exploration: Browse and interact with image collections in a web-based interface
Dimensionality Reduction: Visualize high-dimensional data in intuitive 2D projections using Three.js
Metadata Exploration: Analyze and filter images based on metadata attributes using bar charts and violin plots
Labeling System: Create and maintain multiple label alphabets for image classification
Jupyter Integration: Perform feature extraction and analysis with template notebooks
Fully Dockerized: Simple setup and deployment with Docker Compose
Extensible Architecture: Mount directories for easy data interchange with other tools

Architecture

DaedalusData combines a Nuxt.js frontend with Jupyter notebooks for analysis and feature extraction. The entire application is containerized with Docker, providing a consistent environment across different systems.

DaedalusData/
├── frontend/         # Nuxt.js frontend application
├── notebooks/        # Jupyter notebooks for analysis
├── data/             # Data files (mounted volume)
│   ├── images/       # PNG image files
│   ├── metadata/     # JSON metadata
│   ├── features/     # Extracted features (CSV/NPZ)
│   ├── projections/  # Dimensionality reduction results
│   └── labels/       # Label alphabets and assignments

Getting Started

DaedalusData is designed to be extremely easy to set up and use. The only prerequisites are:

Docker - Installation Guide
Docker Compose - Installation Guide

That's it! Everything else runs inside Docker, so you don't need to worry about dependencies, Python versions, or library conflicts.

Quick Start

1. Clone & Launch

git clone git@github.com:alexv710/daedalusData.git
cd daedalusdata
docker compose up -d
# or
podman-compose up -d

This starts two services:

Frontend → http://localhost:3000
Jupyter Lab → http://localhost:8888

2. Prepare Sample Data

Open Jupyter at http://localhost:8888 and run the following notebooks in order:

Notebook	Purpose	Output
`load_demo_dataset.ipynb`	Download sample dataset	`data/images/` & `data/metadata/`
`feature_extraction.ipynb`	Extract image features	`data/features/`
`Dimensionality_Reduction.ipynb`	Compute 2D projections	`data/projections/`

Tip: Run all cells in each notebook sequentially before moving to the next.

3. Generate Atlas & Explore

Open the frontend at http://localhost:3000
Verify that images, metadata, features, and projections show as loaded:
Click GENERATE ATLAS to create an optimized image atlas for fast loading:
Once complete, click EXPLORE DATASET to start exploring and labeling:

Configuration

The application is configured through the compose.yaml file. Key configuration options:

Volume mounts for data directories

Data Organization

DaedalusData operates on the following data structures:

Images

Place your PNG image files in the data/images/ directory. The image filenames (without extension) must match the keys in your metadata JSON file.

Example structure:

data/images/
├── image1.png
├── image2.png
└── image3.png

Metadata

Create a JSON file in the data/metadata/ directory with image metadata. The structure should be a flat JSON object where keys are image names (without file extension) and values are objects containing atomic attributes.

Example images.json:

{
  "image1": {
    "type": "sample",
    "size": 10.5,
    "category": "A"
  },
  "image2": {
    "type": "control",
    "size": 8.3,
    "category": "B"
  }
}

Features

Features can be stored in two formats (You can store them however you like, but the sample dimred notebook would expect these formats):

CSV format:

,0,1,2,3
image1,0.002,0.016,0.885,
image2,0.055,0.844,,

NPZ format:

np.savez_compressed(
    "features.npz", 
    image_names=image_names,
    features=img_features_array
)

The Jupyter notebooks provide templates for feature extraction.

Projections

Projections are stored as JSON files in the data/projections/ directory. A manifest file (projection_manifest.json) lists all available projections.

Example projection_manifest.json:

[
  "umap_image_projection.json",
  "umap_combined_projection.json"
]

Example projection file:

[
  {
    "image": "image1",
    "UMAP1": 1.3517,
    "UMAP2": -1.3488
  },
  {
    "image": "image2",
    "UMAP1": -1.4149,
    "UMAP2": 1.5770
  }
]

Labels

Labels are organized into "alphabets" - sets of label categories. Each alphabet is stored as a JSON file with a unique ID.

Example label alphabet:

{
  "id": "54f54ea2-a39b-4405-9e4c-9c5bd1456ea5",
  "name": "Category",
  "description": "Main image categories",
  "labels": [
    {
      "id": "843c0f3f-4403-4f30-babf-9f5863da86aa",
      "value": "Type A",
      "description": "Category A images",
      "color": "#17c200",
      "images": ["image1", "image3"]
    },
    {
      "id": "d43e75c8-91e7-4c2b-9302-feea559dc38b",
      "value": "Type B",
      "description": "Category B images",
      "color": "#de1753",
      "images": ["image2"]
    }
  ]
}

A manifest file (label_manifest.json) keeps track of all label alphabets:

{
  "alphabets": [
    "alphabet_a52854a0-0a66-41b2-98e8-74d7da996caa.json",
    "alphabet_54f54ea2-a39b-4405-9e4c-9c5bd1456ea5.json"
  ]
}

Usage Guide

Workflow Overview

Data Preparation:
- Place your images in data/images/
- Create metadata in data/metadata/images.json
Feature Extraction:
- Use the Jupyter notebooks to extract features from images
- Save features to data/features/
Dimensionality Reduction:
- Generate projections using the notebook templates
- Save projections to data/projections/
Exploration & Labeling:
- Use the frontend to explore your data
- Create label alphabets and assign labels to images
- Filter and analyze based on metadata attributes
- Labeling in the frontend will update the label jsons in data/labels/

Jupyter Notebooks

The application provides template notebooks for:

Data Loading: Load and preprocess image data
Feature Extraction: Extract features from images using pre-trained models
Dimensionality Reduction: Project features to 2D space for visualization

The included notebooks implement standard, widely-used methods that work well out of the box:

Feature Extraction

The default feature extraction uses ResNet50, a pre-trained deep learning model:

from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input

# Pre-trained model for feature extraction
model = ResNet50(weights='imagenet', include_top=False, pooling='avg')

# Extract features from images
features = model.predict(preprocessed_images)

This approach provides robust image embeddings without requiring any machine learning expertise.

Dimensionality Reduction

The standard dimensionality reduction uses UMAP to create 2D projections:

def compute_normalized_umap(features, n_neighbors=15, min_dist=0.1, 
                           n_components=2, random_state=42):
    reducer = umap.UMAP(n_neighbors=n_neighbors, min_dist=min_dist, 
                       n_components=n_components, random_state=random_state)
    embedding = reducer.fit_transform(features)
    # Normalize coordinates such that the mean is 0 (centered)
    embedding_centered = embedding - np.mean(embedding, axis=0)
    return embedding_centered

The notebooks generate two standard projections:

Image-only projection based on visual features
Combined projection integrating image features and metadata

Access Jupyter at http://localhost:8888.

Frontend Interface

The web interface provides tools for:

Visualizing image collections in 2D/3D projections
Filtering and selecting images based on metadata
Creating and managing label alphabets
Visualizing metadata distributions with bar charts and violin plots
Exporting labeled datasets

Access the frontend at http://localhost:3000.

Testing

DaedalusData includes test suites for both the Python data processing and the frontend application.

Python Tests

Validate data formats and structures:

# Run from project root
pytest

# Or inside Docker/Podman
docker compose exec app python -m pytest /app/tests
podman-compose exec app python -m pytest /app/tests

Frontend Tests

Test the Nuxt.js API routes and components:

cd frontend
pnpm test        # Run tests
pnpm test:ci     # Run with coverage

See tests/README.md for more details.

Extending DaedalusData

Adding Custom Feature Extractors

Create a new notebook in the notebooks/ directory
Use the template notebooks as a guide
Save extracted features to data/features/

Creating Custom Projections

Use the dimensionality reduction notebook as a template
Experiment with different algorithms (UMAP, t-SNE, PCA, etc.)
Save projections in the format described above
Update the projection manifest

Contributing

Contributions are welcome! Please see our Contributing Guidelines for details on:

Reporting bugs
Suggesting features
Submitting pull requests
Getting help and support

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Citation

If you use DaedalusData in your research, please cite:

@article{wyss2023daedalusdata,
  title={DaedalusData: A Dockerized Open-Source Platform for Exploration, Visualization, and Interactive Labeling of Image Collections},
  author={Wyss, Alexander},
  journal={tbd},
  year={tbd}
}

Acknowledgments

This open-source implementation builds upon research originally published in IEEE Transactions on Visualization and Computer Graphics:

A. Wyss, G. Morgenshtern, A. Hirsch-Hüsler and J. Bernard, "DaedalusData: Exploration, Knowledge Externalization and Labeling of Particles in Medical Manufacturing — A Design Study" in IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 54-64, Jan. 2025, doi: 10.1109/TVCG.2024.3456329.

Research Keywords: Visual Analytics, Image Data, Knowledge Externalization, Data Labeling, Anomaly Detection, Medical Manufacturing

The original design study addressed challenges in medical diagnostics, specifically focusing on particle-based contamination in in-vitro diagnostics consumables. This dockerized implementation makes the DaedalusData approach accessible to researchers in various domains beyond medical manufacturing.

For more information about the design study methodology, evaluation results, and theoretical framework for knowledge externalization, please refer to the original publication.

Appendix

HPC deployment

# adjust the docker tag
singularity pull docker://ghcr.io/alexv710/daedalusdata/frontend:sha-a0a47b7
singularity run -B ./data:/app/data frontend_sha-5ed09d8.sif

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
.github		.github
docs		docs
frontend		frontend
notebooks		notebooks
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

License

alexv710/daedalusData

Folders and files

Latest commit

History

Repository files navigation

DaedalusData

Features

Architecture

Getting Started

Quick Start

1. Clone & Launch

2. Prepare Sample Data

3. Generate Atlas & Explore

Configuration

Data Organization

Images

Metadata

Features

Projections

Labels

Usage Guide

Workflow Overview

Jupyter Notebooks

Feature Extraction

Dimensionality Reduction

Frontend Interface

Testing

Python Tests

Frontend Tests

Extending DaedalusData

Adding Custom Feature Extractors

Creating Custom Projections

Contributing

License

Citation

Acknowledgments

Appendix

HPC deployment

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages