FAIR Data Machine

Welcome to the High-Value Data FAIRification machine.

This project provides a practical “toolkit in a box” for turning raw data into reusable digital products. It leverages a modern, data-driven architecture to generate optimized Docker workstations tailored to your specific FAIRification needs.

🚀 Quick Start

The FAIR Data Machine is now fully customizable. You can use the built-in Builder UI to select exactly the tools and packages you need.

1. Launch the Builder

python3 scripts/ui/app.py

2. Generate and Build

Select your components (Python, Postgres, DuckDB, R, AI tools, etc.).
Pick specific Python/R packages from the dynamic dropdowns.
Click Generate Build Package.
Follow the instructions in the newly created custom-build/ directory to build your image.

+> [!TIP] +> The custom-build/ directory is overwritten every time you generate a new build package. If you want to keep a specific version, simply rename the folder (e.g., mv custom-build my-special-build) before clicking generate again.

🛠 Features & Components

The machine is powered by a central Component Registry (config/components.json). Available tools include:

Languages & Runtimes

Python 3.12: Fast dependency management via uv.
Node.js (LTS): JavaScript runtime with pnpm.
R: Statistical computing layer with custom package support.

Databases & Query Engines

PostgreSQL + pgvector: Relational storage with vector similarity search.
DuckDB: Embedded analytical SQL engine.
QLever: RDF/SPARQL-oriented query engine.
Oxygraph: Lightweight RDF graph database.

Analytical Tools

VisiData: Terminal-first interactive data exploration.
QSV: High-performance CSV toolkit.
ReadStat: Interoperability for legacy statistical formats (SPSS/Stata/SAS).

Workflow & Version Control

Git: Distributed version control system.
GitHub CLI: Official command-line interface for GitHub.

AI Assistants (Optional)

Claude Code CLI: Anthropic's terminal-based AI coding assistant.
Gemini CLI: Google's AI assistant for automation and scripting.
Ollama + ollama-code: Local LLM runtime for private, offline AI assistance.

Infrastructure (Optional)

Nginx Proxy Manager: Reverse proxy and TLS management for controlled service access.

📂 Project Structure

config/components.json: Central registry of all tool installation logic and metadata.
scripts/runtime/generator.py: Optimized build generator that produces clean, single-stage Dockerfiles.
scripts/ui/app.py: Interactive Gradio interface for image customization.
custom-build/: The output directory for your generated workstation files.
docs/OFFLINE.md: Step-by-step guide for air-gapped deployment.
docs/MAINTENANCE_GUIDE.md: Instructions for expanding the registry and UI.

👩‍💻 For Maintainers

The architecture is designed to be highly extensible. Adding a new tool or updating a package version only requires editing a single JSON file.

Refer to the MAINTENANCE_GUIDE.md for details on adding components, resolving dependencies, and updating the UI.

Important

Early Prototype: This project is in an early stage. Expect breaking changes and limited documentation as we refine the FAIRification workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
config		config
custom-build		custom-build
docs		docs
profiles		profiles
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FAIR Data Machine

🚀 Quick Start

1. Launch the Builder

2. Generate and Build

🛠 Features & Components

Languages & Runtimes

Databases & Query Engines

Analytical Tools

Workflow & Version Control

AI Assistants (Optional)

Infrastructure (Optional)

📂 Project Structure

👩‍💻 For Maintainers

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FAIR Data Machine

🚀 Quick Start

1. Launch the Builder

2. Generate and Build

🛠 Features & Components

Languages & Runtimes

Databases & Query Engines

Analytical Tools

Workflow & Version Control

AI Assistants (Optional)

Infrastructure (Optional)

📂 Project Structure

👩‍💻 For Maintainers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages