# Computation as Workflow

DataJoint's central innovation is to recast relational databases as executable workflow specifications comprising a mixture of manual and automated steps. This chapter connects the theoretical foundations of the [Relational Workflow Model](../20-concepts/04-workflows.md) to practical computation patterns.

## The Relational Workflow Model in Action

Recall that the **Relational Workflow Model** is built on four fundamental concepts:

1. **Workflow Entity** — Each table represents an entity type created at a specific workflow step
2. **Workflow Dependencies** — Foreign keys prescribe the order of operations
3. **Workflow Steps** — Distinct phases where entity types are created (manual or automated)
4. **Directed Acyclic Graph (DAG)** — The schema forms a graph structure ensuring valid execution sequences

The Relational Workflow Model defines a new class of databases: **Computational Databases**, where computational transformations are first-class citizens of the data model. In a computational database, the schema is not merely a passive data structure—it is an executable specification of the workflow itself.

## From Declarative Schema to Executable Pipeline

A DataJoint schema uses **table tiers** to distinguish different workflow roles:

| Tier | Color | Role in Workflow |
|------|-------|------------------|
| **Lookup** | Gray | Static reference data and configuration parameters |
| **Manual** | Green | Human-entered data or data from external systems |
| **Imported** | Blue | Data acquired automatically from instruments or files |
| **Computed** | Red | Derived data produced by computational transformations |

Because dependencies are explicit through foreign keys, DataJoint's `populate()` method can explore the DAG top-down: for every upstream key that has not been processed, it executes the table's `make()` method inside an atomic transaction. If anything fails, the transaction is rolled back, preserving **computational validity**—the guarantee that all derived data remains consistent with its upstream dependencies.

This is the essence of **workflow automation**: each table advertises what it depends on, and `populate()` runs only the computations that are still missing. The [Blob-detection Pipeline](../80-examples/075-blob-detection.ipynb) from the examples chapter demonstrates how this plays out in practice.

## Case Study: Blob Detection

The notebook `075-blob-detection.ipynb` assembles a compact image-analysis workflow:

1. **Store source imagery** – `Image` is a manual table with a `longblob` field. NumPy arrays fetched from `skimage` are serialized automatically, illustrating that binary payloads need a serializer when stored in a relational database.
2. **Scan parameter space** – `BlobParamSet` is a lookup table of min/max sigma and threshold values for `skimage.feature.blob_doh`. Each combination represents an alternative experiment configuration.
3. **Compute detections** – `Detection` depends on both upstream tables. Its part table `Detection.Blob` holds every circle (x, y, radius) produced by the detector so that master and detail rows stay in sync.

```python
@schema
class Detection(dj.Computed):
    definition = """
    -> Image
    -> BlobParamSet
    ---
    nblobs : int
    """

    class Blob(dj.Part):
        definition = """
        -> master
        blob_id : int
        ---
        x : float
        y : float
        r : float
        """

    def make(self, key):
        img = (Image & key).fetch1("image")
        params = (BlobParamSet & key).fetch1()
        blobs = blob_doh(img,
                         min_sigma=params['min_sigma'],
                         max_sigma=params['max_sigma'],
                         threshold=params['threshold'])
        self.insert1(dict(key, nblobs=len(blobs)))
        self.Blob.insert(dict(key, blob_id=i, x=x, y=y, r=r)
                         for i, (x, y, r) in enumerate(blobs))
```

Running `Detection.populate(display_progress=True)` fans out over every `(image, paramset)` pair, creating six jobs in the demo notebook. Because each job lives in an atomic transaction, half-written results never leak—this is the **isolation** guarantee that maintains workflow integrity.

## Curate the Preferred Result

After inspecting the plots, a small manual table `SelectDetection` records the "best" parameter set for each image. That drives a final visualization that renders only the chosen detections. This illustrates a common pattern: let automation explore the combinatorics, then capture human judgment in a concise manual table.

## Why Computational Databases Matter

The Relational Workflow Model provides several key benefits:

- **Reproducibility** — Rerunning `populate()` regenerates every derived table from raw inputs, providing a clear path from primary data to results
- **Dependency-aware scheduling** — You do not need to script job order; DataJoint infers it from foreign keys (the DAG structure)
- **Computational validity** — Foreign key constraints combined with immutable workflow artifacts ensure downstream results remain consistent with upstream inputs
- **Provenance tracking** — The schema itself documents what was computed from what

## Practical Tips

- Develop `make()` logic with restrictions (e.g., `Detection.populate(key)`) before unlocking the entire pipeline
- Use `display_progress=True` when you need visibility; use `reserve_jobs=True` when distributing work across multiple machines
- If your computed table writes both summary and detail rows, keep them in a part table so the transaction boundary protects them together (see [Master-Part Relationships](../30-database-design/053-master-part.ipynb))

The blob-detection notebook is a self-contained template: swap in your own raw data source, adjust the parameter search, and you have the skeleton for an end-to-end computational database ready for scientific workflows.
