# Populate

The `populate` operation is the engine of workflow automation in DataJoint.
While [insert](010-insert.ipynb), [delete](020-delete.ipynb), and [update](030-updates.ipynb) are operations for Manual tables, `populate` automates data entry for **Imported** and **Computed** tables based on the dependencies defined in the schema.

As introduced in [Workflow Operations](000-workflow-operations.md), the distinction between external and automatic data entry maps directly to table tiers:

| Table Tier | Data Entry Method |
|------------|-------------------|
| Lookup | `contents` property (part of schema) |
| Manual | `insert` from external sources |
| **Imported** | **Automatic `populate`** |
| **Computed** | **Automatic `populate`** |

This chapter shows how `populate` transforms the schema's dependency graph into executable computations.

## The Relational Workflow Model in Action

Recall that the **Relational Workflow Model** is built on four fundamental concepts:

1. **Workflow Entity** — Each table represents an entity type created at a specific workflow step
2. **Workflow Dependencies** — Foreign keys prescribe the order of operations
3. **Workflow Steps** — Distinct phases where entity types are created (manual or automated)
4. **Directed Acyclic Graph (DAG)** — The schema forms a graph structure ensuring valid execution sequences

The Relational Workflow Model defines a new class of databases: **Computational Databases**, where computational transformations are first-class citizens of the data model. In a computational database, the schema is not merely a passive data structure—it is an executable specification of the workflow itself.

## From Declarative Schema to Executable Pipeline

A DataJoint schema uses **table tiers** to distinguish different workflow roles:

| Tier | Color | Role in Workflow |
|------|-------|------------------|
| **Lookup** | Gray | Static reference data and configuration parameters |
| **Manual** | Green | Data from external systems or human entry |
| **Imported** | Blue | Data acquired automatically from instruments or files |
| **Computed** | Red | Derived data produced by computational transformations |

Because dependencies are explicit through foreign keys, DataJoint's `populate()` method can explore the DAG top-down: for every upstream key that has not been processed, it executes the table's `make()` method inside an atomic transaction. If anything fails, the transaction is rolled back, preserving **computational validity**—the guarantee that all derived data remains consistent with its upstream dependencies.

This is the essence of **workflow automation**: each table advertises what it depends on, and `populate()` runs only the computations that are still missing.

## The `populate` Method

The `populate()` method is the engine of workflow automation. When called on a computed or imported table, it:

1. **Identifies missing work** — Queries the key source (the join of all upstream dependencies) and subtracts keys already present in the table
2. **Iterates over pending keys** — For each missing key, calls the table's `make()` method
3. **Wraps each `make()` in a transaction** — Ensures atomicity: either all inserts succeed or none do
4. **Handles errors gracefully** — Failed jobs are logged but do not stop the remaining work

```python
# Process all pending work
Detection.populate(display_progress=True)

# Process a specific subset
Detection.populate(Image & "image_id < 10")

# Distribute across workers
Detection.populate(reserve_jobs=True)
```

The `reserve_jobs=True` option enables parallel execution across multiple processes or machines by using the database itself for job coordination.

## The `make` Method

The `make()` method defines the computational logic for each entry.
It receives a **key** dictionary identifying which entity to compute and must **fetch** inputs, **compute** results, and **insert** them into the table.

See the dedicated [make Method](055-make.ipynb) chapter for:
- The three-part anatomy (fetch, compute, insert)
- Restrictions on auto-populated tables
- The three-part pattern for long-running computations
- Transaction handling strategies

## Transactional Integrity

Each `make()` call executes inside an **ACID transaction**. This provides critical guarantees for computational workflows:

- **Atomicity** — The entire computation either commits or rolls back as a unit
- **Isolation** — Partial results are never visible to other processes
- **Consistency** — The database moves from one valid state to another

When a computed table has [part tables](../30-design/053-master-part.ipynb), the transaction boundary encompasses both the master and all its parts. The master's `make()` method is responsible for inserting everything within a single transactional scope. See the [Master-Part](../30-design/053-master-part.ipynb) chapter for detailed coverage of ACID semantics and the master's responsibility pattern.

## Case Study: Blob Detection

The [Blob Detection](../80-examples/075-blob-detection.ipynb) example demonstrates these concepts in a compact image-analysis workflow:

1. **Source data** — `Image` (manual) stores NumPy arrays as `longblob` fields
2. **Parameter space** — `BlobParamSet` (lookup) defines detection configurations via `contents`
3. **Computation** — `Detection` (computed) depends on both upstream tables

The `Detection` table uses a master-part structure: the master row stores an aggregate (blob count), while `Detection.Blob` parts store per-feature coordinates. When `populate()` runs:

- Each `(image_id, blob_paramset)` combination triggers one `make()` call
- The `make()` method fetches inputs, runs detection, and inserts both master and parts
- The transaction ensures all blob coordinates appear atomically with their count

```python
Detection.populate(display_progress=True)
# Detection: 100%|██████████| 6/6 [00:01<00:00, 4.04it/s]
```

This pattern—automation exploring combinatorics, then human curation—is common in scientific workflows. After reviewing results, the `SelectDetection` manual table records the preferred parameter set for each image. Because `SelectDetection` depends on `Detection`, it implicitly has access to all `Detection.Blob` parts for the selected detection.

:::{seealso}
- [The `make` Method](055-make.ipynb) — Anatomy, constraints, and patterns
- [Blob Detection](../80-examples/075-blob-detection.ipynb) — Complete working example
- [Master-Part](../30-design/053-master-part.ipynb) — Transaction semantics and dependency implications
:::

## Why Computational Databases Matter

The Relational Workflow Model provides several key benefits:

| Benefit | Description |
|---------|-------------|
| **Reproducibility** | Rerunning `populate()` regenerates derived tables from raw inputs |
| **Dependency-aware scheduling** | DataJoint infers job order from foreign keys (the DAG structure) |
| **Computational validity** | Transactions ensure downstream results stay consistent with upstream inputs |
| **Provenance tracking** | The schema documents what was computed from what |

## Practical Tips

- **Develop incrementally** — Test `make()` logic with restrictions (e.g., `Table.populate(restriction)`) before processing all data
- **Monitor progress** — Use `display_progress=True` for visibility during development
- **Distribute work** — Use `reserve_jobs=True` when running multiple workers
- **Use master-part for multi-row results** — When a computation produces both summary and detail rows, structure them as master and parts to keep them in the same transaction