# Insert

The `insert` operation adds new entities to the database.
In the context of the [Relational Workflow Model](../20-concepts/05-workflows.md), inserting data is how information enters the pipeline—either as reference data in Lookup tables or as new workflow items in Manual tables.

## Insert in the Workflow

Different table tiers use insert differently:

| Table Tier | When to Insert | Typical Pattern |
|------------|----------------|-----------------|
| **Lookup** | Schema setup | Once, with `skip_duplicates=True` |
| **Manual** | Ongoing workflow | As new subjects, sessions, trials occur |
| **Imported/Computed** | Never manually | Only via `populate()` |

For **Lookup tables**, insertion happens during schema initialization—these tables define the reference data and parameter sets that configure your pipeline.

For **Manual tables**, each insert represents new information entering the workflow.
This is the trigger that drives downstream computations: when you insert a new session, all Imported and Computed tables that depend on it become candidates for population.

## The `insert1` Method

Use `insert1` to add a single row:

```python
<Table>.insert1(row, ignore_extra_fields=False)
```

**Parameters:**
- **`row`**: A dictionary with keys matching table attributes
- **`ignore_extra_fields`**: If `True`, extra dictionary keys are silently ignored; if `False` (default), extra keys raise an error

**Example:**
```python
# Insert a single subject into a Manual table
Subject.insert1({
    'subject_id': 'M001',
    'species': 'mouse',
    'sex': 'M',
    'date_of_birth': '2023-06-15'
})
```

Use `insert1` when:
- Adding individual records interactively
- Processing items one at a time in a loop where you need error handling per item
- Debugging, where single-row operations provide clearer error messages

## The `insert` Method

Use `insert` for batch insertion of multiple rows:

```python
<Table>.insert(rows, ignore_extra_fields=False, skip_duplicates=False)
```

**Parameters:**
- **`rows`**: A list of dictionaries (or any iterable of dict-like objects)
- **`ignore_extra_fields`**: If `True`, extra keys are ignored
- **`skip_duplicates`**: If `True`, rows with existing primary keys are silently skipped; if `False` (default), duplicates raise an error

**Example:**
```python
# Batch insert multiple sessions
Session.insert([
    {'subject_id': 'M001', 'session_date': '2024-01-15', 'session_notes': 'baseline'},
    {'subject_id': 'M001', 'session_date': '2024-01-16', 'session_notes': 'treatment'},
    {'subject_id': 'M001', 'session_date': '2024-01-17', 'session_notes': 'follow-up'},
])
```

Use `insert` when:
- Loading data from files or external sources
- Populating Lookup tables at schema setup
- Migrating or synchronizing data between systems

## Populating Lookup Tables

Lookup tables should be populated as part of schema initialization, not as ongoing workflow operations.
Use `skip_duplicates=True` to make the insertion idempotent—safe to run multiple times:

```python
# Idempotent lookup table population
# Can be run every time the pipeline starts
Species.insert([
    {'species': 'mouse', 'species_name': 'Mus musculus'},
    {'species': 'rat', 'species_name': 'Rattus norvegicus'},
    {'species': 'human', 'species_name': 'Homo sapiens'},
], skip_duplicates=True)

# Parameter sets for analysis
ProcessingParams.insert([
    {'param_id': 1, 'filter_cutoff': 300, 'threshold': 0.5},
    {'param_id': 2, 'filter_cutoff': 500, 'threshold': 0.3},
], skip_duplicates=True)
```

## Referential Integrity

DataJoint enforces referential integrity on insert.
If a table has foreign key dependencies, the referenced entities must already exist:

```python
# This will fail if subject 'M001' doesn't exist in Subject table
Session.insert1({
    'subject_id': 'M001',  # Must exist in Subject
    'session_date': '2024-01-15'
})
```

This constraint ensures the dependency graph remains valid—you cannot create downstream entities without their upstream dependencies.

## Best Practices

1. **Match insert method to use case**: Use `insert1` for single records, `insert` for batches
2. **Use `skip_duplicates=True` for Lookup tables**: Makes initialization scripts idempotent
3. **Keep `ignore_extra_fields=False`** (default): Helps catch data mapping errors early
4. **Insert upstream before downstream**: Respect the dependency order defined by foreign keys
5. **Let `populate()` handle auto-populated tables**: Never manually insert into Imported or Computed tables