# BiasAnalyzer Cohort Builder — Simple Workthrough Tutorial

This tutorial shows **from scratch** how to install and use the Python DSL to define cohorts and export Cohorts for the BiasAnalyzer pipeline.


## 0) Installation / Environment

From the project root (where `setup.py` lives), install the package locally in editable mode:

```bash
pip install -e .
```

If you are running the notebook outside the repo root and need to add the folder to your path:

```python
import sys
sys.path.append('/path/to/BiasAnalyzerYAMLBuilder')  # adjust to your local path
```

Then import the core classes and operators:


In [1]:
# (Optional) sys.path tweak if needed:
# import sys; sys.path.append('/path/to/BiasAnalyzerYAMLBuilder')

from CohortDefinition import (
    ConditionOccurrence,
    VisitOccurrence,
    DateEvent,
    Demographics,
    CohortCriteria,
    AND,
    OR,
    BEFORE,
    NOT,
)
print("Imports OK — package is reachable.")

Imports OK — package is reachable.


## 1) Concepts at a glance

- **Events (leaves)** — atomic facts with `event_type` and fields:  
  `ConditionOccurrence`, `VisitOccurrence`, `DateEvent`, …  
- **Logic operators (strict arity)** — combine events into temporal/set logic:  
  - `AND(A, B)`, `OR(A, B)`, `BEFORE(A, B)` 
  - `NOT(X)` 
- **CohortYAML** — container that holds:
  - `inclusion_criteria`: `demographics`、`temporal_events`
  - `exclusion_criteria`: `demographics`、 `temporal_events`
  - `save_yaml('path.yaml')` to persist YAML


## 2) Example A — **Demographics-only cohort**

**Goal:** *Males born before 2000.*

**Human-readable logic**
- **Inclusion — Demographics**: `gender = male` AND `max_birth_year = 1999`
- **Inclusion — Temporal**: *(none)*  
- **Exclusion**: *(none)*


In [2]:
cohort_a = CohortCriteria(
    demographics=Demographics(gender="male", max_birth_year=1999)
)
print(cohort_a.to_yaml())

inclusion_criteria:
  demographics:
    gender: 'male'
    max_birth_year: 1999



## 3) Example B — **Single condition + demographics**

**Goal:** *Male patients born before 2000 with Type 2 Diabetes.*

**Human-readable logic**
- **Inclusion — Demographics**: `gender = male` AND `max_birth_year = 1999`
- **Inclusion — Temporal**: one **condition_occurrence** = Type 2 Diabetes (concept `201826`)
- **Exclusion**: *(none)*


In [3]:
t2dm = ConditionOccurrence(event_concept_id=201826)  # Type 2 Diabetes

cohort_b = CohortCriteria(
    demographics=Demographics(gender="male", max_birth_year=1999),
    temporal_blocks=[t2dm],
)
cohort_b.save("cohort_example_B.yaml")
print(cohort_b)
print(cohort_b.to_yaml())
#print("\nSaved -> cohort_example_B.yaml")

CohortYAML(temporal_blocks=[ConditionOccurrence(event_concept_id=201826, code_type=None, code=None, event_instance=None, offset=None)], demographics=Demographics(gender='male', min_birth_year=None, max_birth_year=1999), exclusion_blocks=None, exclusion_demographics=None)
inclusion_criteria:
  demographics:
    gender: 'male'
    max_birth_year: 1999
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 201826



## 4) Example C — **Set logic OR**

**Goal:** *Patients with Hypertension OR Type 2 Diabetes.*

**Human-readable logic**
- **Inclusion — Temporal**: `OR( condition: 320128 , condition: 201826 )`
- **Demographics/Exclusion**: *(none)*


In [4]:
htn = ConditionOccurrence(event_concept_id=320128)
t2dm = ConditionOccurrence(event_concept_id=201826)

rule_or = OR(htn, t2dm)  # strict binary OR

cohort_c = CohortCriteria(temporal_blocks=[rule_or])
cohort_c.save("cohort_example_C.yaml")
print(cohort_c.to_yaml())
#print("\nSaved -> cohort_example_C.yaml")

inclusion_criteria:
  temporal_events:
  - operator: 'OR'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 320128
    - event_type: 'condition_occurrence'
      event_concept_id: 201826



## 5) Example D — **Temporal logic with window**

**Goal:** *Females born 2000–2020 with “difficulty breathing” BEFORE a COVID diagnosis by 2–5 days, and the COVID diagnosis occurs between 2020‑03‑15 and 2020‑12‑11.*

**Human-readable logic**
- **Inclusion — Demographics**: `gender = female` AND `2000 ≤ birth_year ≤ 2020`
- **Inclusion — Temporal**: two **parallel** groups  
  - **Group [0]** *(visits OR)* — *(intentionally omitted here; see Example F for a full complex cohort)*  
  - **Group [1]** — `BEFORE( condition:4041664 , AND( BEFORE(date:2020-03-15, condition:37311061), BEFORE(condition:37311061, date:2020-12-11) ) )` with `interval: [2, 5]`
- **Exclusion**: *(none)*


In [5]:
dyspnea = ConditionOccurrence(event_concept_id=4041664)
covid   = ConditionOccurrence(event_concept_id=37311061)
start   = DateEvent(timestamp="2020-03-15")
end     = DateEvent(timestamp="2020-12-11")

left_bound  = BEFORE(start, covid)  # date_start BEFORE covid
right_bound = BEFORE(covid, end)    # covid BEFORE date_end
date_gate   = AND(left_bound, right_bound)

outer = BEFORE(dyspnea, date_gate)
outer["interval"] = [2, 5]

cohort_d = CohortCriteria(
    demographics=Demographics(gender="female", min_birth_year=2000, max_birth_year=2020),
    temporal_blocks=[outer],
)
cohort_d.save("cohort_example_D.yaml")
print(cohort_d.to_yaml())
print("\nSaved -> cohort_example_D.yaml")

inclusion_criteria:
  demographics:
    gender: 'female'
    min_birth_year: 2000
    max_birth_year: 2020
  temporal_events:
  - operator: 'BEFORE'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 4041664
    - operator: 'AND'
      events:
      - operator: 'BEFORE'
        events:
        - event_type: date
          timestamp: '2020-03-15'
        - event_type: 'condition_occurrence'
          event_concept_id: 37311061
      - operator: 'BEFORE'
        events:
        - event_type: 'condition_occurrence'
          event_concept_id: 37311061
        - event_type: date
          timestamp: '2020-12-11'
    interval:
    - 2
    - 5


Saved -> cohort_example_D.yaml


## 6) Example E — **Exclusion criteria**

**Goal:** *Patients with Diabetes, but **exclude** those with Heart Failure or born ≥ 2010.*

**Human-readable logic**
- **Inclusion — Temporal**: condition_occurrence = Diabetes (`201826`)
- **Exclusion — Demographics**: `min_birth_year = 2010`
- **Exclusion — Temporal**: condition_occurrence = Heart Failure (`316139`)


In [6]:
diabetes = ConditionOccurrence(event_concept_id=201826)
hf       = ConditionOccurrence(event_concept_id=316139)

cohort_e = CohortCriteria(
    temporal_blocks=[diabetes],
    exclusion_demographics=Demographics(min_birth_year=2010),
    exclusion_blocks=[hf],
)
cohort_e.save("cohort_example_E.yaml")
print(cohort_e.to_yaml(sort_keys=False))
print("\nSaved -> cohort_example_E.yaml")

inclusion_criteria:
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 201826
exclusion_criteria:
  demographics:
    min_birth_year: 2010
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 316139


Saved -> cohort_example_E.yaml


## 7) Example F — **Complex cohort: multiple temporal groups (teacher-style)**

**Goal:** *Females born 2000–2020 who satisfy EITHER (Inpatient 2nd instance OR ER visit), AND ALSO have “difficulty breathing” BEFORE a COVID diagnosis by 2–5 days with COVID constrained between two calendar dates. Include an example exclusion block.*

**Human-readable logic**
- **Inclusion — Demographics**: `gender = female` AND `2000 ≤ birth_year ≤ 2020`
- **Inclusion — Temporal**: **two parallel groups**  
  - **Group [0] (OR of visits)**:
    - `visit_occurrence(9201, event_instance=2)` **OR**
    - `visit_occurrence(9203)`
  - **Group [1] (BEFORE with interval + AND of two BEFOREs)**:
    - `BEFORE( condition:4041664 , AND( BEFORE(date:2020-03-15, condition:37311061), BEFORE(condition:37311061, date:2020-12-11) ) )` with `interval: [2, 5]`
- **Exclusion — Demographics**: `min_birth_year = 2010`
- **Exclusion — Temporal**: condition_occurrence = Heart Failure (`316139`)


In [7]:
# Group [0]: OR(visits)
ip_2nd = VisitOccurrence(event_concept_id=9201, event_instance=2)
er     = VisitOccurrence(event_concept_id=9203)
group0 = OR(ip_2nd, er)

# Group [1]: BEFORE + interval; right operand is AND(BEFORE(start, covid), BEFORE(covid, end))
dyspnea = ConditionOccurrence(event_concept_id=4041664)
covid   = ConditionOccurrence(event_concept_id=37311061)
start   = DateEvent(timestamp="2020-03-15")
end     = DateEvent(timestamp="2020-12-11")

b_left  = BEFORE(start, covid)
b_right = BEFORE(covid, end)
date_and = AND(b_left, b_right)

group1 = BEFORE(dyspnea, date_and)
group1["interval"] = [2, 5]

hf_excl = ConditionOccurrence(event_concept_id=316139)

cohort_f = CohortCriteria(
    demographics=Demographics(gender="female", min_birth_year=2000, max_birth_year=2020),
    temporal_blocks=[group0, group1],
    exclusion_demographics=Demographics(min_birth_year=2010),
    exclusion_blocks=[hf_excl],
)
cohort_f.save("cohort_example_F.yaml")
print(cohort_f.to_yaml(sort_keys=False))
print("\nSaved -> cohort_example_F.yaml")

inclusion_criteria:
  demographics:
    gender: 'female'
    min_birth_year: 2000
    max_birth_year: 2020
  temporal_events:
  - operator: 'OR'
    events:
    - event_type: 'visit_occurrence'
      event_concept_id: 9201
      event_instance: 2
    - event_type: 'visit_occurrence'
      event_concept_id: 9203
  - operator: 'BEFORE'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 4041664
    - operator: 'AND'
      events:
      - operator: 'BEFORE'
        events:
        - event_type: date
          timestamp: '2020-03-15'
        - event_type: 'condition_occurrence'
          event_concept_id: 37311061
      - operator: 'BEFORE'
        events:
        - event_type: 'condition_occurrence'
          event_concept_id: 37311061
        - event_type: date
          timestamp: '2020-12-11'
    interval:
    - 2
    - 5
exclusion_criteria:
  demographics:
    min_birth_year: 2010
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_

## 9) Wrap-up

You now know how to:
- Define **demographics**, **events**, and compose **logic** with strict arity
- Build **inclusion** and **exclusion** blocks
- Export YAML via `save_yaml(...)`
- Create complex cohorts with **multiple parallel temporal groups**

**Tip:** Keep operators at the **group** level, and make sure every group’s `events` are **leaf events** (or well-formed subgroups), never mixing levels.
