# Using BiasAnalyzer for Asynchronous Cohort Creation and Exploration

This tutorial demonstrates how to use the `BiasAnalyzer` package to create multiple cohorts asynchronously for exploration, which can improve performance and responsiveness when working with large datasets or complex cohort definitions. It complements the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), following a similar workflow but optimized for performance by introducing asynchronous processing.

---

### Overview

**Objective**:  
Show how to define and create multiple cohorts using asynchronous execution to improve responsiveness and performance when working with large or complex datasets.

**Before You Begin**:  
The `BiasAnalyzer` package is currently in active development and has not yet been officially released on PyPI.
You can install it in one of the two ways:

- **Install from GitHub (recommended during development)**:
```bash
pip install git+https://github.com/vaclab/BiasAnalyzerCore.git
```
- **Install from PyPI (once the pacakge is officially released)**:
```bash
pip install biasanalyzer
```

For full setup and usage instructions, refer to the [README](https://github.com/VACLab/BiasAnalyzerCore/blob/main/README.md).

---


### Preparation for asynchronous cohort creation
**Preparation step 1**: Import the `BIAS` class from the `api` module of the `BiasAnalyzer` package, create an object `bias` of the `BIAS` class, specify OMOP CDM database configurations on the `bias` object, and set OMOP CDM database to enable connection to the database. Refer to the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb) for more details.

In [1]:
from biasanalyzer.api import BIAS

bias = BIAS()

bias.set_config('/Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/BiasAnalyzer_Cohort_Builder_Comparison/config_duckdb.yaml')

bias.set_root_omop()

configuration specified in /Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/BiasAnalyzer_Cohort_Builder_Comparison/config_duckdb.yaml loaded successfully
Connected to the DuckDB database: /Users/sherrylu/Documents/UNC/26Fall/VAC Lab/data/synpuf_100k_omop_54.duckdb.
Cohort Definition table created.
Cohort table created.


**Preparation step 2**: Import `BackgroundResult` class and the `run_in_background` function from the `background.threading_utils` module of the `BiasAnalyzer` package to support asynchronous cohort creation.

In [2]:
from biasanalyzer.background.threading_utils import BackgroundResult, run_in_background

**Now that you have connected to your OMOP CDM database and imported the necessary utilities for asynchronous processing, you are ready to create cohorts asynchronously using the `BiasAnalyzer` APIs.** The rest of this notebook illustrates how to create both a baseline and a study cohort asynchronously, and explore and compare them once they are ready. With asynchronous execution, you don't need to wait for cohort creation to finish --- you can continue running the subsequent cells and explore the data as it becomes available.

---

### Asynchronous cohort creation
**Baseline cohort creation**: To create a baseline cohort of young female patients asynchronously, use the `run_in_background()` function to run `create_cohort(cohort_name, cohort_description, query_or_yaml_file, created_by)` method in a background thread. You'll pass the target function as the first argument, the cohort creation target function input arguments as the next four arguments, a `BackgroundResult` object via the `result_holder` optional parameter to store the created baseline cohort result, and a `delay` value (e.g., 120 seconds) to simulate asynchronous execution of long-running process for testing purposes. The created baseline cohort will be identical to the one created in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), except that the cohort creation now runs asychronously in a background thread.

In [3]:
# Create baseline cohort result holder
baseline_result = BackgroundResult()

# Start background task to run create_cohort() method for a baseline cohort in a background thread
baseline_thread = run_in_background(
    bias.create_cohort,
    "Diabetes baseline",
    "A cohort of Type 2 diabetes mellitus",
    "/Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/assets/cohort_creation/extras/diabetes_example2/cohort_creation_config_baseline_example2.yaml",
    "system",
    result_holder=baseline_result,
    delay=120  # simulate 2 minutes delay for async testing
)

print("Baseline cohort creation running in background...")

[*] Background task started...Baseline cohort creation running in background...

template_path: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/biasanalyzer/sql_templates


Cohort creation:   0%|          | 0/3 [00:00<?, ?stage/s]

configuration specified in /Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/assets/cohort_creation/extras/diabetes_example2/cohort_creation_config_baseline_example2.yaml loaded successfully
Cohort definition inserted successfully.
Cohort Diabetes baseline successfully created.
template_path: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/biasanalyzer/sql_templates
[DEBUG] Simulating long-running task with 120 seconds delay...
cohort created successfully
[✓] Background task completed.


In [None]:
"""
Baseline cohort (Python-built) equivalent to `cohort_creation_config_baseline_example2.yaml`.

Notes:
- The YAML has a single temporal block: operator AND with a single condition_occurrence event (concept_id 201826).
- We construct that shape explicitly to ensure structural equality.
"""

from pathlib import Path

from CohortDefinition import (
    ConditionOccurrence,
    CohortCriteria,
)

# 1) Define the single clinical event (Type 2 diabetes mellitus).
db = ConditionOccurrence(event_concept_id=201826)

# 3) Assemble the cohort criteria (no demographics for this baseline).
baseline_py = CohortCriteria(
    temporal_blocks=[db]
)

# 4) Print YAML directly (str(cohort) emits YAML).
print("===== Baseline cohort (Python-built) =====")
print(baseline_py)
baseline_py


===== Baseline cohort (Python-built) =====


<CohortCriteria 1 blocks>

In [7]:
# Create baseline cohort result holder
baseline_result_py = BackgroundResult()

# Start background task to run create_cohort() method for a baseline cohort in a background thread
baseline_thread_py = run_in_background(
    bias.create_cohort,
    "Diabetes baseline",
    "A cohort of Type 2 diabetes mellitus",
    baseline_py,
    "system",
    result_holder=baseline_result_py,
    delay=120  # simulate 2 minutes delay for async testing
)

print("Baseline cohort creation running in background...")

[*] Background task started...Baseline cohort creation running in background...



Cohort creation:   0%|          | 0/3 [00:00<?, ?stage/s]

configuration specified in inclusion_criteria:
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 201826
 loaded successfully
Cohort definition inserted successfully.
Cohort Diabetes baseline successfully created.
template_path: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/biasanalyzer/sql_templates
[DEBUG] Simulating long-running task with 120 seconds delay...
cohort created successfully
[✓] Background task completed.


———————————————

**Study cohort creation**: To create a study cohort of young female COVID patients asynchronously, use the `run_in_background()` function to run `create_cohort(cohort_name, cohort_description, query_or_yaml_file, created_by)` function in a background thread. You'll pass the target function as the first argument, the cohort creation target function input arguments as the next four arguments, a `BackgroundResult` object via the `result_holder` optional parameter to store the created baseline cohort result, and a `delay` value (e.g., 120 seconds) to simulate asynchronous execution of long-running process for testing purposes. The created study cohort will be identical to the one created in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), except that the cohort creation now runs asychronously in a background thread.

In [None]:
# Create study cohort result holder
study_result = BackgroundResult()

# Start background task to run create_cohort() function for a study cohort in a background thread
study_thread = run_in_background(
    bias.create_cohort,
    "Mid or Elder Male patients with heart failure",
    "Male patients born in or before 1990 diagnosed with heart failure",
    '/Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/assets/cohort_creation/extras/diabetes_example2/cohort_creation_config_study1_example2.yaml',
    "system",
    result_holder=study_result,
    delay=120  # simulate 2 minutes delay for async testing
)

print("Study cohort creation running in background...")

[*] Background task started...Study cohort creation running in background...



Cohort creation:   0%|          | 0/3 [00:00<?, ?stage/s]

configuration specified in /Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/assets/cohort_creation/extras/diabetes_example2/cohort_creation_config_study1_example2.yaml loaded successfully
Cohort definition inserted successfully.
Cohort Mid or Elder Male patients with heart failure successfully created.
template_path: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/biasanalyzer/sql_templates
[DEBUG] Simulating long-running task with 120 seconds delay...
cohort created successfully
[✓] Background task completed.


In [20]:
from CohortDefinition import ConditionOccurrence, Demographics, CohortCriteria

# 1) Define the demographics (male, born in or before 1990)
demo = Demographics(gender="male", max_birth_year=1990)

# 2) Define the event: heart failure diagnosis (condition_occurrence)
event = ConditionOccurrence(
    event_concept_id=201826  # Heart failure concept ID
)

# 3) Build the cohort criteria (no YAML handling exposed to the user)
study1_py = CohortCriteria(
    demographics=demo,
    temporal_blocks=[event],
)

print(study1_py)

inclusion_criteria:
  demographics:
    gender: 'male'
    max_birth_year: 1990
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 201826



In [None]:
# Create study cohort result holder
study_result_py = BackgroundResult()

# Start background task to run create_cohort() function for a study cohort in a background thread
study_thread_py = run_in_background(
    bias.create_cohort,
    "Mid or Elder Male patients with heart failure",
    "Male patients born in or before 1990 diagnosed with heart failure",
    study1_py,
    "system",
    result_holder=study_result_py,
    delay=120  # simulate 2 minutes delay for async testing
)

print("Study cohort creation running in background...")

[*] Background task started...Study cohort creation running in background...



Cohort creation:   0%|          | 0/3 [00:00<?, ?stage/s]

configuration specified in inclusion_criteria:
  demographics:
    gender: 'male'
    max_birth_year: 1990
  temporal_events:
  - operator: 'AND'
    events:
    - event_type: 'condition_occurrence'
      event_concept_id: 201826
 loaded successfully
Cohort definition inserted successfully.
Cohort Mid or Elder Male patients with heart failure successfully created.
template_path: /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/biasanalyzer/sql_templates
[DEBUG] Simulating long-running task with 120 seconds delay...
cohort created successfully
[✓] Background task completed.


---

### Cohort exploration when available
**Exploring the baseline cohort**: To explore the baseline cohort once it's available, check the `ready` property of the `baseline_result` --- the `BackgroundResult` object provided as the `result_holder` during asynchronous cohort creation. If the result is ready, verify whether the background process completed successfully by checking the `error` property of the `baseline_result`. If no error occurred, you can retrieve the created baseline cohort object and explore it, just as demonstrated in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb).

In [16]:
if baseline_result.ready:
    if baseline_result.error:
        print(f"Baseline cohort creation failed: {baseline_result.error}")
    else:
        baseline_cohort = baseline_result.value
        baseline_cohort_def = baseline_cohort.metadata
        print(f"Baseline cohort created with metadata: {baseline_cohort_def}")
        baseline_cohort_data = baseline_cohort.data
        baseline_cohort_stats = baseline_cohort.get_stats()
        print(f"Baseline cohort created with stats: {baseline_cohort_stats}")
else:
    print("Still creating baseline cohort...")

Baseline cohort created with metadata: {'id': 1, 'name': 'Diabetes baseline', 'description': 'A cohort of Type 2 diabetes mellitus', 'created_date': datetime.date(2025, 10, 29), 'creation_info': 'WITH ranked_asc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date ASC ) AS event_instance FROM condition_occurrence ), ranked_desc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date DESC ) AS event_instance FROM condition_occurrence ), domain_qualifying_events AS ( ( SELECT person_id, event_start_date, event_end_date, event_start_date AS adjusted_start, event_end_date AS adjusted_end FROM ran

In [26]:
if baseline_result_py.ready:
    if baseline_result_py.error:
        print(f"Baseline cohort creation failed: {baseline_result_py.error}")
    else:
        baseline_cohort_py = baseline_result_py.value
        baseline_cohort_def_py = baseline_cohort_py.metadata
        print(f"Baseline cohort created with metadata: {baseline_cohort_def_py}")
        baseline_cohort_data_py = baseline_cohort_py.data
        baseline_cohort_stats_py = baseline_cohort_py.get_stats()
        print(f"Baseline cohort created with stats: {baseline_cohort_stats_py}")
else:
    print("Still creating baseline cohort...")

Baseline cohort created with metadata: {'id': 2, 'name': 'Diabetes baseline', 'description': 'A cohort of Type 2 diabetes mellitus', 'created_date': datetime.date(2025, 10, 29), 'creation_info': 'WITH ranked_asc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date ASC ) AS event_instance FROM condition_occurrence ), ranked_desc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date DESC ) AS event_instance FROM condition_occurrence ), domain_qualifying_events AS ( ( SELECT person_id, event_start_date, event_end_date, event_start_date AS adjusted_start, event_end_date AS adjusted_end FROM ran

———————————————

**Exploring the study cohort**: To explore the study cohort once it's available, check the `ready` property of the `study_result` --- the `BackgroundResult` object provided as the `result_holder` during asynchronous cohort creation. If the result is ready, verify whether the background process completed successfully by checking the `error` property of the `study_result`. If no error occurred, you can retrieve the created study cohort object and explore it, just as demonstrated in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb).

In [18]:
if study_result.ready:
    if study_result.error:
        print(f"Study cohort creation failed: {study_result.error}")
    else:
        study_cohort = study_result.value
        study_cohort_def = study_cohort.metadata
        print(f"Study cohort created with metadata: {study_cohort_def}")
        study_cohort_data = study_cohort.data
        study_cohort_stats = study_cohort.get_stats()
        print(f"Study cohort created with stats: {study_cohort_stats}")
else:
    print("Still creating study cohort...")

Study cohort created with metadata: {'id': 4, 'name': 'Mid or Elder Male patients with heart failure', 'description': 'Male patients born in or before 1990 diagnosed with heart failure', 'created_date': datetime.date(2025, 10, 29), 'creation_info': 'WITH ranked_asc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date ASC ) AS event_instance FROM condition_occurrence ), ranked_desc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date DESC ) AS event_instance FROM condition_occurrence ), domain_qualifying_events AS ( ( SELECT person_id, event_start_date, event_end_date, event_start_date AS a

In [29]:
if study_result_py.ready:
    if study_result_py.error:
        print(f"Study cohort creation failed: {study_result_py.error}")
    else:
        study_cohort_py = study_result_py.value
        study_cohort_def_py = study_cohort_py.metadata
        print(f"Study cohort created with metadata: {study_cohort_def_py}")
        study_cohort_data_py = study_cohort_py.data
        study_cohort_stats_py = study_cohort_py.get_stats()
        print(f"Study cohort created with stats: {study_cohort_stats_py}")
else:
    print("Still creating study cohort...")

Study cohort created with metadata: {'id': 5, 'name': 'Mid or Elder Male patients with heart failure', 'description': 'Male patients born in or before 1990 diagnosed with heart failure', 'created_date': datetime.date(2025, 10, 29), 'creation_info': 'WITH ranked_asc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date ASC ) AS event_instance FROM condition_occurrence ), ranked_desc_condition_occurrence AS ( SELECT person_id, condition_concept_id AS concept_id, condition_start_date AS event_start_date, condition_end_date AS event_end_date, ROW_NUMBER() OVER ( PARTITION BY person_id, condition_concept_id ORDER BY condition_start_date DESC ) AS event_instance FROM condition_occurrence ), domain_qualifying_events AS ( ( SELECT person_id, event_start_date, event_end_date, event_start_date AS a

---

### Cohort comparison when available
To compare the baseline and study cohorts once they are available, check the `ready` property of both `baseline_result` and `study_result` --- the `BackgroundResult` objects passed as `result_holder` during asynchronous cohort creation. If both results are ready, you can retrieve and compare the cohorts using the same approach demonstrated in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb).

In [28]:
# compare the baseline and user study cohorts
if baseline_result.ready and study_result.ready:
    print(f"first 5 patient in baseline cohort data: {baseline_cohort_data[:5]}")
    print(f"first 5 patient in study cohort data: {study_cohort_data[:5]}")
    baseline_cohort_age_stats = baseline_cohort.get_stats("age")
    print(f'the baseline cohort age stats: {baseline_cohort_age_stats}')
    baseline_cohort_gender_stats = baseline_cohort.get_stats("gender")
    print(f'the baseline cohort gender stats: {baseline_cohort_gender_stats}')
    study_cohort_age_stats = study_cohort.get_stats("age")
    print(f'the study cohort age stats: {study_cohort_age_stats}')
    study_cohort_gender_stats = study_cohort.get_stats("gender")
    print(f'the study cohort gender stats: {study_cohort_gender_stats}')
    result = bias.compare_cohorts(baseline_cohort_def['id'], study_cohort_def['id'])
    print(result)

first 5 patient in baseline cohort data: [{'subject_id': 237, 'cohort_definition_id': 1, 'cohort_start_date': datetime.date(2009, 9, 28), 'cohort_end_date': datetime.date(2010, 8, 1)}, {'subject_id': 517, 'cohort_definition_id': 1, 'cohort_start_date': datetime.date(2008, 12, 15), 'cohort_end_date': datetime.date(2010, 8, 10)}, {'subject_id': 561, 'cohort_definition_id': 1, 'cohort_start_date': datetime.date(2008, 1, 30), 'cohort_end_date': datetime.date(2008, 3, 10)}, {'subject_id': 747, 'cohort_definition_id': 1, 'cohort_start_date': datetime.date(2009, 1, 16), 'cohort_end_date': datetime.date(2009, 7, 3)}, {'subject_id': 868, 'cohort_definition_id': 1, 'cohort_start_date': datetime.date(2009, 6, 14), 'cohort_end_date': datetime.date(2009, 12, 3)}]
first 5 patient in study cohort data: [{'subject_id': 307, 'cohort_definition_id': 4, 'cohort_start_date': datetime.date(2008, 7, 6), 'cohort_end_date': datetime.date(2010, 6, 30)}, {'subject_id': 350, 'cohort_definition_id': 4, 'cohort_st

In [30]:
# compare the baseline and user study cohorts
if baseline_result_py.ready and study_result_py.ready:
    print(f"first 5 patient in baseline cohort data: {baseline_cohort_data_py[:5]}")
    print(f"first 5 patient in study cohort data: {study_cohort_data_py[:5]}")
    baseline_cohort_age_stats_py = baseline_cohort_py.get_stats("age")
    print(f'the baseline cohort age stats: {baseline_cohort_age_stats_py}')
    baseline_cohort_gender_stats_py = baseline_cohort.get_stats("gender")
    print(f'the baseline cohort gender stats: {baseline_cohort_gender_stats_py}')
    study_cohort_age_stats_py = study_cohort_py.get_stats("age")
    print(f'the study cohort age stats: {study_cohort_age_stats_py}')
    study_cohort_gender_stats_py = study_cohort_py.get_stats("gender")
    print(f'the study cohort gender stats: {study_cohort_gender_stats_py}')
    result = bias.compare_cohorts(baseline_cohort_def_py['id'], study_cohort_def_py['id'])
    print(result)

first 5 patient in baseline cohort data: [{'subject_id': 83, 'cohort_definition_id': 2, 'cohort_start_date': datetime.date(2008, 5, 16), 'cohort_end_date': datetime.date(2010, 10, 26)}, {'subject_id': 307, 'cohort_definition_id': 2, 'cohort_start_date': datetime.date(2008, 7, 6), 'cohort_end_date': datetime.date(2010, 6, 30)}, {'subject_id': 350, 'cohort_definition_id': 2, 'cohort_start_date': datetime.date(2008, 11, 24), 'cohort_end_date': datetime.date(2010, 10, 20)}, {'subject_id': 387, 'cohort_definition_id': 2, 'cohort_start_date': datetime.date(2009, 4, 1), 'cohort_end_date': datetime.date(2010, 1, 5)}, {'subject_id': 1192, 'cohort_definition_id': 2, 'cohort_start_date': datetime.date(2008, 6, 10), 'cohort_end_date': datetime.date(2010, 11, 17)}]
first 5 patient in study cohort data: [{'subject_id': 307, 'cohort_definition_id': 5, 'cohort_start_date': datetime.date(2008, 7, 6), 'cohort_end_date': datetime.date(2010, 6, 30)}, {'subject_id': 350, 'cohort_definition_id': 5, 'cohort_

---

### Final cleanup to ensure database connections are closed

In [31]:
bias.cleanup()

Connection to BiasDatabase closed.
Connection to the OMOP CDM database closed.


### ✅ Summary

In this tutorial, you learned how to use the BiasAnalyzer package to create a baseline and a study cohort asynchronously for improved performance and responsiveness when working with large datasets or complex cohort definitions. For testing purposes, a `delay` optional parameter is introduced in the `run_in_background()` function to simulate asynchronous execution of long-running process. This tutorial complements the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), following a similar workflow but optimized for performance by introducing asynchronous processing.
  
For more information, refer to the [BiasAnalyzer GitHub repo](https://github.com/VACLab/BiasAnalyzerCore) and the [README file](https://github.com/VACLab/BiasAnalyzerCore/blob/main/README.md).
