# Using BiasAnalyzer for Asynchronous Cohort Creation and Exploration

This tutorial demonstrates how to use the `BiasAnalyzer` package to create multiple cohorts asynchronously for exploration, which can improve performance and responsiveness when working with large datasets or complex cohort definitions. It complements the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), following a similar workflow but optimized for performance by introducing asynchronous processing.

---

### Overview

**Objective**:  
Show how to define and create multiple cohorts using asynchronous execution to improve responsiveness and performance when working with large or complex datasets.

**Before You Begin**:  
The `BiasAnalyzer` package is currently in active development and has not yet been officially released on PyPI.
You can install it in one of the two ways:

- **Install from GitHub (recommended during development)**:
```bash
pip install git+https://github.com/vaclab/BiasAnalyzer.git
```
- **Install from PyPI (once the pacakge is officially released)**:
```bash
pip install biasanalyzer
```

For full setup and usage instructions, refer to the [README](https://github.com/VACLab/BiasAnalyzer/blob/main/README.md).

---


### Preparation for asynchronous cohort creation
**Preparation step 1**: Import the `BIAS` class from the `api` module of the `BiasAnalyzer` package, create an object `bias` of the `BIAS` class, specify OMOP CDM database configurations on the `bias` object, and set OMOP CDM database to enable connection to the database. Refer to the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb) for more details.

In [1]:
from biasanalyzer.api import BIAS
import os

# Get the directory of the config file path and load the file
base_dir = os.getcwd()
config_file = os.path.join(base_dir, "test_config.yaml")

bias = BIAS()

bias.set_config(config_file)

bias.set_root_omop()

configuration specified in /Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/BiasAnalyzer_Cohort_Builder_Comparison/test_config.yaml loaded successfully
Connected to the OMOP CDM database (read-only).
Cohort Definition table created.
Cohort table created.


**Preparation step 2**: Import `BackgroundResult` class and the `run_in_background` function from the `background.threading_utils` module of the `BiasAnalyzer` package to support asynchronous cohort creation.

In [2]:
from biasanalyzer.background.threading_utils import BackgroundResult, run_in_background

**Now that you have connected to your OMOP CDM database and imported the necessary utilities for asynchronous processing, you are ready to create cohorts asynchronously using the `BiasAnalyzer` APIs.** The rest of this notebook illustrates how to create both a baseline and a study cohort asynchronously, and explore and compare them once they are ready. With asynchronous execution, you don't need to wait for cohort creation to finish --- you can continue running the subsequent cells and explore the data as it becomes available.

---

### Asynchronous cohort creation
**Baseline cohort creation**: To create a baseline cohort of young female patients asynchronously, use the `run_in_background()` function to run `create_cohort(cohort_name, cohort_description, query_or_yaml_file, created_by)` method in a background thread. You'll pass the target function as the first argument, the cohort creation target function input arguments as the next four arguments, a `BackgroundResult` object via the `result_holder` optional parameter to store the created baseline cohort result, and a `delay` value (e.g., 120 seconds) to simulate asynchronous execution of long-running process for testing purposes. The created baseline cohort will be identical to the one created in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), except that the cohort creation now runs asychronously in a background thread.

In [None]:
# Create baseline cohort result holder
baseline_result = BackgroundResult()
#print(base_dir)

# Start background task to run create_cohort() method for a baseline cohort in a background thread
baseline_thread = run_in_background(
    bias.create_cohort,
    "Young female patients",
    "A cohort of female patients born between 2000 and 2020",
    os.path.join(base_dir, "/test_yaml/test_cohort_creation_condition_occurrence_config_baseline.yaml"),
    "system",
    result_holder=baseline_result,
    delay=120  # simulate 2 minutes delay for async testing
)

print("Baseline cohort creation running in background...")

/Users/sherrylu/Documents/UNC/26Fall/VAC Lab/BiasAnalyzerYAMLBuilder/JypterNotebook/BiasAnalyzer_Cohort_Builder_Comparison
[*] Background task started...
Baseline cohort creation running in background...


Cohort creation:   0%|          | 0/3 [00:00<?, ?stage/s]

specified cohort creation configuration file does not exist. Make sure the configuration file name with path is specified correctly.
[✓] Background task completed.


———————————————

**Study cohort creation**: To create a study cohort of young female COVID patients asynchronously, use the `run_in_background()` function to run `create_cohort(cohort_name, cohort_description, query_or_yaml_file, created_by)` function in a background thread. You'll pass the target function as the first argument, the cohort creation target function input arguments as the next four arguments, a `BackgroundResult` object via the `result_holder` optional parameter to store the created baseline cohort result, and a `delay` value (e.g., 120 seconds) to simulate asynchronous execution of long-running process for testing purposes. The created study cohort will be identical to the one created in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), except that the cohort creation now runs asychronously in a background thread.

In [None]:
# Create study cohort result holder
study_result = BackgroundResult()

# Start background task to run create_cohort() function for a study cohort in a background thread
study_thread = run_in_background(
    bias.create_cohort,
    "Young COVID female patients",
    "Young COVID female patients",
    '/home/hongyi/BiasAnalyzer/tests/assets/cohort_creation/test_cohort_creation_condition_occurrence_config_study.yaml',
    "system",
    result_holder=study_result,
    delay=120  # simulate 2 minutes delay for async testing
)

print("Study cohort creation running in background...")

---

### Cohort exploration when available
**Exploring the baseline cohort**: To explore the baseline cohort once it's available, check the `ready` property of the `baseline_result` --- the `BackgroundResult` object provided as the `result_holder` during asynchronous cohort creation. If the result is ready, verify whether the background process completed successfully by checking the `error` property of the `baseline_result`. If no error occurred, you can retrieve the created baseline cohort object and explore it, just as demonstrated in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb).

In [None]:
if baseline_result.ready:
    if baseline_result.error:
        print(f"Baseline cohort creation failed: {baseline_result.error}")
    else:
        baseline_cohort = baseline_result.value
        baseline_cohort_def = baseline_cohort.metadata
        print(f"Baseline cohort created with metadata: {baseline_cohort_def}")
        baseline_cohort_data = baseline_cohort.data
        baseline_cohort_stats = baseline_cohort.get_stats()
        print(f"Baseline cohort created with stats: {baseline_cohort_stats}")
else:
    print("Still creating baseline cohort...")

———————————————

**Exploring the study cohort**: To explore the study cohort once it's available, check the `ready` property of the `study_result` --- the `BackgroundResult` object provided as the `result_holder` during asynchronous cohort creation. If the result is ready, verify whether the background process completed successfully by checking the `error` property of the `study_result`. If no error occurred, you can retrieve the created study cohort object and explore it, just as demonstrated in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb).

In [None]:
if study_result.ready:
    if study_result.error:
        print(f"Study cohort creation failed: {study_result.error}")
    else:
        study_cohort = study_result.value
        study_cohort_def = study_cohort.metadata
        print(f"Study cohort created with metadata: {study_cohort_def}")
        study_cohort_data = study_cohort.data
        study_cohort_stats = study_cohort.get_stats()
        print(f"Study cohort created with stats: {study_cohort_stats}")
else:
    print("Still creating study cohort...")

---

### Cohort comparison when available
To compare the baseline and study cohorts once they are available, check the `ready` property of both `baseline_result` and `study_result` --- the `BackgroundResult` objects passed as `result_holder` during asynchronous cohort creation. If both results are ready, you can retrieve and compare the cohorts using the same approach demonstrated in the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb).

In [None]:
# compare the baseline and user study cohorts
if baseline_result.ready and study_result.ready:
    print(f"first 5 patient in baseline cohort data: {baseline_cohort_data[:5]}")
    print(f"first 5 patient in study cohort data: {study_cohort_data[:5]}")
    baseline_cohort_age_stats = baseline_cohort.get_stats("age")
    print(f'the baseline cohort age stats: {baseline_cohort_age_stats}')
    baseline_cohort_gender_stats = baseline_cohort.get_stats("gender")
    print(f'the baseline cohort gender stats: {baseline_cohort_gender_stats}')
    study_cohort_age_stats = study_cohort.get_stats("age")
    print(f'the study cohort age stats: {study_cohort_age_stats}')
    study_cohort_gender_stats = study_cohort.get_stats("gender")
    print(f'the study cohort gender stats: {study_cohort_gender_stats}')
    result = bias.compare_cohorts(baseline_cohort_def['id'], study_cohort_def['id'])
    print(result)

---

### Final cleanup to ensure database connections are closed

In [None]:
bias.cleanup()

### ✅ Summary

In this tutorial, you learned how to use the BiasAnalyzer package to create a baseline and a study cohort asynchronously for improved performance and responsiveness when working with large datasets or complex cohort definitions. For testing purposes, a `delay` optional parameter is introduced in the `run_in_background()` function to simulate asynchronous execution of long-running process. This tutorial complements the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb), following a similar workflow but optimized for performance by introducing asynchronous processing.
  
For more information, refer to the [BiasAnalyzer GitHub repo](https://github.com/VACLab/BiasAnalyzer) and the [README file](https://github.com/VACLab/BiasAnalyzer/blob/main/README.md).
