# Using BiasAnalyzer for Cohort Concept Prevalence Exploration

This tutorial demonstrates how to use the `BiasAnalyzer` package to browse and explore OMOP concepts. In the OMOP (Observational Medical Outcomes Partnership) CDM (Common Data Model), a **concept** refers to a coded term from a standardized medical vocabulary, uniquely identified by a **concept ID**. All clinical events in OMOP, such as conditions, drug exposures, procedures, measurements, and events, are represented as concepts.

---

### Overview

**Objective**:  
Learn how to browse and explore OMOP concepts using `BiasAnalyzer`.

**Before You Begin**:  
The `BiasAnalyzer` package is currently in active development and has not yet been officially released on PyPI.
You can install it in one of the two ways:

- **Install from GitHub (recommended during development)**:
```bash
pip install git+https://github.com/vaclab/BiasAnalyzerCore.git
```
- **Install from PyPI (once the pacakge is officially released)**:
```bash
pip install biasanalyzer
```

For full setup and usage instructions, refer to the [README](https://github.com/VACLab/BiasAnalyzerCore/blob/main/README.md).

---


### Preparation for OMOP concept exploration
Import the `BIAS` class from the `api` module of the `BiasAnalyzer` package, create an object `bias` of the `BIAS` class, specify OMOP CDM database configurations on the `bias` object, and set OMOP CDM database to enable connection to the database. Refer to the [Cohort Exploration Tutorial](./BiasAnalyzerCohortsTutorial.ipynb) for more details.

In [1]:
from biasanalyzer.api import BIAS

bias = BIAS()

bias.set_config('../config.yaml')

bias.set_root_omop()

configuration specified in ../config.yaml loaded successfully
Connected to the OMOP CDM database (read-only).
Cohort Definition table created.
Cohort table created.


**Now that you have connected to your OMOP CDM database, you are ready to browse and explore OMOP concepts.** 

---

### Explore OMOP domains and vocabularies
Since each OMOP concept is linked to a domain and vocabulary, it is helpful to first understand which domains and vocabularies are available before exploring concepts. You can retrieve available OMOP domains and their associated vocabularies using the `get_domains_and_vocabularies()` method on the `bias` object. This function returns a list of dictionaries, where each dictionary contains a `domain_id` and a `vocabulary_id`. The list is sorted alphabetically by `domain_id` and then by `vocabulary_id`.

In [2]:
import pandas as pd
pd.set_option('display.max_rows', None)

domains_and_vocabs = bias.get_domains_and_vocabularies()
print(pd.DataFrame(domains_and_vocabs))

               domain_id         vocabulary_id
0              Condition                 HCPCS
1              Condition                 ICD10
2              Condition               ICD10CM
3              Condition                ICD9CM
4              Condition                 ICDO3
5              Condition        OMOP Extension
6              Condition                SNOMED
7       Condition/Device               ICD10CM
8         Condition/Meas               ICD10CM
9       Condition Status      Condition Status
10                  Cost                  Cost
11              Currency              Currency
12                Device                 HCPCS
13                Device              ICD10PCS
14                Device                   NDC
15                Device                SNOMED
16                Device                   SPL
17                  Drug                   ATC
18                  Drug                 HCPCS
19                  Drug              ICD10PCS
20           

---

### Exploring OMOP concepts

You can explore OMOP concepts using the `get_concepts(search_term, domain=None, vocabulary=None)` method on the `bias` object. To narrow down your search, you should provide a search term along with a domain, a vocabulary, or both. Since the OMOP vocabulary contains a vast number of concepts, filtering by domain and/or vocabulary helps constrain the search space and keeps the number of results manageable. 

In [3]:
concepts = bias.get_concepts("COVID-19", "Condition", "SNOMED")
print(pd.DataFrame(concepts))

   concept_id                                       concept_name  \
0      703440  COVID-19 confirmed using clinical diagnostic c...   
1      703441              COVID-19 confirmed by laboratory test   
2      703445  Low risk category for developing complication ...   
3      703446  Moderate risk category for developing complica...   
4      703447  High risk category for developing complication...   
5    37310269                                           COVID-19   
6    37311061                                           COVID-19   

  valid_start_date valid_end_date  domain_id vocabulary_id  
0       2020-04-01     2099-12-31  Condition        SNOMED  
1       2020-04-01     2099-12-31  Condition        SNOMED  
2       2020-04-01     2099-12-31  Condition        SNOMED  
3       2020-04-01     2099-12-31  Condition        SNOMED  
4       2020-04-01     2099-12-31  Condition        SNOMED  
5       2020-02-04     2020-10-28  Condition        SNOMED  
6       2020-01-31     2099-

———————————————

### Exploring concept hierarchy

**Retrieve concept hierarchy**: You can retrieve the concept hierarchy for a specific OMOP concept using the `get_concept_hierarchy(concept_id)` method on the `bias` object. The method returns two dictionaries: the **ancestor hierarchy** representing the concept's lineage upward, and the descendant hierarchy representing the concept's children and their branches. Each dictionary has a nested structure with two main keys: 
- `details`: a dictionary containing metadata about the current concept node, including `concept_id`, `concept_name`, `vocabulary_id`, and `concept_code`
-  `parents` (for the ancestor hierarchy) or `children` (for the descendant hierarchy): a list of parent or child concept nodes, respectively

A progress bar is displayed during execution to indicate the progress of computing the concept's hierarchical relationships.

In [4]:
# get parent and children concept hierarchical tree for COVID-19 (SNOMED id: 37311061)
parent_concept_tree, children_concept_tree = bias.get_concept_hierarchy(37311061)

Concept Hierarchy:   0%|          | 0/3 [00:00<?, ?stage/s]

**Visualize concept hierarchy**: You can visualize a concept hierarchy using the `display_concept_tree(concept_tree, level=0, show_in_text_format=True)` method on the `bias` object. This method supports two display modes:
- Text-based visualization (`show_in_text_format=True`): Displays an indented tree with upward and downward arrows to indicate parent-child relationships. This is the default and more robust display mode.
- Interactive widget visualization (`show_in_text_format=False`): Uses a `ipytree`-based widget to render the concept hierarchy as an expandable/collapsible tree, ideal for interactive exploration in supported Jupyter environments.
  - **Note**: The `ipytree`-based interactive widget may display frontend warnings or partial rendering issues in **JupyterLab 4.x** or above due to compatibility limitations of the `ipytree` widget. Despite these warnings, the tree should remain functional. This feature is optional and recommended to be used in environments where full `ipytree` support is available.

For maximum compatibility, the text-based display mode is used by default.

In [5]:
print('parent concept hierarchy for COVID-19 in text format:')
print(bias.display_concept_tree(parent_concept_tree))
print('children concept hierarchy for COVID-19 in text format:')
print(bias.display_concept_tree(children_concept_tree))

parent concept hierarchy for COVID-19 in text format:
🔼 COVID-19 (ID: 37311061, Code: 840539006)
  🔼 Clinical finding (ID: 441840, Code: 404684003)
  🔼 Viral disease (ID: 440029, Code: 34014006)
  🔼 Disease (ID: 4274025, Code: 64572001)
  🔼 Coronavirus infection (ID: 439676, Code: 186747009)
  🔼 Disease due to Coronaviridae (ID: 4100065, Code: 27619001)
  🔼 Disorder due to infection (ID: 432250, Code: 40733004)

children concept hierarchy for COVID-19 in text format:
🔽 COVID-19 (ID: 37311061, Code: 840539006)
  🔽 Lymphocytopenia due to Severe acute respiratory syndrome coronavirus 2 (ID: 3661631, Code: 866151004)
  🔽 Otitis media due to disease caused by Severe acute respiratory syndrome coronavirus 2 (ID: 37310254, Code: 1240521000000100)
  🔽 Respiratory infection caused by COVID-19 (ID: 756039, Code: OMOP4873907)
    🔽 Acute bronchitis caused by SARS-CoV-2 (ID: 3661405, Code: 138389411000119105)
    🔽 Pneumonia caused by SARS-CoV-2 (ID: 3661408, Code: 882784691000119100)
    🔽 Lower 

In [6]:
print(f'parent concept hierarchy for COVID-19 in widget tree format:')
bias.display_concept_tree(parent_concept_tree,  show_in_text_format=False)
print(f'children concept hierarchy for COVID-19 in widget tree format:')
bias.display_concept_tree(children_concept_tree, show_in_text_format=False)

parent concept hierarchy for COVID-19 in widget tree format:


VBox(children=(Label(value='Concept Hierarchy'), Tree(nodes=(Node(name='🔼 COVID-19 (ID: 37311061, Code: 840539…

children concept hierarchy for COVID-19 in widget tree format:


VBox(children=(Label(value='Concept Hierarchy'), Tree(nodes=(Node(name='🔽 COVID-19 (ID: 37311061, Code: 840539…

Node(name='🔽 COVID-19 (ID: 37311061, Code: 840539006)', nodes=(Node(name='🔽 Lymphocytopenia due to Severe acut…

---

### Final cleanup to ensure database connections are closed

In [7]:
bias.cleanup()

Connection to BiasDatabase closed.
Connection to the OMOP CDM database closed.


### ✅ Summary

In this tutorial, you learned how to use the BiasAnalyzer package to explore OMOP clinical concepts in the context of their associated domains and vocabularies. You also explored how to use BiasAnalyzer APIs to retrieve and visualize concept hierarchies, including ancestor and descendant relationships, in a tree structure.
  
For more information, refer to the [BiasAnalyzer GitHub repo](https://github.com/VACLab/BiasAnalyzerCore) and the [README file](https://github.com/VACLab/BiasAnalyzerCore/blob/main/README.md).
