# BMI Comprehensive Harmonization for All of Us

**Purpose**: Extract, clean, and harmonize BMI data with advanced quality control  
**Author**: Bennett Waxse  
**Created**: June 2025 
**CDR Version**: v8  

## Features
- Multiple validated concept IDs for weight, height, BMI
- Unit conversion and validation
- 4-sigma outlier removal by unit type
- Quality control metrics and validation plots
- Temporal matching for cohort studies

## Dependencies
```
pandas, polars, seaborn, matplotlib, google-cloud-bigquery
```

In [None]:
# Standard imports
import pandas as pd
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from google.cloud import bigquery

# Configuration
pd.set_option("display.max_columns", None)
pd.set_option('display.max_colwidth', 100)
pl.Config.set_fmt_str_lengths(100)

# Plotting style
plt.style.use('default')
sns.set_palette("husl")

In [None]:
# All of Us Workbench Setup
version = %env WORKSPACE_CDR
print("CDR version: " + version)

my_bucket = os.getenv('WORKSPACE_BUCKET')
print("Workspace bucket: " + my_bucket)

In [None]:
def polars_gbq(query):
    """
    Execute BigQuery SQL and return result as polars dataframe
    
    Args:
        query: BigQuery SQL query string
    
    Returns:
        pl.DataFrame: Query results
    """
    client = bigquery.Client()
    query_job = client.query(query)
    rows = query_job.result()
    df = pl.from_arrow(rows.to_arrow())
    return df

## 1. Validated Concept IDs

These concept IDs have been validated against All of Us data to ensure they capture the relevant measurements.