# Investigating the Impact of Dietary Patterns (HEI-2020) on Cardiovascular disease CVD Risk Factors in Racial/Ethnic Minorities and Socioeconomically Disadvantaged Populations Using NHANES 2021–2023

## Background and Rationale

**Context**: Cardiovascular disease (CVD) remains a leading cause of mortality globally, disproportionately
affecting racial/ethnic minorities and socioeconomically disadvantaged populations. Dietary patterns,
measured by the Healthy Eating Index 2020 (HEI-2020), are critical determinants of CVD risk.

**Objective**: To evaluate the relationship between HEI-2020 scores and CVD risk factors among
racial/ethnic minorities and low-income groups using NHANES 2021–2023 data.


## Population Selection:
* Include participants aged ≥18 years.
* Define racial/ethnic minorities as non-Hispanic Black, Hispanic, and Asian participants.
* Define socioeconomic disadvantage using the poverty income ratio (PIR) &lt;1.3.
* Exclude individuals with incomplete data on dietary intake or CVD risk factors.


## Dietary Assessment:
* Calculate HEI-2020 scores using 24-hour dietary recall data.
* Categorize HEI-2020 scores into quartiles for comparison.


## CVD Risk Factors:
* Assess systolic/diastolic blood pressure, total cholesterol, HDL, LDL, triglycerides, and fasting
glucose.
* Include obesity metrics (BMI, waist circumference).

## Covariates:
* Adjust for age, gender, acculturation, smoking status, physical activity, and comorbidities (e.g.,
diabetes).

## Data Analysis Plan
Descriptive Statistics:
* Summarize participant characteristics (age, gender, race/ethnicity, PIR) by HEI-2020 quartiles.
* Report means (±SD) for continuous variables and percentages for categorical variables.

## Bivariate Analysis:
* Compare CVD risk factors across HEI-2020 quartiles using ANOVA or chi-square tests.

## Multivariate Analysis:
Primary Model: Linear regression to assess the association between HEI-2020 scores and
continuous CVD risk factors (e.g., blood pressure, cholesterol).
* Secondary Model: Logistic regression for binary outcomes (e.g., hypertension, dyslipidemia).
* Include interaction terms to test for effect modification by race/ethnicity and socioeconomic
status.

## Sensitivity Analysis:
* Stratify analyses by racial/ethnic group and socioeconomic status.
* Exclude participants with self-reported heart disease to test robustness.

## Data Preprocessing

We used data from the National Health and Nutrition Examination Survey (NHANES) cycles 1999–2000 through 2010–2012.

_.. National Center for Health Statistics. About the National Health and Nutrition Examination Survey. http://www.cdc.gov/nchs/nhanes/about_nhanes.htm. Accessed January 2025._

Data from the US Department of Agriculture Food Patterns Equivalents Database were obtained to translate NHANES 24-
hour dietary recall data into equivalent servings of the major food groups according to the Healthy Eating Index (HEI) 2010 [2](https://github.com/jamesjiadazhan/dietaryindex).

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [5]:
!ls

[1m[36mdata[m[m          posters.ipynb


In [6]:
# Load the data
from pathlib import Path
import zipfile

# Define paths using pathlib
archive_data = Path("data/archive")
zip_file_path = Path("data/nhanes_data.zip")

try:
    archive_data.mkdir(parents=True, exist_ok=True)

    if not zip_file_path.exists():
        raise FileNotFoundError(f"Zip file not found: {zip_file_path}")

    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(archive_data)
except FileNotFoundError as e:
    print(e)

In [10]:
# demographic variables
demo_df = pd.read_sas("data/archive/DEMO_L.xpt", format="xport")

In [13]:
# Plasma fasting glucose
glu_df = pd.read_sas("data/archive/GLU_L.xpt", format="xport")

In [14]:
# High density lipid project (HDL)
hdl_df = pd.read_sas("data/archive/GLU_L.xpt", format="xport")

In [15]:
# Insuline
ins_df = pd.read_sas("data/archive/INS_L.xpt", format="xport")

In [16]:
# Dietary interview
dr1iff_df = pd.read_sas("data/archive/DR1IFF_L.xpt", format="xport")

In [None]:

## Exploratory Data Analysis

## Feature Engineering
# https://github.com/jamesjiadazhan/dietaryindex

# https://github.com/abhrastat/heiscore

## Modeling and Inference

## Communicate Result