This repository contains the code used to analyze comorbidity networks in companion dogs using data from the Dog Aging Project (DAP). The analysis creates both undirected and time-directed comorbidity networks, adjusting for demographic covariates using a novel Poisson binomial approach.
π Explore the Interactive Comorbidity Network
For those interested in investigating comorbidities in the undirected, unstratified network in greater detail, we have developed an interactive Shiny application that allows you to:
- Explore individual condition relationships
- Filter networks by disease categories or specific conditions
- Toggle easily between unstratified, age-stratified, and directed networks
- Search for specific diseases of interest
The analysis pipeline processes owner-reported health data from the Dog Aging Project to identify statistically significant associations between health conditions. The methods use a Poisson binomial approach to account for individual-level covariates (age, sex, sterilization status, breed background, and weight) without requiring stratification, enabling more robust comorbidity detection than traditional methods.
The analysis requires the following input data from the Dog Aging Project:
DAP_2021_HLES_health_conditions_v1.0.RData: Health condition dataDAP_2021_HLES_cancer_conditions_v1.0.RData: Cancer condition dataDAP_2021_HLES_dog_owner_v1.0.RData: Dog and owner demographic datadiseaseCodes.csv: Mapping file connecting disease codes to names and categories
Note: Access to the DAP data requires approval through dogagingproject.org/data-access
The code automatically creates the following organized directory structure:
βββ data/
β βββ clean/ # Cleaned datasets (stratified and unstratified)
β βββ frequencies/ # Disease frequency tables
βββ results/
β βββ models/ # Logistic regression coefficients and probabilities
β βββ networks/ # Network edge lists and attributes
β βββ statistics/ # Summary statistics and analysis results
βββ outputs/
βββ figures/ # Publication-ready figures and plots
βββ tables/ # Formatted tables for manuscripts
The code consists of six main R Markdown scripts that should be executed in the following order:
Purpose: Data preprocessing and quality control
- Processes raw DAP data files
- Filters health conditions by prevalence (β₯60 dogs)
- Consolidates duplicate conditions (e.g., IVDD, laryngeal paralysis)
- Stratifies data by dog life stage (puppy, young adult, mature adult, senior)
- Creates disease frequency tables and crosswalk mappings
- Generates supplementary tables (S1a, S1b)
Key Outputs:
cleaned_unstrat.csv: Main analysis datasetcleaned_{lifestage}.csv: Age-stratified datasetsdisease_frequencies_{dataset}.csv: Disease prevalence tables
Purpose: Core comorbidity network analysis using Poisson binomial approach
- Fits logistic regression models for each health condition
- Calculates individualized disease probabilities accounting for demographics
- Identifies significant comorbidity pairs using statistical testing
- Applies multiple testing correction (Bonferroni)
- Creates network edge lists for visualization
Key Features:
- Must be run separately for each dataset (
analysis_typeparameter) - Supports both unstratified and age-stratified analyses
- Uses parallel processing for computational efficiency
Key Outputs:
significant_pairs_{dataset}.csv: Network edge listsprobability_{dataset}.csv: Individual disease probabilitiescoef_and_pvalues_{dataset}.csv: Regression model results
Purpose: Temporal relationship analysis
- Analyzes time-directed comorbidity relationships
- Implements temporal extension of Poisson binomial approach
- Uses sliding window analysis (configurable: 6, 12, 24 months)
- Processes medical history completeness and record length
- Determines which conditions tend to precede others
Key Outputs:
significant_pairs_directed_{window}m.csv: Directed network edgesdiagnosis_date_data.csv: Medical history analysis- Temporal relationship statistics
Purpose: Advanced network analysis and comparative studies
- Compares breed-specific disease prevalence (purebred vs. mixed)
- Analyzes network overlap between age groups
- Fits and compares degree distributions (power law vs. exponential)
- Calculates comprehensive network centrality measures
- Identifies age-specific and shared comorbidity patterns
Key Outputs:
network_overlap_matrix.csv: Cross-age network comparisonsbreed_disease_associations_full.csv: Breed risk analysisnetwork_topology_metrics.csv: Network structural properties- Degree distribution analysis and statistical tests
Purpose: Descriptive statistics and demographic analysis
- Generates comprehensive demographic summary tables (Table 1)
- Creates age and weight distribution histograms
- Analyzes disease category distributions
- Produces top 20 most common diseases visualization
- Examines distribution of disease counts per dog
- Creates Figure 1 (study overview) including medical history boxplot
Dependencies: Requires diagnosis_date_data.csv from directed_network_analysis.Rmd
Key Outputs:
Figure_1_study_overview.tiff: Main demographic figure- Demographic summary tables and histograms
Purpose: Publication-ready network visualizations
- Creates main undirected network with strategic node collapsing
- Generates age-stratified network visualizations
- Produces directed network plots with temporal arrows
- Creates network overlap heatmaps
- Combines multiple plots into publication figures
Dependencies: Requires outputs from all previous scripts
Key Features:
- Consistent color schemes and styling across all plots
- Automated node collapsing for visual clarity
- Highlighted edges for manuscript discussion
- Multiple output formats (PNG, TIFF)
Key Outputs:
final_network_layout.tiff: Main undirected network figurecombined_2x2_with_heatmap_cowplot_legend.tiff: Age comparison figurecombined_directed.tiff: Temporal relationship figure
In addition to the static publication figures, this repository includes an interactive Shiny application that allows users to explore the comorbidity networks in detail.
The interactive network explorer provides:
- Zooming and Panning: Navigate through the full network to examine specific regions and connections in detail
- Node Selection: Pan over individual diseases to highlight their connections and view detailed information
- Interactive Filtering: Filter networks by disease category, frequency thresholds, or statistical significance levels
- Search Functionality: Quickly locate specific diseases or conditions of interest
The Shiny application can be launched directly from this repository without running the full analysis pipeline:
# Run the interactive network explorer
shiny::runApp("./shiny_app/")Note: The required data files (significant_pairs_unstrat.csv and disease_frequencies_unstrat.csv) are included in this repository, so the interactive app can be run immediately without completing the full analysis pipeline.
To immediately explore the networks without running any analysis:
- Clone this repository
- Open R/RStudio in the project directory
- Install required packages:
shiny,DT,visNetwork,dplyr - Run:
shiny::runApp("./shiny_app/")
The app will launch in your default web browser, providing instant access to the interactive comorbidity network explorer.
# Core data manipulation and analysis
library(tidyverse) # Data manipulation and visualization
library(lubridate) # Date handling
library(haven) # Reading SPSS/SAS data
library(fastDummies) # Creating dummy variables
library(miceadds) # Loading RData files
# Statistical modeling and parallel processing
library(caret) # Statistical modeling
library(furrr) # Parallel processing
library(poweRlaw) # Distribution fitting
library(fitdistrplus) # Distribution analysis
# Network analysis and visualization
library(igraph) # Network analysis
library(ggraph) # Grammar of graphics for networks
library(tidygraph) # Tidy graph data structures
# Table creation and file handling
library(gt) # Table formatting
library(gtsummary) # Summary tables
library(readxl) # Excel file reading
library(writexl) # Excel file writing
# Plotting and visualization
library(RColorBrewer) # Color palettes
library(cowplot) # Plot composition
library(patchwork) # Additional plot layout
library(ggforce) # Extended ggplot functionalityIf you use this code in your research, please cite the associated publication and acknowledge the Dog Aging Project for data access.
For questions about the analysis methods or code implementation, please refer to the detailed comments within each script or contact the corresponding author of the associated publication.