This code is still in development and is therefore subject to major change over the coming weeks.
This repository provides a step-by-step guide to standardised modelling of estimated glomerular filtration rate (eGFR) slope in trials included in the SGLT2 inhibitor Meta-Analysis Cardio-Renal Trialists Consortium (SMART-C). An accompanying reproducible example (reprex) using synthetically-generated data is also provided in the code
folder.
A quick note on the reprex: the main objective is to demonstrate how to set-up the data, specify the model, and extract the relevant results for acute, chronic, and total slope. The synthetic data were crudely generated by creating variables that have the same marginal distribution as those in the CREDENCE trial, using the simstudy
R library. Creating a synthetic dataset which precisely emulates all variable relationships within longitudinal data is complex, and the main objective of this reprex is to understand the underlying methodology, so the results of the model here won't look congruent with what will be seen with clinical trial data.
If you'd like more information on the statistical methodology for modelling eGFR slope, please see the original publication here: Vonesh et al., 2019.
To get this code to work, please install all dependencies. To install these dependencies (if you haven't already), run the following script:
libs <- c("glue", "lme4", "multcomp", "simstudy", "tidyverse")
install.packages(setdiff("tidyverse", rownames(installed.packages())))
If you have any issues, check your package versions. The following package versions were used for the generation of this code:
dplyr - v1.1.2
glue - v1.6.2
lme4 - v1.1-34
multcomp - v1.4-25
simstudy - v0.7.0
stringr - v1.5.0
tibble - v3.2.1
tidyr - v1.3.0
Below is an overview of the folders in this repository that are actively synchronised with GitHub.
code
contains the reprex code: modelling_egfr_slope_reprex.R
- this reprex will show you how to calculate eGFR slope for the trial overall (active, control, and comparison) and for two groups in a binary subgroup variable (in this example, baseline use of glucagon-like peptide-1 receptor agonist (GLP-1RA) yes/no).
data
contains synthetic baseline synthetic_trial_baseline.csv
and follow-up synthetic_trial_follow_up_egfr.csv
data used in the reprex.
figs
contains the logo for the SGLT2 inhibitor Meta-Analysis Cardio-Renal Trialists Consortium (SMART-C) and the CREDENCE eGFR figure used for this README.
src
contains any files that are called via source()
in the scripts in code
. In this instance, the functions used to generate the synthetic data and to extract results from the mixed model.
The aim of this repository is to provide a systematic guide to modelling eGFR slope using R. This guide, enclosed within the repository README
, will provide a visual guide to:
- Setting-up the data into the required format
- Specifying the model
- Extracting the relevant model results (including acute slope, chronic slope, and total slope)
This guide will demonstrate how to model eGFR slope for the total trial population. If you'd like to know how to apply this code in subgroups, please see the reprex in the code
folder.
The simulated data is for a hypothetical trial of 5000 participants randomised 1:1 to either an SGLT2 inhibitor or placebo (2500 per arm).
Since your trial data will likely be standardised to the Analysis Data Model (ADaM), the simulated data has been set-up to imitate a truncated version of the "subject-level analysis dataset" ADSL
, here termed baseline
or bl
and repeat eGFR measurements from the "laboratory - within 2 days of the last meds" dataset ADLBM
, here termed follow-up
or fu
. If you'd like to know what each of the columns in the datasets are, please see the README
for the data folder of this repository.
Here is a visual snapshot of the baseline data:
# Source function to generate synthetic data
source(glue::glue("{path}/src/generate_synthetic_data.R"))
# Generate synthetic baseline data (derived from ADSL)
bl <- generate_synthetic_data(.table = "baseline")
# Print baseline data
print(bl, n = 10)
#> # A tibble: 5,000 × 8
#> usubjid trt01pn randfl ittfl trtfl blgfr blglp1 strata
#> <chr> <int> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 id0001 0 Y Y Y 54 N Screening eGFR 45 to <60 mL/min/1.73m2
#> 2 id0002 1 Y Y Y 27 N Screening eGFR 45 to <60 mL/min/1.73m2
#> 3 id0003 1 Y Y Y 50 N Screening eGFR 45 to <60 mL/min/1.73m2
#> 4 id0004 0 Y Y Y 47 N Screening eGFR 45 to <60 mL/min/1.73m2
#> 5 id0005 0 Y Y Y 75 N Screening eGFR 60 to <90 mL/min/1.73m2
#> 6 id0006 0 Y Y Y 30 N Screening eGFR 30 to <45 mL/min/1.73m2
#> 7 id0007 0 Y Y Y 56 N Screening eGFR 45 to <60 mL/min/1.73m2
#> 8 id0008 1 Y Y Y 59 N Screening eGFR 60 to <90 mL/min/1.73m2
#> 9 id0009 1 Y Y Y 65 N Screening eGFR 60 to <90 mL/min/1.73m2
#> 10 id0010 1 Y Y Y 70 N Screening eGFR 60 to <90 mL/min/1.73m2
#> # … with 4,990 more rows
#> # ℹ Use `print(n = ...)` to see more rows
Here is a visual snapshot of the follow-up data:
# Generate synthetic follow-up data (derived from ADLBM)
fu <- generate_synthetic_data(.table = "follow-up")
# Print follow-up data
print(fu, n = 10)
#> # A tibble: 41,800 × 12
#> usubjid randfl ittfl trtfl anl01fl aval base paramcd param ady avisitn avisit
#> <chr> <chr> <chr> <chr> <chr> <dbl> <int> <chr> <chr> <dbl> <dbl> <chr>
#> 1 id0001 Y Y Y Y 54 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 1 0 BASELINE
#> 2 id0001 Y Y Y Y 51 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 22 3 WEEK 3
#> 3 id0001 Y Y Y Y 61 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 91 13 WEEK 13
#> 4 id0001 Y Y Y Y 63 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 183 26 WEEK 26
#> 5 id0001 Y Y Y Y 53 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 362 52 WEEK 52
#> 6 id0001 Y Y Y Y 47 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 544 78 WEEK 78
#> 7 id0001 Y Y Y Y 37 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 726 104 WEEK 104
#> 8 id0001 Y Y Y Y 58 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 911 130 WEEK 130
#> 9 id0001 Y Y Y Y 59 54 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 1092 156 WEEK 156
#> 10 id0002 Y Y Y Y 27 27 GFRBSCRT GFR from Creatinine Adjusted for BSA (mL/min/1.73m2) 1 0 BASELINE
#> # … with 41,790 more rows
#> # ℹ Use `print(n = ...)` to see more rows
The first objective is to define the time point at which to place the knot in the spline fitted to the model. The reason for doing this is because of the acute drop in eGFR that is observed on initiation of an SGLT2 inhibitor. This should correspond to the first follow-up visit in your trial. For instance, in the DAPA-CKD trial (Heerspink et al., 2020), this is at 2 weeks (~14 days) post-randomisation, and in the CREDENCE trial (Perkovic et al., 2019), this is at 3 weeks (~21 days) (see figure below).
The change from the screening level in the estimated GFR in the on-treatment population (Perkovic et al., 2019).
# Set knot point for spline. This is 3 weeks/21 days in the CREDENCE trial
k <- 21
Next, the data need to be formatted correctly. You will need to specify in the "on-treatment" population in both datasets, and filter for the analysis values in the follow-up eGFR data. This is what was done in the SGLT2 inhibitor trials but you may not need to do the same in your study. You'll also notice that the code generates a "spline term" column. This needs to be set at 0
for times before the knot, and at time - k / 365.25
for all times after the knot.
The data object gfr_c
is how you want the data to appear before you commence modelling.
# Treatment arms (filtering for on-treatment individuals only)
arm <- bl |>
dplyr::filter(trtfl == "Y") |>
dplyr::select(usubjid, trt01pn, blglp1, strata) |>
dplyr::mutate(blglp1 = dplyr::if_else(blglp1 == "Y", 1, 0))
# Repeat eGFR (filtering for on-treatment values and those flagged for analysis)
gfr <- fu |>
dplyr::filter(dplyr::if_all(c(trtfl, anl01fl), \(x) x == "Y")) |>
dplyr::distinct(usubjid, aval, ady, base, avisitn)
# Join
gfr_c <- arm |>
dplyr::left_join(gfr, by = dplyr::join_by(usubjid), multiple = "all") |>
dplyr::mutate(
# Convert days from baseline to years
time = ady / 365.25,
# Generate spline term
spline = dplyr::if_else(time >= k / 365.25, time - k / 365.25, 0)
)
# Print combined data
print(gfr_c, n = 10)
#> # A tibble: 40,957 × 10
#> usubjid trt01pn blglp1 strata aval ady base avisitn time spline
#> <chr> <int> <dbl> <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 54 1 54 0 0.00274 0
#> 2 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 51 22 54 3 0.0602 0.00274
#> 3 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 61 91 54 13 0.249 0.192
#> 4 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 63 183 54 26 0.501 0.444
#> 5 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 53 362 54 52 0.991 0.934
#> 6 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 47 544 54 78 1.49 1.43
#> 7 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 37 726 54 104 1.99 1.93
#> 8 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 58 911 54 130 2.49 2.44
#> 9 id0001 0 0 Screening eGFR 45 to <60 mL/min/1.73m2 59 1092 54 156 2.99 2.93
#> 10 id0002 1 0 Screening eGFR 45 to <60 mL/min/1.73m2 27 1 27 0 0.00274 0
#> # … with 40,947 more rows
#> # ℹ Use `print(n = ...)` to see more rows
Here is what each of the columns correspond to:
Variable | Type | Description |
---|---|---|
usubjid |
Character | Unique subject identifier |
trt01pn |
Integer | Numeric indicator for treatment assignment; 0 == "Placebo", 1 == "SGLT2i" |
blglp1 |
Double | Numeric indicator for baseline GLP1-RA use; 0 == "No", 1 == "Yes" |
strata |
Character | Study strata (not required) |
aval |
Double | Estimated glomerular filtration rate (eGFR) value at study visit |
ady |
Double | Days from baseline |
base |
Integer | Baseline eGFR value |
avisitn |
Double | Study visit (weeks from baseline; numeric) |
time |
Double | Years from baseline (ady / 365.25) |
spline |
Double | Spline term |
The model used to calculate eGFR slope is a mixed effects model with an unstructured residual variance-covariance matrix. Typically we would adjust for the stratification term in the trial (categories of screening eGFR in the CREDENCE trial or diabetes status/UACR categories in DAPA-CKD trial), but whether you choose to do this in your study will depend on your data and analysis plan. The function required to run the model is lme4::lmer()
, and the unstructured residual variance-covariance matrix is specified with (time | usubjid)
. Adding -1
ensures that no intercept is fitted.
# Fit mixed effects model with unstructured residual variance-covariance matrix
fit <- lme4::lmer(
aval ~ base + strata + time * trt01pn + spline * trt01pn - 1 + (time | usubjid),
data = gfr_c
)
Once you have fit your model, you then need to use linear combinations to calculate each desired result. We have written a simple function compute_slope()
which at its core is a wrapper for multcomp::glht()
but provides output as a tibble
for easier manipulation and export. All you need to do is to provide the names of key variables in your model and whether you'd like to compute acute, chronic, total, or all slope types. Below is an example of computing all slope types. The function will print the coefficients that are indexed for each calculation, as well as the contrast vector that is being used compute the slope estimate.
# Define proportion of total slope accounted-for by chronic slope (1095.75 is
# the equivalent of 3 years in days, so we if we subtract the number of days in
# the acute slope, approximately 21 days or 3 weeks, we get this proportion
prop <- (1095.75 - 21) / 1095.75
# eGFR slope for the whole cohort
all <- compute_slope(
.model_obj = fit, .time_var = "time", .intervention_var = "trt01pn", .spline_var = "spline",
.prop = prop, .output = "all"
)
#> Slopes computed: Total
#> Output collated in dataframe extension of type `tibble`.
#> --------------------------------------------------------
#> Acute slope
#> Control
#> Linear combination: `time`
#> Contrast vector: c(0, 0, 0, 0, 1, 0, 0, 0, 0)
#> Active
#> Linear combination: `time` + `time:trt01pn`
#> Contrast vector: c(0, 0, 0, 0, 1, 0, 0, 1, 0)
#> Active - Control
#> Linear combination: `time:trt01pn`
#> Contrast vector: c(0, 0, 0, 0, 0, 0, 0, 1, 0)
# Print results
print(all)
#> # A tibble: 9 × 9
#> slope group estimate lci uci result se t_value p_value
#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 Acute Control -50.5 -57.9 -43.2 -50.55 (-57.88 to -43.21) 3.74 -13.5 0
#> 2 Acute Active -66.0 -73.3 -58.6 -65.95 (-73.29 to -58.62) 3.74 -17.6 0
#> 3 Acute Active - Control -15.4 -25.8 -5.03 -15.41 (-25.78 to -5.03) 5.29 -2.91 0.0036
#> 4 Chronic Control -2.46 -2.60 -2.32 -2.46 (-2.60 to -2.32) 0.0714 -34.4 0
#> 5 Chronic Active -1.54 -1.68 -1.40 -1.54 (-1.68 to -1.40) 0.0711 -21.6 0
#> 6 Chronic Active - Control 0.922 0.724 1.12 0.92 (0.72 to 1.12) 0.101 9.15 0
#> 7 Total Control -3.38 -3.54 -3.23 -3.38 (-3.54 to -3.23) 0.0794 -42.6 0
#> 8 Total Active -2.77 -2.93 -2.62 -2.77 (-2.93 to -2.62) 0.0792 -35.0 0
#> 9 Total Active - Control 0.608 0.389 0.828 0.61 (0.39 to 0.83) 0.112 5.42 0
The results above are implausible because of the inaccurate nature of the synthetic data, so do not expect your results to look the same. If you would like to know how to run the same analysis for subgroups, an appropriate model object and a .by
argument can be specified:
# Recode baseline GLP-1RA use as factor and set `blglp1` == "Yes" as reference
gfr_c <- gfr_c |>
mutate(blglp1 = factor(blglp1, levels = c(1, 0), labels = c("Yes", "No")))
# Fit mixed effects model with unstructured residual variance-covariance matrix
# this time with adjustment and interactions with `blglp1`
fit_subgrp <- lme4::lmer(
aval ~ base + time * trt01pn + spline * trt01pn + time * blglp1 +
spline * blglp1 + trt01pn * blglp1 + time * trt01pn * blglp1 +
spline * trt01pn * blglp1 - 1 + (time | usubjid),
data = gfr_c
)
# By GLP1 use (acute only)
sg <- compute_slope(
.model_obj = fit_subgrp, .time_var = "time", .intervention_var = "trt01pn",
.spline_var = "spline", .prop = prop, .by = "blglp1", .output = "acute"
)
#> Slopes computed: Acute
#> Output collated in dataframe extension of type `tibble`.
#> --------------------------------------------------------
#> Acute slope
#> Control, blglp1 = Yes
#> Linear combination: `time`
#> Contrast vector: c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
#> Active, blglp1 = Yes
#> Linear combination: `time` + `time:trt01pn`
#> Contrast vector: c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)
#> Active - Control, blglp1 = Yes
#> Linear combination: `time:trt01pn`
#> Contrast vector: c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)
#> Control, blglp1 = No
#> Linear combination: `time` + `time:blglp1No`
#> Contrast vector: c(0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)
#> Active, blglp1 = No
#> Linear combination: `time` + `time:blglp1No` + `trt01pn:blglp1No` + `time:trt01pn:blglp1No`
#> Contrast vector: c(0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0)
#> Active - Control, blglp1 = No
#> Linear combination: `time:trt01pn` + `time:trt01pn:blglp1No`
#> Contrast vector: c(0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0)
# Print results
print(sg)
#> # A tibble: 6 × 11
#> slope group subgroup subgroup_cat estimate lci uci result se t_value p_value
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 Acute Control blglp1 Yes -47.4 -86.1 -8.69 -47.41 (-86.12 to -8.69) 19.8 -2.40 0.0164
#> 2 Acute Active blglp1 Yes -28.2 -62.9 6.56 -28.19 (-62.94 to 6.56) 17.7 -1.59 0.112
#> 3 Acute Active - Control blglp1 Yes 19.2 -32.8 71.2 19.22 (-32.80 to 71.24) 26.5 0.724 0.469
#> 4 Acute Control blglp1 No -50.7 -58.1 -43.2 -50.66 (-58.14 to -43.19) 3.81 -13.3 0
#> 5 Acute Active blglp1 No -85.0 -135. -34.7 -85.03 (-135.35 to -34.71) 25.7 -3.31 0.0009
#> 6 Acute Active - Control blglp1 No -17.1 -27.6 -6.47 -17.06 (-27.65 to -6.47) 5.40 -3.16 0.0016
Robert Fletcher, MSc (Oxon) - rfletcher@georgeinstitute.org.au | The George Institute for Global Health, Sydney, Australia | British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge UK
Niels Jongs, PhD - n.jongs@umcg.nl | University Medical Center Groningen, Groningen, Netherlands