Install packages.

In [1]:
#######################
## Install packages. ##
#######################
library( bigrquery )
library( ggplot2 )
library( tidyverse )
library( readr )

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mpurrr    [39m 1.0.2     [32m✔[39m [34mtidyr    [39m 1.3.0
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


# 2024 01 11

Set and load requisites.

In [75]:
##############################
## Set and load requisites. ##
##############################

# Setup connection to GCP.
project_id = "yhcr-prd-phm-bia-core"
con <- DBI::dbConnect( drv = bigquery(), project = project_id )

# Define R tibbles from GCP tables.
r_tbl_srcode <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srcode" )
r_tbl_BNF_DMD_SNOMED_lkp <- dplyr::tbl( con, "CB_LOOKUPS.tbl_BNF_DMD_SNOMED_lkp" )
r_tbl_srprimarycaremedication <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srprimarycaremedication" )
r_tbl_srappointment <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srappointment" )

# Clinical code lists (BNF, SNOMED-CT, etc).
codes_SNOMED_diagnoses_of_interest <-
    readr::read_csv(file = 'nhsd-primary-care-domain-refsets-dmtype2_cod-20200812.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
codes_SNOMED_test_of_interest <-
    readr::read_csv(file = 'opensafely-glycated-haemoglobin-hba1c-tests-3e5b1269.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
codes_BNF_meds_of_interest <-
    readr::read_csv(file = 'ciaranmci-bnf-section-61-drugs-for-diabetes-207573b7.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
names_meds_of_interest <-
    r_tbl_BNF_DMD_SNOMED_lkp %>%
    dplyr::filter( BNF_Code %in% codes_BNF_meds_of_interest ) %>%
    dplyr::select( DMplusD_ProductDescription )
codes_BNF_metformin <-
    readr::read_csv(file = 'ciaranmci-metformin-bnf-0601022b0-and-child-bnf-codes-only-43e7d87e.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
names_metformin_meds <-
    r_tbl_BNF_DMD_SNOMED_lkp %>%
    dplyr::filter( BNF_Code %in% codes_BNF_metformin ) %>%
    dplyr::select( DMplusD_ProductDescription )

# Study start date
date_study_start <- lubridate::ymd('2018-06-01')
date_study_end <- lubridate::ymd('2020-12-31')

# Duration of months after the first prescription that we ignore before studying patterns.
pattern_delay_months <- 0

# Threshold value for the test, which in this case is HbA1c.
val_test_threshold <- 48

# Set values for meaningful changes in the values of the test.
val_meaningful_test_improvement <- -10
val_meaningful_test_disimprovement <- 10

# Set window within which to search for repeated (but bot repeat) prescriptions.
window_repeated_prescription_months <- 3

# Set number of tests, treatments, or iterations after diagnosis that should be tracked.
n_iterations <- 6

### Define cohort
The cohort is defined as records that include a diagnosis for Type 2 Diabetes Mellitus (T2DM) and a prescription for metformin as the only diabetic medication for the first prescription. The following queries create this list of records that are within the Connected Bradford primary care table.

Select records from primary care that have a clinical code for a T2DM diagnosis. Include a column showing the earliest date of diagnosis. Filter for within the study period.

In [3]:
###########################################################################
## Select records from primary care that have a clinical code for a T2DM ##
## diagnosis. Include a column showing the earliest date of diagnosis.   ##
## Filter for within the study period.                                   ##
###########################################################################
ppl_with_T2DM_diagnoses <-
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_diagnoses_of_interest ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_diagnosis = min( dateevent, na.rm = TRUE ) ) %>%
    dplyr::select( person_id, date_diagnosis ) %>%
    dplyr::filter( dplyr::between(date_diagnosis, date_study_start, date_study_end) )

### Prescriptions table
Select records from the primary-care prescriptions table that contain a prescription for the diabetes medications of interest. Group by `person_id` to select the first prescription, then use an inner join to filter for patients whose first prescription was for a metformin medication. This logic assumes that prescriptions for diabetic medications will only exist after the diagnosis date. Once completed, find these people's next prescription for a diabetes drug that is `pattern_delay_months` after the first prescription.

This provides a table with columns for `person_id`, `date_first_prescription`, `name_first_prescription`, `repeat_first_prescription`, `date_next_prescription`, and `name_next_prescription`.

In [4]:
ppl_first_and_next_prescription <-
########################################################################################################
## Select list of person IDs that have a metformin prescription as their first diabetic prescription. ## 
########################################################################################################
    # Select from the medication table the records for those patients who have a diagnosis.
    ppl_with_T2DM_diagnoses %>%
    dplyr::left_join(r_tbl_srprimarycaremedication, by = join_by( person_id ) ) %>%
    # Select every record that has a prescription for any diabetes medication.
    dplyr::inner_join( names_meds_of_interest, by = join_by( nameofmedication == DMplusD_ProductDescription ) ) %>%
    # Extract the first prescription.
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_first_prescription = min( dateevent, na.rm = TRUE ) ) %>%
    # Join to the original table to get the name of the medication.
    dplyr::left_join( r_tbl_srprimarycaremedication, by = join_by( person_id, date_first_prescription == dateevent ) ) %>%
    dplyr::select( person_id, date_first_prescription, nameofmedication, isrepeatmedication, medicationquantity, medicationdosage ) %>%
    dplyr::distinct() %>%
    # Use an inner join to filter these first prescriptions for the metformin medications, only.
    dplyr::inner_join( names_metformin_meds, by = join_by( nameofmedication == DMplusD_ProductDescription ) ) %>%
    dplyr::distinct() %>%
    # Rename the name of the medication (a very hacky way of doing it).
    dplyr::mutate( name_first_prescription = nameofmedication
                  ,repeat_first_prescription = ifelse(isrepeatmedication == 'true', TRUE, FALSE)
                  ,quantity_first_prescription = medicationquantity
                  ,dose_first_prescription = medicationdosage
   ) %>%
    dplyr::select( -c( nameofmedication, isrepeatmedication, medicationquantity, medicationdosage ) ) %>% 

#######################################################################
## Find these people's next prescription for the same diabetes drug. ##
## the first prescription.                                           ##
#######################################################################
    # Select all prescriptions for these person IDs.
    dplyr::left_join( r_tbl_srprimarycaremedication, by = join_by( person_id ) ) %>%
    # Select only prescriptions for the same diabetes medication as the first prescription.
    dplyr::select( person_id, date_first_prescription, name_first_prescription, repeat_first_prescription
                  ,quantity_first_prescription, dose_first_prescription, dateevent, nameofmedication
                  ,isrepeatmedication, medicationquantity, medicationdosage ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::filter( name_first_prescription == nameofmedication ) %>%
    dplyr::distinct() %>%
    dplyr::ungroup() %>%
    # Rename columns (the hacky way).
    dplyr::mutate(
        date_next_prescription = dateevent#ifelse( date_first_prescription == dateevent, NA_character_, dateevent )
        ,repeat_next_prescription = ifelse( isrepeatmedication == 'true', TRUE, FALSE )#ifelse( date_first_prescription == dateevent, NA_character_, ifelse( isrepeatmedication == 'true', TRUE, FALSE ) )
        ,quantity_next_prescription = medicationquantity#ifelse( date_first_prescription == dateevent, NA_character_, medicationquantity )
        ,dose_next_prescription = medicationdosage#ifelse( date_first_prescription == dateevent, NA_character_, medicationdosage )
    ) %>% 
    dplyr::select( -c( dateevent, nameofmedication, medicationquantity, medicationdosage, isrepeatmedication ) ) %>% 
    # Retrieve either a) the first of the next prescriptions, if there is only one, or b) the second
    # of the next prescripions, if there is more than one and the first is NA.
    dplyr::group_by( person_id ) %>%
    dbplyr::window_order( date_next_prescription ) %>%
    dplyr::mutate(
        keep_this = dplyr::case_when(
            ( date_next_prescription == date_first_prescription ) & is.na( dplyr::lead( date_next_prescription ) ) ~ TRUE
            ,( date_next_prescription == date_first_prescription ) & !is.na( dplyr::lead( date_next_prescription ) ) ~ FALSE
            ,( date_next_prescription != date_first_prescription ) & ( dplyr::lag( date_next_prescription ) == date_first_prescription ) ~ TRUE
            ,( date_next_prescription != date_first_prescription ) & ( dplyr::lag( date_next_prescription ) != date_first_prescription ) ~ FALSE
            ,TRUE ~ NA
        )
    ) %>%
    dplyr::filter( keep_this == TRUE ) %>% 
    dplyr::select( - keep_this ) %>%
    # Set NA if there was no subsequent prescription for the same medication.
    dplyr::mutate(
        date_next_prescription = ifelse( date_next_prescription == date_first_prescription, NA_character_, date_next_prescription )
        ,repeat_next_prescription = ifelse( date_next_prescription == date_first_prescription, NA_character_, repeat_next_prescription )
        ,quantity_next_prescription = ifelse( date_next_prescription == date_first_prescription, NA_character_, quantity_next_prescription )
        ,dose_next_prescription = ifelse( date_next_prescription == date_first_prescription, NA_character_, dose_next_prescription )
     ) %>%

#######################################################################
## What was the treatment (i.e. prescription): {repeat prescription, ## # Needs checking.
## repeated prescription, one-off prescription}?                     ##
#######################################################################
    dbplyr::window_order( person_id ) %>%
    dplyr::mutate(
        prescription_repeat = dplyr::case_when(
            ( repeat_first_prescription == TRUE ) & is.na( date_next_prescription ) ~ 'Repeat prescription'
            ,( repeat_first_prescription == TRUE ) & is.na( date_next_prescription ) ~ 'Error: Repeat prescription but no same subsequent'
            ,( repeat_first_prescription == FALSE ) & !is.na( date_next_prescription ) ~ 'Prescription repeated'
            ,( repeat_first_prescription == FALSE ) & is.na( date_next_prescription )  ~ 'One-off prescription'
            ,TRUE ~ 'Unknown'
        )
    )

In [5]:
t <-
ppl_first_and_next_prescription %>%
dplyr::arrange( person_id ) %>%
dplyr::collect()

In [6]:
t %>%
head( )

person_id,date_first_prescription,name_first_prescription,repeat_first_prescription,quantity_first_prescription,dose_first_prescription,date_next_prescription,repeat_next_prescription,quantity_next_prescription,dose_next_prescription,prescription_repeat
<int>,<dttm>,<chr>,<lgl>,<chr>,<chr>,<dttm>,<lgl>,<chr>,<chr>,<chr>
230,2020-07-13 09:48:21,Metformin 1g modified-release tablets,True,56 tablet,take one daily,,True,56 tablet,take one daily,Repeat prescription
6066,2020-01-31 09:32:54,Metformin 500mg tablets,True,112 tablet,take 2 tablets after breakfast and evening meal,2021-07-16 09:12:14,True,112 tablet,take 2 tablets after breakfast and evening meal,Unknown
15106,2018-08-31 12:10:46,Metformin 500mg tablets,True,84 tablet,One tablet daily for 2 weeks then increase to on tablet twice daily,2018-09-03 09:47:05,True,84 tablet,One tablet daily for 2 weeks then increase to on tablet twice daily,Unknown
17525,2020-01-08 09:32:19,Metformin 500mg tablets,False,28 tablet,Take 1 tablet after breakfast and evening meal,2020-02-05 11:47:10,True,28 tablet,Take 1 tablet after breakfast and evening meal,Prescription repeated
19553,2021-04-22 14:22:42,Metformin 500mg tablets,False,7 tablet,Week 2: Take ONE daily after breakfast,2021-05-13 11:31:40,True,28 tablet,Week 7 onwards: Take TWO tablet twice a day,Prescription repeated
20102,2019-10-03 14:25:01,Metformin 500mg tablets,False,56 tablet,1 with main meal,2019-11-25 15:31:03,True,56 tablet,1 with main meal,Prescription repeated


### Tests table.
Find these people's next prescription for a diabetes drug that is `pattern_delay_months` after the first prescription.

This provides a table with columns for `person_id`, `date_next_test`, and `val_next_test`.

In [76]:
########################################################################################################
## Find these people's next prescription for a diabetes drug that is `pattern_delay_months` after     ##
## the first prescription.                                                                            ##
########################################################################################################
ppl_next_test <- 
    # Filter records for only those that refer to the test of interest.
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_test_of_interest ) %>%
    # Filter for only those patient that we already identified in our prescription table.
    dplyr::right_join( ppl_first_and_next_prescription %>% dplyr::select( person_id, date_first_prescription )
                      ,by = join_by( person_id )
    ) %>%
    # Extract the required fields.
    dplyr::select( person_id, dateevent, numericvalue, date_first_prescription ) %>%
    # Filter for tests that were done after the pattern start date.
    dplyr::group_by( person_id ) %>%
    dplyr::filter(
        dateevent > sql(paste0( 'DATETIME_ADD( date_first_prescription, INTERVAL ', pattern_delay_months, ' MONTH )') )
    ) %>%
    # Remove `date_first_prescription` because it is no longer needed.
    dplyr::select( - date_first_prescription ) %>%
    # Filter for however many subsequent test you want, then number them.
    dbplyr::window_order( person_id, dateevent ) %>%
    dplyr::filter( row_number() <= n_iterations ) %>%
    dplyr::mutate( temp_col_names = row_number() ) %>% 
    # Rename columns (the hacky way).
    dplyr::mutate(
        date_next_test = dateevent
        ,val_next_test = as.integer( numericvalue )
    ) %>%
    dplyr::select( -c( dateevent, numericvalue ) ) %>% 
    # Append the date of diagnosis and find the test result immediately preceding it. ###########################
    dplyr::left_join( ppl_with_T2DM_diagnoses, by = join_by( person_id ) ) %>%
    # Sort and retrieve the query results.
    dplyr::arrange( person_id ) %>% dplyr::collect() %>%
    # Define new variables quantifying the intervals of interest.
    dplyr::mutate(
        interval_test_to_test = lubridate::as.period( date_next_test - dplyr::lag( date_next_test ) ) %>% lubridate::day()
    ) %>%
    # Pivot the subsequent test dates and values into their own columns.
    tidyr::pivot_wider( names_from = temp_col_names, values_from = c( date_next_test, val_next_test, interval_test_to_test ), names_sort = TRUE ) %>%
    # Define the first interval, i.e. between diagnosis and first test.
    tibble::add_column(
        interval_diagnosis_to_test_1 = lubridate::as.period( .$date_next_test_1 - .$date_diagnosis ) %>% lubridate::day()
        ,.before = "interval_test_to_test_1"
    ) %>%
    dplyr::select( - interval_test_to_test_1 ) %>%
    dplyr::rename_with(
        .fn = ~ paste0("interval_test_", c( 2:n_iterations - 1 ), "_to_test_", c( 2:( n_iterations ) ), recycle0 = TRUE )
        ,.cols = starts_with("interval_test_to_test")
    )
####### The first intervals seems very long but note that it is constrained to be AFTER the first prescription.

person_id,date_diagnosis,date_next_test_1,date_next_test_2,date_next_test_3,date_next_test_4,date_next_test_5,date_next_test_6,val_next_test_1,val_next_test_2,val_next_test_3,val_next_test_4,val_next_test_5,val_next_test_6,interval_diagnosis_to_test_1,interval_test_1_to_test_2,interval_test_2_to_test_3,interval_test_3_to_test_4,interval_test_4_to_test_5,interval_test_5_to_test_6
<int>,<dttm>,<dttm>,<dttm>,<dttm>,<dttm>,<dttm>,<dttm>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
230,2018-12-12 00:00:00,2020-08-10 18:51:00,2020-11-05 09:51:41,2021-03-18 09:59:36,2022-01-20 10:18:55,2022-05-26 10:43:51,2022-09-29 10:25:35,67,80,60,82,65,62,607,86,133,308,126,125
15106,2018-08-31 12:10:46,2018-11-27 13:05:00,2019-05-28 13:39:00,2019-10-10 13:32:00,2021-08-05 13:42:00,2021-08-19 10:22:40,2021-11-18 17:46:00,47,44,48,56,56,49,88,182,134,665,13,91
17525,2019-05-29 00:00:00,2020-12-02 14:19:00,2022-01-20 19:21:00,,,,,51,53,,,,,553,414,,,,
19553,2019-06-11 19:35:49,2021-12-29 19:29:00,,,,,,76,,,,,,931,,,,,
20102,2019-09-18 15:31:54,2021-04-28 18:47:00,2021-08-19 14:04:00,2022-03-08 10:12:51,2022-11-03 13:19:00,,,52,50,55,51,,,588,112,200,240,,
30766,2019-07-01 12:16:06,2019-10-04 12:57:00,2020-01-17 14:29:00,2021-05-04 17:08:00,2021-08-23 14:50:00,2021-12-06 16:48:00,2022-07-07 17:39:00,41,44,48,73,61,64,95,105,473,110,105,213
36414,2018-08-24 15:16:54,2018-09-24 19:31:00,2018-09-24 19:31:00,2019-03-06 13:39:00,2019-03-06 13:39:00,2019-08-30 18:30:00,2019-08-30 18:30:00,61,61,48,48,60,60,31,0,162,0,177,0
38736,2020-11-20 15:04:52,2021-02-11 18:22:00,2021-06-01 11:16:22,2021-09-21 14:22:37,2022-01-11 12:40:04,2023-02-01 08:41:09,,50,47,53,66,63,,83,109,112,111,385,
43158,2020-12-21 00:00:00,2021-03-03 00:00:00,2022-05-10 16:29:19,,,,,109,108,,,,,72,433,,,,
52017,2018-06-13 09:42:12,2018-09-19 11:12:21,2018-12-05 12:59:00,2019-06-07 11:29:00,2020-07-22 12:26:33,2020-07-22 12:26:33,2022-07-27 11:25:29,47,54,53,48,48,51,98,77,183,411,0,734


### Contact / Appointments table.
Describe the pattern of contacts between the first subsequent test and the second subsequent test.

In [12]:
ppl_count_contacts_between_tests <-
    ppl_next_test %>%
    left_join( r_tbl_srappointment %>% dplyr::select( person_id, datestart ), by = join_by( person_id ) ) %>%
    dplyr::filter( datestart > date_next_test_1 & datestart < date_next_test_2 ) %>%
    dplyr::distinct( datestart ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( count_contacts_between_tests = n() )

In [None]:
temp <-
    ppl_next_test %>%
    dplyr::left_join( ppl_count_contacts_between_tests, by = join_by( person_id ) ) %>%
    arrange( person_id ) %>%
    dplyr::collect() %>%
    head( n = 10 )

In [18]:
temp %>% colnames()

In [28]:
temp %>% 
dplyr::select( person_id, date_diagnosis, date_next_test_1, date_next_test_2 ) %>%
head()

person_id,date_diagnosis,date_next_test_1,date_next_test_2
<int>,<dttm>,<dttm>,<dttm>
230,2018-12-12 00:00:00,2020-08-10 18:51:00,2020-11-05 09:51:41
15106,2018-08-31 12:10:46,2018-11-27 13:05:00,2019-05-28 13:39:00
17525,2019-05-29 00:00:00,2020-12-02 14:19:00,2022-01-20 19:21:00
19553,2019-06-11 19:35:49,2021-12-29 19:29:00,
20102,2019-09-18 15:31:54,2021-04-28 18:47:00,2021-08-19 14:04:00
30766,2019-07-01 12:16:06,2019-10-04 12:57:00,2020-01-17 14:29:00


# 2024 01 04

Set and load requisites.

In [71]:
##############################
## Set and load requisites. ##
##############################

# Setup connection to GCP.
project_id = "yhcr-prd-phm-bia-core"
con <- DBI::dbConnect( drv = bigquery(), project = project_id )

# Define R tibbles from GCP tables.
r_tbl_srcode <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srcode" )
r_tbl_BNF_DMD_SNOMED_lkp <- dplyr::tbl( con, "CB_LOOKUPS.tbl_BNF_DMD_SNOMED_lkp" )
r_tbl_srprimarycaremedication <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srprimarycaremedication" )
r_tbl_srappointment <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srappointment" )

# Clinical code lists (BNF, SNOMED-CT, etc).
codes_SNOMED_diagnoses_of_interest <-
    readr::read_csv(file = 'nhsd-primary-care-domain-refsets-dmtype2_cod-20200812.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
codes_SNOMED_test_of_interest <-
    readr::read_csv(file = 'opensafely-glycated-haemoglobin-hba1c-tests-3e5b1269.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
codes_BNF_meds_of_interest <-
    readr::read_csv(file = 'ciaranmci-bnf-section-61-drugs-for-diabetes-207573b7.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
names_meds_of_interest <-
    r_tbl_BNF_DMD_SNOMED_lkp %>%
    dplyr::filter( BNF_Code %in% codes_BNF_meds_of_interest ) %>%
    dplyr::select( DMplusD_ProductDescription )
codes_BNF_metformin <-
    readr::read_csv(file = 'ciaranmci-metformin-bnf-0601022b0-and-child-bnf-codes-only-43e7d87e.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
names_metformin_meds <-
    r_tbl_BNF_DMD_SNOMED_lkp %>%
    dplyr::filter( BNF_Code %in% codes_BNF_metformin ) %>%
    dplyr::select( DMplusD_ProductDescription )

# Study start date
date_study_start <- lubridate::ymd('2018-06-01')
date_study_end <- lubridate::ymd('2020-12-31')

# Duration of months after the first prescription that we ignore before studying patterns.
pattern_delay_months <- 0

# Threshold value for the test, which in this case is HbA1c.
val_test_threshold <- 48

# Set values for meaningful changes in the values of the test.
val_meaningful_test_improvement <- -10
val_meaningful_test_disimprovement <- 10

# Set window within which to search for repeated (but bot repeat) prescriptions.
window_repeated_prescription_months <- 3

### Define cohort
The cohort is defined as records that include a diagnosis for Type 2 Diabetes Mellitus (T2DM) and a prescription for metformin as the only diabetic medication for the first prescription. The following queries create this list of records that are within the Connected Bradford primary care table.

Select records from primary care that have a clinical code for a T2DM diagnosis. Include a column showing the earliest date of diagnosis. Filter for within the study period.

In [72]:
###########################################################################
## Select records from primary care that have a clinical code for a T2DM ##
## diagnosis. Include a column showing the earliest date of diagnosis.   ##
## Filter for within the study period.                                   ##
###########################################################################
ppl_with_T2DM_diagnoses <-
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_diagnoses_of_interest ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_diagnosis = min( dateevent, na.rm = TRUE ) ) %>%
    dplyr::select( person_id, date_diagnosis ) %>%
    dplyr::filter( dplyr::between(date_diagnosis, date_study_start, date_study_end) )

### Prescriptions table
Select records from the primary-care prescriptions table that contain a prescription for the diabetes medications of interest. Group by `person_id` to select the first prescription, then use an inner join to filter for patients whose first prescription was for a metformin medication. This logic assumes that prescriptions for diabetic medications will only exist after the diagnosis date. Once completed, find these people's next prescription for a diabetes drug that is `pattern_delay_months` after the first prescription.

This provides a table with columns for `person_id`, `date_first_prescription`, `name_first_prescription`, `repeat_first_prescription`, `date_next_prescription`, and `name_next_prescription`.

In [73]:
ppl_first_and_next_prescription <-
########################################################################################################
## Select list of person IDs that have a metformin prescription as their first diabetic prescription. ## 
########################################################################################################
    # Select from the medication table the records for those patients who have a diagnosis.
    ppl_with_T2DM_diagnoses %>%
    dplyr::left_join(r_tbl_srprimarycaremedication, by = join_by( person_id ) ) %>%
    # Select every record that has a prescription for any diabetes medication.
    dplyr::inner_join( names_meds_of_interest, by = join_by( nameofmedication == DMplusD_ProductDescription ) ) %>%
    # Extract the first prescription.
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_first_prescription = min( dateevent, na.rm = TRUE ) ) %>%
    # Join to the original table to get the name of the medication.
    dplyr::left_join( r_tbl_srprimarycaremedication, by = join_by( person_id, date_first_prescription == dateevent ) ) %>%
    dplyr::select( person_id, date_first_prescription, nameofmedication, isrepeatmedication, medicationquantity, medicationdosage ) %>%
    dplyr::distinct() %>%
    # Use an inner join to filter these first prescriptions for the metformin medications, only.
    dplyr::inner_join( names_metformin_meds, by = join_by( nameofmedication == DMplusD_ProductDescription ) ) %>%
    dplyr::distinct() %>%
    # Rename the name of the medication (a very hacky way of doing it).
    dplyr::mutate( name_first_prescription = nameofmedication
                  ,repeat_first_prescription = ifelse(isrepeatmedication == 'true', TRUE, FALSE)
                  ,quantity_first_prescription = medicationquantity
                  ,dose_first_prescription = medicationdosage
   ) %>%
    dplyr::select( -c( nameofmedication, isrepeatmedication, medicationquantity, medicationdosage ) ) %>%

########################################################################################################
## Find these people's next prescription for a diabetes drug that is `pattern_delay_months` after     ##
## the first prescription.                                                                            ##
########################################################################################################
    # Select all prescriptions for these person IDs.
    dplyr::left_join( r_tbl_srprimarycaremedication, by = join_by( person_id ) ) %>%
    # Select only prescriptions for diabetes medications.
    dplyr::select( person_id, date_first_prescription, name_first_prescription, repeat_first_prescription
                  ,quantity_first_prescription, dose_first_prescription, dateevent, nameofmedication ) %>%
    dplyr::inner_join( names_meds_of_interest, by = join_by( nameofmedication == DMplusD_ProductDescription ) ) %>%
    dplyr::distinct() %>%
    # Filter for tests that were done after the pattern start date.
    dplyr::group_by( person_id ) %>%
    dplyr::filter(
        dateevent > sql( paste0( 'DATETIME_ADD( date_first_prescription, INTERVAL ', pattern_delay_months, ' MONTH )' ) )
    ) %>%
    # If there are multiple medications prescribed on the same day, collapse the combination into one comma-separated string.
    dplyr::ungroup() %>% 
    dplyr::mutate(
        combind_NoM = sql( 'STRING_AGG(nameofmedication, \', \') OVER( PARTITION BY person_id, dateevent)' )
    ) %>%
    # Rename columns (the hacky way).
    dplyr::mutate(
        date_next_prescription = dateevent
        ,name_next_prescription = as.character( combind_NoM )
    ) %>% 
    dplyr::select( -c( dateevent, nameofmedication, combind_NoM ) ) %>%
    dplyr::distinct() %>% 
    # Retrieve only the first of the next prescriptions.
    dplyr::group_by( person_id ) %>%
    dbplyr::window_order( date_next_prescription ) %>%
    dplyr::filter( row_number() == 1 ) %>%
    dplyr::ungroup() %>%

#######################################################################
## What was the treatment (i.e. prescription): {repeat prescription, ##
## repeated prescription, one-off prescription}?                     ##
#######################################################################
    dbplyr::window_order( person_id ) %>%
    dplyr::mutate(
        prescription_repeat = dplyr::case_when(
            ( repeat_first_prescription == TRUE ) & ( name_next_prescription %LIKE% paste0( '%', name_first_prescription, '%' ) ) ~ 'Repeat prescription'
            ,( repeat_first_prescription == TRUE ) & ( name_next_prescription %NOT LIKE% paste0( '%', name_first_prescription, '%' ) ) ~ 'Error: Repeat prescription but subsequent does not match'
            ,( repeat_first_prescription == FALSE ) & ( name_next_prescription %LIKE% paste0( '%', name_first_prescription, '%' ) ) ~ 'Prescription repeated'
            ,( repeat_first_prescription == FALSE ) & ( name_next_prescription %NOT LIKE% paste0( '%', name_first_prescription, '%' ) ) ~ 'One-off prescription'
            ,TRUE ~ 'Unknown'
            )
    )

### Tests table.
Find these people's next prescription for a diabetes drug that is `pattern_delay_months` after the first prescription.

This provides a table with columns for `person_id`, `date_next_test`, and `val_next_test`.

In [74]:
########################################################################################################
## Find these people's next prescription for a diabetes drug that is `pattern_delay_months` after     ##
## the first prescription.                                                                            ##
########################################################################################################
ppl_next_test <- 
    # Filter records for only those that refer to the test of interest.
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_test_of_interest,
                     as.numeric( numericvalue ) > val_test_threshold ) %>%
    # Filter for only those patient that we already identified in our prescription table.
    dplyr::right_join( ppl_first_and_next_prescription %>% dplyr::select( person_id, date_first_prescription )
                      ,by = join_by( person_id )
    ) %>%
    # Extract only the perion ID and the date and result of the test.
    dplyr::select( person_id, dateevent, numericvalue, date_first_prescription ) %>%
    # Filter for tests that were done after the pattern start date.
    dplyr::group_by( person_id ) %>%
    dplyr::filter(
        dateevent > sql(paste0( 'DATETIME_ADD( date_first_prescription, INTERVAL ', pattern_delay_months, ' MONTH )') )
    ) %>%
    dplyr::select( -date_first_prescription ) %>%
    # Filter for the first of these subsequent prescriptions.
    dbplyr::window_order( person_id, dateevent ) %>%
    dplyr::filter( row_number() == 1 ) %>% # Just add `| row_number() == 2` if you want the second-next, too.
    # Rename columns (the hacky way).
    dplyr::mutate(
        date_next_test = dateevent
        ,val_next_test = as.integer( numericvalue )
    ) %>%
    dplyr::select( -c( dateevent, numericvalue ) )

### Join tables and calculate variables of interest.
Our prescriptions and tests tables are joined by matching the person ID. New columns are added to store variables that represent the answers to the following questions:

1. What was the treatment (i.e. prescription): {repeat prescription, repeated prescription, one-off prescription}?

2. What was the duration until the next end event?

3. Did the test result change?

4. Did the count and frequency of contacts change? * * *_I need more information. What are the intervals? What are 'contacts'?_* * *


In [23]:
# Find the value of the test initiated the diagnosis. Call it `val_diagnosis_test`.
ppl_diagnosis_test <-
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_test_of_interest,
                  as.numeric( numericvalue ) > 48 ) %>%
    dplyr::select( person_id, dateevent, numericvalue ) %>%
    dplyr::inner_join( ppl_with_T2DM_diagnoses, by = join_by( person_id ) ) %>%
    dplyr::filter( dateevent < date_diagnosis ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_test_just_before_diagnosis = max( dateevent ) ) %>%
    dplyr::left_join( r_tbl_srcode,
                     by = join_by( person_id, date_test_just_before_diagnosis == dateevent ) ) %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_test_of_interest ) %>%
    dplyr::mutate( val_diagnosis_test = as.integer( numericvalue ) ) %>%
    dplyr::select( - numericvalue ) %>%
    dplyr::select( person_id, val_diagnosis_test )
 


# Join the tables.
dplyr::left_join(
    ppl_first_and_next_prescription
    ,ppl_next_test
    ,by = join_by( person_id )
) %>%
dplyr::left_join(
    ppl_diagnosis_test
    ,by = join_by( person_id )
) %>%
dplyr::mutate(
    val_test_threshold = val_test_threshold 
   ,val_meaningful_test_improvement = val_meaningful_test_improvement
   ,val_meaningful_test_disimprovement = val_meaningful_test_disimprovement
) %>%
#####################################################    
## Did the count and frequency of contacts change? ## # Don't know how I will tackle this one, yet. I can use tbl_srappointment but I will still need to define intervals.
#####################################################
#... 

#####################################################
## What was the duration until the next end event? ##
#####################################################
dplyr::mutate(
    days_until_next_prescription = sql('DATETIME_DIFF(date_next_prescription, date_first_prescription, DAY)')
    ,days_until_next_test = sql('DATETIME_DIFF(date_next_test, date_first_prescription, DAY)')
) %>% 
#################################
## Did the test result change? ##
#################################
dplyr::mutate(
    status_next_test = dplyr::case_when(
        is.na( val_next_test ) ~ 'No next test within window.'
        ,val_next_test >= val_test_threshold ~ 'Of concern'
        ,val_next_test < val_test_threshold ~ 'No concern'
    )
    ,change_next_test = dplyr::case_when(
        is.na( val_diagnosis_test ) ~ 'No diagnosis test. Check data.'
        ,is.na( val_next_test ) ~ 'No next test within window.'
        ,( val_next_test - val_diagnosis_test ) < val_meaningful_test_improvement ~ 'Improvement'
        ,( val_next_test - val_diagnosis_test ) > val_meaningful_test_disimprovement ~ 'Disimprovement'
        ,TRUE ~ 'No change'
    )
) %>%
dplyr::select( - c( val_meaningful_test_improvement, val_meaningful_test_disimprovement ) ) %>% arrange(person_id) %>% collect() %>% head()

ERROR: [1m[33mError[39m in `dplyr::mutate()`:[22m
[1m[22m[36mℹ[39m In argument: `status_next_test = dplyr::case_when(...)`
[1mCaused by error:[22m
[1m[22m[33m![39m Object `val_next_test` not found.


# 2023 12 07

Set and load requisites.

In [73]:
# Set and load requisites.

# Setup connection to GCP.
project_id = "yhcr-prd-phm-bia-core"
con <- DBI::dbConnect( drv = bigquery(), project = project_id )

# Define R tibbles from GCP tables.
r_tbl_srcode <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srcode" )
r_tbl_BNF_DMD_SNOMED_lkp <- dplyr::tbl( con, "CB_LOOKUPS.tbl_BNF_DMD_SNOMED_lkp" )
r_tbl_srprimarycaremedication <- dplyr::tbl( con, "CB_FDM_PrimaryCare_V7.tbl_srprimarycaremedication" )

# Clinical code lists (BNF, SNOMED-CT, etc).
codes_SNOMED_diagnoses_of_interest <-
    readr::read_csv(file = 'nhsd-primary-care-domain-refsets-dmtype2_cod-20200812.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
codes_SNOMED_test_of_interest <-
    readr::read_csv(file = 'opensafely-glycated-haemoglobin-hba1c-tests-3e5b1269.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
codes_BNF_meds_of_interest <-
    readr::read_csv(file = 'ciaranmci-bnf-section-61-drugs-for-diabetes-207573b7.csv',
                    col_types = cols( code = col_character(), term = col_character() ) )$code
names_meds_of_interest <-
    r_tbl_BNF_DMD_SNOMED_lkp %>%
    dplyr::filter( BNF_Code %in% codes_BNF_meds_of_interest ) %>%
    dplyr::select( DMplusD_ProductDescription )

# Define study duration in months.
study_duration <- 6
# Define the duration of interest between a diagnosis and the first prescription. We
# assume that this is the duration in which lifestyle interventions are trialled before
# resorting to medications.
duration_between_diagnosis_and_first_prescription_in_months <- 4

Select records from the primary care table that have a clinical code for a T2DM diagnosis. Include a column showing the earliest date of diagnosis.

In [3]:
# Select records from primary care that have a clinical code for a T2DM
# diagnosis. Include a column showing the earliest date of diagnosis.
ppl_with_T2DM_diagnoses <-
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_diagnoses_of_interest ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_diagnosis = min( dateevent, na.rm = TRUE ) ) %>%
    dplyr::select( person_id, date_diagnosis )

Select records from the primary care table that have a reading for HbA1c greater than 48 mmol/l. Then, select only those records that were also returned for having a clinical code for a T2DM diagnosis. Finally, only select the records where the HbA1c test > 48 mmol/l happened before the diagnosis for T2DM was clinically coded.

In [11]:
# Select records from primary care that have a reading for HbA1c greater
# than 48 mmol/l. Then, select only those records that were also returned
# for having a clinical code for a T2DM diagnosis. Finally, only select 
# the records where the HbA1c test > 48 mmol/l happened before the diagnosis
# for T2DM was clinically coded.
ppl_with_high_HbA1c_just_before_diagnosis <-
    r_tbl_srcode %>%
    dplyr::filter( snomedcode %in% codes_SNOMED_test_of_interest) %>%
    dplyr::select( person_id, dateevent, numericvalue ) %>%
    dplyr::inner_join( ppl_with_T2DM_diagnoses, by = join_by( person_id ) ) %>%
    dplyr::filter( dateevent < date_diagnosis) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::filter( dateevent == max( dateevent ) ) %>%
    dplyr::select( - date_diagnosis ) %>%
    dplyr::arrange( person_id )

Select records from the primary-care prescriptions table that contain a prescription for the diabetes medications of interest. Include a column showing the earliest date of prescription for each person (as opposed to the earliest prescription of a particular drug for each person).

In [None]:
# Select records from the primary-care prescriptions table that contain
# a prescription for the diabetes medications of interest. Include a column
# showing the earliest date of prescription for each person (as opposed to
# the earliest prescription of a particular drug for each person).
ppl_with_diabetes_meds_first_prescription <-
    r_tbl_srprimarycaremedication %>%
    dplyr::inner_join( names_meds_of_interest, by = join_by( nameofmedication == DMplusD_ProductDescription ) ) %>%
    dplyr::group_by( person_id ) %>%
    dplyr::summarise( date_first_prescription = min( dateevent, na.rm = TRUE ) ) %>%
    dplyr::left_join( r_tbl_srprimarycaremedication, by = join_by( person_id, date_first_prescription == dateevent ) ) %>%
    dplyr::select( person_id, date_first_prescription, nameofmedication ) %>%
    dplyr::filter( nameofmedication %in% ????? # I need a way to filter out any non-diabetes prescriptions that also happened on the date of the first diabetic prescription.
                 ) %>%
    dplyr::arrange( person_id )

ppl_with_diabetes_meds_first_prescription %>% dplyr::collect()

Mark records where there is a prescription for diabetes medications within the duration between a diagnosis and the first prescription. We assume this duration would be sufficient for lifestyle interventions. NOTE: This list of records is limited to those of people who were eventually prescribed diabetes medication. If the lifestyle intervention worked, then they are not included in this list.

In [15]:
# Mark records where there is a prescription for diabetes medications within 
# the duration between a diagnosis and the first prescription. We assume this
# duration would be sufficient for lifestyle interventions.
# NOTE: This list of records is limited to those of people who were eventually 
# prescribed diabetes medication. If the lifestyle intervention worked, then
# they are not included in this list.
ppl_with_delay_before_meds_treatment <-
    ppl_with_T2DM_diagnoses %>%
    dplyr::inner_join( ppl_with_diabetes_meds_first_prescription, by = join_by(person_id) ) %>% dplyr::collect() %>%
    dplyr::mutate(
        ppl_with_delay_before_meds_treatment = 
            base::ifelse(
                base::difftime( date_first_prescription, date_diagnosis, units = "weeks") >
                                (duration_between_diagnosis_and_first_prescription_in_months*4) # Assumes four weeks in a month.
                ,"Prescription delayed", "Prescription NOT delayed"
            )
    ) %>%
    dplyr::select( person_id, ppl_with_delay_before_meds_treatment )

Select patient IDs from records that indicated a high HbA1c just before diagnosis and mark them as either having or not having a prescription for diabetes drugs within a duration in which lifestyle interventions would be expected as a first choice. The `ppl_with_delay_before_meds_treatment` variable acts as our "exposure" or "grouping" variable.

In [16]:
# Select patient IDs from records that indicated a high HbA1c just before diagnosis
# and mark them as either having or not having a prescription for diabetes drugs 
# within a duration in which lifestyle interventions would be expected as a first
# choice.
cohort_of_interest <-
    ppl_with_high_HbA1c_just_before_diagnosis %>% dplyr::collect() %>%
    dplyr::inner_join( ppl_with_delay_before_meds_treatment, by = join_by( person_id ) ) %>%
    dplyr::select( person_id, ppl_with_delay_before_meds_treatment ) %>%
    dplyr::arrange( person_id )

In [20]:
i_pID = 2173

was_a_prescription_given

if_prescription_then_what_drug_name

if_prescription_then_what_drug_duration

if_prescription_then_what_drug_single_or_continued

final_HbA1c_value


person_id,ppl_with_delay_before_meds_treatment
<int>,<chr>
147,Prescription delayed
230,Prescription delayed
344,Prescription delayed
344,Prescription delayed
2173,Prescription NOT delayed
2841,Prescription delayed
