# =============================================
# = Glazsiou study | 0 years since diagnosis | Insulin sensitivity check  =
# =============================================

The purpose of this notebook is to check what the effect is of different degrees of excluding people who are treated with insulin:
1. Excluding a person's entire record if they were ever prescribed insulin.
2. Excluding the portion of a person's record after their first prescription of insulin.
3. Including a clause in the definition of the HMA state that ignores insulin prescriptions.
<br></br>

The motivation behind the sensitivity analyses is that there were instances of inter-terst intervals being labelled as 'Adjust' because the interval between insulin prescriptions was larger than the threshold. The interval is not as relevant for insulin prescriptions so these inter-test intervals were mislabelled as 'Adjust'.

## Conclusions
1. Excluding insulin users barely changes the proportions.
2. There is no expectations that the other methods will have any effect, given the first conclusion.
3. Redefining 'Adjust' to ignore insulin does not change the proportions.

I didn't work on the 2nd method because that would have been more difficult to code, technically.

# Get requisite packages.

In [40]:
# Get requisite packages.
if( !"pacman" %in% installed.packages() )
{
  install.packages( "pacman" )
  libray( pacman )
}
pacman::p_load(
    bigrquery # Version ‘1.5.1’
    ,data.table # Version ‘1.16.0’
    ,GGally # Version ‘2.2.1’
    ,gtable # Version ‘0.3.6’
    ,grid # Version ‘4.4.1’
    ,gridExtra # Version ‘2.3’
    ,IRdisplay
    ,kableExtra
    ,paletteer # Version ‘1.6.0’
    ,readr # Version ‘2.1.5’
    ,tidytext # Version ‘0.4.2’
    ,tidyverse # Version ‘2.0.0’
    ,TraMineR # Version ‘2.2.10’
    ,TraMineRextras # Version ‘0.6.8’
)
#devtools::install_github("davidsjoberg/ggsankey")
#remove.packages("ggsankey")
devtools::install_github("ciaranmci/ggsankey", force = TRUE )

Downloading GitHub repo ciaranmci/ggsankey@HEAD




[36m──[39m [36mR CMD build[39m [36m─────────────────────────────────────────────────────────────────[39m
* checking for file ‘/var/tmp/Rtmp1wtJnn/remotes1deb784940fd/ciaranmci-ggsankey-821b0e3/DESCRIPTION’ ... OK
* preparing ‘ggsankey’:
* checking DESCRIPTION meta-information ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘ggsankey_0.0.99999.tar.gz’



Installing package into ‘/home/jupyter/.R/library’
(as ‘lib’ is unspecified)



# Set cohort parameters

In [41]:
# Study dates
# ## The date before which a patient must have had their diagnosis.
date_diagnosis_threshold <- lubridate::ymd('2000-01-01')
# ## The date after which test and intervention records will be studied.
followup_delay_in_years <- 0
date_followup_start <- date_diagnosis_threshold + lubridate::years( followup_delay_in_years )
# ## The date before which test and intervention records will be studied.
followup_duration_in_years <- 10
date_followup_end <- date_followup_start + lubridate::years( followup_duration_in_years )

# Set the duration of the window back in time to review prescriptions when identifying
# the HMA status.
HMA_adjust_lookBack_window <- lubridate::weeks( 16 )

# Set upper and lower thresholds for acceptable values of the test.
test_value_cutoff_lower <- 20
test_value_cutoff_upper <- 200

# Threshold for the expected interval between subsequent tests, in months
val_testing_interval_LB <- 2
val_testing_interval_UB <- 5

# Set values for meaningful changes in the values of the test.
val_meaningful_test_improvement <- -10
val_meaningful_test_disimprovement <- 10

# Set window within which to search for repeated (but not repeat) prescriptions.
window_repeated_prescription_months <- 3

# Set number of tests, treatments, or iterations after diagnosis that should be tracked.
n_iterations <- followup_duration_in_years*2

# Set the window within which mutimorbidity diagnoses and the index diagnosis must fit in, in months.
multimorb_inclusion_window_months <- 60

# Set the window outwith which at least two mutimorbidity diagnoses must be of each other, in months.
multimorb_gap_window_months <- 1

# Generate the cohort.

In [42]:
source('RESHAPE_cohort_generator.r')

Auto-refreshing stale OAuth token.



# Format the data.

In [43]:
source('RESHAPE_format_the_data.r')

[1m[22m`summarise()` has grouped output by 'person_id'. You can override using the
`.groups` argument.


# 1. Excluding a person's entire record if they were ever prescribed insulin.

In [75]:
# Table of HMA states before removing insulin people.
tbl_intervals_per_HMA__all_people <-
    df_log_PandT_longFormat_simplified_StrataLabels %>%
    dplyr::distinct( person_id, idx_test_interval, HMA ) %>%
    dplyr::filter( HMA != "Unobserved", !is.na( HMA ) ) %>%
    dplyr::arrange( person_id, idx_test_interval ) %>%
    dplyr::group_by( HMA ) %>%
    dplyr::reframe( n = n() )
n_people__all_people <-
    df_log_PandT_longFormat_simplified_StrataLabels %>%
    dplyr::distinct( person_id ) %>%
    nrow()
n_intervals__all_people <- 
    tbl_intervals_per_HMA__all_people %>%
    dplyr::pull() %>%
    sum()


# Table of HMA states of insulin people and non insulin people.
tbl_indicating_insulin_or_not <- 
    df_log_PandT_longFormat_simplified_StrataLabels %>%
    dplyr::group_by( person_id ) %>%
    dplyr::mutate( used_insulin = as.logical( any( event_value == "Insulin" ) ) ) %>%
    dplyr::ungroup() %>%
    dplyr::select( person_id, idx_test_interval, HMA, used_insulin )

tbl_intervals_per_HMA__insulin_users_only <-
    tbl_indicating_insulin_or_not %>%
    dplyr::filter( used_insulin == TRUE ) %>%
    dplyr::distinct( person_id, idx_test_interval, HMA ) %>%
    dplyr::filter( HMA != "Unobserved", !is.na( HMA ) ) %>%
    dplyr::arrange( person_id, idx_test_interval ) %>%
    dplyr::group_by( HMA ) %>%
    dplyr::reframe( n = n() )
n_people__insulin_users_only <-
    tbl_indicating_insulin_or_not %>%
    dplyr::filter( used_insulin == TRUE ) %>%
    dplyr::distinct( person_id ) %>%
    nrow()
n_intervals__insulin_users_only <- 
    tbl_intervals_per_HMA__insulin_users_only %>%
    dplyr::pull() %>%
    sum()

tbl_intervals_per_HMA__non_insulin_users_only <-
    tbl_indicating_insulin_or_not %>%
    dplyr::filter( used_insulin == FALSE ) %>%
    dplyr::distinct( person_id, idx_test_interval, HMA ) %>%
    dplyr::filter( HMA != "Unobserved", !is.na( HMA ) ) %>%
    dplyr::arrange( person_id, idx_test_interval ) %>%
    dplyr::group_by( HMA ) %>%
    dplyr::reframe( n = n() )
n_people__non_insulin_users_only <-
    tbl_indicating_insulin_or_not %>%
    dplyr::filter( used_insulin == FALSE ) %>%
    dplyr::distinct( person_id ) %>%
    nrow()
n_intervals__non_insulin_users_only <- 
    tbl_intervals_per_HMA__non_insulin_users_only %>%
    dplyr::pull() %>%
    sum()








# Presentation.
# ## Totals.
print( paste0("Total number of people = ", n_people__all_people ) )
print( paste0("Total number of insulin users = ", n_people__insulin_users_only ) )
print( paste0("Total number of non-insulin users = ", n_people__non_insulin_users_only ) )

print( paste0("Total number of inter-test intervals for all people = ", n_intervals__all_people) )
print( paste0("Total number of inter-test intervals for insulin users people = ", n_intervals__insulin_users_only) )
print( paste0("Total number of inter-test intervals for non-insulin users people = ", n_intervals__non_insulin_users_only) )

# ## Breakdown by HMA state.
print( "The breakdown of inter-test intervals by HMA state for each cohort:" )
dplyr::left_join(
    tbl_intervals_per_HMA__all_people
    ,tbl_intervals_per_HMA__insulin_users_only
    ,by = join_by( HMA )
    ) %>%
dplyr::left_join(
    tbl_intervals_per_HMA__non_insulin_users_only
    ,by = join_by( HMA )
    ) %>%
`colnames<-`( c( "HMA", "All people", "Insulin users" , "Non-insulin users" ) ) %>%
dplyr::mutate(
    `Relative proportion` = round( `All people` / sum( `All people` ), 2 ) 
    ,`Proportion insulin users` = round( `Insulin users` / `All people`, 2 )
    ,`Relative proportion insulin users` = round( `Insulin users` / sum( `Insulin users` ), 2 ) 
    ,`Proportion non-insulin users` = round( `Non-insulin users` / `All people`, 2)
    ,`Relative proportion non-insulin users` = round( `Non-insulin users` / sum( `Non-insulin users` ), 2 ) 
) %>%
dplyr::select(
    `HMA`, `All people`
    ,`Non-insulin users`
    ,`Insulin users`
    , `Relative proportion`
    , `Relative proportion non-insulin users`
    , `Relative proportion insulin users`
)

print( "The relative proportion of HMA states in the cohort with and without the insulin users is essentially the same." )
print( "This is why there is no discernable difference between the plots with and without the insulin users." )

### There appears to be no difference when I remove the 1032 people who used insulin.

[1] "Total number of people = 7807"
[1] "Total number of insulin users = 1033"
[1] "Total number of non-insulin users = 6774"
[1] "Total number of inter-test intervals for all people = 37719"
[1] "Total number of inter-test intervals for insulin users people = 7122"
[1] "Total number of inter-test intervals for non-insulin users people = 30597"
[1] "The breakdown of inter-test intervals by HMA state for each cohort:"


HMA,All people,Non-insulin users,Insulin users,Relative proportion,Relative proportion non-insulin users,Relative proportion insulin users
<fct>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>
Hold,18262,15416,2846,0.48,0.5,0.4
Monitor,14105,11335,2770,0.37,0.37,0.39
Adjust,5352,3846,1506,0.14,0.13,0.21


[1] "The relative proportion of HMA states in the cohort with and without the insulin users is essentially the same."
[1] "This is why there is no discernable difference between the plots with and without the insulin users."


# 2. Excluding the portion of a person's record after their first prescription of insulin.

In [None]:
NOT DONE

# 3. Including a clause in the definition of the HMA state that ignores insulin prescriptions.

This is the most problematic adjustment to make because I must answer _What label do I use if the prescription __is__ for insulin?_

The original problem was that insulin was being prescribed many weeks after the previous prescription of insulin, which is considered to be an adjustment to treatment. I suppose the appopriate label is 'Hold' because the inter-test interval is definitely not shorter than expected and there is no consideration for the change in prescription.

But what happens if something else is prescribed as well as insulin? I can't simply ignore the inter-test interval because insulin is present.

I need to get the list of people who have used insulin as well as another drug.

In [45]:
df_people_using_more_than_just_insulin <-
    df_log_PandT_longFormat_simplified_StrataLabels %>%
    dplyr::distinct( person_id, idx_test_interval, event_value ) %>%
    # Ignore the rows related to tests and that are 'Unobserved'.
    dplyr::filter(
        !stringr::str_detect( event_value, pattern = "Test" )
        ,!stringr::str_detect( event_value, pattern = "Unobserved" )
    ) %>%
    # Filter for the intervals that contain an insulin prescription.
    dplyr::group_by( person_id, idx_test_interval ) %>%
    dplyr::filter( any( event_value == "Insulin" ) ) %>%
    dplyr::ungroup() %>%
    # Now remove the rows that refer to insulin, which should leave me with the intervals that contain more than insulin.
    dplyr::filter( event_value != "Insulin" )

I now need to compare these prescriptions with the prior window to see if these prescriptions indiated an 'Adjust' state or not.

The first thing to do is to extrct these people from 'df_log_PandT_longFormat_simplified_StrataLabels' so that I am only dealing with them.

In [46]:
# I now need to compare these prescriptions with the prior window to see if these prescriptions indiated an 'Adjust' state or not.

# The first thing to do is to extrct these people from 'df_log_PandT_longFormat_simplified_StrataLabels' so that I am only 
# dealing with them.
df_main_table_but_only_insulin_users <-
    df_people_using_more_than_just_insulin %>%
    dplyr::distinct( person_id ) %>%
    dplyr::inner_join(
        df_log_PandT_longFormat_simplified_StrataLabels
        ,by = join_by( person_id )
    ) %>%
    dplyr::filter( event_value != "Insulin" ) %>%
    dplyr::select( -HMA )

I now need to apply the window method to define the 'Adjust' state, but ignoring insulin.

In [47]:
# I now need to apply the window method to define the 'Adjust' state, but ignoring insulin.
df_main_table_but_only_insulin_users <-
    df_main_table_but_only_insulin_users %>%
    # Create new columns that contain the test events and the prescription events.
    dplyr::select( person_id, start_dttm, event_value, idx_test_interval ) %>%
    dplyr::filter( !stringr::str_detect( event_value, pattern = "Test") ) %>%
    dplyr::filter( !stringr::str_detect( event_value, pattern = "Unobserved") ) %>%
    dplyr::mutate( meds_only = as.character( event_value ) ) %>%
    dplyr::select( -event_value ) %>%
    # Remove duplicated prescriptions occuring on the same day.
    dplyr::distinct( ) %>%
    # Create a list-column of the prescriptions given in the preceding months of a given prescription, `HMA_adjust_lookBack_window`.
    tidyr::nest( .by = person_id ) %>%
    dplyr::mutate(
        meds_set_window = 
            lapply(
                data
                ,function(x) slider::slide_index(.x = x$meds_only, .i = x$start_dttm, .f = ~.x, .before = HMA_adjust_lookBack_window ) %>% array()
            )
        ) %>%
    tidyr::unnest( cols = c( data, meds_set_window ) ) %>%
    # Remove the current prescription from `meds_set_window`.
    dplyr::mutate(
        meds_set_window = purrr::map2(
            meds_only
            ,meds_set_window
            ,function( med, med_set) {
                idx <- which( med_set == med )
                if ( length( idx ) > 0) {
                    med_set[ -idx[ 1 ] ]
                } else {
                    med_set
                }
            }
        )
    ) %>%
    # Join this data.frame to one that contains a list-column of the prescriptions given on a particular day.
    dplyr::left_join(
        x = dplyr::select( ., - meds_only )
        # Create a list-column of the prescriptions given on a particular day.
        ,y = stats::aggregate(
                    meds_only ~ person_id + start_dttm
                    ,data = .
                    ,FUN = list
                    ,na.action = NULL
                ) %>% dplyr::arrange( person_id, start_dttm )
        ,by = join_by( person_id, start_dttm )
        ) %>%
    # Find the difference between the history of meds and the meds on the day.
    dplyr::mutate( new_meds = purrr::map2( meds_only, meds_set_window,  ~!all( .x %in% .y ) ) ) %>%
    as.data.frame() %>%
    dplyr::select( -c( meds_only, meds_set_window ) ) %>%
    # Label any additional meds as ' Adjust'.
    dplyr::rowwise() %>%
    dplyr::mutate( HMA = dplyr::if_else( new_meds, 'Adjust', 'Not adjust' ) ) %>%
    dplyr::ungroup() %>%
    as.data.frame() %>%
    dplyr::select( -new_meds ) %>%
    # Set the entire inter-test intervals to an 'Adjust' or 'Not adjust' value based
    # whether an adjust event occured during that window.
    dplyr::group_by( person_id, idx_test_interval ) %>%
    dplyr::mutate( HMA = dplyr::if_else( 'Adjust' %in% HMA, 'Adjust', 'Not adjust') ) %>%
    dplyr::ungroup() %>%
    as.data.frame() %>%
    # Rejoin to the original dataframe.
    dplyr::select( - start_dttm ) %>%
    dplyr::distinct( ) %>%
    dplyr::right_join(
            df_main_table_but_only_insulin_users 
            ,by = join_by( person_id, idx_test_interval )
        ,relationship = "one-to-many"
        ) %>%
    # Impute the value for `idx_test_interval` for the event_value == `Unobserved` event.
    dplyr::group_by( person_id ) %>%
    dplyr::mutate(
        idx_test_interval = dplyr::if_else( ( event_value == 'Unobserved' ) & ( !all(is.na(idx_test_interval)) ), max( idx_test_interval, na.rm = TRUE ) + 1, idx_test_interval )
    ) %>%
    dplyr::ungroup() %>%
    # Remove any record with `idx_test_interval` == 0. This exlcudes all prescriptions before
    # the first non-diagnostic test recorded. The syntax below will also exclude the diagnosis event.
    dplyr::filter( idx_test_interval != 0 ) %>%
    suppressWarnings()

The final step is to combine the two components of the stratification's definition into a single variable called HMA. Take note that the 'Adjust' label overrules a 'Monitor' label.

In [48]:
# The final step is to combine the two components of the stratification's definition into a single variable called HMA.
# Take note that the 'Adjust' label overrules a 'Monitor' label.
df_main_table_but_only_insulin_users <-
    df_main_table_but_only_insulin_users %>%
    dplyr::arrange( person_id, idx_test_interval, start_dttm ) %>%
    dplyr::mutate(
        HMA =
            dplyr::case_when(
                event_value == "Unobserved" ~ "Unobserved"
                
                ,HMA == "Adjust" ~ "Adjust"
                ,inter_test_duration_discr == "As expected" ~ "Hold"
                ,inter_test_duration_discr == "Shorter than expected" ~ "Monitor"

                ,TRUE ~ NA_character_
            )
    ) %>%
    # Set as a factor datatype and order the factors.
    dplyr::mutate( HMA = factor( HMA, levels = df_HMA_factor %>% dplyr::select( HMA_fct_order ) %>% dplyr::pull() ) ) %>%
    dplyr::arrange( person_id, start_dttm )

I now join the updated definition of the HMA state to the original data frame, for comparison.

In [60]:
df_main_table_but_only_insulin_users <-
    df_main_table_but_only_insulin_users %>%
    dplyr::distinct( person_id, idx_test_interval, HMA ) %>%
    dplyr::rename( updated_HMA = HMA ) %>%
    dplyr::right_join(
        df_log_PandT_longFormat_simplified_StrataLabels
        ,by = join_by( person_id, idx_test_interval )
        ,relationship = "many-to-many"
        ) %>%
    # The dataframe using the updated definition of 'Adjust' did not contain all the people so my new, combined dataframe
    # so I need to impute the values for these people. I will do it by accepting the original HMA value if the new HMA is 
    # NA. 
    dplyr::mutate(
        updated_HMA = dplyr::if_else( is.na( updated_HMA ), HMA, updated_HMA )
        )

We can check the breakdown of states in this updated dataframe.

In [73]:
original_state_breakdown <-
    df_main_table_but_only_insulin_users %>%
    dplyr::group_by( HMA ) %>%
    dplyr::reframe( n = n() ) %>%
    dplyr::ungroup() %>%
    dplyr::mutate( `Relative proportion` = round( n / sum( n ), 2) ) %>%
    dplyr::filter( HMA != "Unobserved", !is.na( HMA ) )

updated_state_breakdown <-
    df_main_table_but_only_insulin_users %>%
    dplyr::group_by( updated_HMA ) %>%
    dplyr::reframe( n = n() ) %>%
    dplyr::ungroup() %>%
    dplyr::mutate( `Relative proportion` = round( n / sum( n ), 2) ) %>%
    dplyr::filter( updated_HMA != "Unobserved", !is.na( updated_HMA ) ) %>%
    dplyr::rename( HMA = updated_HMA )

dplyr::left_join(
    original_state_breakdown
    ,updated_state_breakdown
    ,by = join_by( HMA )
    ) %>%
dplyr::select( HMA, n.x, n.y, `Relative proportion.x`, `Relative proportion.y` ) %>%
`colnames<-`( c( "State", "Original count", "Updated count", "Original proportion", "Updated proportion" ) )

State,Original count,Updated count,Original proportion,Updated proportion
<fct>,<int>,<int>,<dbl>,<dbl>
Hold,45681,45897,0.33,0.33
Monitor,26316,26479,0.19,0.19
Adjust,43314,42996,0.31,0.31


The count of 'Adjust' states has gone down but the relative proportion remains unchanged, at the resolution of two decimal places or a proportion.