Skip to content
jreps edited this page Sep 15, 2022 · 4 revisions

Characterization Package Specification

Introduction

Purpose

To create an R package that provide users with a basic understanding of R and the OMOP CDM a user-friendly R interface for running large-scale characterization studies.

Intended Audience

Users with basic R knowledge and basic knowledge of the OMOP CDM.

Intended Use

Perform characterization studies at scale for a set of cohort(s) and characterization specifications.

Scope

To perform aggregate feature extraction, time-to-event and dechallenge/rechallenge characterizations with an R interface.

Needs to be able to generate/save/load json specifications (corresponding to the R analysis specification) and save results into csv and into a database result schema.

Requires shiny modules for interactively exploring results in a database (will be put in OhdsiShinyModules) and rmarkdown templates for generating human readable protocols for the specification.

Definitions and Acronyms

Overall Description

User Needs

Researchers will use the R package by specifying the types of characterization studies they wish to run, the analysis settings and then provide the R connection to a database with OMOP CDM data and the database settings.

Version 1.0.0 of the package will enable users to perform aggregate descriptions via FeatureExtraction, time-to-event and dechallenge/rechallenge studies for a set of user supplied cohorts and user specified analysis settings.

Assumptions and Dependencies

Assumptions:

  • Access to data in the OMOP CDM format
  • Database in a management system supported by OHDSI

Dependencies:

  • R (minimum version?) installed
  • Java runtime installed (until DBI is used in DatabaseConnector)
  • R packages installed:
    • FeatureExtraction
    • DatabaseConnector
    • SqlRender
    • OhdsiShinyModules
    • Andromeda
    • ParallelLogger

System Features and Requirements

Functional Requirements

The package needs to have functions to create study settings, create database results tables, execute each study type, generate plots of tables of results, save results as csv files or to a database.

Name Description Input Output
createTimeToEventSettings Create a setting object for the time to event analysis targetIds, outcomeIds List of class ‘timeToEventSettings’ with pair grid of all target and outcome combinations
computeTimeToEventAnalyses This function will execute the time to event analysis on a database based on the settings.  The time to event calculates how often and when outcomes occur relative to some target cohort. connectionDetails, target/outcome cohort database schema and table name, tempEmulationSchema, timeToEventSettings Andromeda object with a table called  ‘timeToEvent’
saveTimeToEventAnalyses Save the R time to event analyses object Output of computeTimeToEventAnalyses A compressed file with an sqlite database containing the time to event results
loadTimeToEventAnalyses Creates an R time to event analyses object from a saved time to event file Location with the folder ‘TimeToEvent’ Andromeda object with table ‘timeToEvent’
createDechallengeRechallengeSettings Create a setting object for the dechallenge rechallenge analysis targetIds, outcomeIds (timePostEvent - time currently 30 param – default to 30) List of class ‘dechallengeRechallengeSettings’ containing targetCohortDefinitionIds and outcomeCohortDefinitionIds and timePostEvent
computeDechallengeRechallengeAnalyses This function will execute the dechallenge rechallenge on a database based on the settings.  This calculates how often users of a drug have an outcome during drug exposure and how often the outcome stops when the drug exposure stops (dechallenge). Then, for those who had the outcome during drug exposure that stopped when the drug stopped, it counts how often the drug is restarted (rechallenge) and how often the outcome starts again after restarting the drug. connectionDetails, target/outcome cohort database schema and table name, tempEmulationSchema, dechallengeRechallengeSettings Andromeda object with table “dechallengeRechallenge”
computeRechallengeFailCaseSeriesAnalyses This functions will execute the rechallenge fail case series on a database based on the settings.This finds all the patients (can anonymize the ids) who had a dechallenge and rechallenge such that the outcome stopped when the drug stopped and restarted when the drug restarted (indicating causality). Timings between drug and outcome dates are given as offsets (no dates are returned). connectionDetails, target/outcome cohort database schema and table name, tempEmulationSchema, dechallengeRechallengeSettings,sensitive = T Andromeda object with table rechallengeFailCaseSeries 
saveDechallengeRechallengeAnalyses Save the R dechallenge rechallenge object Output of computeDechallengeRechallengeAnalyses A zipped file with an sqlite with the DECHALLENGE_RECHALLENGE table
saveRechallengeFailAnalyses Save the rechallenge fail case series object Output of computeRechallengeFailAnalyses A zipped file with an sqlite with the RECHALLENGE_FAIL_CASE_SERIES table
loadDechallengeRechallengeAnalyses Creates an R dechallenge rechallenge object from a saved dechallenge rechallenge file Location with the csv file Andromeda object with table dechallengeRechallenge
loadRechallengeFailAnalyses Creates an R rechallenge fail case series object from a saved rechallenge fail case series file Location with the csv file Andromeda object with table rechallengeFailCaseSeries
createAggregateCovariateSettings Create a setting object for the aggregate covariate analysis targetIds, oucomeIds, covariateSettings List of class ‘aggregateCovariateSettings’ with targetIds, outcomeIds and covariateSettings
computeAggregateCovariateAnalyses This functions will execute the aggregate covariate on a database based on the settings.This creates the following cohort subsets of the target and outcome cohorts: target with outcome during time-at-risk, target without outcome during time-at-risk, outcome with target in prior time-at-risk, target and outcome.  For all cohorts, FeatureExtraction is run based on user specified settings to generate aggregate features. connectionDetails, cohort database schema and table name, tempEmulationSchema, aggregateCovariateSettings Andromeda object containing continuous and binary feature summary for all combinations of TnO, TnOc, T, O
saveAggregateCovariateAnalyses Save the aggregate covariate object Output of computeAggregateCovariateAnalyses Two csv files covaraites_continuous.csv and covariates.csv
loadAggregateCovariateAnalyses Creates an R aggregate covariate object from a saved aggregate covariate file Location with the folder ‘AggregateCovariate’ Andromeda object containing continuous and binary feature summary
createCharacterizationSettings Input multiple time to event, dechallenge rechallenge and aggregate covariate settings to create a characterization settings object. List of timeToEventSettings, list of dechallengeRechallengeSettings, list of aggregateCovariateSettings A list of class 'characterizationSettings'
runCharacterizationAnalyses This functions will execute all the analyses specified in the characterization settings on a database.Results will be in Andromeda tables but also can be saved to csv files connectionDetails, target/outcome cohort database schema and table name, tempEmulationSchema, characterizationSettings Andromeda object with tables for each type of analyses run
saveCharacterizationSettings Save all the time to events, dechallenge rechallenges and aggregate covariates settings to a json file characterizationSettings Saves as a json file
loadCharacterizationSettings Loads all the time to events, dechallenge rechallenges and aggregate covariates settings into an R list from a json file Location of json file R list of class ‘characterizationSettings’
plotTimeToEvent Plots the time to event results Result object ggplot
plotDechallengeRechallenge Plots the dechallenge rechallenge results Result object ggplot
viewAggregateCovariate Displays a table with the aggregate covariate results Result object, Type = continuous or binary, two or more cohorts (join) View table + standardized diff

In OhdsiShinyModules we will create shiny modules for viewing the results and contain the rmarkdown templates for generating the protocol.

External Interface Requirements

An interface such as ATLAS may be used to generate the json specification for the characterization execute(). CohortGenerator (or manual SQL execution) will be required to create the cohorts used in this package. Cohort creation is out of scope in this package.

A module wrapper around the characterization package will be created to run characterization inside of strategus. This will contain a script for executing the characterization given a valid json spec, code to exporting to csv and the renv containing the package and dependencies.

System Features

Mac/Windows/Linux with R installed

Nonfunctional Requirements

  1. Performance: Code needs to run efficiently at scale (thousands of cohorts + study setting combination)
  2. Quality: Unit tests must be >80% coverage and passing.