/
using-cohortincidence.Rmd
273 lines (201 loc) · 12.6 KB
/
using-cohortincidence.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
---
title: "Using CohortIncidence"
author: "Christopher Knoll"
date: "`r Sys.Date()`"
output:
pdf_document:
number_sections: yes
toc: yes
html_document:
number_sections: yes
toc: yes
vignette: >
%\VignetteIndexEntry{Using CohortIncidence}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
error = FALSE,
comment = "#>"
)
```
```{r, setup, echo=FALSE}
library(CohortIncidence)
```
# Introduction
This vignette describes how to use the `CohortIncidence` package to perform a single incidence rate analysis for a given target and outcome cohort, with a few settings for Time At Risk and Clean Window.
# Installation instructions
Before installing the `CohortIncidence` package make sure you have Java available. Java can be downloaded from [www.java.com](http://www.java.com). For Windows users, RTools is also necessary. RTools can be downloaded from [CRAN](http://cran.r-project.org/bin/windows/Rtools/).
The `CohortIncidence` package is currently maintained in a [Github repository](https://github.com/OHDSI/CohortIncidence).
```{r eval=FALSE}
install.packages("remotes")
remotes::install_github("ohdsi/CohortIncidence")
```
Once installed, you can type `library(CohortIncidence)` to load the package.
# Database Preparation
The results of the anlaysis SQL will assume a final table: `@results_database_schema.incidence_summary`. The DDL for this table can be fetched from the package via the following:
```{r, comment = ""}
# Fetch DDL from package
ddl <- CohortIncidence::getResultsDdl()
cat(ddl)
```
Using `SqlRender` and `DatabaseConnector`, you can execute the above on your target database platform in order to deploy the table. Remember to replace `@schemaName` with the appropriate schema. You can also 'hack' the `@schemaName` paramater to apply a prefix by specifying the SqlRender paramater of `schemaName.incidence_summary` to a target table ie: `mySchema.prefix_incidence_summary`. Using the same paramater name/value in `buildQuery()` will allow you to provide a prefix to the result table name instead of having to declare a separate schema.
```{r, eval = FALSE}
connectionDetails <- DatabaseConnector::createConnectionDetails(dbms = "postgresql",server={Sys.getenv("server")}, port = Sys.getenv("port"))
# to specify the target schema (the typical use case):
ddl <- SqlRender::render(CohortIncidence::getResultsDdl(), schemaName = "mySchema")
# a work-around to provide a prefix to the result table, in case creating new schema is restricted
ddlPrefix <- SqlRender::render(CohortIncidence::getResultsDdl(), "schemaName.incidence_summary" = "mySchema.prefix_incidence_summary")
con <- DatabaseConnector::connect(connectionDetails)
DatabaseConnector::executeSql(ddl)
DatabaseConnector::disconnect(con)
```
# A simple example
This example will create a CohortIncidence design containing a single target, outcome, and time at risk.
## Build the design
The following script builds a single T, O and Time at Risk, and assembles those element into a design. Finally, the resulting JSON is printed.
```{r}
t1 <- CohortIncidence::createCohortRef(id=1, name="Target cohort 1")
o1 <- CohortIncidence::createOutcomeDef(id=1,name="Outcome 1, 30d Clean",
cohortId =2,
cleanWindow =30)
tar1 <- CohortIncidence::createTimeAtRiskDef(id=1,
startWith="start",
endWith="end",
endOffset=30)
# Note: c() is used when dealing with an array of numbers,
# later we use list() when dealing with an array of objects
analysis1 <- CohortIncidence::createIncidenceAnalysis(targets = c(t1$id),
outcomes = c(o1$id),
tars = c(tar1$id))
subgroup1 <- CohortIncidence::createCohortSubgroup(id=1, name="Subgroup 1", cohortRef = createCohortRef(id=300))
# Create Design (note use of list() here):
irDesign <- CohortIncidence::createIncidenceDesign(targetDefs = list(t1),
outcomeDefs = list(o1),
tars=list(tar1),
analysisList = list(analysis1),
subgroups = list(subgroup1))
# Render the design as JSON
irDesign$asJSON(pretty = T)
```
## Using age, gender and start year strata
The IR design can also include settings to specify if an analysis should be done at the age, gender or start year levels (or any combination of those choices).
To use this function, you create the strata settings with the `CohortIncidence::createStrataSettings)` function:
```{r}
irDesignWithStrata <- CohortIncidence::createIncidenceDesign(targetDefs = list(t1),
outcomeDefs = list(o1),
tars=list(tar1),
analysisList = list(analysis1),
subgroups = list(subgroup1),
#add by age and by gender strata, but don't do by start year.
strataSettings = CohortIncidence::createStrataSettings(byGender=T, byAge=T, ageBreaks = c(17,34,65)))
```
## Using executeAnalysis()
If there is no need to see the analysis sql or control the output of the analysis to a permenant table, the `executeAnalysis()` function
can be used to perform the cohort incidence using a simple API:
```{r, eval=FALSE}
buildOptions <- CohortIncidence::buildOptions(cohortTable = "demoCohortSchema.cohort",
cdmDatabaseSchema = "mycdm",
sourceName = "mysource",
refId = 1)
executeResults <- CohortIncidence::executeAnalysis(connectionDetails = connectionDetails,
incidenceDesign = irDesign,
buildOptions = buildOptions)
```
The results will contain the same table structure as the results schema `incidence_summary`:
|Name |Description|
|-----|--------|
|REF_ID|The reference id specified in buildOptions() to track results to the analysis execution.|
|SOURCE_NAME|The name of the source for these results|
|TARGET_COHORT_DEFIITION_ID|The cohort ID of the target population|
|TARGET_NAME|The name of the target cohort|
|TAR_ID|The TAR identifier|
|TAR_START_WITH|Indicates if the TAR starts with the 'start' or 'end' of the target cohort episode|
|TAR_START_OFFSET|The days added to the date field specified in TAR_START_WITH|
|TAR_END_WITH|Indicates if the TAR ends with the 'start' or 'end' of the target cohort episode|
|TAR_END_OFFSET|The days added to the date field specified in TAR_END_WITH|
|SUBGROUP_ID|The subgroup identifier|
|SUBGROUP_NAME|The name of the subgroup|
|OUTCOME_ID|The outcome identifier|
|OUTCOME_COHORT_DEFINITION_ID|The cohort ID of the outcome population|
|OUTCOME_NAME|The outcome name|
|CLEAN_WINDOW|The clean window for this outcome definition|
|AGE_ID|The age ID for this strata representing the age band specified in the strata settings|
|AGE_GROUP_NAME|The name for this age group|
|GENDER_ID| The gender concept ID for this gender strata|
|GENDER_NAME| The name of the gender|
|START_YEAR|The year strata, defined by using the year the TAR started|
|PERSONS_AT_RISK_PE|Distinct persons at risk before removing excluded time from TAR|
|PERSONS_AT_RISK|Distinct persons at risk after removing excluded time from TAR. A person must have at least 1 day TAR to be included.|
|PERSON_DAYS_PE|Total TAR (in days) before excluded time was removed from TAR.|
|PERSON_DAYS|Total TAR (in days) after excluded time was removed from TAR.|
|PERSON_OUTCOMES_PE|Distinct persons with outcome before removing excluded time from TAR|
|PERSON_OUTCOMES|Distinct persons with outcome after removing excluded time from TAR. A person must have at least 1 day TAR to be included.|
|OUTCOMES_PE|Number of cases before excluding TAR.|
|OUTCOMES|Number of cases after excluding TAR.|
|INCIDENCE_PROPORTION_P100P|The Incidence Proportion (per 100 people), calculated by person_outcomes / persons_at_risk * 100|
|INCIDENCE_RATE_P100PY|The Incidence Rate (per 100 person years), calculated by outcomes / person_days / 365.25 * 100|
# Advanced Usage: Budling and Executing SQL manually
There may be a reason (debugging or special processing steps) Where you would want to access the analysis SQL before execution.
The following sections describe how to fetch the SQL, translate and execute the statements.
## Build analysis SQL from design
From the previous design, the `CohortIncidence::buildQuery()` method is used to generate the analysis SQL:
```{r}
buildOptions <- CohortIncidence::buildOptions(cohortTable = "demoCohortSchema.cohort",
cdmDatabaseSchema = "mycdm",
resultsDatabaseSchema = "myresults",
sourceName = "mysource",
refId = 1)
analysisSql <- CohortIncidence::buildQuery(incidenceDesign = as.character(irDesign$asJSON()),
buildOptions = buildOptions)
cat(analysisSql)
```
## Render SQL with paramaters and execute
With the previous analysis design and options used to generate the analysisSql, the next step is to render the SQL to provide any remaining parameters, translate, and execute on the database:
```{r, eval=FALSE}
# if you didn't pass sourceName to buildOptions(), you can render it here
analysisSql <- SqlRender::render(analysisSql, "sourceName" = "OptumDOD")
analysisSql <- SqlRender::translate(analysisSql, "postgresql")
cat(analysisSql)
conn <- DatabaseConnector::connect(connectionDetails)
DatabaseConnector::executeSql(conn, paste0("DELETE FROM myresults.incidence_summary WHERE ref_id = ", buildOptions$refId$intValue()))
DatabaseConnector::executeSql(conn, analysisSql)
DatabaseConnector::disconnect(conn)
```
## Using Temp Tables
Sometimes, it is not convenient or possible to create dedicated tables to store the results. Instead, the useTempTables option can be used to
place the incidence results into a temp table 'incidence_summary', where they can be ETL'd to another table or exported to a CSV.
The following example demonstrates the additional steps that are necessary if you want to use temp tables:
```{r, eval=FALSE}
# given the prior irDesign constructed from the previous example
buildOptions <- CohortIncidence::buildOptions(cohortTable = "demoCohortSchema.cohort",
cdmDatabaseSchema = "mycdm",
sourceName = "mysource"
useTempTables = T,
refId = 2)
analysisSql <- CohortIncidence::buildQuery(incidenceDesign = as.character(jsonlite::toJSON(irDesign)),
buildOptions = buildOptions)
analysisSql <- SqlRender::translate(analysisSql, "postgresql")
# if we are using temp tables, the steps to execute the analysis are
# 1) create result temp tables
# 2) execute the analysis query, placing the results into the temp table incidence_summary
# 3) Extract/copy the results from the temp tables
# 4) clean up temp tables
conn <- DatabaseConnector::connect(connectionDetails)
tempDDL <- SqlRender::translate(CohortIncidence::getResultsDdl(useTempTables=T), "postgresql")
DatabaseConnector::executeSql(conn, tempDDL)
DatabaseConnector::executeSql(conn, analysisSql)
# In this example, copy to a permanent table from the temp table, but the results could be downloaded to CSV
exportSql <- SqlRender::translate("insert into mySchema.prefix_incidence_summary select * from #incidence_summary", "postgresql");
DatabaseConnector::executeSql(conn, exportSql)
# or download the results to a dataframe
results <- DatabaseConnector::querySql(conn, SqlRender::translate("select * from #incidence_summary", "postgresql"))
# use the getCleanupSql to fetch the DROP TABLE expressions for the tables that were created in tempDDL.
cleanupSql <- SqlRender::translate(CohortIncidence::getCleanupSql(useTempTables=T), "postgresql")
DatabaseConnector::executeSql(conn, cleanupSql)
DatabaseConnector::dbDisconnect(conn)
```