-
Notifications
You must be signed in to change notification settings - Fork 0
/
harmonization.Rmd
435 lines (330 loc) · 28.1 KB
/
harmonization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
---
title: "Harmonization Information"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Harmonization Information}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r load-data, echo = FALSE}
#library(public.ctn0094data)
```
One of the objectives for CTN-0094 was to harmonize data from three clinical trials, ctn_0027, ctn_0030, and ctn_0051. This vignette describes harmonization details and the identification/fixing of problematic values in the trial data. Every dataset in this package has its own documentation. To help protect the anonymity of the study participants two steps were taken. First, the study site information was modified (see the documentation for [site_masked](../html/site_masked.html) for details). Second, all dates have been replace by the number of days relative to study consent. Therefore, some information, like the day of drug use in the month before enrollment is stored as negative numbers. Below, you will find additional details on the harmonization process. Section headings correspond to data sets.
# The `all_drugs` dataset {#all_drugs}
The `all_drugs` dataset is an agglomeration of all self-reported drugs, drugs found in urine drug screening and alcohol screening in ctn_0027, ctn_0030, and ctn_0051. This data is the result of extensive preprocessing of free text to harmonize drug names, but drugs were not collapsed into groups. For example, the many descriptions, abbreviations, and spellings of variants of suboxone (e.g., “street suboxone”, “bup/nx”, “buxnx”, “bupnx”, “pbupnx”, “bupxx”) were harmonized into a single “suboxone” group but suboxone was not collapsed with other buprenorphine formulations.
While there were **many** spellings and text variants (including mg and location where drug was administered), the list in [Table 1](#tab:final) summarizes the additional text and the changes were made to the free text entries. Many free text entries included the combination of two or more drugs. In these cases, a record was created for each drug. For example, the free text entry of 'Amitiptyline & Trazadone' (literal incorrect spelling from record used here for example) was converted two records: 'Tryclic-antidepressant' and 'Antidepressant'.
<p id="tab:final", style="color:gray;">
Table 1: Recoded Free Text Descriptions of Drugs.
</p>
| Original Text | Final Text |
|:---------------------------|:------------------------------------------------|
| 'Acid' | 'Hallucinogen' |
| 'Adderall' | 'Amphetamine' |
| 'Ambien' | 'Sedative-Hypnotic' |
| 'Amitiptyline & Trazadone' | 'Tryclic-antidepressant' & 'Antidepressant' |
| 'Angel Dust' | 'PCP' |
| 'Ativan' | 'Benzodiazepine' |
| 'Baclofen' | 'Muscle Relaxant' |
| 'Bath Salts' | 'Cathinones' |
| 'Bup/Nx & Tramadol' | 'Suboxone' & 'Tramadol' |
| 'Cannabinoids' | 'THC' |
| 'Carisoprodol' | 'Muscle Relaxant' |
| 'Darvocet' | 'Propoxyphene' & 'Acetaminophen' |
| 'DXM' | 'Dextromethorphan' |
| 'Dust' | 'PCP' |
| 'Ecstasy' | 'MDMA' |
| 'Fioricet' | 'Barbiturate' & 'Caffeine' |
| 'Flexeral' | 'Muscle Relaxant' |
| 'Hallucinogens inc mdma' | 'Hallucinogen' & 'MDMA' |
| 'Heroin/Opium' | 'Heroin' & 'Opium' |
| 'Hydroxyzin' | 'Antihistamine' |
| 'Keflex' | 'Antibiotic' |
| 'Klonopin' | 'Benzodiazepine' |
| 'Librium Detox' | 'Benzodiazepine' |
| 'LSD' | 'Hallucinogen' |
| 'Lunesta' | 'Sedative-Hypnotic' |
| 'Marijuana' | 'THC' |
| 'Meth' | 'Methamphetamine' |
| 'Ms Contin' | 'Morphine' |
| 'Mushroom' | 'Hallucinogen' |
| 'Neurotin' | 'Gabapentin' |
| 'Norco' | 'Hydrocodone' & 'Acetaminophen' |
| 'Participant Was Unsure Whether She Took A Percocet Or A Vicodin' | 'Opioid' |
| 'Penicilin (Not Ppt Rx)' | 'Antibiotic' |
| 'Penicillin - Mushrooms (Psilocybin)' | 'Hallucinogen' |
| 'Soma Codeine' | 'Muscle Relaxant' & 'Codeine' |
| 'Somas' | 'Muscle Relaxant' |
| 'Percocets And Vicodin' | 'Oxycodone' & 'Hydrocodone' |
| 'Percoset' | 'Hydrocodone' |
| 'Phenagrin' | 'Antiemetic' |
| 'Phenergran With Codeine' | 'Antiemetic' & 'Codeine' |
| 'Phenobarbi' | 'Barbiturate' |
| 'Promethazine, Clonidine' | 'Antiemetic' & 'Clonidine' |
| 'Quetiapine' | 'Antipsychotic' |
| 'Remeron' | 'Antidepressant' |
| 'Ritalin' | 'Methylphenidate' |
| 'Seroquel' | 'Antipsychotic' |
| 'Sleeping Pill' | 'Sedative-Hypnotic' |
| 'Snow' | 'Cathinones' |
| 'Speed' | 'Amphetamine' |
| 'Speed Ball' | 'Heroin' & 'Cocaine' |
| 'Spice | 'K2' |
| 'Subutex' | 'Buprenorphine' |
| 'Sudafed' | 'Pseuedoephidrine' |
| 'Tranquilizers' | 'Sedative-Hypnotic' |
| 'Trazodone(Desyrel)' | 'Trazodone' |
| 'Tylonol 3' | 'Codeine' & 'Acetaminophen' |
| 'Tylenol PM' | 'Acetaminophen' & 'Benadryl' |
| 'Ultracet' | 'Tramadol' & 'Acetaminophen' |
| 'Valium' | 'Benzodiazepine' |
| 'Vicodin' | 'Hydrocodone' |
| 'Vistaril' | 'Antihistamine' |
| 'Wet (Pcp)' | 'PCP' |
| 'Zolpidem' | 'Sedative-Hypnotic' |
The timeline-followback (TLFB) data had many dozens of typos in dates., nearly all of which could be fixed by looking at form completion dates and the dates before and after the problematic records. In ctn_0030, out of nearly 40 problematic dates, only one could not be fixed and the record was dropped. ctn_0051 stored TLFB data in two files. One file contained start and stop dates for the baseline TLFB assessment. The second contained the results for each day. These files were occasionally inconsistent. Five start/stop records were modified based on the randomization data and the dated data.
Urine drug screening (UDS) records also had many date problems. In ctn_0027, approximately 100 records had UDS screening dates that were problematic. There were a dozen problematic dates in ctn_0030. All could be unambiguously fixed using other dated records and form date stamps.
Self-reported drug information in the TLFB in ctn_0027 allowed for free-text entry of any substance. The TLFB for ctn_0030 and ctn_0051 used structured questions to assess the use of alcohol and drugs. Specifically, the TLFB for ctn_0030 only checked for these drugs listed in [Table 2](#ctn3051tlfb). It allowed for ***free-text entry of only other opiates***, all other abused substances are unknown. Frequently appearing "other opiates" included "Suboxone", "Buprenorphine", "Darvocet" and "Fentanyl." CTN-51 used the more comprehensive set of drugs listed in [Table 2](#ctn3051tlfb) and it allowed for up to two additional drugs per day. Frequently occurring free text drugs from ctn_0051 included: Fioricet, Adderall, Baclofen, K2/Spice, Codeine, Fentanyl, Kratom, Bath Salts, Gabapentin, PCP, and Ambien.
<p id="tab:ctn3051tlfb", style="color:gray;">
Table 2: Drugs Assessed by ctn_0030 and ctn_0051 Timeline Followback Questionnaires.
</p>
| substance | ctn_0030 | ctn_0051 |
|:----------------------|:-------------|:----------------------|
| Alcohol | Yes | Yes (Standard Drinks) |
| Amphetamine | Yes | Yes |
| Buprenorphine | No | Yes |
| Ecstasy | No | Yes |
| Sedative Barbiturates | No | Yes |
| Sedatives other than Benzodiazepines | Yes | No |
| Benzodiazepines | Yes | Yes |
| Cannabinoids (THC) | Yes | Yes |
| Cocaine | Yes | Yes |
| Crack | No | Yes |
| Inhalants | No | Yes |
| Methadone | Yes | Yes |
| Methamphetamine | Yes | No |
| Opioid Analgesics | No | Yes |
| Heroin | Heroin/Opium | Yes |
| Morphine | Yes | No |
| Hydromorphone | Yes | No |
| Codeine | Yes | No |
| Oxycodone | Yes | No |
| Hydrocodone | Yes | No |
| Propoxyphene | Yes | No |
| Other Opiates | Yes | No |
| Other Drug 1 | No | Yes |
| Other Drug 2 | No | Yes |
A few participants (who = 116,166, 250, 934, 1331, 3325) had some TLFB data after the last date with treatment drug which included their treatment medication (buprenorphine).
See [all_drugs](../html/all_drugs.html) for additional details/information.
### Buprenorphine Details
ctn_0051 consistently gathered self-reported drug use and urine drug screening (UDS) data on buprenorphine even after it was prescribed, but ctn_0027 and ctn_0030 did not. In the rare cases where a subject in ctn_0027 and ctn_0030 self-reported taking the drugs prescribed as part of the trial, those records were _left in the dataset_. Analysts should <mark>proceed with caution</mark> because it is unclear if these are data entry issues or if people were supplementing their prescribed drugs. In ctn_0027 there were no self-reports of buprenorphine but 16 people self-reported their methadone at least once after it was prescribed for them. This accounted for only 112 out of more than 100,000 self-reported drug events in ctn_0027. In ctn_0030, suboxone use was self-reported, after prescription, by six people (for eight problematic records out of tens of thousands of drug use events).
ctn_0027 did not include buprenorphine in its UDS, ctn_0030 had scheduled screenings for it (in phase 1 at the week 10 and 12/final visits and in phase 2 at the week 22 and 24/final visits) and ctn_0051 consistently checked for it. There are many buprenorphine UDS screenings in ctn_0030 that were not in week 10, 12, 22 or 24 (N = `r 781-171-169-40-16-54-291`). Nearly always, these seem to be duplicates of the "final visit" records. Analysts should <mark>proceed with caution</mark> when looking at UDS records for buprenorphine in ctn_0030.
### Alcohol
The timeline followback data for ctn_0027 included free text descriptions of the number of alcoholic beverages consumed on a day. Free text included entries that ran the gamut from casual drinking (e.g., "1/2 beer for birthday"), through heavy drinking (e.g., "3 16oz beer, 1 shot hard alcohol, 1 mixed drink"), out to dangerous quantities (e.g., "6pk beer & 1/2 gal. rum"). All entries were converted to standard drinks using information at the [National Institute on Alcohol Abuse and Alcoholism](https://www.niaaa.nih.gov/alcohols-effects-health/overview-alcohol-consumption/what-standard-drink), Wikipedia. Bartending references were used to [estimate the number of shots contained in larger containers](https://www.thespruceeats.com/how-many-shots-in-a-bottle-761232). Ambiguous entries like "many glasses of wine" were coded as five standard drinks. For women, between half a standard drink to less than four standard drinks was considered light drinking for a day. For men, half a standard drink to less than five was considered light drinking for a day. In ctn_0027 less than eight people were marked as drinking something on a particular day but no details were provided. The drinking records for each person were reviewed. Four people had _no other of alcohol use_, so these records were dropped. Three others had a history of light alcohol use so problematic days were marked as light drinking and one person had an unambiguous history of very heavy drinking, therefore the unknown day was marked as heavy drinking.
See [all_drugs](../html/all_drugs.html) for additional details/information.
# The `asi` dataset
See [asi](../html/asi.html) for details/information.
# The `days` dataset
See [days](../html/days.html) for details/information.
# The `demographics` dataset
See [demographics](../html/demographics.html) for details/information.
# The `detox` dataset
See [detox](../html/detox.html) for details/information.
# The `everybody` dataset
See [everybody](../html/everybody.html) for details/information.
# The `fagerstrom` dataset
Tobacco use was not assessed in ctn_0027. ctn_0030 subjects are scored as being smokers or nonsmokers (in the past 30 days). ctn_0051 assessed current smoking and the Fagerstrom Test For Nicotine Dependence Score.
See [fagerstrom](../html/fagerstrom.html) for additional details/information.
# The `first_survey` dataset
See [first_survey](../html/first_survey.html) for details/information.
# The `pain` dataset {#pain}
Baseline pain was assessed using the SF-36 in ctn_0027 and ctn_0030 and using the EuroQoL in CTN-51. SF-36 responses to the question "How much bodily pain have you had during the past 4 weeks?" were aggregated into three categories "No Pain", "Very mild to Moderate Pain", "Severe Pain". The EuroQoL as ask respondents to rank "Pain/discomfort" in one of three categories. These levels were labeled using the same categories described above.
| SF-36 Original Response | Grouped Response |
|:------------------------|:---------------------------|
| None | None |
| Very mild | Very mild to Moderate Pain |
| Mild | Very mild to Moderate Pain |
| Moderate | Very mild to Moderate Pain |
| Severe | Severe Pain |
| Very Severe | Severe Pain |
| EuroQoL Original Response | Grouped Response |
|:-----------------------------------|:---------------------------|
| I have no pain or discomfort | None |
| I have moderate pain or discomfort | Very mild to Moderate Pain |
| I have extreme pain or discomfort | Severe Pain |
# The `psychiatric` dataset {#psychiatric}
The medical history assessment for all three trails included: schizophrenia, depression, bipolar disorder, anxiety (anxiety was grouped with panic disorder in ctn_0027 and ctn_0030), brain damage, and epilepsy.
While ctn_0027 and ctn_0030 gathered psychiatric symptoms using DSM-4 criteria, ctn_0051 used DSM-5 criteria. CTN-027 checked for diagnosis of dependency on opiates, alcohol, amphetamines, cannabis, cocaine, sedatives, benzodiazepines, and dependence on other depressants, or dependence on other stimulants. ctn_0030 only scored people as having a diagnosis of dependency on opiates.
# The `randomization` dataset
See [randomization](../html/randomization.html) for details/information.
# The `rbs` dataset {#rbs}
ctn_0027 and ctn_0030 assessed drug use as the count of days out of the last 30 that drugs (cocaine, heroin alone, speedball, opiate, amphetamine) were used ctn_0051 assessed the number of days of drug use on an ordinal scale which was converted to number of days. The conversion is show in [Table 3](#tab:ivdays).
<p id="tab:ivdays", style="color:gray;">
Table 3: Estimated Days of Drug Use Based on ctn_0051 Categories
</p>
| Reported amount | Days of Use Per Month |
|:----------------------|:----------------------|
| Not at all | 0 |
| A few times | 4 |
| A few times each week | 14 |
| Every day | 30 |
# The `rbs_iv` dataset
See [rbs_iv](../html/rbs_iv.html) for details/information.
# The `screening_date` dataset
See [screening_date](../html/screening_date.html) for details/information.
# The `sex` dataset
See [sex](../html/sex.html) for details/information.
# The `tlfb` dataset {#tlfb}
These are drugs that were self-reported. Some of the drugs listed in the [all_drugs](#all_drugs) file have been grouped. Note the "medical use" opioids are grouped as "Opioid" but "Heroin" and "Opium" are grouped together as "Heroin".
<p id="tab:tlfb", style="color:gray;">
Table 4: Drug Groupings Used in the tflb File.
</p>
| _all_drugs_ Description | _tlfb_ After Grouping |
|:------------------------|:----------------------|
| Acetaminophen | Analgesic |
| Amphetamine | Amphetamine |
| Barbiturate | Sedatives |
| Codeine | Opioid |
| Crack | Cocaine |
| Fentanyl | Opioid |
| Heroin | Opioid |
| Gabapentin | Analgesic |
| Hydrocodone | Opioid |
| Hydromorphone | Opioid |
| Mdma | MDMA/hallucinogen |
| Merperidine | Opioid |
| Methamphetamine | Amphetamine |
| Morphine | Opioid |
| Muscle Relaxant | Analgesic |
| Nalbuphine | Analgesic |
| Opium | Heroin |
| Oxycodone | Opioid |
| Oxymorphone | Opioid |
| Propoxyphene | Opioid |
| Suboxone | Buprenorphine |
| Sedative-Hypnotic | Sedatives |
| Thc | Thc |
| Tramadol | Opioid |
| Trazodone | Antidepressant |
| Tryclic-Antidepressant | Antidepressant |
A few participants (who = 116, 166, 250, 934, 1331, 3325) had some TLFB data after the last date on the dispensed study drug which included their treatment medication (buprenorphine). All treatment drug records have been removed from the `tlfb` file but they remain in `all_drugs`.
See the section [all_drugs](#all_drugs) for additional details.
# The `treatment` dataset
The date information for administration of treatment drugs required _extensive_ processing to find and fix problematic dates. Many algorithms were used to identify problems. These included:
1) treatment dates far before randomization
2) identifying dates past the end of the study period
3) duplicate dates
When such problematic dates were identified, the data were manually reviewed to try to find gaps in the medication history. Nearly always the values were found to be typos in the year and month. In these cases, the dates were fixed. ctn_0027 had more than 250 such typos and all but 10 could be fixed unambiguously. ctn_0030 had approximately two dozen such typos, two of which could not be fixed. The records which could not be fixed were dropped. One person in ctn_0030 had multiple drug records for the same date. The lower mg records were deleted.
Further ctn_0027 had approximately 100 data entry problems mislabeling the drug administered on random days. That is, in the source data, subjects were listed as receiving a single dose of methadone, with the mg appropriate for buprenorphine, in the middle of dozens or hundreds of doses of buprenorphine. Similar mistakes happened for people receiving methadone. These mislabeling mistakes were fixed.
# The `uds` dataset {#uds}
The Urine Drug Screening (UDS) protocols were not identical across the three trials. In [Table 5](#tab:uds) shows the details of what was tested. Note that this table has many opiates grouped into an "Opioid" category and both Amphetamine and Methamphetamine are grouped as "Amphetamine". The not-grouped data can be found in the `all_drugs` table.
<p id="tab:uds", style="color:gray;">
Table 5: Drugs Assessed in UDS for Trials with Grouping Categories.
</p>
```{r, echo = FALSE}
library(kableExtra)
x <- public.ctn0094data::meta_substance_groups_uds
x$`CTN-0027` <- cell_spec(
x$`CTN-0027`,
color = ifelse(x$`CTN-0027` == "NO", "red", "black")
)
x$`CTN-0030` <- cell_spec(
x$`CTN-0030`,
color = ifelse(x$`CTN-0030` == "NO", "red", "black"),
)
x$`CTN-0051` <- cell_spec(
x$`CTN-0051`,
color = ifelse(x$`CTN-0051` == "NO", "red", "black")
)
kbl(x, escape = FALSE) %>%
kable_styling("striped", full_width = FALSE)
```
See the section [all_drugs](#all_drugs) for additional details.
# The `visit` dataset
Missed visits in ctn_0027 as missing the appointment dates but they have the visit week.
ctn_0027 and ctn_0030 logged reasons for missed appointments as free text. ctn_0051 categorized reasons for missing appointments into 10 groups plus an "Other" categorized. All reasons were harmonized into a set 14 excuse categories. The free text was scanned for key words/phrases (including frequently occurring misspellings) and these were converted into the indicator variables shown in [Table 6](#tab:excuse).
<p id="tab:excuse", style="color:gray;">
Table 6: Reasons for Not Attending Appointments.
</p>
| Key Words | Category |
|:-------------------------|:------------------------------------|
| 'deceased' | Dead |
| 'no show' | No Show |
| 'no-show' | No Show |
| 'not show' | No Show |
| 'no visit' | No Show |
| 'Missed visit' | No Show |
| 'MIA' | No Show |
| 'did not attend' | No Show |
| 'never showed' | No Show |
| 'did not contact' | No Show |
| 'abscent' | No Show |
| 'Absent' | No Show |
| 'unable to contact' | No Show |
| 'no funding' | No Funding |
| 'left study' | Left Study |
| 'terminated' | Left Study |
| 'withdraw' | Left Study |
| 'withdrew' | Left Study |
| 'withdrawn' | Left Study |
| 'did not schedule' | Left Study |
| 'drop out' | Left Study |
| 'early term' | Left Study |
| 'out of the study' | Left Study |
| 'Pt dropped out' | Left Study |
| 'prison' | In Jail |
| 'jail' | In Jail |
| 'incarcerated' | In Jail |
| 'forgot' | Forgot |
| 'hospital' | In Hospital |
| 'illness' | Illness |
| 'moved' | Moved |
| '14' | Missing 14 Consecutive Appointments |
| 'window' | Study Window |
| 'unable to attend visit' | Unable |
| 'vacation' | On Vacation |
# The `withdrawal` dataset
Note that ctn_0027 and ctn_0030 used the Clinical Opiate Withdrawal Scale (COWS) and ctn_0051 used the Subjective Opiate Withdrawal Scale (SOWS) to assess withdrawal symptoms. While COWS makes a distinction between moderately severe and severe withdrawal, SOWS does not. Therefore, we combine the severe and moderately severe categories and label them as "severe".
### CTN 27
The Clinical Opiate Withdrawal Scale (COWS)
```
select;
when(score >= 37) withdrawl = 3; * severe;
when(score >= 25) withdrawl = 3; * moderately severe same as severe;
when(score >= 13) withdrawl = 2; * moderate;
when(score >= 5) withdrawl = 1; * mild;
when(score >= 0) withdrawl = 0; * none;
end;
```
### CTN 30
The Clinical Opiate Withdrawal Scale (COWS)
```
select;
when(score >= 37) withdrawl = 3; * severe;
when(score >= 25) withdrawl = 3; * moderately severe same as severe;
when(score >= 13) withdrawl = 2; * moderate;
when(score >= 5) withdrawl = 1; * mild;
when(score >= 0) withdrawl = 0; * none;
end;
```
### CTN 51
Subjective Opiate Withdrawal Scale (SOWS)
```
select;
when (score >= 21) withdrawl = 3; * severe;
when (score >= 11) withdrawl = 2; * moderate;
when (score >= 1) withdrawl = 1; * mild;
when (score = 0) withdrawl = 0; * none;
when (score = .) withdrawl = .;
end;
```
# The `withdrawal_pre_post` dataset
See [withdrawal_pre_post](../html/withdrawal_pre_post.html) for details/information.
```{r}
sessionInfo()
```