-
Notifications
You must be signed in to change notification settings - Fork 8
/
process_accel.R
1194 lines (1023 loc) · 60.1 KB
/
process_accel.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#' Process NHANES 2003-2004 and 2005-2006 accelerometry data
#'
#' @description
#' This function processes the raw NHANES 2003-2004 and 2005-2006 accelerometry data
#' as provided by the NHANES study on the CDC website.
#' Note that due to the large file size of the unzipped .xpt files, this function uses a non-trivial
#' amount of RAM (~12 GB peak). To avoid crashing your computer when running, please ensure you have
#' enough RAM available.
#'
#' @param names_accel_xpt A character vector of names for the zipped raw files. These should be of the form PAXRAW_\* where \* corresponds to the
#' letter of the alphabet indicating which "wave" the data is from. For example, PAXRAW_C and PAXRAW_D correspond to the 2003-2004 and 2005-2006 waves
#' accelerometery data, respectively. A vector containing PAXRAW_C and PAXRAW_D is the default and will process both 2003-2004 and 2005-2006 waves data.
#'
#' @param local Logical value indicating whether the zippped raw .xpt accelerometry files are stored locally.
#' If FALSE, will download the data into a temporary file from the CDC website and process the data. If TRUE, localpath must be specified and
#' the zipped data will be sourced locally. Defaults to FALSE.
#'
#' @param localpath Character string indicating where the locally zipped raw .xpt files are. If local=TRUE, then localpath must be a valid local directory.
#'
#' @param urls Character vector provides the website URLs where the NHANES accelerometry data can be downloaded. The default contains the URLs which will directly download the data frome
#' the CDC's website. This argument, if specified, must be the same length as the names_accel_xpt argument. Downloading the data through R is often slower than downloading the data outside of R.
#' See the examples section below for how to download and process the data directly from the CDC.
#'
#' @param zipped Logical scalar indicating whether the physical activity files are in the zipped format downloaded directly from the CDC website (.ZIP).
#' If local=FALSE and the data are downloaded from the CDC's website, this argument is ignored. Note that if the data are saved locally, processing speed is
#' substantially increased by unzipping before calling the process_accel function.
#'
#' @param check_data logical value indicating whether to perform some checks of the data. If TRUE, the function will incur additional processing time.
#' The NHANES 2003-2006 data have been tested and already passed these checks. Defaults to FALSE.
#'
#' @details
#'
#' This function takes the long format of the NHANES 2003-2006 accelerometry data and transforms it into the 1440+ format, with one row per participant-day, and
#' 7 rows per participant. Although process_accel will try to process any ".xpt" or ".ZIP" file which follows the NHANES accelerometry naming convention, it has only been tested
#' on the NHANES 2003-2006 waves' accelerometry data. As future NHANES accelerometry data are released, we intend to verify that process_accel
#' will correctly transform the newly released data into our 1440+ format.
#' The function documentation, and, if necessary the function itself, will be updated as needed going forward.
#'
#' If the data are directly downloaded from the CDC website, the raw data will be downloaded to a temporary folder and then deleted once it's been read into R.
#'
#' @return
#'
#' This function will return a list with number of elements less than or equal to the number of waves of data specified by the names_accel_xpt
#' argument. The exact number of elements returned will depend on whether all files specified by the user are found in either: 1) the local directory
#' indicated by the localpath argument; or 2) downloadable from the website(s) indicated by the "urls" argument. Each element of the list returned is a data frame
#' with columns:
#'
#' \itemize{
#' \item{SEQN:}{ Unique subject identifier}
#' \item{PAXCAL:}{ Device calibration.
#' Was the device calibrated when it was returned by the participant? 1 = Yes, 2 = No, 9 = Don't Know.
#' Any individuals with either 2 or 9 in this variable should be examined carefully before being included in any analysis.
#' }
#' \item{PAXSTAT:}{ Data reliability status flag. 1 = Data deemed reliable, 2 = Data reliability is questioable.
#' Any individuals with 2 in this variable should be examined carefully before being included in any analysis.
#' }
#' \item{SDDSRVYR:}{ Variable indicating which wave of the NHANES study this data is associated with. For example,
#' SDDSRVYR = 3 corresponds to the 2003-2004 wave and SDDSRVYR = 4 corresponds to the 2005-2006 wave.}
#' \item{WEEKDAY:}{ Day of the week: 1 = Sunday, 2 = Monday, 3 = Tuesday, 4 = Wednesday, 5 = Thursday, 6 = Friday, 7 = Saturday.}
#' \item{MIN1-MIN1440:}{ Activity count corresponding to each minute of the day. For example, MIN1 is the activity count for 00:00-00:01. }
#' }
#'
#'
#' @examples
#' \dontrun{
#' library("rnhanesdata")
#' ## download and process the data directly from the cdc
#' ## the first element of accel_ls corresponds to PAXINTEN_C and
#' ## the second element of accel_ls corresponds to PAXINTEN_D
#' accel_ls <- process_accel(
#' names_accel_xpt = c("PAXRAW_C","PAXRAW_D"),
#' local=FALSE,
#' urls= c("https://wwwn.cdc.gov/Nchs/Nhanes/2003-2004/PAXRAW_C.ZIP",
#' "https://wwwn.cdc.gov/Nchs/Nhanes/2005-2006/PAXRAW_D.ZIP")
#' )
#'
#' ## check to see that the data processed using the process_accel function
#' ## are identical to the processed data included in the package
#' identical(accel_ls[[1]], PAXINTEN_C)
#' identical(accel_ls[[2]], PAXINTEN_D)
#' }
#'
#' @importFrom haven read_xpt
#'
#' @importFrom utils write.csv unzip download.file
#'
#' @export
process_accel <- function(names_accel_xpt = c("PAXRAW_C","PAXRAW_D"),
local=FALSE,localpath=NULL, urls=NULL,
zipped=TRUE, check_data=FALSE){
if(local & is.null(localpath)){
stop("If specifying local = TRUE, must specify localpath as a local directory where the (un)zipped data files are saved.")
}
if(local){
if(!dir.exists(localpath)){
stop("localpath must be a valid local directory path.")
}
if(any(!paste0(names_accel_xpt,".xpt") %in% list.files(localpath)) & !zipped){
stop(paste0("One or more of ", paste0(names_accel_xpt,".xpt"), " not found in localpath directory."))
}
if(any(!paste0(names_accel_xpt,".ZIP") %in% list.files(localpath)) & zipped){
stop(paste0("One or more of ", paste0(names_accel_xpt,".ZIP"), " not found in localpath directory."))
}
}
if(any(!grepl("^PAXRAW_[A-Z]$", names_accel_xpt))){
stop("Raw accelerometer data must follow the NHANES naming convention of PAXRAW_* where * denotes the letter corresponding the NHANES wave")
}
if(!local){
if(any(is.null(urls), length(urls) != length(names_accel_xpt))){
stop("If the data is not locally stored urls must be a vector of the same length as names_accel_xpt.")
}
}
out.name <- gsub("PAXRAW", "PAXINTEN", names_accel_xpt)
ret <- c()
pb <- txtProgressBar(min=0, max=length(names_accel_xpt), style=3)
for(i in seq_along(names_accel_xpt)){
datapath <- c()
xpt_nm <- tolower(paste0(names_accel_xpt[i],".xpt"))
# if local=FALSE, download the data directly from the link provided in URLS
if(!local){
datapath <- system.file("extdat",package="rnhanesdata")
temp <- tempfile()
temp_xpt <- tempfile()
download.file(urls[i], temp)
unzip(temp, xpt_nm, exdir=temp_xpt)
sim.data <- read_xpt(file.path(temp_xpt, xpt_nm))
unlink(c(temp,temp_xpt), recursive=TRUE)
}
if(local){
datapath <- localpath
if(zipped){
sim.data <- read_xpt(unzip(file.path(datapath, paste0(names_accel_xpt[i],".ZIP")), xpt_nm, exdir=datapath))
} else {
sim.data <- read_xpt(file.path(datapath, xpt_nm))
}
}
uid <- as.character(unique(sim.data$SEQN))
if(check_data){
## read_xpt reads all variables in as numeric, check for truncation on reading in
stopifnot(all(nchar(uid)==5))
## get all indices for all subjects
sp_n_inx <- split(1:nrow(sim.data), sim.data$SEQN)
## check that all subjects have 7 or fewer days
sp_n_ln <- vapply(sp_n_inx, length, numeric(1))
if(!all(sp_n_ln <= 7*1440)) stop("Some participants have more than 7 days of data")
## check that within a subject all data are sequential (i.e. no intermittent missing minutes)
## AND that all these numbers are equal to the column PAXN
sp_n <- lapply(sp_n_inx, function(x){
tmp <- sim.data[x,]
tmp$n <- (tmp$PAXMINUT + 60*tmp$PAXHOUR) + (rep(0:6, each=1440)*1440)[1:length(x)]
any(diff(tmp$n) != 1) | any(tmp$n+1 != tmp$PAXN)
})
if(!all(!unlist(sp_n))) stop("There is some intermittent missingness within participants. Please check the data source.")
}
## create empty data frame with full 7 days of data for each subject (10,080 rows/subject)
n <- length(uid)
seqn <- rep(uid,each=10080)
paxn <- rep(c(1:10080),n)
full.list <- data.frame(SEQN=seqn,PAXN=paxn)
rm(list=c("n","seqn","paxn"))
inx <- match(paste0(full.list$SEQN, "_", full.list$PAXN),
paste0(sim.data$SEQN, "_", sim.data$PAXN))
full.na <- cbind(full.list, sim.data[inx,-c(1,5)])
rm(list=c("full.list","inx"))
## create id and day of the week variables to fill in
## note: this assumes PAXCAL/PAXSTAT do not change from with subjects
## this was verified outside of this function for the 2003-2004 and 2005-2006 waves
u_inx <- which(!duplicated(sim.data$SEQN))
u_data <- sim.data[u_inx,c('SEQN','PAXCAL','PAXSTAT','PAXDAY'), drop=FALSE]
cal <- as.integer(rep(u_data$PAXCAL,each=7))
stat <- as.integer(rep(u_data$PAXSTAT,each=7))
rm(list=c("u_inx"))
weekday <- rep(NA, length(uid)*7)
inx_cur <- 1
for(k in seq_along(uid)){
d <- u_data$PAXDAY[k]
if (d==1) {x<-c(1:7)}
else if (d==2) {x<-c(2:7,1)}
else if (d==3) {x<-c(3:7,1:2)}
else if (d==4) {x<-c(4:7,1:3)}
else if (d==5) {x<-c(5:7,1:4)}
else if (d==6) {x<-c(6:7,1:5)}
else if (d==7) {x<-c(7,1:6)}
weekday[inx_cur:(inx_cur+6)] <- x
inx_cur <- inx_cur + 7
}
rm(list=c("k","x","inx_cur","sim.data","u_data"))
id2 <- as.integer(rep(uid,each=7))
idweekday <- data.frame(SEQN=id2,PAXCAL=cal,PAXSTAT=stat,WEEKDAY=as.integer(weekday),
SDDSRVYR=as.integer(c(1:26)[LETTERS == substr(names_accel_xpt[i], nchar(names_accel_xpt[i]), nchar(names_accel_xpt[i]))]),
stringsAsFactors = FALSE)
rm(list=c("weekday","id2","uid"))
col.name <- paste0("MIN",1:1440)
pax <- full.na$PAXINTEN
pax.wide <- data.frame(matrix(as.integer(pax),ncol=1440,byrow=T))
colnames(pax.wide)<-col.name
ret[[out.name[i]]] <- data.frame(idweekday,pax.wide,stringsAsFactors = FALSE)
rm(list=c("pax","pax.wide","col.name","idweekday"))
setTxtProgressBar(pb, i)
}
ret
}
#' Process wear/non-wear flags for NHANES 2003-2004 and 2005-2006 accelerometry data
#'
#' @description
#'
#' This function creates wear/non-wear flag matrices for processed NHANES 2003-2004 and 2005-2006 accelerometry data.
#' The underlying algorithm for estimating wear/non-wear flags is implemented in the \code{\link{weartime}} function from the accelerometry package.
#'
#'
#'
#' @param x A list with each element corresponding to a data matrix of activity counts in the 1440+ format with 7 rows per individual. Each element of x should be named, and each
#' name should correspond to the naming convention used by the output of \code{\link{process_accel}}.
#' For example, the processed accelerometry data corresponding to the 2003-2004 wave should be named "PAXINTEN_C".
#' The data should be sorted by participant (SEQN) and then in descending order chronologically. If days are not ordered chronologically and distinct_days=FALSE,
#' then the resulting wear/non-wear flag output may not be accurate.
#' The output of \code{\link{process_accel}} can be fed directly to this argument. See examples.
#'
#' @param days_distinct Logical value indicating whether days should be treated as distinct time series within participants. If TRUE, then
#' subjects' wear status at 11:59PM does not affect their wear status at 00:01AM the next morning. Defaults to FALSE, this is generally recommended.
#'
#' @param window Numeric value indicated the size of the moving window used to assess non-wear in minutes. Defaults to 90 minutes.
#' See \code{\link{weartime}} for more details.
#'
#' @param tol maximum number of minutes with counts greater than 0 within the a non-wear interval.
#' See \code{\link{weartime}} for more details.
#'
#' @param tol_upper maximium activity count for any minute within the window
#' Defaults to 99. That is, for a given minute, if the window contains any minutes with
#' activity counts greater than tol.upper, this minute is considered "wear".
#' See \code{\link{weartime}} for more details.
#'
#'
#' @param ... aditional arguments to be passed to \code{\link{weartime}}.
#'
#'
#' @details
#'
#' There are many way to estimate non-wear periods in accelerometry data. Fundamentally, they all involve finding extended periods of implausibly low activity.
#' However, there is no one perfect algorithm, and what qualifies as "implausible" is device-, placement-, and population-dependent. Here, we use the algorithm
#' implemented by default in the \code{\link{accelerometry}} package via the \code{\link{weartime}} function. This algorithm is
#' similar to the algorithm used in Troiano et. al (2008).
#'
#' There are a number of parameters the algirothm implemented in \code{\link{weartime}} uses to control how aggressive non-wear time identification is.
#' By making the algorithm more agressive (decreasing window size, increasing tolerance for non-zero activity counts), one increases the likelihood of false positives.
#' Conversely, making the algorithm less agressive increases the likelihood of false negatives. By default we use a fairly conservative window size of 90 minutes.
#'
#'
#'
#' @return
#'
#' The function process_flags returns a list with number of elemnts equal to the number of elements in the object supplied to the "x" argument.
#' Each element of the list returned is a dataframe that mirrors the format of dataframes returned from the \code{\link{process_accel}} function, but instead
#' with the columns conveying activity count data replaced with 0/1 indicators for estimated periods of non-wear.
#' More specifically, each element is a data frame with the following columns
#'
#' \itemize{
#' \item{SEQN:} {Unique subject identifier}
#' \item{PAXCAL:}{ Device calibration.
#' Was the device calibrated when it was returned by the participant? 1 = Yes, 2 = No, 9 = Don't Know.
#' Any individuals with either 2 or 9 in this variable should be examined carefully before being included in any analysis.
#' }
#' \item{PAXSTAT:}{ Data reliability status flag. 1 = Data deemed reliable, 2 = Data reliability is questioable.
#' Any individuals with 2 in this variable should be examined carefully before being included in any analysis.
#' }
#' \item{SDDSRVYR:}{ Variable indicating which wave of the NHANES study this data is associated with. For example,
#' SDDSRVYR = 3 corresponds to the 2003-2004 wave and SDDSRVYR = 4 corresponds to the 2005-2006 wave.}
#' \item{WEEKDAY:}{ Day of the week: 1 = Sunday, 2 = Monday, 3 = Tuesday, 4 = Wednesday, 5 = Thursday, 6 = Friday, 7 = Saturday.}
#' \item{MIN1-MIN1440:}{ Wear/Non-wear flag corresponding to each minute of the day. These columns can take on the following 3 values
#' \itemize{
#' \item{0:}{ A value of 0 indicates that a particular minute is determined to be "non-wear"}
#' \item{1:}{ A value of 1 indicated that a particular minute is determined to be "wear"}
#' \item{NA:}{ A value of NA indicates that a particular minute was missing data in the activity count data matrix used to create this set
#' of wear/non-wear flags}
#' }
#' For example, a value of 0 in the column MIN1 indicates that during the time period 00:00-00:01, it was estimated that the device was not worn.}
#' }
#'
#'
#' @examples
#' \dontrun{
#' library("rnhanesdata")
#' ## In the interest of reducing computation time for this example,
#' ## we use the already processed accelerometry data
#' accel_ls <- list("PAXINTEN_C" = PAXINTEN_C, "PAXINTEN_D" = PAXINTEN_D)
#' flags_ls <- process_flags(x=accel_ls)
#'
#' ## Check to see that these processed flags are identical to
#' ## those provided in the package
#' identical(flags_ls$Flags_C, Flags_C)
#' identical(flags_ls$Flags_D, Flags_D)
#' }
#'
#' @references
#'
#' \itemize{
#' \item{Choi, Leena et al. “Validation of Accelerometer Wear and Nonwear Time Classification Algorithm.”
#' Medicine and science in sports and exercise 43.2 (2011): 357–364. PMC. Web. 10 Oct. 2018.}
#' \item{National Cancer Institute. Risk factor monitoring and methods: SAS programs for analyzing NHANES 2003-2004 accelerometer data.
#' Available at: http://riskfactor.cancer.gov/tools/nhanes_pam. Accessed Oct. 10, 2018.}
#' \item{Troiano RP, Berrigan D, Dodd KW, Masse LC, Tilert T, Mcdowell M: Physical activity in the United States measured by accelerometer.
#' Med Sci Sports Exerc 2008; 40: 181-188.}
#' }
#'
#' @importFrom accelerometry weartime
#'
#' @importFrom utils setTxtProgressBar txtProgressBar
#'
#' @export
process_flags <- function(x, days_distinct=FALSE, window=90L, tol=2L, tol_upper=99L, ...){
names_accel_xpt <- names(x)
if(any(!grepl("^PAXINTEN_[A-Z]$", names_accel_xpt)) | is.null(names_accel_xpt)){
stop("x must be a list with each element corresponding to PAXINTEN_* where * denotes the letter corresponding the NHANES wave")
}
out.name <- paste0("Flags_", substr(names_accel_xpt, nchar(names_accel_xpt), nchar(names_accel_xpt)))
ret <- c()
pb <- txtProgressBar(min=0, max=length(names_accel_xpt), style=3)
for(i in seq_along(names_accel_xpt)){
full_data <- x[[i]]
if(any(table(full_data$SEQN) != 7)){
stop(paste0("Activity data ", names_accel_xpt[i], " does not have 7 rows of data per subject. Please double check Activitiy count data."))
}
if(any(!paste0("MIN",1:1440) %in% colnames(full_data))){
stop(paste0("Activity data ", names_accel_xpt[i], " does not have 1440 minutes of data. Please double check Activitiy count data."))
}
activity_data <- as.matrix(full_data[,paste0("MIN",1:1440)])
activity_data[is.na(activity_data)] = 0
WMX <- matrix(NA,nrow = nrow(activity_data), ncol = ncol(activity_data))
uid <- unique(full_data$SEQN)
for(j in 1:length(uid)){
inx_i <- which(full_data$SEQN == uid[j])
activity_data_i <- as.vector(t(activity_data[inx_i,]))
wt <- weartime(activity_data_i, window = window, tol = tol,
tol_upper = tol_upper, days_distinct=days_distinct, ...)
WMX[inx_i,] <- matrix(wt, ncol=1440, nrow=length(inx_i), byrow=TRUE)
rm(list=c("wt","inx_i","activity_data_i"))
}
## recombine data
out = data.frame(full_data[,-which(colnames(full_data)%in%paste0("MIN",1:1440)),drop=FALSE], WMX,
stringsAsFactors = FALSE)
names(out) = names(full_data)
## put NAs back where they belong
out[is.na(full_data)] = NA
ret[[out.name[i]]] <- out
rm(list=c("out","full_data","WMX","uid","activity_data"))
setTxtProgressBar(pb, i)
}
ret
}
#' Process mortality data for NHANES 2003-2004 and 2005-2006 waves
#'
#' @description
#' This function creates a clean mortality dataset which can be combined with data from the
#' NHANES 2003-2004/2005-2006 waves.
#'
#' @param waves Character vector indicating the waves . Defaults to a vector with "C" and "D", corresponding to the 2003-2004 and 2005-2006 waves.
#'
#' @param mort_release_yr Nuemric value indicating the year associated with the raw mortality data to be processed. The default, 2011, corresponds to the
#' most recent raw mortality data included in the data package.
#'
#' @param localpath Character scalar describing the location where the raw data are stored.
#' If NULL, the funciton will look in pacakge data directory for the requested raw mortality data.
#' Defaults to NULL.
#'
#'
#' @details
#'
#' As of writing, this function has only been tested on the 2011 release for the 2003-2004 and 2005-2006 NHANES mortality data.devtools::check(args = "--as-cran")
#' The raw data comes in the form of a vector of strings, with each string associated with on participant.
#' Assuming mortality releases for other waves use the same format, this function.
#' As future mortality data are released, we will update the package with both the processed and raw mortality data for the NHANES 2003-2006 waves.
#' If necessary, we will modify the code to be able to process all releases of the mortality data for 2011 and beyond.
#' The documentation here will be updated as we confirm future mortality data releases are processed correctly using this function.
#'
#' @return
#'
#'
#' This function will return a list with number of elements less than or equal to the number of waves of data specified by the "waves"
#' argument. The exact number of elements returned will depend on whether all files specified by the user are found in either: 1) the local directory
#' indicated by the localpath argument; or available in the data package. Because the mortality data provided changes from year-to-year, the columns of each
#' element will depend on the release year.
#'
#' For the 2011 release year data, each element of the list returned is a data frame with columns:
#'
#' \itemize{
#' \item{SEQN:}{ Unique subject identifier}
#' \item{eligstat:}{ Eligibility status for mortality follow-up
#' \itemize{
#' \item{1:}{ Eligible}
#' \item{2:}{ Under age 18, not available for public release}
#' \item{3:}{ Ineligible}
#' }
#' }
#' \item{mortat:}{ Indicator for whether participant was found to be alive or deceased at follow-up time given by
#' permth_exm and permth_int
#' \itemize{
#' \item{0:}{ Assumed alive}
#' \item{1:}{ Assumed deceased}
#' \item{NA:}{ Under age 18, not available for public release or ineligible for mortality follow-up}
#' }
#' }
#' \item{permth_exm:}{ Time in months from the mobile examination center (MEC) assessment where mortality was assessed.}
#' \item{permth_int:}{ Time in months from the household interview where mortality was assessed.}
#' \item{ucod_leading:}{ Underlying cause of death recode from UCOD_113 leading causes where available. Specific causes:
#' \itemize{
#' \item{001:}{ Diseases of the heart (I00-I09, I11, I13, I20-I51)}
#' \item{002:}{ Malignant neoplasms (C00-C97)}
#' \item{003:}{ Chronic lower respiratory diseases (J40-J47)}
#' \item{004:}{ Accidents (unintentional injuries) (V01-X59, Y85-Y86)}
#' \item{005:}{ Cerebrovascular diseases (I60-I69)}
#' \item{006:}{ Alzheimer's disease (G30)}
#' \item{007:}{ Diabetes mellitus (E10-E14)}
#' \item{008:}{ Influenza and pneumonia (J09-J18)}
#' \item{009:}{ Nephritis, nephrotic syndrome and nephrosis (N00-N07, N17-N19, N25-N27)}
#' \item{010:}{ All other causes (residual)}
#' \item{NA:}{ Ineligible, under age 18, assumed alive or no cause data}
#' }
#' }
#' \item{diabetes_mcod:}{ diabetes flag from multiple cause of death (mcod)}
#' \item{hyperten_mcod:}{ hyperten flag from multiple cause of death (mcod)}
#' \item{mortscrce_ndi:}{ mortality source: NDI match}
#' \item{mortscrce_ssa:}{ mortality source: SSA information}
#' \item{mortscrce_cms:}{ mortality source: CMS information}
#' \item{mortscrce_dc:}{ mortality source: death certificate match}
#' \item{mortscrce_dcl:}{ mortality source: data collection}
#'
#' }
#'
#' For the 2015 release year data, only the first 8 columns described above are provided.
#'
#' @examples
#' library("rnhanesdata")
#'
#' ## process NHANES mortality data using the raw mortality data release from 2011 that comes
#' ## with the package
#' mort_ls <- process_mort()
#'
#' ## verify that this yields identical results to the processed data included in the package
#' identical(mort_ls$Mortality_2011_C, Mortality_2011_C)
#' identical(mort_ls$Mortality_2011_D, Mortality_2011_D)
#'
#'
#' @references
#'
#' National Center for Health Statistics. Office of Analysis and Epidemiology, Public-use Linked Mortality File, 2015.
#' Hyattsville, Maryland. (Available at the following address: http://www.cdc.gov/nchs/data_access/data_linkage/mortality.htm
#'
#' @importFrom utils setTxtProgressBar txtProgressBar
#'
#' @importFrom readr read_fwf fwf_cols
#'
#'
#' @export
process_mort <- function(waves=c("C","D"),
mort_release_yr=2011,
localpath=NULL){
if(!all(waves %in% LETTERS)) stop("One or more waves invalid. waves must be identified as a captial letter A-Z.")
if(!is.numeric(mort_release_yr)) stop("mort_release_yr must be numeric")
waves <- sort(waves)
waves_num <- vapply(waves, function(x) seq(1999, 2049, by=2)[which(LETTERS == x)], numeric(1))
waves_mort <- paste0("NHANES_",waves_num,"_",waves_num+1,"_MORT_",mort_release_yr,"_PUBLIC.dat")
if(is.null(localpath)) {
filepath <- file.path(system.file("extdat","mort",package="rnhanesdata"), waves_mort)
} else {
filepath <- file.path(localpath, waves_mort)
}
dne <- !file.exists(filepath)
if(any(dne)){
stop(cat("Specified files: \n ",
paste0(filepath[which(dne)],collapse=", \n "),
"\n not found. Please double check data directory and/or localpath specification."))
}
ret <- c()
pb <- txtProgressBar(min=0, max=length(waves_mort), style=3)
for(i in seq_along(waves_mort)){
out.name <- paste0("Mortality_", mort_release_yr, "_", waves[i])
if(mort_release_yr == 2011){
out <- read_fwf(file=filepath[i],
col_types = "iiiiddciiiiiii",
fwf_cols(SEQN = c(1,14),
eligstat = c(15,15),
mortstat = c(16,16),
causeavl = c(17,17),
permth_exm = c(47,49),
permth_int = c(44,46),
ucod_leading = c(18,20),
diabetes_mcod = c(21,21),
hyperten_mcod = c(22,22),
mortsrce_ndi = c(50,50),
mortsrce_cms = c(51,51),
mortsrce_ssa = c(52,52),
mortsrce_dc = c(53,53),
mortsrce_dcl = c(54,54)
),
na = c(" ","")
)
}
if(mort_release_yr == 2015){
out <- read_fwf(file=filepath[i],
col_types = "iiiiicii",
fwf_cols(SEQN = c(1,14),
eligstat = c(15,15),
mortstat = c(16,16),
permth_exm = c(46,48),
permth_int = c(43,45),
ucod_leading = c(17,19),
diabetes_mcod = c(20,20),
hyperten_mcod = c(21,21)
),
na = c(".","")
)
}
ret[[out.name]] <- data.frame(out)
setTxtProgressBar(pb, i)
}
ret
}
#' Merge non-accelerometry data for NHANES waves
#'
#' @description
#' This function retrieves and merges covariate data
#' from one or more NHANES data files across one or more waves of the study. Variables are merged using the NHANES unique subject identifier (SEQN).
#'
#'
#' @param waves character vector with entries of (capitalized) letter of the alphabet corresponding to the
#' NHANES wave of interest. Defaults to a vector containing "C" and "D" corresponding to the NHANES 2003-2004 and 2005-2006 waves.
#'
#' @param varnames character vector indicating which column names are to be searched for.
#' Will check all .XPT files in located in the directory specified by dataPath. If extractAll = TRUE, then this argument is effectively ignored. Defaults
#' to variables which are required to create the processed data matrices \code{\link{Covariate_C}} and \code{\link{Covariate_D}}. If "SEQN" is not included in
#' varnames, it will be autmatically added.
#'
#' @param localpath file path where covariate data are saved. Covariate data must be in .XPT format,
#' and should be in their own folder. For example, PAXRAW_C.XPT should not be located in the folder with
#' your covariate files. This will not cause an error, but the code will take much longer to run.
#'
#'
#' @param extractAll logical argument indicating whether all columns of all .XPT files in the search path should be returned.
#' If extractALL = TRUE, all variables from all .XPT files with
#' Defaults to FALSE.
#'
#'
#' @details
#'
#' This function will search all .XPT files which match the NHANES naming convention associated with the
#' character vector supplied to the "waves" argument in the specified data directory
#' (either the "localpath" argument, or the raw NHANES data included in the \code{\link{rnhanesdata}} package).
#' Any file which matches the relevant naming convention AND has "SEQN" as their first column name will be searched for the variables requested in the
#' "varnames" argument.
#'
#' It is recommended that if using the process_covar function to merge variables locally, that the local directory include the demographic dataset for each wave
#' (DEMO_C.XPT and DEMO_D.XPT for the 2003-2004 and 2005-2006 waves, respectively).
#' The reason for this is that without the demographic dataset, there is no guarantee that all
#' participants in a wave will be included in the returned results.
#' If the demographic datasets are not in the directory specified by localpath a warnining will be produced.
#' In addition, it is recommended that the local directory contain only .XPT files associated with NHANES.
#'
#' @return
#'
#'
#' This function will return a list with number of elements equal to the number of waves of data specified by the "waves" argument.
#' The name of each element is Covariate_\* where \* corresponds to each element of the "waves" argument.
#' If none of the variables listed in the "varnames" arguemnt (and/or SEQN if SEQN was not supplied to the "varnames" argument)
#' for a particular wave are found, then the element of the returned object will be NULL.
#' If none of the user specified variables are found, but subject identifiers (SEQN) are found, the corresponding elements will still be NULL.
#' See the examples below for illustrations of these scenarios.
#'
#' Most variables in NHANES are measured once per individual. In the event that a user requests a variable which has multiple records for a subject,
#' this function will return the variable in matrix format, with one row per participant and number of columns equal to the number of observations per participant.
#' This matrix is returned within each dataframe using an object with class "AsIs" (See \code{\link{I}} for details).
#' For a concrete example, see the examples below.
#'
#'
#'
#' @examples
#' library("rnhanesdata")
#'
#' ## retrieve default variables
#' covar_ls <- process_covar()
#'
#' ## re-code gender for the both the 2003-2004 and 2005-2006 waves
#' covar_ls$Covariate_C$Gender <- factor(covar_ls$Covariate_C$RIAGENDR, levels=1:2,
#' labels=c("Male","Female"), ordered=FALSE)
#' covar_ls$Covariate_D$Gender <- factor(covar_ls$Covariate_D$RIAGENDR, levels=1:2,
#' labels=c("Male","Female"), ordered=FALSE)
#'
#' ## check that this matches the gender information in the processed data
#' identical(covar_ls$Covariate_C[,c("SEQN","Gender")], Covariate_C[,c("SEQN","Gender")])
#' identical(covar_ls$Covariate_D[,c("SEQN","Gender")], Covariate_D[,c("SEQN","Gender")])
#'
#' ## See the data processing package vignette
#' ## for code to fully reproduce the processed data
#' ## included in the package
#'
#'
#' ## Example where only the participant identifer (SEQN) is found for
#' ## the 2003-2004 and 2005-2006 waves, and no data is found for the 2007-2008 wave.
#' covar_ls2 <- process_covar(waves=c("C","D","E"), varnames=c("ThisIsNotValid"))
#' str(covar_ls2)
#'
#'
#' ## Example of variables with possibly multiple responses per participant.
#' ## These variables correspond to self reported physical activity types:
#' ## PADACTIV: physical activity type (i.e. basketball, swimming, etc.)
#' ## PADLEVEL: intensity of activity identified by PADACTIV (moderate or vigorous)
#' ## PADTIMES: # of times activity identified by PADACTIV was done in the past 30 days
#' ## See the codebook at https://wwwn.cdc.gov/Nchs/Nhanes/2003-2004/PAQIAF_C.htm#PADTIMES
#' ## for additional descriptions of these variables for the 2003-2004 wave
#' covar_ls3 <- process_covar(waves=c("C","D"), varnames=c("PADACTIV","PADLEVEL","PADTIMES"))
#' str(covar_ls3)
#'
#'
#'
#'
#'
#' @importFrom utils setTxtProgressBar txtProgressBar
#'
#' @export
process_covar <- function(waves=c("C","D"),
varnames = c("SDDSRVYR","WTMEC2YR", "WTINT2YR",
"SDMVPSU", "SDMVSTRA",
"RIDAGEMN", "RIDAGEEX", "RIDRETH1", "RIAGENDR",
"BMXWT", "BMXHT", "BMXBMI", "DMDEDUC2",
"ALQ101", "ALQ110", "ALQ120Q","ALQ120U", "ALQ130", "SMQ020", "SMD030", "SMQ040",
"MCQ220","MCQ160F", "MCQ160B", "MCQ160C",
"PFQ049","PFQ054","PFQ057","PFQ059", "PFQ061B", "PFQ061C", "DIQ010"),
localpath=NULL,
extractAll=FALSE){
if(!all(waves %in% LETTERS)) stop("One or more waves invalid. waves must be identified as a captial letter A-Z.")
stopifnot(length(waves) >= 1)
stopifnot(is.vector(waves))
waves <- sort(waves)
## add a counter if SEQN is supplied to the varnames argument -- this only affects a message printed to the user on # of matched variables
cnt <- ifelse("SEQN" %in% varnames, 1, 0)
## ensure ID variable (SEQN) is in search regardless of whether it was included in the varnames argument
varnames <- unique(c('SEQN',varnames))
## find all files of .xpt structure that correspond to the year specified in the data
if(is.null(localpath)){
localpath <- system.file(file.path("extdat","covar"),package="rnhanesdata")
}
files_full <- list.files(localpath)
if(any(!paste0("DEMO_",waves,".XPT") %in% files_full)){
warning(cat(paste0("One or more demographic files were not found in the data directory (",paste0("DEMO_",waves,".XPT", collapse=", ") , ").
There is no guarantee all participants for a particular wave will be included in the returned object.")))
}
ret <- rep(list(NULL), length(waves))
names(ret) <- paste0("Covariate_", waves)
pb <- txtProgressBar(min=0, max=length(waves), style=3)
for(i in seq_along(waves)){
cohort <- waves[i]
pathExt <- paste0('_', cohort, '.XPT', sep='')
## find only those files associated with a particular NHANES wave
files <- files_full[substr(files_full, (nchar(files_full) - 5), nchar(files_full)) == pathExt]
if(length(files) == 0){
message(cat(paste0("\n No data associated with wave ", waves[i], " was found. \n")))
setTxtProgressBar(pb, i)
next
}
## find which files contain the variables requested by the user
covarMats <- lapply(files, function(x){
mat <- try(read_xpt(file.path(localpath,x)))
if(inherits(mat, "try-error")) return(NULL)
## expecting SEQN to be first column in all NHANES datasets
if(!"SEQN" %in% colnames(mat)[1]) return(NULL)
## if extractAll = FALSE, return only those columns specified in the varnames argument
if(!extractAll){
if(length(setdiff(intersect(colnames(mat),varnames), "SEQN"))==0) return(NULL)
mat <- mat[,colnames(mat)%in%varnames,drop=FALSE]
}
## if extractAll = TRUE, return all columns
if(!any(is.null(dim(mat)),all(colnames(mat) %in% "SEQN"))) {
return(mat)
} else {
return(NULL)
}
})
## remove any matrices with no matches
covarMats <- covarMats[!vapply(covarMats, is.null,logical(1))]
matchedNames <- lapply(covarMats, colnames)
numMatched <- length(unlist(matchedNames))
if(numMatched == 0) {
message(cat(paste0("\n No variables specified by the varnames argument was found for wave ", waves[i], "\n")))
setTxtProgressBar(pb, i)
next
}
if(numMatched > 0 & !extractAll) message(
cat(paste("\n For", cohort, "cohort,",
(numMatched - length(matchedNames) + cnt),
'Covariates Found of', (length(varnames)-1 + cnt),'specified.',
'Missing covariates:',
paste(setdiff(varnames, unlist(matchedNames)),collapse=", ") )))
## Merge the covariate data
ids <- lapply(covarMats, function(x) x[['SEQN']])
uids <- as.integer(sort(unique(unlist(ids))))
## check for duplicated SEQN. If there is more than one row per subject, we use a different merging mechanism
rep_SEQN <- vapply(ids, function(x) any(duplicated(x)), logical(1))
rep_inx <- which(rep_SEQN)
notrep_inx <- which(!rep_SEQN)
totalCols <- sum(vapply(covarMats, ncol, numeric(1))) - length(covarMats) + 1
CovarMat <- matrix(NA,ncol=totalCols,nrow=length(uids))
colnames(CovarMat) <- c('SEQN', unlist(sapply(covarMats,function(x) colnames(x)[-1])))
CovarMat[,'SEQN'] <- uids
CovarMat <- data.frame(CovarMat)
invisible(lapply(covarMats[notrep_inx], function(x) CovarMat[,colnames(x)[-1]] <<- x[match(CovarMat$SEQN,x$SEQN),-1,drop=FALSE] ) )
## if any of the variabels have multiple observations per id,
## convert them to a matrix wide format, then merge
if(length(rep_inx) != 0) {
invisible(lapply(covarMats[rep_inx], function(x){
n_vars <- length(colnames(x))
for(j in 2:n_vars){
max_reps <- max(table(x$SEQN),na.rm=TRUE)
mat_tmp <- matrix(NA, ncol=max_reps, nrow=nrow(CovarMat))
for(i in 1:nrow(CovarMat)){
if(!CovarMat$SEQN[i] %in% x$SEQN) next
ji_inx <- which(x$SEQN == CovarMat$SEQN[i])
mat_tmp[i,1:length(ji_inx)] <- x[ji_inx,j,drop=TRUE]
rm(list=c("ji_inx"))
}
CovarMat[colnames(x)[j]] <<- I(mat_tmp)
rm(list=c("max_reps","mat_tmp"))
}
})
)
message(
cat(paste("\n Variables with repeated observations per subject found for the following variables:",
paste(sapply(covarMats[rep_inx], function(x) colnames(x)[-1]) , collapse=","),
"Note that these variables will be stored with class AsIs() objects in resulting data frames. ",
"See ?I for details on AsIs class. \n"))
)
}
out.name <- paste0("Covariate_", cohort)
ret[[out.name]] <- CovarMat
rm(list=c("CovarMat","out.name"))
setTxtProgressBar(pb, i)
}
ret
}
#' Remove days with too few/too much weartime and NHANES data quality flags.
#'
#' @description
#' This function subsets accelerometry data by total wear/non-wear time criteria, as well as
#' NHANES data quality flags.
#'
#' @param act Activity data matrix. Should contain, at a minimum, the columns: SEQN, PAXCAL, PAXSTAT, WEEKDAY, SDDSRVYR. In addition,
#' the activity data matrix should have 1440 columns with MIN1, MIN2, ..., MIN1440.
#'
#' @param flags Wear/non-wear flag matrix. Should contain the same columns as the activity data matrix, in the same order. However, instead of
#' activity counts, the columns MIN1, MIN2, ..., MIN1440 should be binary (0/1) wear/non-wear flags where 1 indicates wear and 0 indicates non-wear.
#'
#' @param threshold_lower Lower limit on the amount of wear-time for a day to be considered good. Defaults to 600 minutes which implies a "good" day has
#' at least 10 hours of weartime.
#'
#' @param rm_PAXSTAT Logical value indicating whether to remove based on the data reliability indicator variable PAXSTAT. Defaults to TRUE.
#'
#' @param rm_PAXCAL Logical value indicating whether to remove based on the accelerometer calibration indicator variable PAXCAL. Defaults to TRUE.
#'
#' @details
#'
#'
#' Fundamentally, all this function does is check to see if a day has a user specified amount of estimated total wear time (ignoring timing of wear)
#' and checking for whether any data quality concerns were indicated by NHANES via the PAXSTAT and PAXCAL variables.
#' Because this function doesn't actually use the activity count data, it's not technically necessary to include the activity data matrix. However, forcing
#' the activity data matrix to be included and checking that subject-days are identical between the activity and wear/non-wear flag matrices adds a layer of
#' protection against subsetting or sorting one data matrix but not the other.
#'
#' This function ignores missing data. Missing values to not count toward (or against) wear time.
#'
#'
#' @return
#'
#' This function returns a numeric vector containing the indices of days which were identified as "good". These indices can be used to subset
#' the accelerometry data as desired. An illustration is provided in the examples.
#'
#'
#' @examples
#'
#' library("rnhanesdata")
#'
#' ## remove all days with fewer than 10 hours of wear time in the 2003-2004 accelerometry data
#' ## and exclude all days with data quality/calibration flags
#' data("PAXINTEN_C")
#' data("Flags_C")
#'
#' ## obtain indices for "good" days using the default threshold of at least 10 hours of weartime
#' keep_inx <- exclude_accel(act = PAXINTEN_C, flags = Flags_C)
#' ## subset the accelerometry and flags data using these indices
#' accel_good_C <- PAXINTEN_C[keep_inx,]
#' flags_good_C <- Flags_C[keep_inx,]
#'
#' ## check that all remaining days have at least 10 hours of wear
#' ## and there are no data quality issues as flagged by the PAXSTAT and PAXCAL variables
#' summary(rowSums(flags_good_C[,paste0("MIN",1:1440)], na.rm=TRUE))
#' table(flags_good_C$PAXSTAT)
#' table(flags_good_C$PAXCAL)
#'
#' @references
#'
#'
#' Hart, Teresa L et al. “How Many Days of Monitoring Predict Physical Activity and Sedentary Behaviour in Older Adults?”
#' The International Journal of Behavioral Nutrition and Physical Activity 8 (2011): 62. PMC. Web. 10 Oct. 2018.
#'
#'
#'
#' @export
exclude_accel <- function(act, flags, threshold_lower = 600, rm_PAXSTAT = TRUE, rm_PAXCAL = TRUE){
stopifnot(all(is.data.frame(act), is.data.frame(flags)))
stopifnot(all(colnames(act) == colnames(flags)))
stopifnot(all(c("SEQN","WEEKDAY","PAXSTAT", "PAXCAL", paste0("MIN",1:1440)) %in% colnames(act)))
act_cols <- which(colnames(act) %in% paste0("MIN",1:1440))
if(!identical(act[,-act_cols], flags[,-act_cols])){
stop("One or more columns of the act and flags do not match. Please double check that these two dataframes are identical except for the activity count and wear/non-wear columns")
}
act = act[, c("PAXSTAT", "PAXCAL")]
flags = flags[,act_cols]
if(!all(as.vector(as.matrix(flags))) %in% c(0,1,NA)){
stop("Wear/non-wear flags need to be either 0 (non-wear), 1 (wear), or NA (missing)")
}
flag_nonwear <- rowSums(flags, na.rm = TRUE) < threshold_lower
rm(flags)
stopifnot(all(is.finite(act$PAXSTAT),is.finite(act$PAXCAL)))
cond <- c("flag_nonwear", "!(act$PAXSTAT %in% 1)","!(act$PAXCAL %in% 1)")[c(TRUE, rm_PAXSTAT, rm_PAXCAL)]
cond <- paste(cond, collapse="|")
eval(parse(text=paste("return(which(!(",cond,")))")))
}
#' Reweight NHANES accelerometry data
#'
#' @description
#' This function re-weights accelerometry data for NHANES 2003-2004,2005-2006 waves.
#'
#' @param data Data frame to with survey weights to be re-weighted. Should not contain any duplicated participants. That is, each row
#' of this dataframe should correspond to a unique value of SEQN.
#' The data frame supplied to data must have the columns: SEQN", SDDSRVYR,WTMEC2YR, and WTINT2YR.
#'
#' @param return_unadjusted_wts Logical value indicating whether to return the unadjusted 2-year and, if applicable, 4-year survey weights for all participants.
#'
#' @param age_bks Vector of ages which define the intervals used for re-weighting. This argument is passed to the \code{\link{cut}} function to create age categories which are in turn
#' used to re-weight participants. The argument "right" determines whether these intervals will be closed on the right or the left.
#'
#' @param right Logical value indicating whether the age intervals defined by the "age_bks" arguement should be closed on the left (right=FALSE) or the right (right=TRUE).
#' See \code{\link{cut}} for additional details and examples. Defaults to TRUE.
#'
#' @details
#'
#' The reweight_accel function is designed to re-weight only the 2003-2004 and 2005-2006 waves in the context of missing data.
#' This function calculates 2- and 4- year adjusted and unadjusted survey weights.
#' The re-weighting is performed using age, sex, and ehtnicity strata applied to each wave separately.
#' More specifically, individuals in the data frame supplied to the function via the
#' "data" argument are upweighted by a factor such that the sum of their weights is equal to the total survey weight in the
#' population strata. If data are missing completely at random within each of these strata, then these re-weighted strata are
#' representative of the corresponding strata in the larger study.
#'
#' Users should ensure that if they intend to use the adjusted weights calculated by this function, that the data they reweight aligns with
#' the re-weighted strategy, particularly with regard to age. That is, it does make sense to reweight all individuals 58-60 to be representative of
#' all individuals ages 50-60. The age categories used in re-weighting are controlled by the "age_bks" argument. In illustrate the problems of misalignment of ages
#' in the examples below. Moreover, the re-weighting is done separately for the interview and examination weights. Because there is a time lag between the interview and the exam, individuals
#' may belong to different age strata for the purposes of re-weighting the interview and examination survey weights. Therefore, users need to make sure the
#' ages in their data align with the survey weight they intend to use.
#'
#' It is possible that if there are one or more strata that are sparse, the survey weights. Users should always inspect the adjusted survey weights
#' for outliers.
#'
#'
#'
#'
#'
#' @return
#'
#' The function reweight_accel will return a dataframe with the same columns as the data frame supplied to the "data" argument with either 8 or 16 additional columns.