/
data.Rmd
85 lines (67 loc) · 2.33 KB
/
data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
title: "Input Data"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Input Data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Overview
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE)
```
The MASTA algorithm is a semi-supervised learning method and it requires three input data files.
- A long form longitudinal data for predicting the time-to-event outcomes [`longitudinal`]
- A follow up time data to inform the length of follow-up time for each patient [`follow_up_time`]
- A labeled data with time-to-event outcomes and baseline predictors [`survival`]
In Step I of the MASTA algorithm, `longitudinal` and `follow_up_time` will be used to extract features from estimated subject-specific intensity functions of individual encounters. In Step II of the MASTA algorithm, `survival` and `follow_up_time` will be used to train and evaluate risk prediction models with survival outcomes.
The `MASTA` package contains these three data files as a sample.
```{r, echo=FALSE, fig.width=5, fig.height=3}
DiagrammeR::grViz("
digraph flowchart {
node [fontname = Helvetica, shape = rectangle]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab3 [label = '@@3']
tab4 [label = '@@4']
tab5 [label = '@@5']
tab1 -> tab2;
tab1 -> tab3;
tab2 -> tab4;
tab2 -> tab5;
}
[1]: 'Input Data'
[2]: 'Labeled (n = 1,100)'
[3]: 'Unlabeled (N = 20,000)'
[4]: 'Training (600)'
[5]: 'Validation (500)'
")
```
```{r, eval=TRUE}
library(MASTA)
```
## Longitudinal Encounter Data
```{r, eval=FALSE}
?longitudinal
```
```{r}
head(longitudinal)
table(longitudinal$code)
```
## Follow-up Time Data with the training/validation indicator
One subject has one record in this data. The variable `train_valid` indicates which cohort each subject belong to, training (1) or validation (2).
```{r, eval=FALSE}
?follow_up_time
```
```{r}
head(follow_up_time)
```
## Time-to-Event Data
One subject has one record in this data. The variable `event_ind` indicates whether the subject has an event (1) or not (0). For those who do not have event (i.e., `event_ind=0`), `event_time` in this data set should be the same as the `fu_time` in follow_up_time.
The current version of the MASTA package requires that at least one baseline predictor is included in this data.
```{r, eval=FALSE}
?survival
```
```{r}
head(survival)
```