-
Notifications
You must be signed in to change notification settings - Fork 0
/
Quick-start.Rmd
102 lines (86 loc) · 4.25 KB
/
Quick-start.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: "Quick Start"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Quick-start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(tidyverse)
library(knitr)
library(imputeTS)
library(dlm)
library(crayon)
library(MASS)
```
# Introduction
We start with a simple case, where we use `data_stationary` included in our package, which is generated with the following data generation process:
$$
Y_t = 40+0.5Y_{t-1}-1.5X_t-0.5X_{t-1}-C_t+v_t, v_t \sim \text{i.i.d }\mathcal{N}(0,1)
$$
```{r include=FALSE}
library(SSMimpute)
data_randomwalk
```
# Generate Missing Index
Then using the function `generate.missing_index()` provided in our package, we create a MNAR missing index, and create the columns for corresponding lagged outcomes.
```{r fig.height = 7, fig.width = 16}
data = data_randomwalk#using data_stationary
index_MNAR = generate.missing_index(type = "MNAR",
n=length(data$y),
param = list(data = data.frame(y=data$y,x=data$x),
MNAR.type = "increasing",
coeff = c(0.1,-0.3,0),
MNAR.drawplot = c(TRUE, "y"))
)$missing_index
#data$y[index_MNAR] = NA
data$y_1 = c(NA,data$y[1:999])
data$x_1 = c(NA,data$x[1:999])
data$c_1 = c(NA,data$c[1:999])
data = data[c(2:1000),]
write_csv(data, "sample_data_complete.csv")
data_space_SSMimpute = data
kable(head(data_space_SSMimpute))
imputeTS::ggplot_na_distribution(data_space_SSMimpute$y, color_missing = "pink",color_missing_border = "pink", alpha_missing = 0.1)
```
# Start Imputation
Based on our prior knowledge(usually we need an additional step for data exploration, see: exploration of state-space model for a detailed introduction), we set up the formula and all the learning parameter in `ss_param` and `cpt_learning_param`
```{r message=FALSE, warning=FALSE, results='hide'}
formula="y~y_1+x+x_1+c"
formula_var=unlist(strsplit(unlist(strsplit(formula,"~"))[2],"+",fixed=T))
ss_param=list(inits=c(log(0.25),log(1)),m0=c(40,0.1,-0.1,-0.1,-1),C0=diag(rep(10^3),5), AR1_coeffi=NULL,rw_coeffi="intercept", v_cp_param=NULL, w_cp_param=NULL,max_iteration=100)
result_SSMimpute1=SSMimpute_unanimous_cpts(data_ss_ori=data_space_SSMimpute,formula_var,ss_param_temp=ss_param,
initial_imputation_option="StructTS",
estimate_convergence_cri=0.01,
lik_convergence_cri=0.01,
stepsize_for_newpart=1/3,
max_iteration=100,
cpt_learning_param=list(cpt_method="mean",burnin=1/10,mergeband=20,convergence_cri=10),
cpt_initial_guess_option="ignore",
dlm_option="smooth",m=5,seed=1,printFlag=F)
```
# Analysis of Result
Now we take a close look at the result generated from `SSMimpute_unanimous_cpts()`
Estimated coefficient:
```{r}
#kable(result_statespace_SSMimpute1$result_convergence)
#kable(result_statespace_SSMimpute1$result_convergence_mp)
kable(result_SSMimpute1$result_convergence_mp_addV)
```
We can see `SSMimpute_unanimous_cpts()` successfully uncover the data generating process.
We also plot the imputed missing value versus the ground truth:
```{r fig.height = 7, fig.width = 16}
#result_statespace_SSMimpute1$estimated_cpts
data_na = result_SSMimpute1$data_temp
length(data_na$y_1)
data_temp = result_SSMimpute1$data_temp
missing_part=which(is.na(data_temp$y))[which(is.na(data_temp$y))<nrow(data_temp)]
data_temp$y_1[missing_part+1]=result_SSMimpute1$y_final
#imputeTS::ggplot_na_distribution(data_stationary$y[1:999], color_missing = "pink",color_missing_border = "pink", alpha_missing = 0.9)
imputeTS::ggplot_na_imputations(x_with_na = data_space_SSMimpute$y_1, x_with_imputations = data_na$y_1,x_with_truth = data_stationary$y[1:999])
```