/
drake_continuous.Rmd
184 lines (151 loc) · 4.27 KB
/
drake_continuous.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
title: "SamplingStrata workflow with drake"
subtitle: "Continuous method"
author: "Giulio Barcaroli"
date: "`r Sys.Date()`"
# output: rmarkdown::html_vignette
output:
bookdown::html_document2:
df_print: kable
highlight: tango
number_sections: yes
theme: flatly
toc: yes
toc_depth: 2
toc_float:
collapsed: no
smooth_scroll: yes
toc_title: spsur vs spse
# code_folding: hide
vignette: >
%\VignetteIndexEntry{SamplingStrata workflow with drake}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
\usepackage[utf8]{inputenc}
---
```{r setup, include = FALSE}
options(width = 999)
options(warn=-1)
knitr::opts_chunk$set(fig.width=6, fig.height=4,
collapse = TRUE,
comment = "#>"
)
```
# Workflow definition
First we define packages:
```{r packages, results='hide', message=FALSE, warning=FALSE}
library(drake)
library(SamplingStrata)
```
then 'swissmunicipalities' example data (with only 2 domains):
```{r load_datasets}
data(swissmunicipalities)
swissmun <- swissmunicipalities[swissmunicipalities$REG < 3,
c("REG","COM","Nom","HApoly",
"Surfacesbois","Surfacescult",
"Airbat","POPTOT")]
head(swissmun)
```
Finally, the plan:
```{r plan}
plan <- drake_plan(
# Definition of the sampling frame
frame = buildFrameDF(
df = swissmun,
id = "COM",
domainvalue= "REG",
X = c("HApoly","POPTOT"),
Y =c("Surfacesbois", "Airbat")),
# Definition of precision constraints
cv = as.data.frame(list(
DOM = rep("DOM1",length(unique(swissmun$REG))),
CV1 = rep(0.10,length(unique(swissmun$REG))),
CV2 = rep(0.10,length(unique(swissmun$REG))),
domainvalue= c(1:length(unique(swissmun$REG))))),
# Solution with kmeans
kmean = KmeansSolution2(frame=frame,
errors=cv,
maxclusters=5),
# Determination of number of strata for each domain
nstrat = tapply(kmean$suggestions,
kmean$domainvalue,
FUN=function(x) length(unique(x))),
# Initial solutions
sugg = prepareSuggestion(kmean=kmean,
frame=frame,
nstrat=nstrat),
# Optimisation
solution = optimStrata(method="continuous",
framesamp=frame,
errors= cv,
suggestions = sugg,
nStrat=nstrat,
iter = 25,
pops= 10),
# Optimal strata
strata_structure = summaryStrata(solution$framenew,
solution$aggr_strata,
progress=FALSE),
# Expected CVs with optimal strata
expected_CV1 = expected_CV(solution$aggr_strata),
# Adjust allocation with affordable sample size
sample_size = 150,
adjustedStrata = adjustSize(size=sample_size,strata=solution$aggr_strata,cens=NULL),
# New expected CVs
expected_CV2 = expected_CV(adjustedStrata),
# Evaluation by simulation
eval = evalSolution(frame = solution$framenew,
outstrata = adjustedStrata,
nsampl = 500,
progress = FALSE),
# Final selection of the sample
sample = selectSample(solution$framenew,adjustedStrata)
)
```
# Execution of the workflow
We have just defined this plan:
```{r list_plan}
print(plan)
```
visualised in this way:
```{r graph}
vis_drake_graph(plan, ncol_legend = NULL,
main="Optimization with 'continuous' method")
```
and now we execute:
```{r execution}
make(plan)
```
# Output inspection
First of all, the total size of the optimized sample:
```{r }
options(width = 999)
loadd(solution)
sum(solution$aggr_strata$SOLUZ)
```
Then, the structure of the optimized strata:
```{r }
loadd(strata_structure)
strata_structure
```
The expected CVs given the optimized strata:
```{r }
loadd(expected_CV1)
expected_CV1
```
and after the adjustment of the allocation in the strata based on the affordable sample size:
```{r }
loadd(expected_CV2)
expected_CV2
```
The results of simulation with the adjusted allocation:
```{r }
loadd(eval)
eval$coeff_var
eval$rel_bias
```
Finally, the selection of the sample from the adjusted strata:
```{r }
loadd(sample)
head(sample)
```