Add vignette and update readme.

JieYinStat · Feb 12, 2024 · 8d609ac · 8d609ac
1 parent f9b6cc4
commit 8d609ac
Show file tree

Hide file tree

Showing 6 changed files with 92 additions and 7 deletions.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,4 @@
 .DS_Store
 .quarto
 docs
+inst/doc
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -13,11 +13,14 @@ RoxygenNote: 7.2.3
 URL: https://github.com/JieYinStat/dbsubsampling, https://jieyinstat.github.io/dbsubsampling/
 BugReports: https://github.com/JieYinStat/dbsubsampling/issues
 Suggests: 
+    knitr,
     mvtnorm,
+    rmarkdown,
     testthat (>= 3.0.0)
 Config/testthat/edition: 3
 Imports: 
     withr
 Depends: 
     R (>= 2.10)
 LazyData: true
+VignetteBuilder: knitr
diff --git a/README.Rmd b/README.Rmd
@@ -33,11 +33,20 @@ devtools::install_github("JieYinStat/dbsubsampling")
 
 ## Example
 
-This is a basic example which shows you how to solve a common problem:
+This is a basic example which shows you how to get subsample index, such as uniform sampling and OSMAC:
 
 ```{r example}
 library(dbsubsampling)
-## basic example code
+
+data <- data_binary_class
+# Uniform sampling
+subsampling(y_name = "y", data = data, n = 30, method = "Unif", seed_1 = 123)
+# OSMAC-A
+subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
+  method = "OSMAC_A", seed_1 = 123, seed_2 = 456)
+# OSMAC-L
+subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
+  method = "OSMAC_L", seed_1 = 123, seed_2 = 456)
 ```
 
-What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
+You can get more detailed examples from the article column on the [website](jieyinstat.github.io/dbsubsampling/).
diff --git a/README.md b/README.md
@@ -23,12 +23,28 @@ devtools::install_github("JieYinStat/dbsubsampling")
 
 ## Example
 
-This is a basic example which shows you how to solve a common problem:
+This is a basic example which shows you how to get subsample index, such
+as uniform sampling and OSMAC:
 
 ``` r
 library(dbsubsampling)
-## basic example code
+
+data <- data_binary_class
+# Uniform sampling
+subsampling(y_name = "y", data = data, n = 30, method = "Unif", seed_1 = 123)
+#>  [1] 2463 2511 8718 2986 1842 9334 3371 4761 6746 9819 2757 5107 9145 9209 2888
+#> [16] 6170 2567 9642 9982 2980 1614  555 4469 9359 7789 9991 9097 1047 7067 3004
+# OSMAC-A
+subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
+  method = "OSMAC_A", seed_1 = 123, seed_2 = 456)
+#>  [1] 5684 1620 5372 8297 8863 9783 6483 6103 2702 5735 9382   40 9919 8623 2816
+#> [16] 5035 6088 2006 4702 1993 4279 9827 8738 8892 7632 6836 6393 6405   99 3952
+# OSMAC-L
+subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
+  method = "OSMAC_L", seed_1 = 123, seed_2 = 456)
+#>  [1] 5813 1681 5372 8313 8863 9780 1630 6103 2702 5888 9382 9843 9913 8635 2816
+#> [16] 5035 6211 2090 4702 2083 4385 9813 8776 8904 4425 6899 1615 6513   99 4076
 ```
 
-What is special about using `README.Rmd` instead of just `README.md`?
-You can include R chunks like so:
+You can get more detailed examples from the article column on the
+[website](jieyinstat.github.io/dbsubsampling/).
diff --git a/vignettes/.gitignore b/vignettes/.gitignore
@@ -0,0 +1,2 @@
+*.html
+*.R
diff --git a/vignettes/Subsampling.Rmd b/vignettes/Subsampling.Rmd
@@ -0,0 +1,54 @@
+---
+title: "Subsampling"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Subsampling}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>"
+)
+```
+
+You can get your subsample based various design-based methods, here we give some examples.
+```{r setup}
+library(dbsubsampling)
+```
+
+# Uniform Sampling
+Get random subsample with equal probability.
+```{r unif}
+N <- 1000
+n <- 10
+Unif(N = 1000, n = 10)
+```
+You can set a random seed, this random seed is only valid for this sampling and will not affect the external environment.
+
+```{r unif with seed}
+Unif(N = 1000, n = 10, seed = 123, replace = TRUE)
+```
+
+# OSMAC
+A subsampling method based on A- / L- optimal for logistic regression proposed by [Wang et.al. (2018)](https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1292914)^[HaiYing Wang, Rong Zhu and Ping Ma (2018) *Optimal Subsampling for Large Sample Logistic Regression, Journal of the American Statistical Association, 113:522, 829-844*.].
+
+## A-optimal
+A-optimal minimise the trace of the covariance matrix of the parameter estimates.
+```{r OSMAC-A}
+data <- data_binary_class
+y <- data[["y"]]
+x <- data[-which(names(data) == "y")]
+
+OSMAC(X = x, Y = y, r1 = 100, r2 = 5, method="mmse", seed_1 = 123, seed_2 = 456)
+```
+
+## L-optimal
+L-optimal minimise the trace of the covariance matrix of the linear combination of parameter estimates.
+```{r OSMAC-L}
+OSMAC(X = x, Y = y, r1 = 100, r2 = 5, method="mvc", seed_1 = 123, seed_2 = 456)
+```
+
+**We're working on more features，such as subsampling based on OSS, Lowcon, support point, etc. **