Skip to content

Commit

Permalink
Add vignette and update readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
JieYinStat committed Feb 12, 2024
1 parent f9b6cc4 commit 8d609ac
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 7 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
.DS_Store
.quarto
docs
inst/doc
3 changes: 3 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ RoxygenNote: 7.2.3
URL: https://github.com/JieYinStat/dbsubsampling, https://jieyinstat.github.io/dbsubsampling/
BugReports: https://github.com/JieYinStat/dbsubsampling/issues
Suggests:
knitr,
mvtnorm,
rmarkdown,
testthat (>= 3.0.0)
Config/testthat/edition: 3
Imports:
withr
Depends:
R (>= 2.10)
LazyData: true
VignetteBuilder: knitr
15 changes: 12 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,20 @@ devtools::install_github("JieYinStat/dbsubsampling")

## Example

This is a basic example which shows you how to solve a common problem:
This is a basic example which shows you how to get subsample index, such as uniform sampling and OSMAC:

```{r example}
library(dbsubsampling)
## basic example code
data <- data_binary_class
# Uniform sampling
subsampling(y_name = "y", data = data, n = 30, method = "Unif", seed_1 = 123)
# OSMAC-A
subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
method = "OSMAC_A", seed_1 = 123, seed_2 = 456)
# OSMAC-L
subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
method = "OSMAC_L", seed_1 = 123, seed_2 = 456)
```

What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
You can get more detailed examples from the article column on the [website](jieyinstat.github.io/dbsubsampling/).
24 changes: 20 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,28 @@ devtools::install_github("JieYinStat/dbsubsampling")

## Example

This is a basic example which shows you how to solve a common problem:
This is a basic example which shows you how to get subsample index, such
as uniform sampling and OSMAC:

``` r
library(dbsubsampling)
## basic example code

data <- data_binary_class
# Uniform sampling
subsampling(y_name = "y", data = data, n = 30, method = "Unif", seed_1 = 123)
#> [1] 2463 2511 8718 2986 1842 9334 3371 4761 6746 9819 2757 5107 9145 9209 2888
#> [16] 6170 2567 9642 9982 2980 1614 555 4469 9359 7789 9991 9097 1047 7067 3004
# OSMAC-A
subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
method = "OSMAC_A", seed_1 = 123, seed_2 = 456)
#> [1] 5684 1620 5372 8297 8863 9783 6483 6103 2702 5735 9382 40 9919 8623 2816
#> [16] 5035 6088 2006 4702 1993 4279 9827 8738 8892 7632 6836 6393 6405 99 3952
# OSMAC-L
subsampling(y_name = "y", data = data, n = 30, pilot_n = 100,
method = "OSMAC_L", seed_1 = 123, seed_2 = 456)
#> [1] 5813 1681 5372 8313 8863 9780 1630 6103 2702 5888 9382 9843 9913 8635 2816
#> [16] 5035 6211 2090 4702 2083 4385 9813 8776 8904 4425 6899 1615 6513 99 4076
```

What is special about using `README.Rmd` instead of just `README.md`?
You can include R chunks like so:
You can get more detailed examples from the article column on the
[website](jieyinstat.github.io/dbsubsampling/).
2 changes: 2 additions & 0 deletions vignettes/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.html
*.R
54 changes: 54 additions & 0 deletions vignettes/Subsampling.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "Subsampling"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Subsampling}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

You can get your subsample based various design-based methods, here we give some examples.
```{r setup}
library(dbsubsampling)
```

# Uniform Sampling
Get random subsample with equal probability.
```{r unif}
N <- 1000
n <- 10
Unif(N = 1000, n = 10)
```
You can set a random seed, this random seed is only valid for this sampling and will not affect the external environment.

```{r unif with seed}
Unif(N = 1000, n = 10, seed = 123, replace = TRUE)
```

# OSMAC
A subsampling method based on A- / L- optimal for logistic regression proposed by [Wang et.al. (2018)](https://www.tandfonline.com/doi/full/10.1080/01621459.2017.1292914)^[HaiYing Wang, Rong Zhu and Ping Ma (2018) *Optimal Subsampling for Large Sample Logistic Regression, Journal of the American Statistical Association, 113:522, 829-844*.].

## A-optimal
A-optimal minimise the trace of the covariance matrix of the parameter estimates.
```{r OSMAC-A}
data <- data_binary_class
y <- data[["y"]]
x <- data[-which(names(data) == "y")]
OSMAC(X = x, Y = y, r1 = 100, r2 = 5, method="mmse", seed_1 = 123, seed_2 = 456)
```

## L-optimal
L-optimal minimise the trace of the covariance matrix of the linear combination of parameter estimates.
```{r OSMAC-L}
OSMAC(X = x, Y = y, r1 = 100, r2 = 5, method="mvc", seed_1 = 123, seed_2 = 456)
```

**We're working on more features,such as subsampling based on OSS, Lowcon, support point, etc. **

0 comments on commit 8d609ac

Please sign in to comment.