Skip to content

Commit

Permalink
Merge pull request #57 from maelle/installation
Browse files Browse the repository at this point in the history
  • Loading branch information
krlmlr committed Oct 15, 2023
2 parents f207d34 + 92806c7 commit 3003aeb
Show file tree
Hide file tree
Showing 2 changed files with 158 additions and 94 deletions.
113 changes: 75 additions & 38 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,31 +24,56 @@ set.seed(20230702)
[![R-CMD-check](https://github.com/duckdblabs/duckplyr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/duckdblabs/duckplyr/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of duckplyr is to provide a drop-in replacement for dplyr that uses DuckDB as a backend for fast operation.
It also defines a set of generics that provide a low-level implementer's interface for dplyr's high-level user interface.
The goal of duckplyr is to provide a drop-in replacement for dplyr that uses [DuckDB](https://duckdb.org/) as a backend for fast operation.
DuckDB is an in-process SQL OLAP database management system.

## Example
duckplyr also defines a set of generics that provide a low-level implementer's interface for dplyr's high-level user interface.

```{r load}
## Installation

Install duckplyr from CRAN with:

``` r
install.packages("duckplyr")
```

You can also install the development version of duckplyr from R-universe:

``` r
install.packages('duckplyr', repos = c('https://duckdblabs.r-universe.dev', 'https://cloud.r-project.org'))
```

Or from [GitHub](https://github.com/) with:

``` r
# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
pak::pak("duckdblabs/duckplyr")
```


## Examples

```{r attach}
library(conflicted)
library(duckplyr)
conflict_prefer("filter", "duckplyr")
```

There are two ways to use duckplyr.

1. To enable for individual data frames, use `as_duckplyr_df()` as the first step in your pipe.
1. To enable for the entire session, use `methods_overwrite()`.
1. To enable duckplyr for individual data frames, use `as_duckplyr_df()` as the first step in your pipe.
1. To enable duckplyr for the entire session, use `methods_overwrite()`.

The examples below illustrate both methods.
See also the companion [demo repository](https://github.com/Tmonster/duckplyr_demo) for a use case with a large dataset.

### Individual
### Usage for individual data frames

This example illustrates usage of duckplyr for individual data frames.

```{r individual}
# Use `as_duckplyr_df()` to enable processing with duckdb:
Use `as_duckplyr_df()` to enable processing with duckdb:

```{r}
out <-
palmerpenguins::penguins %>%
# CAVEAT: factor columns are not supported yet
Expand All @@ -57,53 +82,79 @@ out <-
mutate(bill_area = bill_length_mm * bill_depth_mm) %>%
summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>%
filter(species != "Gentoo")
```

The result is a data frame or tibble, with its own class.

# The result is a data frame or tibble, with its own class.
```{r}
class(out)
names(out)
```

duckdb is responsible for eventually carrying out the operations.
Despite the late filter, the summary is not computed for the Gentoo species.

# duckdb is responsible for eventually carrying out the operations.
# Despite the late filter, the summary is not computed for the Gentoo species.
```{r}
out %>%
explain()
```

# All data frame operations are supported.
# Computation happens upon the first request.
All data frame operations are supported.
Computation happens upon the first request.

```{r}
out$mean_bill_area
```

# After the computation has been carried out, the results are available
# immediately:
After the computation has been carried out, the results are available immediately:

```{r}
out
```


### Session-wide
### Session-wide usage

This example illustrates usage of duckplyr for all data frames in the R session.

```{r session}
# Use `methods_overwrite()` to enable processing with duckdb for all data frames:
Use `methods_overwrite()` to enable processing with duckdb for all data frames:

```{r}
methods_overwrite()
```
This is the same query as above, without `as_duckplyr_df()`:

# This is the same query as above, without `as_duckplyr_df()`:
```{r}
out <-
palmerpenguins::penguins %>%
# CAVEAT: factor columns are not supported yet
mutate(across(where(is.factor), as.character)) %>%
mutate(bill_area = bill_length_mm * bill_depth_mm) %>%
summarize(.by = c(species, sex), mean_bill_area = mean(bill_area)) %>%
filter(species != "Gentoo")
```

The result is a plain tibble now:

# The result is a plain tibble now:
```{r}
class(out)
```

Querying the number of rows also starts the computation:

# Querying the number of rows also starts the computation:
```{r}
nrow(out)
```

# Restart R, or call `methods_restore()` to revert to the default dplyr implementation.
Restart R, or call `methods_restore()` to revert to the default dplyr implementation.

```{r}
methods_restore()
```

dplyr is active again:

# dplyr is active again:
```{r}
palmerpenguins::penguins %>%
# CAVEAT: factor columns are not supported yet
mutate(across(where(is.factor), as.character)) %>%
Expand Down Expand Up @@ -211,17 +262,3 @@ rel_names.dfrel <- function(rel, ...) {
rel_names(mtcars_rel)
```

## Installation

Install duckplyr from CRAN with:

``` r
install.packages("duckplyr")
```

You can also install the development version of duckplyr from [GitHub](https://github.com/) with:

``` r
# install.packages("pak", repos = sprintf("https://r-lib.github.io/p/pak/stable/%s/%s/%s", .Platform$pkgType, R.Version()$os, R.Version()$arch))
pak::pak("duckdblabs/duckplyr")
```
Loading

0 comments on commit 3003aeb

Please sign in to comment.