Skip to content

Commit

Permalink
links
Browse files Browse the repository at this point in the history
  • Loading branch information
jwijffels committed Apr 19, 2021
1 parent 2089c24 commit 5c38ff6
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 10 deletions.
4 changes: 2 additions & 2 deletions R/read_sas.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
#' Mark that files on the local file system need to be specified using the full path.
#' @param table character string with the name of the Spark table where the SAS dataset will be put into
#' @return an object of class \code{tbl_spark}, which is a reference to a Spark DataFrame based on which
#' dplyr functions can be executed. See \url{https://github.com/rstudio/sparklyr}
#' dplyr functions can be executed. See \url{https://github.com/sparklyr/sparklyr}
#' @export
#' @seealso \code{\link[sparklyr]{spark_connect}}, \code{\link[sparklyr]{sdf_register}}
#' @references \url{https://spark-packages.org/package/saurfang/spark-sas7bdat}, \url{https://github.com/saurfang/spark-sas7bdat}, \url{https://github.com/rstudio/sparklyr}
#' @references \url{https://spark-packages.org/package/saurfang/spark-sas7bdat}, \url{https://github.com/saurfang/spark-sas7bdat}, \url{https://github.com/sparklyr/sparklyr}
#' @examples
#' \dontrun{
#' ## If you haven't got a Spark cluster, you can install Spark locally like this
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# spark.sas7bdat

The **spark.sas7bdat** package allows R users working with [Apache Spark](https://spark.apache.org) to read in [SAS](http://www.sas.com) datasets in .sas7bdat format into Spark by using the [spark-sas7bdat Spark package](https://spark-packages.org/package/saurfang/spark-sas7bdat). This allows R users to
The **spark.sas7bdat** package allows R users working with [Apache Spark](https://spark.apache.org) to read in [SAS](https://www.sas.com) datasets in .sas7bdat format into Spark by using the [spark-sas7bdat Spark package](https://spark-packages.org/package/saurfang/spark-sas7bdat). This allows R users to

- load a SAS dataset in parallel into a Spark table for further processing with the [sparklyr](https://cran.r-project.org/package=sparklyr) package
- process in parallel the full SAS dataset with dplyr statements, instead of having to import the full SAS dataset in RAM (using the foreign/haven packages) and hence avoiding RAM problems of large imports


## Example
The following example reads in a file called iris.sas7bdat in a table called sas_example in Spark. Do try this with bigger data on your cluster and look at the help of the [sparklyr](https://github.com/rstudio/sparklyr) package to connect to your Spark cluster.
The following example reads in a file called iris.sas7bdat in a table called sas_example in Spark. Do try this with bigger data on your cluster and look at the help of the [sparklyr](https://github.com/sparklyr/sparklyr) package to connect to your Spark cluster.

```r
library(sparklyr)
Expand Down
1 change: 1 addition & 0 deletions inst/NEWS
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
CHANGES IN spark.sas7bdat VERSION 1.4

o Fix URL's
o Add rmarkdown to Suggests in DESCRIPTION

CHANGES IN spark.sas7bdat VERSION 1.3
Expand Down
4 changes: 2 additions & 2 deletions man/spark_read_sas.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions vignettes/spark_sas7bdat_examples.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,18 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

This R package allows R users to easily import large [SAS](http://www.sas.com) datasets into [Spark](https://spark.apache.org) tables in parallel.
This R package allows R users to easily import large [SAS](https://www.sas.com) datasets into [Spark](https://spark.apache.org) tables in parallel.


The package uses the [spark-sas7bdat Spark package](https://spark-packages.org/package/saurfang/spark-sas7bdat) in order to read a SAS dataset in Spark. That Spark package imports the data in parallel on the Spark cluster using the Parso library and this process is launched from R using the [sparklyr](https://github.com/rstudio/sparklyr) functionality.
The package uses the [spark-sas7bdat Spark package](https://spark-packages.org/package/saurfang/spark-sas7bdat) in order to read a SAS dataset in Spark. That Spark package imports the data in parallel on the Spark cluster using the Parso library and this process is launched from R using the [sparklyr](https://github.com/sparklyr/sparklyr) functionality.

More information about the spark-sas7bdat Spark package and sparklyr can be found at:

- https://spark-packages.org/package/saurfang/spark-sas7bdat and https://github.com/saurfang/spark-sas7bdat
- https://github.com/rstudio/sparklyr
- https://github.com/sparklyr/sparklyr

## Example
The following example reads in a file called iris.sas7bdat in parallel in a table called sas_example in Spark. Do try this with bigger data on your cluster and look at the help of the [sparklyr](https://github.com/rstudio/sparklyr) package to connect to your Spark cluster.
The following example reads in a file called iris.sas7bdat in parallel in a table called sas_example in Spark. Do try this with bigger data on your cluster and look at the help of the [sparklyr](https://github.com/sparklyr/sparklyr) package to connect to your Spark cluster.

```{r, eval=FALSE}
library(sparklyr)
Expand Down

0 comments on commit 5c38ff6

Please sign in to comment.