s3fs
provides a file-system like interface into Amazon Web Services
for R
. It utilizes paws
SDK
and
R6
for it’s core design. This repo has
been inspired by Python’s s3fs
,
however it’s API and implementation has been developed to follow R
’s
fs
.
You can install the released version of s3fs from CRAN with:
install.packages('s3fs')
r-universe installation:
# Enable repository from dyfanjones
options(repos = c(
dyfanjones = 'https://dyfanjones.r-universe.dev',
CRAN = 'https://cloud.r-project.org')
)
# Download and install s3fs in R
install.packages('s3fs')
Github installation
remotes::install_github("dyfanjones/s3fs")
paws
: connection with AWS S3R6
: Setup core classdata.table
: wrangle lists into data.framesfs
: file system on local fileslgr
: set up loggingfuture
: set up async functionalityfuture.apply
: set up parallel looping
s3fs
attempts to give the same interface as fs
when handling files
on AWS S3 from R
.
- Vectorization. All
s3fs
functions are vectorized, accepting multiple path inputs similar tofs
. - Predictable.
- Non-async functions return values that convey a path.
- Async functions return a
future
object of it’s no-async counterpart. - The only exception will be
s3_stream_in
which returns a list of raw objects.
- Naming conventions. s3fs functions follows
fs
naming conventions withdir_*
,file_*
andpath_*
however with the syntaxs3_
infront i.es3_dir_*
,s3_file_*
ands3_path_*
etc. - Explicit failure. Similar to
fs
if a failure happens, then it will be raised and not masked with a warning.
- Scalable. All
s3fs
functions are designed to have the option to run in parallel through the use offuture
andfuture.apply
.
For example: copy a large file from one location to the next.
library(s3fs)
library(future)
plan("multisession")
s3_file_copy("s3://mybucket/multipart/large_file.csv", "s3://mybucket/new_location/large_file.csv")
s3fs
to copy a large file (> 5GB) using multiparts, future
allows
each multipart to run in parallel to speed up the process.
- Async.
s3fs
usesfuture
to create a few key async functions. This is more focused on functions that might be moving large files to and fromR
andAWS S3
.
For example: Copying a large file from AWS S3
to R
.
library(s3fs)
library(future)
plan("multisession")
s3_file_copy_async("s3://mybucket/multipart/large_file.csv", "large_file.csv")
fs
has a straight forward API with 4 core themes:
path_
for manipulating and constructing pathsfile_
for filesdir_
for directorieslink_
for links
s3fs
follows theses themes with the following:
s3_path_
for manipulating and constructing s3 uri pathss3_file_
for s3 filess3_dir_
for s3 directories
NOTE: link_
is currently not supported.
library(s3fs)
# Construct a path to a file with `path()`
s3_path("foo", "bar", letters[1:3], ext = "txt")
#> [1] "s3://foo/bar/a.txt" "s3://foo/bar/b.txt" "s3://foo/bar/c.txt"
# list buckets
s3_dir_ls()
#> [1] "s3://MyBucket1"
#> [2] "s3://MyBucket2"
#> [3] "s3://MyBucket3"
#> [4] "s3://MyBucket4"
#> [5] "s3://MyBucket5"
# list files in bucket
s3_dir_ls("s3://MyBucket5")
#> [1] "s3://MyBucket5/iris.json" "s3://MyBucket5/athena-query/"
#> [3] "s3://MyBucket5/data/" "s3://MyBucket5/default/"
#> [5] "s3://MyBucket5/iris/" "s3://MyBucket5/made-up/"
#> [7] "s3://MyBucket5/test_df/"
# create a new directory
tmp <- s3_dir_create(s3_file_temp(tmp_dir = "MyBucket5"))
tmp
#> [1] "s3://MyBucket5/filezwkcxx9q5562"
# create new files in that directory
s3_file_create(s3_path(tmp, "my-file.txt"))
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
s3_dir_ls(tmp)
#> [1] "s3://MyBucket5/filezwkcxx9q5562/my-file.txt"
# remove files from the directory
s3_file_delete(s3_path(tmp, "my-file.txt"))
s3_dir_ls(tmp)
#> character(0)
# remove the directory
s3_dir_delete(tmp)
Created on 2022-06-21 by the reprex package (v2.0.1)
Similar to fs
, s3fs
is designed to work well with the pipe.
library(s3fs)
paths <- s3_file_temp(tmp_dir = "MyBucket") |>
s3_dir_create() |>
s3_path(letters[1:5]) |>
s3_file_create()
paths
#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
paths |> s3_file_delete()
#> [1] "s3://MyBucket/fileazqpwujaydqg/a"
#> [2] "s3://MyBucket/fileazqpwujaydqg/b"
#> [3] "s3://MyBucket/fileazqpwujaydqg/c"
#> [4] "s3://MyBucket/fileazqpwujaydqg/d"
#> [5] "s3://MyBucket/fileazqpwujaydqg/e"
Created on 2022-06-22 by the reprex package (v2.0.1)
NOTE: all examples have be developed from fs
.
s3fs
allows you to connect to file systems that provides an
S3-compatible interface. For example, MinIO offers
high-performance, S3 compatible object storage. You will be able to
connect to your MinIO
server using s3fs::s3_file_system
:
library(s3fs)
s3_file_system(
aws_access_key_id = "minioadmin",
aws_secret_access_key = "minioadmin",
endpoint = "http://localhost:9000"
)
s3_dir_ls()
#> [1] ""
s3_bucket_create("s3://testbucket")
#> [1] "s3://testbucket"
# refresh cache
s3_dir_ls(refresh = T)
#> [1] "s3://testbucket"
s3_bucket_delete("s3://testbucket")
#> [1] "s3://testbucket"
# refresh cache
s3_dir_ls(refresh = T)
#> [1] ""
Created on 2022-12-14 with reprex v2.0.2
NOTE: if you to want change from AWS S3 to Minio in the same R
session, you will need to set the parameter refresh = TRUE
when
calling s3_file_system
again. You can use multiple sessions by using
the R6 class S3FileSystem
directly.
Please open a Github ticket raising any issues or feature requests.