# How to save and load R objects from the workspace bucket

Save intermediate work to R's native format for rapid loading.

<div class="alert alert-block alert-info">
<b>Tip:</b> By storing your RDA files in the workspace bucket, they are available to your workspace collaborators to load into their own notebooks!
</div>


See also [Notebooks 101 - How not to lose data output files or collaborator edits](https://broadinstitute.zendesk.com/hc/en-us/articles/360027300571-Notebooks-101-How-not-to-lose-data-output-files-or-collaborator-edits).

## Setup

In [None]:
library(lubridate)
library(tidyverse)

Get the Cloud Storage bucket associated with this workspace.

In [None]:
(WORKSPACE_BUCKET <- Sys.getenv('WORKSPACE_BUCKET'))

Create a timestamp for a folder of results generated today.

In [None]:
(TIMESTAMP <- strftime(now(), '%Y%m%d/%H%M%S'))

Get your username so that everyone can know who created the RDA file.

In [None]:
(OWNER_EMAIL <- Sys.getenv('OWNER_EMAIL'))

In [None]:
(RDA_FILENAME <- str_glue('thousand_genomes.rda'))

Assemble the destination path within the workspace bucket.

In [None]:
(DESTINATION <- str_glue('{WORKSPACE_BUCKET}/data/r-objects/{OWNER_EMAIL}/{TIMESTAMP}/{RDA_FILENAME}'))

## Read some data from Cloud Storage.
Let’s retrieve the sample information for [1000 Genomes](http://www.internationalgenome.org/data "1000 Genomes").

This approach uses `gsutil cat` to transfer the contents of the CSV file since we want to load the whole thing. 

If you instead want to load a subset of columns or a subset of rows, instead retrieve the data from BigQuery table [bigquery-public-data.human_genome_variants.1000_genomes_sample_info](https://bigquery.cloud.google.com/table/bigquery-public-data:human_genome_variants.1000_genomes_sample_info).

In [None]:
sample_info <- read_csv(pipe('gsutil cat gs://genomics-public-data/1000-genomes/other/sample_info/sample_info.csv'),
               guess_max = 5000)

## Save the object(s) to a local file.

In [None]:
save(sample_info, file = RDA_FILENAME)

## Transfer the file to the workspace bucket

Use `gsutil` to copy the file from your Jupyter harddrive to the workspace bucket.

In [None]:
system(str_glue('gsutil cp {RDA_FILENAME} {DESTINATION} 2>&1'), intern = TRUE)

## Now, load that object from the native format file in Cloud Storage

In [None]:
# The object exists in memory.
head(sample_info)

In [None]:
# Go ahead and delete it.
rm(sample_info)

In [None]:
# Okay, its gone.
head(sample_info)

In [None]:
load(pipe(str_glue('gsutil cat {DESTINATION}')))

In [None]:
# The object exists in memory again!
head(sample_info)

# Provenance

In [None]:
devtools::session_info()

Copyright 2018 The Broad Institute, Inc., Verily Life Sciences, LLC All rights reserved.

This software may be modified and distributed under the terms of the BSD license. See the LICENSE file for details.