# R analysis notebook template

Created by: `<First and last name>` (`<email address>`)

# Overview

Provide a general description here of the purpose of the notebook. If this notebook includes one iteration of several for a given analysis type, provide details here regarding how this iteration differs from others (e.g., data source, data transformations, model fitting method, etc.).

**Note on notebook naming:** Prepend all notebook filenames with the date stamp `yyyymmdd_`. The notebook title should include the numeric ID for analysis type and iteration. For example, a specific iteration `0021` of a differential gene expression analysis `0005` may have a notebook file `20230131_dge_analysis.ipynb` with the title _DGE analysis `0005` iteration `0021`_. If planning to create several notebooks within the same subdirectory for sequential processing/analysis, append a numeric ordering prefix. For example, `20230131_dge_analysis.ipynb` would become `1_20230131_dge_analysis.ipynb` if it were the first notebook of several within the same subdirectory. 

**Note on code styling:** Follow the [tidyverse style guide](https://style.tidyverse.org/) for code styling.

**Note on results file naming:** For any tables, plots, or serialized objects output from this notebook, use a date stamp suffix on filenames of the format `_yyyymmdd.<file extension>`.

----

# Environment setup

Docker image: `<image name and tag>` built from `src/docker/<image name>/Dockerfile`.

In [None]:
initial_wd <- getwd()
initial_wd

In [None]:
setwd("/<repo top level dir name>") #set working dir to the repo top level
getwd()

In [None]:
# Include here all the packages that are needed for the analysis
library_list <- c(
  ""
)
for (package in library_list) {
  library(package, character.only = T, quietly = T)
}

In [None]:
options(stringsAsFactors = F)
options(repr.plot.res = 0.25) #plot resolution can have a big impact on notebook size

In [None]:
# Set dir paths for results from this analysis. Makes it easier to specify output paths
# throughout the notebook
results_dir <- ""
figure_dir <- file.path(results_dir, "figures")
print(figure_dir)
table_dir <- file.path(results_dir, "tables")
print(table_dir)
serialized_dir <- file.path(results_dir, "serialized")
print(serialized_dir)
s3_dir <- "s3://bucket/path/" #URI to repo top level dir on S3
print(s3_dir)

In [None]:
# Include here any reusable functions that may not (yet) warrant being included as a generalized
# function in src/r/. Follow roxygen2 syntax for code documentation for functions:
# https://roxygen2.r-lib.org/articles/rd.html
# A template function is included here for your convenience.

#' @title
#' Function title
#'
#' @description
#' Brief function description.
#'
#' @details
#' Function details.
#'
#' @param bar A <variable type>. Description of the function argument.
#'
#' @returns Description of what the function returns.
#'
#' @examples
#' Put executable R code here that demonstrates how the function works.
#' Code must run without error.
foo <- function(bar) {
  return NULL
}

In [None]:
# Include here any global variables for easy reference throughout the notebook. Examples include
# color palettes, plot specifications, variable thresholds/cutoffs, etc.

----

# Data loading

## AWS sync data files

Calls to AWS sync ensures that local copies of processed data files (i.e., not in `data/raw`) align with the copy in the AWS S3 repository storage directory.

In [None]:
# Template for building calls to aws s3 sync
cmd_prefix <- "aws s3 sync"
cmd_suffix_list <- list(
  c(
    file.path(s3_dir, '<S3 subdirectory path>'),
    '<local path to store file>', #start from repo top level
    '--exclude="*"', #use if syncing a specific file
    '--include="<wildcard expression>"' #use if syncing a specific file
  )
)
cat("\nExecute the following from the repo top level in a local terminal with AWS credentials configured:\n\n")
dev_null <- lapply(
  cmd_suffix_list,
  function(cmd_suffix) {
    cmd <- paste0(c(cmd_prefix, cmd_suffix), collapse = " ")
    cat(cmd, "\n")
    return(NULL)
  }
)

Paste printed output from above cell into the Markdown code block below. Example that syncs a GENCODE SQLite file:

```
Execute the following from the repo top level in a local terminal with AWS credentials configured:

aws s3 sync s3://rti-hiv/scratch/git_repo/gnetii_supplement/data/processed/annotation/gencode/ data/processed/annotation/gencode/ --exclude="*" --include="gencode_v34*sqlite" 
```

## Title for data type 1

In [None]:
# Data loading code for data files

## Title for data type 2

In [None]:
# Data loading code for data files

---

# Title for analysis section 1

## Title for analysis section 1 subsection 1

In [None]:
# code block

In [None]:
# Template for outputting results
outfile <- file.path(results_dir, "filename_yyyymmdd.extension")
print(outfile)

write.table(file = outfile, ...) #for plain text tables

saveRDS(file = outfile, ...) #for serialized objects

# For support vector graphic as PDF
pdf(file = outfile, width = 5, height = 5)
# plot calling function here
dev.off()
IRdisplay::display_pdf(file = outfile, width = 500) # view graphic as cell output

# For graphic as PNG
png(filename = outfile, width = 5, height = 5, units = "in", res = 150)
# plot calling function here
dev.off()
IRdisplay::display_png(file = outfile, width = 500) # view graphic as cell output

---

# Title for analysis section 2

## Title for analysis section 2 subsection 1

In [None]:
# code block

---

# Sync new files to AWS S3

Any new data files, serialized results, or tables that were generated as part of this analysis that may be needed for other analyses/reporting should be synced to the GitHub repo directory storage space on AWS S3.

In [None]:
cmd_prefix <- "aws s3 sync"
cmd_suffix_list <- list(
  c(
    "<path to local directory for file>", #path should be relative to repo top level
    file.path(s3_dir, '<S3 subdirectory path>'),
    '--exclude="*"',
    '--include="<filename>"'
  )
)
cat("\nExecute the following from the repo top level in a local terminal with AWS credentials configured:\n\n")
dev_null <- lapply(
  cmd_suffix_list,
  function(cmd_suffix) {
    cmd <- paste0(c(cmd_prefix, cmd_suffix), collapse = " ")
    cat(cmd, "\n")
    return(NULL)
  }
)

Paste printed output from above cell into the Markdown code block below. Example that syncs a local file to S3:

```
Execute the following from the repo top level in a local terminal with AWS credentials configured:

aws s3 sync data/processed/dataset_0001/0001/rna_expression/sequencing/cd4_t_cells/ s3://rti-hiv/scratch/git_repo/gnetii_supplement/data/processed/dataset_0001/0001/rna_expression/sequencing/cd4_t_cells/ --exclude="*" --include="txi_dtuscaledtpm_data_20230810.rds" 
```

---

# Session info

In [None]:
sessionInfo()

```
Paste sessionInfo() output here since code cell output gets cleared on git commit
```