# Seurat2SCP Notebook

## MANUAL ITERATION  (RECOMMENDED)
Easier to tweak and understand for the user.
You can also scroll down to run [AUTOMATIC ITERATION](#AUTOMATIC-ITERATION), see if and at which step it fails, and return here afterward to run individual chunks.

### SETUP
#### Load required libraries and the scp_save_seurat.R file

In [None]:
library(Seurat)
library(crunch)
library(data.table)
library(R.utils)
source("scp_save_seurat.R")

#### Edit paths to Seurat objects and the preferred output path

In [None]:
# Paths to .Rds/.Rdata seurat objects. 
# Paste path names in quotation marks on each line and delimit with ',' and a newline.
seurat.paths <- c(
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/allergen.RData",
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Split_Up/B_comb.RData"
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Split_Up/CD4_comb.RData"
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Objects/Week13.All.Seurat.Rdata",
                  "/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Objects/Week25.All.Seurat.Rdata"
                )
if (length(seurat.paths) == 0) stop("No paths were entered.")

# The output path to which the output SCP files will be sent.
# Entering '' refers to the directory in which the notebook is being run.
output.dir <- ''

# The ith path in seurat.paths, cannot be greater than seurat length.
# If a step has failed you can return here and set 'i' to the index number from which you wish to proceed!
i <- 1
if (i > length(seurat.paths)) stop(paste("i:", i, "cannot be greater than seurat.path length:", length(seurat.paths), sep=' '))
if (i < 1) stop("i cannot be less than 1.")

# Initialize an empty dataframe for merging the metadata from each object.
metadata.dfs = list()

print("Successfully initialized!")

### START OF LOOP
Conceptual equivalent:
```
for (i in seq(1:length(seurat.paths)){
...
```

#### Initialization

In [None]:
# Initialize the ith path
cur.path <- seurat.paths[i]
print(cur.path)

In [None]:
# Load the object
print("Loading object, this will probably take a long time...")
object.name <- load(cur.path)
print(paste("Object: ", object.name, ", Version", get(object.name)@version, sep=''))

In [None]:
# Update the Seurat object to the format of the newest version.
print("Updating object, this will also take some time...")
object = UpdateSeuratObject(get(object.name))
print(paste("Object", object.name, "updated to", object@version, sep=' '))

In [None]:
# Deallocate the old Seurat object to save space
rm(list=c(object.name))

In [None]:
# Prefix cell names with the object name
print(paste("Prefixing cell names with object name ", object.name, "...", sep=''))
RenameCells(object, add.cell.id = object.name)
print("Prefixing cells completed!")

#### Save the expression matrix

In [None]:
# Initialization
output.prefix <- paste(output.dir, "SCP", sep='')
expr.filename <- paste(output.prefix, "_norm_expression.txt", sep='')

In [None]:
# Add the gene column to the expression data and save as a dataframe
print("Adding gene column and saving as dataframe, this will take a long time...")
source("scp_save_seurat.R")
expr.df <- add_gene_column(object@assays$RNA@data, object.name) 
print("Finished saving as a dataframe!")

**PLEASE WAIT FOR THE ABOVE CHUNK TO COMPLETE BEFORE PROCEEDING.**

*Next chunk: write the expression matrix as a .txt.gz file to output.dir*
*TWO METHODS: EITHER write.csv.gz() OR fwrite() and gzip()*
*Latter method seems faster, but comment out the one you do not prefer*

In [None]:
#print("Writing and compressing expression matrix to .txt.gz file...")
#write.csv.gz(x=expr.df, file=expr.filename, quote=FALSE, sep='\t', col.names=TRUE) 
#print("Finished writing and compression expression matrix to .txt.gz file!")

print("Writing expression matrix, this will take some time...")
fwrite(x=expr.df, file=expr.filename, quote=FALSE, sep='\t', col.names=TRUE)
print("Compressing expression matrix to .txt.gz file...")
gzip(expr.filename, destname=paste(expr.filename, ".gz", sep=''))
print("Finished compression of expression matrix to .txt.gz file!")

#### Save the cluster files
*Change the below parameters if the column names of your dimensionality reduction are not the ones used below.*

In [None]:
# Print the names of column headers and look for your X and Y dimensionality reduction parameter names!
colnames(object@meta.data)

In [None]:
# Edit your dimensional reductional parameters if the column names of your dimensionality reduction are not the ones used below!
dim.red.type = "umap" # Can be any string. Preferably name of dimensionality reduction type.
X.name <- "X_umap1" # Needs to be the object parameter found in object@meta.data
Y.name <- "X_umap2" # Needs to be the object parameter found in object@meta.data

In [None]:
# Save the cluster file using the parameters entered above
output.prefix <- paste(output.dir, "SCP", sep='')
cluster.file.prefix <- paste(output.prefix, object.name, dim.red.type, sep='_')
print("Saving cluster file...")
save_cluster_file(object@meta.data, X.name, Y.name, object.name, cluster.file.prefix)
print("Cluster file saved!")

#### Merge the metadata into the dataframe

In [None]:
# Merge metadata
metadata.df = data.frame(CELLS=paste(object.name, rownames(object@meta.data), sep='_'), object@meta.data)
metadata.dfs[[i]] <- metadata.df
rm(metadata.df)

### END OF LOOP, DEALLOCATE OBJECT AND INCREMENT `i`

In [None]:
# Deallocate the updated Seurat object to save space!
rm(expr.df)
rm(object)

In [None]:
# Increment i or terminate
if (i < length(seurat.paths)){
    i <- i + 1
    cur.path <- seurat.paths[i]
    print(cur.path)
    print("Return to the START chunk using the link below!")
} else {
    print("Loop completed! No paths remain! Proceed to save merged metadata in the specific section below!")
}

### [CLICK HERE TO RETURN TO THE START OF THE LOOP!](#START-OF-LOOP)


#### AFTER LOOPING: Save the merged metadata dataframe as a rough .txt file
This .txt file does not adhere to SCP format, another notebook must be used to format this notebook as such.

In [None]:
merged.metadata.df = Reduce(function(x, y) merge(x, y, all=TRUE), metadata.dfs)
write.table("merged_metadata.txt", quote=FALSE, sep='\t', x=merged.metadata.df, row.names=FALSE)                        

## AUTOMATIC ITERATION
Very prone to failure, 0/10 would not recommended.

### Setup

In [None]:
# Remove all stored data
rm(list=ls())

In [None]:
# Load libraries and the scp_save_seurat.R file
library(Seurat)
library(crunch)
library(data.table)
library(R.utils)
source("scp_save_seurat.R")

In [None]:
# Paths to .Rds/.Rdata seurat objects. 
# Paste path names in quotation marks on each line and delimit with ',' and a newline.
seurat.paths <- c(
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/allergen.RData",
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Split_Up/B_comb.RData"
                  #"/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Split_Up/CD4_comb.RData"
                  "/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Objects/Week13.All.Seurat.Rdata",
                  "/Users/jggatter/Desktop/Alexandria/alexandria_repository/uploadHelpers/Objects/Week25.All.Seurat.Rdata"
                )
if (length(seurat.paths) == 0) stop("No paths were entered.")

# Enter the types of dimensionality reduction found in each object.
dim.red.types <-   c(
                    "umap"
                    )
if (length(dim.red.types) == 0) stop("No dimensionality reduction types were entered.")

# For each dim red type, enter their respective X parameter found in object@meta.data
X.dim.red.names <- c(
                    "X_umap1"
                    )
if (length(X.dim.red.names) != length(dim.red.types)) stop("Insufficient X names were entered.")

# For each dim red type, enter their respective Y parameter found in object@meta.data
Y.dim.red.names <- c(
                    "X_umap2"
                    )
if (length(Y.dim.red.names) != length(dim.red.types)) stop("Insufficient Y names were entered.")

# The output path to which the output SCP files will be sent.
# Entering '' refers to the directory in which the notebook is being run.
output.dir <- ''

# The ith path in seurat.paths, cannot be greater than seurat length.
# If a step has failed you can return here and set 'i' to the index number from which you wish to proceed!
i <- 1
if (i > length(seurat.paths)) stop(paste("i:", i, "cannot be greater than seurat.path length:", length(seurat.paths), sep=' '))
if (i < 1) stop("i cannot be less than 1.")

print("Successfully initialized!")

### Run the process automatically

In [None]:
metadata.dfs = list()
output.prefix <- paste(output.dir, "SCP", sep='')
for (i in seq(from=1, to=length(seurat.paths), by=1)){
    # Initialize the ith path
    cur.path <- seurat.paths[i]
    print(cur.path)

    # Load the object
    print("Loading object, this will probably take a long time...")
    object.name <- load(cur.path)
    print(paste("Object: ", object.name, ", Version", get(object.name)@version, sep=''))

    # Update the Seurat object to the format of the newest version.
    print("Updating object, this will also take some time...")
    object = UpdateSeuratObject(get(object.name))
    print(paste("Object", object.name, "updated to", object@version, sep=' '))

    # Deallocate the old Seurat object to save space
    rm(list=c(object.name))

    # Prefix cell names with the object name
    print(paste("Prefixing cell names with object name ", object.name, "...", sep=''))
    RenameCells(object, add.cell.id = object.name)
    print("Prefixing cells completed!")

    # Add the gene column to the expression data and save as a dataframe
    print("Adding gene column and saving as dataframe, this will take a long time...")
    source("scp_save_seurat.R")
    expr.df <- add_gene_column(object@assays$RNA@data, object.name) 
    expr.filename <- paste(output.prefix, "_norm_expression.txt", sep='')
    print("Finished saving as a dataframe!")

    # METHOD 1: Write and compress the expression matrix as a .txt.gz file 
    #print("Writing and compressing expression matrix to .txt.gz file...")
    #write.csv.gz(x=expr.df, file=expr.filename, quote=FALSE, sep='\t', col.names=TRUE) 
    #print("Finished writing and compression expression matrix to .txt.gz file!")

    # METHOD 2: Write and compress the expression matrix as a .txt.gz file 
    print("Writing expression matrix, this will take some time...")
    fwrite(x=expr.df, file=expr.filename, quote=FALSE, sep='\t', col.names=TRUE)
    print("Compressing expression matrix to .txt.gz file...")
    gzip(expr.filename, destname=paste(expr.filename, ".gz", sep=''))
    print("Finished compression of expression matrix to .txt.gz file!")

    # Save the cluster file using the parameters entered in the setup chunk
    for (j in seq(from=1, to=length(dim.red.types), by=1)){
        dim.red.type = dim.red.types[j] # Can be any string. Preferably name of dimensionality reduction type.
        X.name <- X.dim.red.names[j] # Needs to be the object parameter found in object@meta.data
        Y.name <- Y.dim.red.names[j] # Needs to be the object parameter found in object@meta.data
        # Save the cluster file using the parameters entered above
        cluster.file.prefix <- paste(output.prefix, object.name, dim.red.type, sep='_')
        print("Saving cluster file...")
        save_cluster_file(object@meta.data, X.name, Y.name, object.name, cluster.file.prefix)
        print("Cluster file saved!")
    }
    
    # Merge metadata
    metadata.df = data.frame(CELLS=paste(object.name, rownames(object@meta.data), sep='_'), object@meta.data)
    metadata.dfs[[i]] <- metadata.df

    # Deallocate the updated Seurat object to save space!
    rm(metadata.df)
    rm(expr.df)
    rm(object)
}

# Merge metadata dataframes and write them to a tab-delimited text file
merged.metadata.df = Reduce(function(x, y) merge(x, y, all=TRUE), metadata.dfs)
write.table("merged_metadata.txt", quote=FALSE, sep='\t', x=merged.metadata.df, row.names=FALSE)

*This will take a very long time to run so please be patient, keep the notebook running while you await your results!*