# **Prepare the input for VID**
#### **Two options for VID input:**
##### A. h5ad file (.h5ad)
##### B. gene expression table (.csv) + metadata table (.csv)

#### **Install packages: Seurat, SeuratDisk**

In [None]:
# if (!requireNamespace("remotes", quietly = TRUE)) {
#   install.packages("remotes")
# }
# remotes::install_github("satijalab/seurat", ref = "v4.4.0")
# remotes::install_github("mojaveazure/seurat-disk")

In [23]:
library(Seurat)
library(SeuratDisk)

### **Load seurat object**

In [2]:
seurat_obj_dir <- '' # seurat object directory

In [3]:
seurat_obj <- readRDS(seurat_obj_dir) # load seurat object 

### **A. If your Seurat object version is not higher than V5, convert to h5ad file with the code below**

#### *seurat object (.rds) -> h5seurat (.h5seurat) -> h5ad (.h5ad)* 

In [14]:
h5seurat_dir <- '' # h5seurat file directory
h5ad_dir <- '' # h5ad file directory (VID input)

In [13]:
SaveH5Seurat(seurat_obj, filename = h5seurat_dir) # convert and save seurat object to h5seurat file
Convert(h5ad_dir, dest = "h5ad") # convert h5seurat file to h5ad file

Creating h5Seurat file for version 3.1.5.9900

Adding counts for RNA

Adding data for RNA

Adding scale.data for RNA

Adding variable features for RNA

Adding feature-level metadata for RNA

Adding cell embeddings for pca

Adding loadings for pca

No projected loadings for pca

Adding standard deviations for pca

No JackStraw data for pca

Adding cell embeddings for umap

No loadings for umap

No projected loadings for umap

No standard deviations for umap

No JackStraw data for umap

Adding cell embeddings for harmony

Adding loadings for harmony

Adding projected loadings for harmony

Adding standard deviations for harmony

No JackStraw data for harmony

Adding cell embeddings for tsne

No loadings for tsne

No projected loadings for tsne

No standard deviations for tsne

No JackStraw data for tsne

Validating h5Seurat file

Adding scale.data from RNA as X

Transfering meta.features to var

Adding data from RNA as raw

Transfering meta.features to raw/var

Transfering meta.data to ob

### **B. Extract the gene expression and meta data as two csv tables from seurat object(V5)**
#### *seurat object (.rds) -> gene expression matrix (.csv) +  metadata(.csv)*

###### Extract the metadata

In [4]:
meta_data <- seurat_obj@meta.data # extract meta-data from seurat object
meta_path <- '' # metadata table directory
write.csv(meta_data, file = meta_path, row.names = TRUE) # save

#### Extract the log normalized gene expression matrix 

In [15]:
output_file <- '' # gene expression table output directory

In [22]:
log_normalized_data <- GetAssayData(seurat_obj, assay = "RNA", layer = "data") # Access the log-normalized data

In [20]:
# save the gene expression table directly if sample size is not too big, otherwise, the table will be saved with chunk
result <- tryCatch({
  # Code that might throw an error
    write.csv(log_normalized_data, file = output_file, row.names = TRUE)
}, warning = function(w) {
    # Handle warnings
    cat("A warning occurred: ", w$message, "\n")
    list(success = FALSE, data = NULL)
}, error = function(e) {
  # Handle errors
    cat("An error occurred: ", e$message, "\n")
    list(success = FALSE, data = NULL)
}, finally = {
    # Code to execute regardless of error
    # Define the chunk size
    chunk_size <- 1000
    
    # First chunk: write with header
    chunk <- log_normalized_data[, 1:chunk_size]
    chunk_df <- as.data.frame(as.matrix(chunk))
    fwrite(chunk_df, file = output_file, row.names = TRUE)
    
    # Subsequent chunks: append without header
    for (i in seq(chunk_size + 1, ncol(log_normalized_data), by = chunk_size)) {
      chunk <- log_normalized_data[, i:min(i + chunk_size - 1, ncol(log_normalized_data))]
      chunk_df <- as.data.frame(as.matrix(chunk))
      fwrite(chunk_df, file = output_file, row.names = TRUE, append = TRUE, col.names = FALSE)
    }
    cat("Execution completed\n")
})

Execution completed


In [21]:
# Check if the operation was successful
if (result$success) {
  print("Gene expression table saved.")
} else {
  print("Sample size is too big to save directly, gene expression table has been saved with chunks.")
}

[1] "Sample size is too big to save directly, gene expression table has been saved with chunks."


###### ***If both methods failed, please consider downgrade the seurat to V4.***