# Updating fragment file paths in a Seurat object
Whenever you transfer, download or move fragment to a location that was different than the original analysis, you need to update the fragment file paths in the Seurat object.

In this demonstration,we have a Seurat object that contains an ATAC assay, with a fragment object that contains paths to 27 fragment files. The object was created and analyzed on TSCC, but I need to to work with it on NRNB, so we need transfer the fragment files over and update the paths. I already transferred the files, so let's update the paths!

# Load in the Seurat object

In [None]:
# Read in Seurat object
adata <- readRDS(rds_path)
adata

# Choose the assay in the Seurat object where

In [None]:
# Make sure the assay is set to the ATAC assay you want to update
DefaultAssay(adata) <- "mpeak"

# Get the fragment objects out of the Seurat object

In [None]:
frags <- Fragments(adata)  # get list of fragment objects

# Find an identifier for each current fragment file
Next you need to get some kind of ordering of identifiers out of the object. This will vary in degree of difficulty depending on how your files are set-up. If there already a natural ordering of fragment files in the Seruat object, you can effectively skip this step. You will need to match the ordering of the new locations.

Often times, an identifier is contained within the fragment file path. This is the case for this demonstration with file paths looking like this:

"/nfs/lab/projects/igvf/data/multiome/DM041_multi/DM45A_72h_control/atac_fragments.tsv.gz"

We will extract "DM45A" from this path. Doing this for each path gives us an ordering of fragment files to line the new locations up to

In [None]:
# Need to get the proper ordering of fragments from the actual object
for (i in seq_along(frags)) {
    sample <- toupper(strsplit(frags[[i]]@path, "/")[[1]][9])
    if (substr(sample, 1, 1) == "I") {
        sample <- toupper(strsplit(sample, "_")[[1]][2])
    }
    else {
        sample <- toupper(strsplit(sample, "_")[[1]][1])
    }
    obj_sample_names = c(obj_sample_names, sample)
}
obj_sample_names

# Order the new fragment file paths to match
Now you need to get the paths to all your fragment files in there updated locations and set their ordering to match what was extracted above. 

For this example, we are working with fragment files in CellRanger output directories structures like this:

"cellar/users/aklie/data/igvf/beta_cell_networks/cellranger/igvf_sc-islet_10X-Multiome/igvf_dm45a_deep/outs/atac_fragments.tsv.gz"

Here, we will get a list of fragment files and extract the sample identifier again. We then set the name of each path to be the sample ID so that the list can be easily sorted based on ordering we got above. Again, if you have a natural ordering already (e.g. lexicographically), this is more straightforward.

In [None]:
# TODO: Get the sample specific directories for the fragment files
frag_dir <- file.path("/cellar/users/aklie/data/igvf/beta_cell_networks/cellranger", dataset_name)
sample_dirs <- list.files(path = frag_dir)
sample_dirs

In [None]:
# Grab a mapping of sample IDs to fragment files locally
frag_files <- list()
for (sample_dir in sample_dirs) {
    sample <- toupper(strsplit(sample_dir, "_")[[1]][2])
    frag_file <- list.files(path = file.path(frag_dir, sample_dir, "outs"), pattern = "*fragments.tsv.gz$", full.names = TRUE)
    frag_files[[sample]] <- frag_file
}
head(frag_files)

In [None]:
# Order the local frag_files list by the order of the object
new_frag_files <- frag_files[obj_sample_names]
head(new_frag_files)

# Update the Fragment object paths
Now that we have the new fragment paths ordered properly, we need to update to them in the Fragment object.

Signac actually does a really good thing here and computes an MD5 hash using the new path to compare to the hash stored in the Fragment object to verify that the files are the same.

In [None]:
# Update each fragment path in the object with the new local information
Fragments(adata) <- NULL  # remove fragment information from assay
for (i in seq_along(frags)) {
  frags[[i]] <- UpdatePath(frags[[i]], new.path = new_frag_files[[i]]) # update path
}

In [None]:
# Double check that the paths are updated
for (i in seq_along(frags)) {
  print(frags[[i]]@path)
}

# Perform some sanity checks
It's always a good idea to make sure the update went as expected. 

Verifying this will likely be dataset dependent. Here we are going to compare how many cells in the object per sample to the number of cells that the Fragment object has tracked for each file

In [None]:
# Take a look at how many cells are from each fragment file in the Fragment object
samples <- c()
for (i in seq_along(frags)) {
    sample <- strsplit(Cells(frags[[i]])[1], "_")[[1]][1]
    # Add an "A" to the end if sample starts with "D"
    if (substr(sample, 1, 1) == "D") {
        sample <- paste(sample, "A", sep = "")
    }
    print(paste(sample, length(Cells(frags[[i]])), sep = ": "))
    samples <- c(samples, sample)
}

In [None]:
# Compare this to the metadata counts, they should match up
table(adata$sample)[samples]

# Update and save the Seurat object

In [None]:
# assign updated list back to the object
Fragments(adata) <- frags 
Fragments(adata)

In [None]:
# Save a copy of this bad boy
saveRDS(adata, file.path(seurat_dir, dataset_name, "25Aug23", "25Aug23_all.cells.rds"))

# Do some other analyses!