Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak for single-cell resolution spatial edgelist construction #21

Open
paularstrpo opened this issue Oct 10, 2023 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@paularstrpo
Copy link

Hi, Thank you for writing such a wonderful & useful tool!

I am currently using this for my MERFISH data; and I noticed that with even one sample of 30,000 cells; the function for constructing the edgelist causes a memory leak since it seems to first calculate all possible distances between all cells for the spatial edgelist before filtering for the k closest neighbors. For single-cell resolution spatial data, this can quickly get out of hand; in my case, my 64G RAM was no match.

I have made a slower, but more RAM-friendly work-around by hacking together a version of the edgelist construction function that finds the neighboring cells iteratively and only retains the k closest neighbor cells as it goes.

I'm including a snippet of it here (with added option for parallelization with future), since I hope it may be a useful addition to helping to solve this leak for others using your software.

# example for 10 neighbors
# meta is a data.frame with your cell metadata information
# spatial_1 and spatial_2 are column names inside the meta data frame for the x and y coordinates
get_niche_neighbors <- function(meta, spatial_1='spatial_1', spatial_2='spatial_2', k = 10) {
  library(pblapply)
  library(future)

  df <- data.frame(x = meta[,spatial_1], y = meta[, spatial_2])
  rownames(df) <- rownames(meta)

  edgelist <- pblapply(X = rownames(df),
                       cl = "future",
                       FUN = function(cell) {
    # only keep the k closest cells for each iteration; don't let it save the rest
    data.frame(
      from = cell,
      to = names(sort(setNames(sqrt(abs(df[cell, "x"] - df$x)^2 + abs(df[cell, "y"] - df$y)^2), rownames(df)))[1:(k + 1)])
      )
  })
  edgelist <- do.call(rbind, edgelist)

  return(edgelist)
}

Sorry if this would be better placed as a pull request; I am honestly not much of a developer!

@jcyang34 jcyang34 added the enhancement New feature or request label Nov 9, 2023
@jcyang34 jcyang34 self-assigned this Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants