You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The current calc_dist_mat constructs a distance matrix between all pairs of cells. The 1st cell will be placed in the 0th, row, and so on, up to the nth cell. This will produce a dist_mat with nxn.
The functions that index into the dist_mat to get distances between cells, for example compute_closenum, use the value of the label to index into the appropriate row. However, this assumes that the cells are sequentially labeled with no missing values. This is not always the case.
Expected behavior
The benefits of doing it this way is that the distance matrix only has to be constructed once. Then, if someone is doing anlyses on a subset of the cells, for example only immune cells, the same total distance matrix can be used, since the indexing still works.
One option would be to sequentially relabel all cells prior to analysis to avoid this error. The other option would be to require people to regenerate a new distance matrix if they want to analyze a subset of their cells.
I know @alex-l-kong ran into some annoyances with constructing a fake dist_matrix because of how it expected cells to be labeled. Is this related? Or switching this wouldn't have impacted anything? Tagging @vacuousplanet just to stand out above the sea of other issues I just added.
The text was updated successfully, but these errors were encountered:
Yeah this issue is similar to the one I had before. What ends up happening is that the call to regionprops in generate_dist_matrix ends up ordering the centroids in increasing order by x-coords (in the case of ties, ascending y-coord). Unfortunately, this means we end up losing the desired order of cell labels.
As you mention, relabeling the cells is an option, and IMO the easiest, although we could run into problems if the user insists on cells having specific labels. I think regenerating a new distance matrix every time would be a bit cumbersome.
Okay, we're going to modify this so that the xarray label of the coordinate is the cell_label. Then we can index using xarray.loc. This means that even as the distance matrix gets subset for different functions, the labels will remain attached to the correct row.
Describe the bug
The current
calc_dist_mat
constructs a distance matrix between all pairs of cells. The 1st cell will be placed in the 0th, row, and so on, up to the nth cell. This will produce a dist_mat with nxn.The functions that index into the dist_mat to get distances between cells, for example compute_closenum, use the value of the label to index into the appropriate row. However, this assumes that the cells are sequentially labeled with no missing values. This is not always the case.
Expected behavior
The benefits of doing it this way is that the distance matrix only has to be constructed once. Then, if someone is doing anlyses on a subset of the cells, for example only immune cells, the same total distance matrix can be used, since the indexing still works.
One option would be to sequentially relabel all cells prior to analysis to avoid this error. The other option would be to require people to regenerate a new distance matrix if they want to analyze a subset of their cells.
I know @alex-l-kong ran into some annoyances with constructing a fake dist_matrix because of how it expected cells to be labeled. Is this related? Or switching this wouldn't have impacted anything? Tagging @vacuousplanet just to stand out above the sea of other issues I just added.
The text was updated successfully, but these errors were encountered: