Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling MapDataDataToCodes produces long vectors error #54

Open
jonhsussman opened this issue Sep 21, 2022 · 8 comments
Open

Calling MapDataDataToCodes produces long vectors error #54

jonhsussman opened this issue Sep 21, 2022 · 8 comments

Comments

@jonhsussman
Copy link

Hello,

I am running FlowSOM:::MapDataToCodes(someWeights, as.matrix(fovPixelData)) which produces the following error:

Error in FlowSOM:::MapDataToCodes(someWeights, as.matrix(fovPixelData)) : 
  long vectors (argument 1) are not supported in .C

someWeights is a 100 x 10 numerical matrix, and fovPixel data is a large file (297923080 obs. of 10 variables). fovPixel is produced by reading a .feather file as data.table(arrow::read_feather(file))

Do you have any thoughts on what is causing this error?

Thanks!

@SamGG
Copy link
Contributor

SamGG commented Sep 21, 2022

Did you try as.matrix(someWeights)?
Did you check that both someWeights and fovPixelData have colnames() set to some values? As a reminder, the call to C function in charge of mapping is using the colnames of the codes to select the columns of the newdata matrix.
@SofieVG I think we already discussed this at some point, but no colnames check is currently implemented.

FlowSOM/R/2_buildSOM.R

Lines 246 to 259 in 31bf74c

MapDataToCodes <- function (codes, newdata, distf = 2) {
nnCodes <- .C("C_mapDataToCodes",
as.double(newdata[, colnames(codes)]),
as.double(codes),
as.integer(nrow(codes)),
as.integer(nrow(newdata)),
as.integer(ncol(codes)),
nnCodes = integer(nrow(newdata)),
nnDists = double(nrow(newdata)),
distf = as.integer(distf))
return(cbind(nnCodes$nnCodes, nnCodes$nnDists))
}

@jonhsussman
Copy link
Author

Thanks for your reply.

I just tried as.matrix(someWeights) and also checked the colnames() of someWeights and fovPixelData and confirmed that they are both set to text values and are equivalent to each other. But unfortunately I still am encountering the same error.

@SamGG
Copy link
Contributor

SamGG commented Sep 21, 2022

typeof(someWeights) and class(someWeights)?

@jonhsussman
Copy link
Author

See below:

image

@jonhsussman
Copy link
Author

jonhsussman commented Sep 21, 2022

Of note, when I reduce the fovPixelData to a much smaller amount just as trial: fovPixelData_less <- fovPixelData[1:1000000, ] then I no longer encounter this error. Additionally, this runs through with a an example data set of comparatively very small images. So I am worried that it is the case that it is simply larger sizes of files create an issue with the C code at this step. But I am not certain.

@SamGG
Copy link
Contributor

SamGG commented Sep 21, 2022

The type sounds OK. In fact, I missed that the 1st argument of the C call is newdata, not codes. So as you clearly showed it, this is a size depending problem. I think that long vectors are used when the matrix is becoming too large to be indexed by a classical integer. The easiest workaround is to split the newdata and mapdatatocode by block. Here is a code I use typically.

# result from FlowSOM, 1 representative point per group
codes = as.matrix(iris[c(25,75,125),-5])
# data to map
nwdata = as.matrix(iris[,-5])

# map by block
block_size = 20  # to be defined, test 20 and 50
result = matrix(0.0, nrow(nwdata), 2)  # MapDataToCodes returns 2 columns
block_end = nrow(nwdata)
for (block_i in 0:((block_end-1) %/% block_size)) {
  i_start = 1 + block_i*block_size
  i_end = min((block_i+1)*block_size, block_end)
  cat(i_start, i_end, "\n")
  result[i_start:i_end,] = FlowSOM:::MapDataToCodes(codes, nwdata[i_start:i_end,])
}
#> 1 20 
#> 21 40 
#> 41 60 
#> 61 80 
#> 81 100 
#> 101 120 
#> 121 140 
#> 141 150

# for fun
table(result[,1], iris[,5])
#>    
#>     setosa versicolor virginica
#>   1     50          1         0
#>   2      0         48        13
#>   3      0          1        37

Created on 2022-09-21 by the reprex package (v2.0.1)

@jonhsussman
Copy link
Author

Thanks for this very excellent solution! This works perfectly to solve the problem and runs very efficiently, and has enabled us to use the package.

@jonhsussman
Copy link
Author

jonhsussman commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants