-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problems with fcs 3.1 files exported from Macsquantify #75
Comments
@AndyinMission |
Sorry but unfortunately I've not been able to learn more about this. The vendor hasn't been responsive to technical questions about how they're encoding and exporting fcs files and it's not clear to me what the issue may be as it relates to the 'missing' data for some of the -H parameters in the 3.1 files when opened in flowCore. |
@gfinak - Searching for other solutions, I find that you are poised with tests/testthat/IO-testSuite.R - presumably once you figure out the root cause - is that right? |
@AndyinMission Hi, could you give explicitly 2 "-H" parameters that are zero and attach the corresponding plot? |
@SamGG I think all of the BTW we have an FCS file validator that runs in your browser here: https://primitybio.github.io/fcs-validator/ (source: https://github.com/primitybio/fcs-validator). |
Somewhat lurking here. I've read in FCS spec that the choice of delimiter character is not fixed. And I've witnessed that it varies from vendor to vendor. And I've further seen software that does not respect this and assumes that the delimiter is '/'. Or assumes that it is '*'. Could this be part of underlying issue here? |
Sure, I see now it's not only limited to the -H parameters.. I've attached the results of running autoplot() on all three versions of the exported macsquant fcs files in R. On the right-hand side (fcs 3.1 version) we can see that these parameters don't seem to match the 3.0 version, and some in the compatible version also seem wrong: FL1-W Edit - added more illustrative image where rows/cols match. |
@AndyinMission Thanks for providing more information as it is always valuable to reporduce and track any problem. It is not because the 3.0 FCS is ok that the 3.1 is. May be should you stick on the 3.0 FCS. library(flowCore)
(mydir = "c:/Users/sampgg/Downloads/MiltenyiFCS/")
#> [1] "c:/Users/sampgg/Downloads/MiltenyiFCS/"
# original file
ff31 = read.FCS(file.path(mydir, "1696_12017-04-30.0001_3p1.fcs"), emptyValue = FALSE)
#> uneven number of tokens: 679
#> The last keyword is dropped.
#> uneven number of tokens: 679
#> The last keyword is dropped.
keyword(ff31)[["$SPILLOVER"]]
#> FL1-A FL1-H FL2-A FL2-H FL3-A FL3-H FL4-A FL4-H FL5-A FL5-H
#> [1,] 1 0 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0 0 0
#> [4,] 0 1 0 0 0 0 0 0 0 0
#> [5,] 0 0 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 1 0 0
#> [7,] 0 0 0 0 0 0 0 0 1 0
#> [8,] 0 0 0 1 0 0 0 0 0 0
#> [9,] 0 0 0 0 1 0 0 0 0 0
#> [10,] 0 0 0 0 0 0 0 0 0 1
rowSums(keyword(ff31)[["$SPILLOVER"]])
#> [1] 1 0 0 1 1 1 1 1 1 1
colSums(keyword(ff31)[["$SPILLOVER"]])
#> FL1-A FL1-H FL2-A FL2-H FL3-A FL3-H FL4-A FL4-H FL5-A FL5-H
#> 1 1 1 1 1 0 0 1 1 1
# modified version
# // replaced by __
# ENDXXX corrected -1
# SPILLOVER is diagonal
ff31mod = read.FCS(file.path(mydir, "1696_12017-04-30.0001_3p1_mod.fcs"), transformation = FALSE, truncate_max_range = FALSE, min.limit = NULL)
apply(exprs(ff31mod)[,c("FL1-W", "FL3-A", "FL4-W", "FL5-H")], 2,
quantile, c(0, 0.1, 0.5, 0.9, 1))
#> FL1-W FL3-A FL4-W FL5-H
#> 0% -20072.3301 0 -197586.6562 0
#> 10% 229.5142 0 259.2396 0
#> 50% 661.7385 0 520.5358 0
#> 90% 983.1834 0 705.4641 0
#> 100% 1805.7382 0 76805.0312 0
colnames(exprs(ff31mod))
#> $P1N $P2N $P3N $P4N $P5N $P6N $P7N $P8N
#> "Time" "HDR-CE" "HDR-SE" "HDR-V" "FSC-A" "FSC-H" "FSC-W" "SSC-A"
#> $P9N $P10N $P11N $P12N $P13N $P14N $P15N $P16N
#> "SSC-H" "SSC-W" "FL1-A" "FL1-H" "FL1-W" "FL2-A" "FL2-H" "FL2-W"
#> $P17N $P18N $P19N $P20N $P21N $P22N $P23N $P24N
#> "FL3-A" "FL3-H" "FL3-W" "FL4-A" "FL4-H" "FL4-W" "FL5-A" "FL5-H"
#> $P25N
#> "FL5-W"
# offset
592317 # beginning of data
#> [1] 592317
592317+16*4 # $P17
#> [1] 592381
592317+23*4 # $P24
#> [1] 592409
592317+16*4+25*4 # $P17 3nd point
#> [1] 592481
592317+23*4+25*4 # $P24
#> [1] 592509
# modified version with
# __ replaced back to //
ff31slash = read.FCS(file.path(mydir, "1696_12017-04-30.0001_3p1_mod_slash.fcs"), transformation = FALSE, truncate_max_range = FALSE, min.limit = NULL)
#> Error in fcsTextParse(txt, emptyValue = emptyValue): Empty keyword name detected!If it is due to the double delimiters in keyword value, please set emptyValue to FALSE and try again! Created on 2018-08-29 by the reprex package (v0.2.0). So the 3.1 FCS is wrong in pointing the end of TEXT and DATA. Concerning the end of TEXT, pointing on the next character after the last delimitr leads to make flowCore thinks that there is an extra keyword. Once corrected (with // replaced by __), no more error during read.FCS. I tried flowIO at http://bioinformin.cesnet.cz/flowIO/ for the first time and found it interesting. Thanks to the authors. The report is at the end. Nevertheless, I corrected the END values on my own. Concerning the original question about data, I am binary guy. So you will find screenshot of the hexadecimal representation of the content of the modified 3.1 FCS. Having computing the offsets to address the FL3-A and FL5-H parameters, I am convince that those parameters are real zero and that the zero don't result from read.FCS(). Before changing the compensation matrix, I got a strange representation of those two compensated parameters in FlowJo. After making the spillover diagonal, no more compensation in FlowJo, and real zeros. So there are zeros in the file for those parameters definitively. BTW the spillover has a strange shape and can't be used for compensation in flowCore, which makes sense IMHO. OK, so far so long, I don't think I am going to investigate so deeply next time. Hope to get a beer from Andy or Milteny ;-) Finally the flowIO report. |
@SamGG , I am not sure what you posted has anything to do with the zero-value issue. The problem is data section wasn't parsed properly (keywords are not the main issue here). datastart <- 592317
dataend <- 3875217
con <- file(file.path(dataPath, "Miltenyi/1696_12017-04-30.0001_3p1.fcs"), open="rb")
seek(con, datastart)
nBytes <- dataend - datastart + 1
dat <- readBin(con = con, what = "double", n = nBytes/4, size = 4, endian="little")
dat <- matrix(dat, ncol = 25, byrow=TRUE)
> dim(dat)
[1] 32829 25
close(con)
range(dat[, 17])
apply(dat, 2, range)
> range(dat[, 17])
[1] 0 0
> apply(dat, 2, range)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1.508742e-05 0.00001 0.00001 0.020 8.676194 6.948626 577.9117
[2,] 5.660420e+00 0.32829 0.32829 1.385 1330.887085 542.654663 7312.2695
[,8] [,9] [,10] [,11] [,12] [,13]
[1,] 10.91154 9.162822 481.3907 -2.310761 0.05768027 -20072.330
[2,] 3077.03467 1977.768311 1629.7454 162.298904 210.58576965 1805.738
[,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] -0.2577191 0.2439771 -874.7939 0 0.1146897 -802.4409 -0.3812822
[2,] 65.6860657 5.9242105 2899.9937 0 49.3567200 1891.2042 55.0991020
[,21] [,22] [,23] [,24] [,25]
[1,] -0.05220075 -197586.66 -0.2845602 0 -471.7487
[2,] 41.53239059 76805.03 7.0218620 0 965.0278 So dimension is right and data is somehow wrong. If other software is able to read in the correct values from the same file. Then I will have to seek the help from @jspidlen to figure out how exactly the data is parsed (e.g. through flowJo). |
@mikejiang Hi. Sorry if I was misleading by answering many errors in that file. |
The modified FCS. |
Wow, excellent detailed analysis. Thanks @mikejiang @SamGG . I'm sorry to say I don't quite follow all the details though. Is there a summary of issues I can provide to miltenyi regarding any issues with encoding of their 3.1 files off the macsquant? It seems the two issues are wrong pointing or delimiter in the header causing wrong parsing of TEXT and DATA, and a comp matrix in unusual/incorrect format, which isn't related to zero values but disallows compensation in flowCore. Is this a correct interpretation? Thank you all for investigating. As to the reason we were't just using 3.0 version, 3.1 version contains well ID keyword and the comp matrix which is useful for analysis, but we could invent a workaround for that if absolutely needed. |
I don't think it was the delimiter causing the DATA parsing issue. It maybe simply that |
Good to hear from you. My summary is the following, but I prefer you wait for Mike's opinion.
|
Thanks @SamGG |
Hi, I find that reading in fcs files exported as version 3.1 from Miltenyi's Macsquantify is giving some unexpected results. The odd result I get is that a couple of my parameters have all 0's when reading into flowCore despite these parameters having data when viewed in flowjo or other applications. I've attached 3 files exported from Macsquantify to fcs. Each is a different version of fcs for export, all are from the same mqd source file (versions exportd are fcs 3.0, fcs 3.1 and what miltenyi calls fcs 'compatible').
MiltenyiFCS.zip
The text was updated successfully, but these errors were encountered: