Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with fcs 3.1 files exported from Macsquantify #75

Closed
Andrew-InTheBox opened this issue May 25, 2017 · 16 comments
Closed

problems with fcs 3.1 files exported from Macsquantify #75

Andrew-InTheBox opened this issue May 25, 2017 · 16 comments

Comments

@Andrew-InTheBox
Copy link

Hi, I find that reading in fcs files exported as version 3.1 from Miltenyi's Macsquantify is giving some unexpected results. The odd result I get is that a couple of my parameters have all 0's when reading into flowCore despite these parameters having data when viewed in flowjo or other applications. I've attached 3 files exported from Macsquantify to fcs. Each is a different version of fcs for export, all are from the same mqd source file (versions exportd are fcs 3.0, fcs 3.1 and what miltenyi calls fcs 'compatible').

MiltenyiFCS.zip

@gfinak
Copy link
Member

gfinak commented Aug 17, 2018

@AndyinMission
Do you have any updates on the root cause of your issue? Is it an export problem or a flowCore problem?

@Andrew-InTheBox
Copy link
Author

Sorry but unfortunately I've not been able to learn more about this. The vendor hasn't been responsive to technical questions about how they're encoding and exporting fcs files and it's not clear to me what the issue may be as it relates to the 'missing' data for some of the -H parameters in the 3.1 files when opened in flowCore.

@malcook
Copy link

malcook commented Aug 28, 2018

@gfinak - Searching for other solutions, I find that you are poised with tests/testthat/IO-testSuite.R - presumably once you figure out the root cause - is that right?

@SamGG
Copy link
Contributor

SamGG commented Aug 28, 2018

@AndyinMission Hi, could you give explicitly 2 "-H" parameters that are zero and attach the corresponding plot?
IMHO, the 1696_12017-04-30.0001_3p1.fcs file is badly coded as it contains many // in keyword values. I am not sure that emptyValue=FALSE could solve this problem.

@zbjornson
Copy link

zbjornson commented Aug 28, 2018

@SamGG I think all of the // that you see are escaped delimiters (for example /$P13F/450//50 nm/$P13L/... means $P13F = 450/50 nm).

BTW we have an FCS file validator that runs in your browser here: https://primitybio.github.io/fcs-validator/ (source: https://github.com/primitybio/fcs-validator).

@malcook
Copy link

malcook commented Aug 28, 2018

Somewhat lurking here. I've read in FCS spec that the choice of delimiter character is not fixed. And I've witnessed that it varies from vendor to vendor. And I've further seen software that does not respect this and assumes that the delimiter is '/'. Or assumes that it is '*'. Could this be part of underlying issue here?

@Andrew-InTheBox
Copy link
Author

Andrew-InTheBox commented Aug 28, 2018

Sure, I see now it's not only limited to the -H parameters.. I've attached the results of running autoplot() on all three versions of the exported macsquant fcs files in R. On the right-hand side (fcs 3.1 version) we can see that these parameters don't seem to match the 3.0 version, and some in the compatible version also seem wrong:

FL1-W
FL3-A
FL4-W
FL5-H

macsquant_fcs_r

Edit - added more illustrative image where rows/cols match.

@SamGG
Copy link
Contributor

SamGG commented Aug 28, 2018

@AndyinMission Thanks for providing more information as it is always valuable to reporduce and track any problem. It is not because the 3.0 FCS is ok that the 3.1 is. May be should you stick on the 3.0 FCS.
@zbjornson Damn! I missed that point again. Thanks, but it is also interesting to replace // by __, see below.
@gfinak Hi, I think there is still a problem with unescaping delimiter.

library(flowCore)

(mydir = "c:/Users/sampgg/Downloads/MiltenyiFCS/")
#> [1] "c:/Users/sampgg/Downloads/MiltenyiFCS/"

# original file

ff31 = read.FCS(file.path(mydir, "1696_12017-04-30.0001_3p1.fcs"), emptyValue = FALSE)
#> uneven number of tokens: 679
#> The last keyword is dropped.
#> uneven number of tokens: 679
#> The last keyword is dropped.

keyword(ff31)[["$SPILLOVER"]]
#>       FL1-A FL1-H FL2-A FL2-H FL3-A FL3-H FL4-A FL4-H FL5-A FL5-H
#>  [1,]     1     0     0     0     0     0     0     0     0     0
#>  [2,]     0     0     0     0     0     0     0     0     0     0
#>  [3,]     0     0     0     0     0     0     0     0     0     0
#>  [4,]     0     1     0     0     0     0     0     0     0     0
#>  [5,]     0     0     1     0     0     0     0     0     0     0
#>  [6,]     0     0     0     0     0     0     0     1     0     0
#>  [7,]     0     0     0     0     0     0     0     0     1     0
#>  [8,]     0     0     0     1     0     0     0     0     0     0
#>  [9,]     0     0     0     0     1     0     0     0     0     0
#> [10,]     0     0     0     0     0     0     0     0     0     1
rowSums(keyword(ff31)[["$SPILLOVER"]])
#>  [1] 1 0 0 1 1 1 1 1 1 1
colSums(keyword(ff31)[["$SPILLOVER"]])
#> FL1-A FL1-H FL2-A FL2-H FL3-A FL3-H FL4-A FL4-H FL5-A FL5-H 
#>     1     1     1     1     1     0     0     1     1     1

# modified version

# // replaced by __
# ENDXXX corrected -1
# SPILLOVER is diagonal

ff31mod = read.FCS(file.path(mydir, "1696_12017-04-30.0001_3p1_mod.fcs"), transformation = FALSE, truncate_max_range = FALSE, min.limit = NULL)

apply(exprs(ff31mod)[,c("FL1-W", "FL3-A", "FL4-W", "FL5-H")], 2, 
      quantile, c(0, 0.1, 0.5, 0.9, 1))
#>            FL1-W FL3-A        FL4-W FL5-H
#> 0%   -20072.3301     0 -197586.6562     0
#> 10%     229.5142     0     259.2396     0
#> 50%     661.7385     0     520.5358     0
#> 90%     983.1834     0     705.4641     0
#> 100%   1805.7382     0   76805.0312     0

colnames(exprs(ff31mod))
#>     $P1N     $P2N     $P3N     $P4N     $P5N     $P6N     $P7N     $P8N 
#>   "Time" "HDR-CE" "HDR-SE"  "HDR-V"  "FSC-A"  "FSC-H"  "FSC-W"  "SSC-A" 
#>     $P9N    $P10N    $P11N    $P12N    $P13N    $P14N    $P15N    $P16N 
#>  "SSC-H"  "SSC-W"  "FL1-A"  "FL1-H"  "FL1-W"  "FL2-A"  "FL2-H"  "FL2-W" 
#>    $P17N    $P18N    $P19N    $P20N    $P21N    $P22N    $P23N    $P24N 
#>  "FL3-A"  "FL3-H"  "FL3-W"  "FL4-A"  "FL4-H"  "FL4-W"  "FL5-A"  "FL5-H" 
#>    $P25N 
#>  "FL5-W"

# offset
592317  # beginning of data
#> [1] 592317
592317+16*4  # $P17
#> [1] 592381
592317+23*4  # $P24
#> [1] 592409
592317+16*4+25*4  # $P17 3nd point
#> [1] 592481
592317+23*4+25*4  # $P24
#> [1] 592509

# modified version with

# __ replaced back to //

ff31slash = read.FCS(file.path(mydir, "1696_12017-04-30.0001_3p1_mod_slash.fcs"), transformation = FALSE, truncate_max_range = FALSE, min.limit = NULL)
#> Error in fcsTextParse(txt, emptyValue = emptyValue): Empty keyword name detected!If it is due to the double delimiters in keyword value, please set emptyValue to FALSE and try again!

Created on 2018-08-29 by the reprex package (v0.2.0).

So the 3.1 FCS is wrong in pointing the end of TEXT and DATA. Concerning the end of TEXT, pointing on the next character after the last delimitr leads to make flowCore thinks that there is an extra keyword. Once corrected (with // replaced by __), no more error during read.FCS. I tried flowIO at http://bioinformin.cesnet.cz/flowIO/ for the first time and found it interesting. Thanks to the authors. The report is at the end. Nevertheless, I corrected the END values on my own.

2018-08-28_234003

Concerning the original question about data, I am binary guy. So you will find screenshot of the hexadecimal representation of the content of the modified 3.1 FCS. Having computing the offsets to address the FL3-A and FL5-H parameters, I am convince that those parameters are real zero and that the zero don't result from read.FCS().

2018-08-28_234117

Before changing the compensation matrix, I got a strange representation of those two compensated parameters in FlowJo. After making the spillover diagonal, no more compensation in FlowJo, and real zeros. So there are zeros in the file for those parameters definitively.

2018-08-28_233847

BTW the spillover has a strange shape and can't be used for compensation in flowCore, which makes sense IMHO.

OK, so far so long, I don't think I am going to investigate so deeply next time. Hope to get a beer from Andy or Milteny ;-)

Finally the flowIO report.

screenshot_2018-08-28 flowio

@mikejiang
Copy link
Member

@SamGG , I am not sure what you posted has anything to do with the zero-value issue. The problem is data section wasn't parsed properly (keywords are not the main issue here).
Here is from FCS binary view
image
As shown the FCS header section gives the start and end of data section. Now let's simply load the data using readBin

datastart <- 592317
dataend <- 3875217

con <- file(file.path(dataPath, "Miltenyi/1696_12017-04-30.0001_3p1.fcs"), open="rb")
seek(con, datastart)
nBytes <- dataend - datastart + 1
dat <- readBin(con = con, what = "double", n = nBytes/4, size = 4, endian="little")
dat <- matrix(dat, ncol = 25, byrow=TRUE)
> dim(dat)
[1] 32829    25
close(con)
range(dat[, 17])
apply(dat, 2, range)
> range(dat[, 17])
[1] 0 0
> apply(dat, 2, range)
             [,1]    [,2]    [,3]  [,4]        [,5]       [,6]      [,7]
[1,] 1.508742e-05 0.00001 0.00001 0.020    8.676194   6.948626  577.9117
[2,] 5.660420e+00 0.32829 0.32829 1.385 1330.887085 542.654663 7312.2695
           [,8]        [,9]     [,10]      [,11]        [,12]      [,13]
[1,]   10.91154    9.162822  481.3907  -2.310761   0.05768027 -20072.330
[2,] 3077.03467 1977.768311 1629.7454 162.298904 210.58576965   1805.738
          [,14]     [,15]     [,16] [,17]      [,18]     [,19]      [,20]
[1,] -0.2577191 0.2439771 -874.7939     0  0.1146897 -802.4409 -0.3812822
[2,] 65.6860657 5.9242105 2899.9937     0 49.3567200 1891.2042 55.0991020
           [,21]      [,22]      [,23] [,24]     [,25]
[1,] -0.05220075 -197586.66 -0.2845602     0 -471.7487
[2,] 41.53239059   76805.03  7.0218620     0  965.0278

So dimension is right and data is somehow wrong. If other software is able to read in the correct values from the same file. Then I will have to seek the help from @jspidlen to figure out how exactly the data is parsed (e.g. through flowJo).

@SamGG
Copy link
Contributor

SamGG commented Aug 28, 2018

@mikejiang Hi. Sorry if I was misleading by answering many errors in that file.
Concerning the data part, I did the hard way while you showed the smart way. I think my screenshot and your output say the same thing: PAR17 and PAR24 are zeros. Data is NOT wrong as you read it directly from the disk. FlowJo does its own processing, but flowCore is correct unless somebody demonstrates the contrary. I think there is something weird in computing the compensation. Look at the SPILLOVER matrix and let me know how to compensate data with it, if compensation makes really sense. Thanks Mike.

@SamGG
Copy link
Contributor

SamGG commented Aug 28, 2018

The modified FCS.

MiltenyiFCS_modified.zip

@Andrew-InTheBox
Copy link
Author

Wow, excellent detailed analysis. Thanks @mikejiang @SamGG . I'm sorry to say I don't quite follow all the details though. Is there a summary of issues I can provide to miltenyi regarding any issues with encoding of their 3.1 files off the macsquant? It seems the two issues are wrong pointing or delimiter in the header causing wrong parsing of TEXT and DATA, and a comp matrix in unusual/incorrect format, which isn't related to zero values but disallows compensation in flowCore. Is this a correct interpretation?

Thank you all for investigating. As to the reason we were't just using 3.0 version, 3.1 version contains well ID keyword and the comp matrix which is useful for analysis, but we could invent a workaround for that if absolutely needed.

@mikejiang
Copy link
Member

I don't think it was the delimiter causing the DATA parsing issue. It maybe simply that Macsquantify file doesn't follow the FCS3.1 standard and we don't what rules to follow in order to process it properly.

@SamGG
Copy link
Contributor

SamGG commented Aug 29, 2018

Good to hear from you. My summary is the following, but I prefer you wait for Mike's opinion.

  • end of TEXT and DATA is pointing 1 byte after the real position. flowCore is OK with one less, but not one more.
  • the delimiter is OK, but it is better to use as delimiter a character that never appears in keyword names and values; | and * are usually good choices.
  • IMHO the PAR17 and PAR24 are really zeros. So measurements in those channels are zeros.
  • The compensation matrix is made of zeros and ones, and some rows and columns sum up to zero. I don't know what can be derived from them. There is no compensation matrix in 3.0 file.

@Andrew-InTheBox
Copy link
Author

Thanks @SamGG

@gfinak gfinak closed this as completed Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants