Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to best use peacoQC() in Spectre workflow? #149

Closed
denvercal1234GitHub opened this issue Jan 26, 2023 · 8 comments
Closed

How to best use peacoQC() in Spectre workflow? #149

denvercal1234GitHub opened this issue Jan 26, 2023 · 8 comments
Assignees
Labels
documentation Improvements or additions to documentation troubleshooting Troubleshooting an issue, that doesn't appear to be a fundamental error in the code.

Comments

@denvercal1234GitHub
Copy link

Hi there,

I am following the tutorial (https://immunedynamics.io/spectre/simple-discovery/#6_Annotate_clusters).

I input the channel values of my data into Spectre. Then, I do write.files() as below to export these channel values as FCS without doing any transformation or clustering, because I now want to use these FlowJo-transformed FCS files in peacoQC package before putting back the QC-files to Spectre.

Is it better to use CSV-to-FCS script instead of this write.files()?

Thank you for your help.

write.files(cell.dat,
                    file.prefix = exp.name,
                    divide.by = sample.col,
                    write.csv = FALSE,
                    write.fcs = TRUE)

Related: sydneycytometry/CSV-to-FCS#6

@ghar1821
Copy link
Member

I would use write.files instead of CSV-to-FCS. We haven't updated CSV-to-FCS for a while.

I'm not familiar with how PeacoQC works, but if it takes FCS files, then the output of write.files should work. Give it a try and let us know if it doesn't work (what the error message is).

@denvercal1234GitHub
Copy link
Author

denvercal1234GitHub commented Jan 27, 2023

Thanks @ghar1821 for your response! When you get a chance, would you mind helping me address 4 questions below?

  1. To confirm, if I do write.files() to export the channel values as FCS without doing any transformation or clustering, the transformation done by FlowJo will be preserved?

  2. When I ran write.files() as below on my data.list file (i.e. before merging all the data.tables into 1 file), I got a Warning below. That sounds concerning, so I decided to write.files() instead on my cell.dat object (i.e., after merging) and there was no error. Is it ok to use FCS convered from the cell.dat object?

Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3 has 98770 rows but longest item has 136260; recycled with remainder.Warning: Item 4 has 99557 rows but longest item has 136260; recycled with remainder.Warning: Item 5 has 86865 rows but longest item has 136260; recycled with remainder.Warning: Item 6 has 100794 rows but longest item has 136260; recycled with remainder.Warning: Item 7 has 80890 rows but longest item has 136260; recycled with remainder.Warning: Item 8 has 80149 rows but longest item has 136260; recycled with remainder.Warning: Item 9 has 20444 rows but longest item has 136260; recycled with remainder.Warning: Item 10 has 92751 rows but longest item has 136260; recycled with remainder.Warning: Item 11 has 78070 rows but longest item has 136260; recycled with remainder.Warning: Item 12 has 90425 rows but longest item has 136260; recycled with remainder.Warning: Item 13 has 84116 rows but longest item has 136260; 
  1. When I then ran read.flowSet() as shown below, I had another warning as below. This does not look right?
##fcs.dir1 is my directory of the FCS files converted from cell.dat object

cell.dat_FlowSet <- read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) 

Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. 
  1. Do you know how I might preserve the name for each FCS file (e.g., by the column "FileName") that is converted?

Thank you for your help!

> write.files(F37_LiveSinglets_channel_data.list,
+                     write.csv = FALSE,
+                     write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

@ghar1821
Copy link
Member

Hi @denvercal1234GitHub

Answers to your questions:

  1. Yes the transformation will be preserved. The write.files essentially just writes out whatever is stored in the data.table.
  2. The data.list looks to be a list of data.table? The write.files function takes a data.table as the dat argument, not a list. So you should use lapply (or something equivalent) to write out every data.table in data.list variable out as FCS file.
  3. I'm not sure what that error message meant.
  4. I don't think that is possible with FCS files.

Rather than writing out the flowJo's transformed value out as csv file then converting it to fcs file using Spectre, is it not possible to just get flowJo to export those transformed values as fcs files?

@tomashhurst tomashhurst self-assigned this Jan 31, 2023
@tomashhurst tomashhurst added the troubleshooting Troubleshooting an issue, that doesn't appear to be a fundamental error in the code. label Jan 31, 2023
@denvercal1234GitHub
Copy link
Author

denvercal1234GitHub commented Jan 31, 2023

Thanks @ghar1821 and @tomashhurst.

QUESTION 2.
data.list is the output in the format of a list of data.tables from Spectre::read.files(file.loc = "/Users/clusteredatom/Documents/.../LiveSinglets_Channel", file.type = ".csv", do.embed.file.names = TRUE)

Performing lapply(F37_data.list, function(x) write.files(x, write.csv = FALSE, write.fcs = TRUE,file.prefix = "channelFCS", divide.by = NULL)) still gave me 1 FCS file called channelFCS.fcs, and in the Console, they showed whole bunch of "NULL" for some data.tables as below. So I don't think it works with lapply() this way.

$export_F37_10D_CD158_PE_Live
NULL

$export_F37_10D_KI_PE_Live
NULL

$export_F37_10D_CD12_PE_Live
NULL
...

And, if I run write.files() with divide.by = "FileNo", then I encountered the "Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3"

write.files(F37_LiveSinglets_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

If I run write.files() nested in the lapply() instead on my cell.dat object (i.e., after merging), which does not make sense since cell.dat is not a list but just a dt, there was similar outputs shown on Console:

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

lapply(F37_data.list_cell.dat, function(x) Spectre::write.files(x, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = NULL))
$`FSC-A`
NULL

$`SSC-A`
NULL

$TOX
NULL

$CD3
NULL
...

Lastly, if thus run simply write.files using my cell.data object with divide.by = "FileNo", I can have each FCS output file (named as "channelFCS_FileNo_1.fcs" etc.), and there was no warning.

If I set the divide.by=NULL, I will only get 1 FCS file.

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

In conclusion, would you say running write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo") is the best option?

Although now, the file name of the outputted FCS will just be FileNo that I will need to match them back to the FileNo column of the cell.data object - which create more room for human errors - which was why I had QUESTION 4.

I am ok with this, but the issue is when I then used these exported FCS files in flowCore::read.flowSet() to convert them into FlowSet object in order to use the peacoQC package, I had Warning as mentioned in QUESTION 3 of $PnR NA is invalid.

# fcs.dir1 is directory to exported FCS files from Spectre::write.files(F37_data.list_cell.dat)

F37_data.list_cell.dat_FlowSet <- flowCore::read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) #fcs_data will be a FlowSet object
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.

QUESTION 3.
The warning $PnR NA is invalid as mentioned above after running flowCore::read.flowSet() was also produced after running flowCore::read.FCS(), which is being discussed here: RGLab/flowCore#243

@denvercal1234GitHub
Copy link
Author

@ghar1821 commenting at RGLab/flowCore#243

@ghar1821
Copy link
Member

ghar1821 commented Feb 2, 2023

Pasting the reply I initially wrote for RGLab/flowCore#243 here:

Performing lapply(F37_data.list, function(x) write.files(x, write.csv = FALSE, write.fcs = TRUE,file.prefix = "channelFCS", divide.by = NULL)) still gave me 1 FCS file called channelFCS.fcs, and in the Console, they showed whole bunch of "NULL" for some data.tables as below. So I don't think it works with lapply() this way.

$export_F37_10D_CD158_PE_Live
NULL

$export_F37_10D_KI_PE_Live
NULL

$export_F37_10D_CD12_PE_Live
NULL
...

The NULL showed in the console is because write.files function does not return anything. Hence the console just shows NULL is returned for every element in the list.
You only get 1 FCS file because every data.table is written into the same FCS file (the one with channelFCS prefixed). You should change the file.prefix parameter so that every data.table is given a different file.prefix. So something like the following where I have a list of data.table, each element is named after the filename the data.table was read from:

lapply(seq_len(length(dt)), function(i) {
    x <- dt[[i]]
    fname <- gsub(".csv", "", names(dt)[i])
    write.files(x, file.prefix = fname, write.csv=FALSE, write.fcs=TRUE)
})

And, if I run write.files() with divide.by = "FileNo", then I encountered the "Warning: Item 1 has 98073 rows but longest item has 136260; recycled with remainder.Warning: Item 2 has 103077 rows but longest item has 136260; recycled with remainder.Warning: Item 3"

write.files(F37_LiveSinglets_channel_data.list, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

write.files should not be able to write out a list. I'm surprised you didn't get an error. Is F37_LiveSinglets_channel_data.list a data.table? The warning may mean some of the markers have missing values for some cells. Looks to me marker in column 1 has 98,073 cells while the one in column 2 has 103,077, and that the marker with the most number of non-empty values has 136,260 cells. Can you check?

If I run write.files() nested in the lapply() instead on my cell.dat object (i.e., after merging), which does not make sense since cell.dat is not a list but just a dt, there was similar outputs shown on Console:

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

lapply(F37_data.list_cell.dat, function(x) Spectre::write.files(x, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = NULL))
$`FSC-A`
NULL

$`SSC-A`
NULL

$TOX
NULL

$CD3
NULL
...

No this doesn't make sense. Don't do it.

Lastly, if thus run simply write.files using my cell.data object with divide.by = "FileNo", I can have each FCS output file (named as "channelFCS_FileNo_1.fcs" etc.), and there was no warning.

If I set the divide.by=NULL, I will only get 1 FCS file.

F37_data.list_cell.dat <- Spectre::do.merge.files(dat = F37_data.list)

write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo")

This is the expected behaviour. If divide.by is not NULL, write.files will simply break up your data.table into few chunks, each containing only cells which value for the column you specified in divide.by is the same, e.g. chunk 1 only contains cells from FileNo 1, chunk 2 only contains cells from FileNo 2, etc..

In conclusion, would you say running write.files(F37_data.list_cell.dat, write.csv = FALSE, write.fcs = TRUE, file.prefix = "channelFCS", divide.by = "FileNo") is the best option?

Yes.

Although now, the file name of the outputted FCS will just be FileNo that I will need to match them back to the FileNo column of the cell.data object - which create more room for human errors - which was why I had QUESTION 4.

What about using FileName as divide.by?

I am ok with this, but the issue is when I then used these exported FCS files in flowCore::read.flowSet() to convert them into FlowSet object in order to use the peacoQC package, I had Warning as mentioned in QUESTION 3 of $PnR NA is invalid.

# fcs.dir1 is directory to exported FCS files from Spectre::write.files(F37_data.list_cell.dat)

F37_data.list_cell.dat_FlowSet <- flowCore::read.flowSet(path=fcs.dir1, pattern="*.fcs", transformation = FALSE, truncate_max_range = FALSE) #fcs_data will be a FlowSet object
Warning: $PnR NA is invalid. If it is a numeric string, this could be because it is larger than R's numeric limit: 1.79769313486232e+308. The assigned $PnR value will be imputed from the maximum value of the data in this channel.

QUESTION 3. The warning $PnR NA is invalid as mentioned above after running flowCore::read.flowSet() was also produced after running flowCore::read.FCS(), which is being discussed here: RGLab/flowCore#243

I think you will still get the error regardless of whether you use 1 or 4 FCS files. I have a feeling this is because the data.table does not store the instrument range - though I'm not 100% as I don't normally use FCS file. Now, whether this matters for downstream analysis really depends on what your subsequent steps are.

I'll see if I can find the time write up a guide on how to use peacoQC to do preprocessing, and Spectre for subsequent downstream analysis.

@ghar1821
Copy link
Member

ghar1821 commented Feb 2, 2023

From @denvercal1234GitHub reply originally posted at RGLab/flowCore#243:

Thank you so much for your time @ghar1821!

I will try `divide.by="FileName" and let you know, and check on QUESTION 3 and let you know.

And, yes please, having a guide on how to incorporate the following packages to our Spectre package would be very useful and increase the packages' usage a lot:

@ghar1821 ghar1821 self-assigned this Feb 2, 2023
@ghar1821 ghar1821 added the documentation Improvements or additions to documentation label Feb 2, 2023
@denvercal1234GitHub
Copy link
Author

denvercal1234GitHub commented Feb 6, 2023

The issue was identified due to file naming of PE markers across wells. See RGLab/flowCore#243

Solution = change the marker names before exporting the data from cytometer into FlowJo and subsequent R packages.

Thank you again @ghar1821 for taking the time to look into the issue. And, I look forward to reading the tutorials whenever you get a chance to do it.

Keeping it open because unsure if Givanna's tag for documentation would disappear if closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation troubleshooting Troubleshooting an issue, that doesn't appear to be a fundamental error in the code.
Projects
None yet
Development

No branches or pull requests

3 participants