Consider AnnData & CITE-seq postprocessing #356

sjspielman · 2023-06-27T17:00:06Z

Related to #353

As we move towards releasing scpca-nf with CITE-seq post-processing, we discussed in the linked issue above keeping integration and cell-typing as "experimental features" for now to avoid cherry-picking commits. But, there's one other change to be aware of hdf5 exports containing AnnData formatted results! This has more practical considerations as well in terms of internally re-processing data for the the portal, since we don't want to release AnnData files at this time. Since this may require code changes, I've opened this as a separate issue.

A couple options:

Make the anndata conversion process optional, to be controlled by some config
Simply comment out the process
(more options?)

@allyhawkins @jashapiro

The text was updated successfully, but these errors were encountered:

jashapiro · 2023-06-27T19:13:05Z

I guess it depends how anxious we are to get the next release out. If we have some time I might make the process optional, with a config variable.

But if we are anxious to get the release out, I don't see a problem with leaving it as is and outputting and extra file (noting that in the release notes). I don't think this affects processing for the portal at all as the extra file should simply be skipped and not added to the portal at this stage.

sjspielman · 2023-06-27T19:19:49Z

But if we are anxious to get the release out

We have portal updates planned for a deadline of 7/28, and scpca-nf itself should therefore be roughly end of next sprint.

I don't think this affects processing for the portal at all as the extra file should simply be skipped and not added to the portal at this stage.

So basically, just don't copy over the hdf5 files into the portal is your view it sounds like?

outputting an extra file (noting that in the release notes).

To me, seems like this is not experimental if we get #355 in. I might favor holding off on merging #355 into development which makes this feature more in-progress, but I don't feel super strongly.

allyhawkins · 2023-06-27T19:35:19Z

I don't think this affects processing for the portal at all as the extra file should simply be skipped and not added to the portal at this stage.

So basically, just don't copy over the hdf5 files into the portal is your view it sounds like?

I think this does probably make the most sense to minimize modifications to the workflow. I think all that would need to be done is when you copy things over from the nextflow-ccdl-results bucket to the portal inputs bucket is excluding any files with the .hdf5 extension.

outputting an extra file (noting that in the release notes).

To me, seems like this is not experimental if we get #355 in. I might favor holding off on merging #355 into development which makes this feature more in-progress, but I don't feel super strongly.

I think we could still merge it? The only thing is that it outputs two files for CITE-seq instead of just one for RNA. If we are including it in the release notes I'm not sure if it really matters either way.

sjspielman · 2023-06-28T13:23:36Z

Discussion has been discussed, so we can close this out now. We'll keep the code as is, and for portal releases do some selective copying.

sjspielman closed this as completed Jun 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider AnnData & CITE-seq postprocessing #356

Consider AnnData & CITE-seq postprocessing #356

sjspielman commented Jun 27, 2023

jashapiro commented Jun 27, 2023

sjspielman commented Jun 27, 2023 •

edited

Loading

allyhawkins commented Jun 27, 2023

sjspielman commented Jun 28, 2023

Consider AnnData & CITE-seq postprocessing #356

Consider AnnData & CITE-seq postprocessing #356

Comments

sjspielman commented Jun 27, 2023

jashapiro commented Jun 27, 2023

sjspielman commented Jun 27, 2023 • edited Loading

allyhawkins commented Jun 27, 2023

sjspielman commented Jun 28, 2023

sjspielman commented Jun 27, 2023 •

edited

Loading