Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider AnnData & CITE-seq postprocessing #356

Closed
sjspielman opened this issue Jun 27, 2023 · 4 comments
Closed

Consider AnnData & CITE-seq postprocessing #356

sjspielman opened this issue Jun 27, 2023 · 4 comments

Comments

@sjspielman
Copy link
Member

Related to #353

As we move towards releasing scpca-nf with CITE-seq post-processing, we discussed in the linked issue above keeping integration and cell-typing as "experimental features" for now to avoid cherry-picking commits. But, there's one other change to be aware of hdf5 exports containing AnnData formatted results! This has more practical considerations as well in terms of internally re-processing data for the the portal, since we don't want to release AnnData files at this time. Since this may require code changes, I've opened this as a separate issue.

A couple options:

  • Make the anndata conversion process optional, to be controlled by some config
  • Simply comment out the process
  • (more options?)

@allyhawkins @jashapiro

@jashapiro
Copy link
Member

I guess it depends how anxious we are to get the next release out. If we have some time I might make the process optional, with a config variable.

But if we are anxious to get the release out, I don't see a problem with leaving it as is and outputting and extra file (noting that in the release notes). I don't think this affects processing for the portal at all as the extra file should simply be skipped and not added to the portal at this stage.

@sjspielman
Copy link
Member Author

sjspielman commented Jun 27, 2023

But if we are anxious to get the release out

We have portal updates planned for a deadline of 7/28, and scpca-nf itself should therefore be roughly end of next sprint.

I don't think this affects processing for the portal at all as the extra file should simply be skipped and not added to the portal at this stage.

So basically, just don't copy over the hdf5 files into the portal is your view it sounds like?

outputting an extra file (noting that in the release notes).

To me, seems like this is not experimental if we get #355 in. I might favor holding off on merging #355 into development which makes this feature more in-progress, but I don't feel super strongly.

@allyhawkins
Copy link
Member

I don't think this affects processing for the portal at all as the extra file should simply be skipped and not added to the portal at this stage.

So basically, just don't copy over the hdf5 files into the portal is your view it sounds like?

I think this does probably make the most sense to minimize modifications to the workflow. I think all that would need to be done is when you copy things over from the nextflow-ccdl-results bucket to the portal inputs bucket is excluding any files with the .hdf5 extension.

outputting an extra file (noting that in the release notes).

To me, seems like this is not experimental if we get #355 in. I might favor holding off on merging #355 into development which makes this feature more in-progress, but I don't feel super strongly.

I think we could still merge it? The only thing is that it outputs two files for CITE-seq instead of just one for RNA. If we are including it in the release notes I'm not sure if it really matters either way.

@sjspielman
Copy link
Member Author

Discussion has been discussed, so we can close this out now. We'll keep the code as is, and for portal releases do some selective copying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants