Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan new scpca-nf release with CITE-seq post-processing #353

Closed
sjspielman opened this issue Jun 26, 2023 · 13 comments
Closed

Plan new scpca-nf release with CITE-seq post-processing #353

sjspielman opened this issue Jun 26, 2023 · 13 comments
Assignees

Comments

@sjspielman
Copy link
Member

As part of closing out the #291 Epic, we'll want to actually cut the new scpca-nf release with cite-seq post-processing. This will almost certainly involve only including selected commits from development into the release.

@sjspielman sjspielman self-assigned this Jun 26, 2023
@jaclyn-taroni
Copy link
Member

@sjspielman, you probably want to use the release checklist template to track this: https://github.com/AlexsLemonade/scpca-nf/issues/new?assignees=&labels=release&projects=&template=release-checklist.md&title=Prepare+for+scpca-nf+release+vX.X.X

@jashapiro
Copy link
Member

I'm not sure we should try to pick out changes at this point. There is enough interleaved that I think it may simply be better to include all updates to this point, unless there are specific changes we think should not go in.

@allyhawkins
Copy link
Member

I'm not sure we should try to pick out changes at this point. There is enough interleaved that I think it may simply be better to include all updates to this point, unless there are specific changes we think should not go in.

If we do that it includes the cell type workflow and integration workflows. Right now they are separate so don't get run by the main workflow, but just noting that they do exist so maybe we don't want to include those?

@sjspielman sjspielman changed the title Cut new scpca-nf release with CITE-seq post-processing Plan new scpca-nf release with CITE-seq post-processing Jun 26, 2023
@jashapiro
Copy link
Member

I'm not sure we should try to pick out changes at this point. There is enough interleaved that I think it may simply be better to include all updates to this point, unless there are specific changes we think should not go in.

If we do that it includes the cell type workflow and integration workflows. Right now they are separate so don't get run by the main workflow, but just noting that they do exist so maybe we don't want to include those?

Maybe not ideal, but I think I would rather have them in main as "experimental" than worry about the morass of manually picking out commits. The longer we keep development and main on separate paths, the more trouble we are setting ourselves up for.

Another option (which I don't know how I feel about), is branching off "development" with cell typing and integration as their own branches, then removing those workflows from development. I think this is likely to end up with some pretty error-prone merges too though (unless we remove them first, then branch and revert the removal commits in the new branches). No great options here, I guess.

@sjspielman
Copy link
Member Author

Argh, the template of course...

Let's continue to chat about plans for release here re-"which commits/features" and I'll open a better issue from template for the release itself.

@allyhawkins
Copy link
Member

Maybe not ideal, but I think I would rather have them in main as "experimental" than worry about the morass of manually picking out commits. The longer we keep development and main on separate paths, the more trouble we are setting ourselves up for.

I'm okay with including them and avoiding errors and issues with keeping things in sync. I just wanted to make sure everyone knew that those will also be present if we merge in all changes.

@sjspielman
Copy link
Member Author

I'm okay with including them and avoiding errors and issues with keeping things in sync. I just wanted to make sure everyone knew that those will also be present if we merge in all changes.

I'm also good either way, so this is helpful since my original plan was to not include cell-typing or integration! Since we'd call these features experimental for now, I imagine they would get loosely documented in release notes (my note to self..), but I wonder if there's anything else we'd want to add to internal or external instructions just reminding folks that they are experimental. That said, I don't think this is critical; any other opinions?

@sjspielman
Copy link
Member Author

sjspielman commented Jun 26, 2023

Noting also we have updated approaches for references which would get included in a new release containing everything in development. I would think this feature would need to be more fully documented vs celltyping/integration. I see changes towards internal docs in #310 and changes for external docs in #306; just confirming here whether there are any more docs needed for that feature (it seems set to me?).

@allyhawkins
Copy link
Member

Noting also we have updated approaches for references which would get included in a new release containing everything in development. I would think this feature would need to be more fully documented vs celltyping/integration. I see changes towards internal docs in #310 and changes for external docs in #306; just confirming here whether there are any more docs needed for that feature (it seems set to me?).

These updates have been accounted for in the docs, so we are good to release those.

I'm also good either way, so this is helpful since my original plan was to not include cell-typing or integration! Since we'd call these features experimental for now, I imagine they would get loosely documented in release notes (my note to self..), but I wonder if there's anything else we'd want to add to internal or external instructions just reminding folks that they are experimental. That said, I don't think this is critical; any other opinions?

I think including them as experimental still in the release notes is good.

@sjspielman
Copy link
Member Author

To get a sense of whether there are any sneaky changes that releasing all development commits will incur, I went ahead and did a quick-and-dirty comparison of a non-CITE-seq project (SCPCL000001) when run fresh (no -resume) on development code vs what's in the portal right now. There are some differences (often resulting from inherent stochasticity) that are worth being aware of:

  • salmon version differences (is this really minimal?)
  • different numbers of filtered/processed SCE cells! (blame empty drops filtering for this; portal has 7 more than development)
    • as a consequence, we have some minor differences in QC stats
  • most interesting to me was the HVG vector, which differs for 14 of the genes. this seems important!

So, the question is, do these changes raise to the level of needing to re-process all projects with this new release? If so, would it make more sense to cherry-pick such that only CITE-seq projects would be meaningfully affected by the re-release? Would a whole ScPCA refresh make sense at this stage anyways?

I'm attaching here the two QC reports for SCE version as well as a super quick notebook I compared SCEs in.
upload.zip

@jashapiro
Copy link
Member

Thanks for doing this comparison. It seems quite likely that most of the changes are to do with updated containers. Notably the salmon change happened a little while ago and is already in the newer releases, and the EmptyDrops changes are probably due to updates in scpcaTools, which also are already in main.

As we do record the pipeline version in the outputs (though apparently not in the QC report), and these changes appear quite minor (I expect the changes in HVG are for genes at the lower end of the vector, which you can confirm) I am not concerned that we need to reprocess everything at this stage. As we add functionality (cell typing), everything will get rerun through at least the later part of the pipelin (not including salmon as the design is not to require that).

@sjspielman
Copy link
Member Author

I expect the changes in HVG are for genes at the lower end of the vector, which you can confirm

Yep -

which(!(metadata(dev)$highly_variable_genes %in% metadata(portal)$highly_variable_genes))
> [1] 1911 1921 1938 1944 1953 1955 1958 1961 1968 1972 1975 1976 1977 1980 1981 1985 1986 1988 1990 1991 1992 1994
[23] 1995 1997

@sjspielman
Copy link
Member Author

Going to close out this discussion since we have a plan now moving forward. See #354 for the actual release issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants