Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mosdef #3355

Closed
10 tasks done
ldammer opened this issue Mar 20, 2024 · 39 comments
Closed
10 tasks done

mosdef #3355

ldammer opened this issue Mar 20, 2024 · 39 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK

Comments

@ldammer
Copy link

ldammer commented Mar 20, 2024

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @ldammer

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: mosdef
Title: MOSt frequently used and useful Differential Expression Functions
Version: 0.99.0
Authors@R: 
    c(
        person(
  given = "Leon", family = "Dammer", role = c("aut"), 
  email = "lc.dammer@gmail.com", comment = c(ORCID = "0009-0008-4132-7639")
        ),
        person(
  given = "Federico", family = "Marini", role = c("aut", "cre"), 
  email = "marinif@uni-mainz.de", comment = c(ORCID = "0000-0003-3252-7758")
        )
    )
Description: This package provides functionality to run a number of tasks in the 
    differential expression analysis workflow. This encompasses the most widely 
    used steps, from running various enrichment analysis tools with a unified interface
    to creating plots and beautifying table components linking to external websites
    and databases. This streamlines the generation of comprehensive analysis reports.
Imports:
    dplyr,
    DT,
    ggplot2,
    ggforce, 
    ggrepel,
    htmltools,
    methods,
    AnnotationDbi,
    topGO,
    GO.db,
    clusterProfiler, 
    goseq,
    utils,
    RColorBrewer,
    rlang,
    DESeq2,
    scales,
    SummarizedExperiment,
    S4Vectors,
    stats
Suggests: 
    knitr,
    rmarkdown,
    airway,
    macrophage,
    org.Hs.eg.db,
    GeneTonic,
    testthat (>= 3.0.0),
    TxDb.Hsapiens.UCSC.hg38.knownGene,
    BiocStyle
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
VignetteBuilder: knitr
URL: https://github.com/imbeimainz/mosdef
BugReports: https://github.com/imbeimainz/mosdef/issues
Config/testthat/edition: 3
biocViews:
    GeneExpression, Software, Transcription, Transcriptomics, DifferentialExpression,
    Visualization, ReportWriting, GeneSetEnrichment, GO

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Mar 20, 2024
@ldammer
Copy link
Author

ldammer commented Mar 20, 2024

cc coauthor @federicomarini

@federicomarini
Copy link

As a heads-up for the reviewing process:

  • We are aware of long-ish testing times, and we already set up the longtests as suggested by @lshep - thanks a lot again for the tip!
  • About the package: This is the place where we moved out some functions that were there in some other packages we developed (pcaExplorer, ideal, GeneTonic), with the aim to also achieve some higher modularity of the codebase.
    If this package is accepted, we will proceed with the expected cycles of replacing the dependencies/soft deprecating the older ways of doing things.

@lshep lshep added the pre-check passed pre-review performed and ready to be added to git label Mar 22, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been added to git.bioconductor.org to continue the
pre-review process. A build report will be posted shortly. Please
fix any ERROR and WARNING in the build report before a reviewer is
assigned or provide a justification on why you feel the ERROR or
WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. All changes should be
pushed to git.bioconductor.org moving forward. It is required to push a
version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Mar 22, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/mosdef to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@federicomarini
Copy link

We tried to reproduce the error on our machines + on docker, and via GitHub actions but we were not really successful.
The message from the report is bit cryptic but seems to be related to other chunks (if at all - it builds fine pretty much on all other systems we tried).
Is there a way to get more details on this?

image

Thanks in advance!

@lshep
Copy link
Contributor

lshep commented Mar 25, 2024

I couldn't reproduce either. I'm going to manually kick off a new build for you and see if it clears up.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): mosdef_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/mosdef to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@federicomarini
Copy link

I couldn't reproduce either. I'm going to manually kick off a new build for you and see if it clears up.

Perfect @lshep thanks for setting this a bit into movement.
The WARNING (due to R CMD check taking about 11-12 minutes) we got is "expected", as it is above the 10 minutes.
The longtests are already in place to properly cover all the codebase.

As for the WARNING by BiocCheck, we are a bit above the 5Mb due to a fair amount of documentation. If needed, we can try to cut it down a bit, but I guess this is one of the items where striving for very good docs/vignettes can be "almost detrimental".

Anyways: glad we could solve the un-reproducible ERROR!

@lshep lshep added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean labels Mar 27, 2024
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package for an indepth review.
Please respond accordingly to any further comments from the reviewer.

@lazappi
Copy link

lazappi commented Apr 10, 2024

Hi @ldammer

Thanks for submitting mosdef 🎉! Below is my review of your package. Please reply here if anything is unclear or needs any further explanation.

What next?

Please address the comments as best as you can. When you are ready for me to check the package again please reply to let me know with a summary of changes you have made or any other responses. You can use the "Quote reply" option in the ... menu on this comment to reply directly to my points below.

Luke

Review

Key: 🚨 Required ⚠️ Recommended 🟢 Optional ❓ Question

General package development

  • ⚠️ There are some notes in the build report, please check these and address as many as possible
  • ⚠️ There are also some notes from BiocCheck::BiocCheck()
  • 🟢 Building the package was quite slow for me (I think because of the vignette). If the build system doesn't complain it's probably fine but something to keep in mind.

DESCRIPTION file

  • 🟢 There is already at least one biology related package with this name (https://mosdef.org/). It's not super related so probably ok but maybe consider something else.
  • ⚠️ Packages in the next release should depend on R (>= 4.4.0)

NAMESPACE file

  • 🚨 The naming of some function is inconsistent (x2y vs x_2_y). x_2_y or x_to_y is probably more consistent with the other functions but the main thing is that they match.
  • ❓ I'm not sure if all of the functions need to be exported? It seems like some of them are only meant to be used internally, maybe check.

Package data

  • ⚠️ The dataset documentation could be improved/expanded. The inst/script folder is used for documenting internal data but exported data should be documentd in the help file.

Documentation

README

  • 🚨 Please add Bioconductor installation instructions to the README

Vignette

  • ⚠️ The goseq chunk is disabled and you use a cached result. This is ok if it is clear but could lead to issues in the future so I would try to avoid it if possible.
  • 🚨 Please move the installation instructions to a separate Installation section
  • 🟢 I'm not sure how useful the DE section of the vignette is, given this is not part of the functionality of the package. Perhaps you could include the results as a dataset in the package and start from there?
  • ⚠️ I think more detail could be added to the vignette, particularly things like how to interpret the output.
  • ⚠️ Parts of the vignette are quite slow to run. Could you speed it up somehow (maybe by using a smaller example dataset)?
  • ⚠️ Some of the rendered tables are too wide for the vignette format. Maybe worth looking at (and why it's only some of them).

Code

R

  • 🟢 You mention future extension for {edgeR} output, how difficult would it be to do now? If it's not too much work I would encourage you to do it now as it would ~double your user base. Support of other formats (SummarizedExperiment, data.frame) would also be great if possible.
    • Even if you don't want to do this now it might be worth considering modifying argument names to be more general to make this easier in the future
  • ⚠️ Some of the function names/arguments could be clearer to better communicate what they do
  • 🚨 Please remove commented code that isn't used
  • ⚠️ I think you need to assign the output of match.arg() for it to work as expected. Some of your argument checks also seem redundant with match.arg().
  • ❓ Are the deseqresult2DEgenes() and deseqresult2df() functions redundant? The seem very similar to me.
  • 🟢 It's greate to have the checking functions but I would expect them to give an error/warning if something it not valid, not just a message (similar to validObject())
  • ⚠️ I think it would be useful to have a argument for the de_volcano to set the p-value threshold
  • ⚠️ A lot of the code in the enrichment functions is very similar. Could this be moved to a helper function(s) and reused?

@federicomarini
Copy link

Hi @lazappi , thanks for your comprehensive review, we'll get right onto it to improve it following your suggestions!

@ldammer and I will get back to you once we have addressed the issue, and we will do so in a separate branch+PR so that it can become easy for you to review in a glance the changes we introduced.

Federico

@federicomarini
Copy link

As for your point of accommodating the edgeR framework, which can take a while to implement properly, we are nonetheless trying to think ahead, and have some of the deseq-centric parameters renamed.

Mainly, this is our dds object, and we already did some brainstorming on some possible new names for that.
We thought of...

  1. data_de - not too bad in our mind, but possibly confused with res_de?
  2. data - a bit too generic, and also risky because data() is also a function
  3. exp_data - it is expression data, be it whatever container, probably leading to think that it could be "anything"?
  4. x - "old but gold", being (together with object) one of the classical ways of handling argument dispatch. Why not object, it is because often we'd need also the res_de
  5. de_container: just like x, but a bit more specific in telling users what should be in it. Which, of course, will be a job shared with the proper param entry in the documentation.

So far x and de_container are our favorites. I brought this up here because I would like to pick your brain a bit on this, so that there's another thinking head in this choice that could ideally be mid-long term affecting.

Thanks in advance for your input 😉

@lazappi
Copy link

lazappi commented Apr 11, 2024

This is really up to you but my preference is towards the clearest, most explicit naming. So either the object type (if that is required) or a description of what the object contains (e.g. something like de_results and expr_data in this case). You could also use S3 dispatch with methods for different classes if you prefer, which can be more work to set up but can also help with organising/modularising code.

@federicomarini
Copy link

You could also use S3 dispatch with methods for different classes if you prefer, which can be more work to set up but can also help with organising/modularising code.

That is the long term plan, which is why going the x way is probably something that kind of respects already the tradition of how people do that. The only "disadvantage" would be that x is not too explicit to specify what is needed - and we also recognize the value of the explicitness of the parameter name.
In the end, if we are clever in designing the functions, one would not even need to specify the parameter (if it is handled in the "positional" manner).

Thanks for sharing your thoughts, we will proceed with that in mind - and a slight preference for de_container 😉

@ldammer
Copy link
Author

ldammer commented Apr 12, 2024

Thank you again for your review @lazappi !
We addressed the points you raised in a separate branch, bioc_review, which we just merged into the devel branch (link to the PR for you to check all things at once: imbeimainz/mosdef#11.

You can find below our comments, in a point by point fashion for clarity

Review

Key: 🚨 Required ⚠️ Recommended 🟢 Optional ❓ Question

General package development

  • ⚠️ There are some notes in the build report, please check these and address as many as possible

Thanks for pointing this out.
Two notes are basically unavoidable as we decided to have the longtests set up, which requires us to have the extra folder, and the .BBSoptions for the Bioc build system.
The only remaining one is about the doc folder being bulky, but it is only related to the size of the vignette. We did some work to cut that down as well, but it still remained a bit above the 5mb limit. We hope this is not too much of a problem.

  • ⚠️ There are also some notes from BiocCheck::BiocCheck()

We did some changes to chop down the number of lines which were extra long, both in the code and in the documentation. As for the non-multiple of 4, we have been keeping consistently the tab-as-two-spaces setting on. The suppressMessages was a deliberate choice to have the messages of topGO displayed, as it tends to be pretty dominant (especially compared to the other companion functions). We are also both registered at the support site, so this should go away once run from the Bioc build systems.

  • 🟢 Building the package was quite slow for me (I think because of the vignette). If the build system doesn't complain it's probably fine but something to keep in mind.

We removed some dependencies we realised were unnecessary as well as have the tests run a bit quicker by using mock examples. We also shaved some time off of the vignette all of which should save a bit of time.
The enrichment functions do take most time to run, but we opted for a balance of using real data in the vignette, and moving the long running tests to the longtests folder. We hope that these changes solve the issue!

DESCRIPTION file

  • 🟢 There is already at least one biology related package with this name (https://mosdef.org/). It's not super related so probably ok but maybe consider something else.

We also saw this other project was existing, a bit after we started having mosdef encoded in our minds for this one package. When we realized that there were no substantial overlap that could confuse users of either software/project (which also use two different language implementations), we opted for keeping the original name as planned.

  • ⚠️ Packages in the next release should depend on R (>= 4.4.0)

Changed as suggested!

NAMESPACE file

  • 🚨 The naming of some function is inconsistent (x2y vs x_2_y). x_2_y or x_to_y is probably more consistent with the other functions but the main thing is that they match.

Thanks for pointing that out. These were remnants of the process of the moving/rewriting functions from other packages that fell through the cracks.
Indeed, the new names do look more homogeneous, and we think this will lead to less confusion for the future users.
This was also coupled to renaming some of the parameters, to follow a more harmonic way of labelling them - see point below for this.

  • ❓ I'm not sure if all of the functions need to be exported? It seems like some of them are only meant to be used internally, maybe check.

Good point, we did check the whole exported codebase. Although some of these functions seem to be only used internally in mosdef for now, they are meant to be re-used (by being exported) in the context of other existing packages (i.e pcaExplorer, GeneTonic, ideal).

Package data

  • ⚠️ The dataset documentation could be improved/expanded. The inst/script folder is used for documenting internal data but exported data should be documentd in the help file.

We added more detail to the creation and detail on the data objects in the mosdef-data.R file. We hope that these changes resolve the issue you raised about in this regard!

Documentation

README

  • 🚨 Please add Bioconductor installation instructions to the README

Done, we prioritized that over the GitHub approach!

Vignette

  • ⚠️ The goseq chunk is disabled and you use a cached result. This is ok if it is clear but could lead to issues in the future so I would try to avoid it if possible.

Thanks for this comment - we were aware that using a cached results would not be the most robust choice. Thanks to the changes we did to the rest of the codebase, we managed to save a little bit of time that we re-invested in setting this chunk to eval=TRUE, which was indeed our intention in the first place.

  • 🚨 Please move the installation instructions to a separate Installation section

We did that, thanks for the suggestion!

  • 🟢 I'm not sure how useful the DE section of the vignette is, given this is not part of the functionality of the package. Perhaps you could include the results as a dataset in the package and start from there?

Good point! We added some text saying that users experienced with the general DE workflow can move over to the following sections, that describe now more in detail the actual mosdef functionality. Still, for completeness we kept the section in (as a guarantee to work with the same object) - adding the computed dds results would have ended up in a too large package size.

  • ⚠️ I think more detail could be added to the vignette, particularly things like how to interpret the output.

A very good point, thanks for highlighting that. We added explanations for the outputs returned by the enrichment functions, for each algorithm detailing what exactly the columns contain. Moreover, we added extra detail on how their output can be picked up later in other functions in the mosdef package.

  • ⚠️ Parts of the vignette are quite slow to run. Could you speed it up somehow (maybe by using a smaller example dataset)?

Thanks for your comment on this. We did improve a bit overall, with the bottleneck that is put in place by actually running the enrichment functions (see also comment above) on real sized data, which we consider highly beneficial for a vignette reader. We did try to run them on smaller subsets and/or reducing the number of genes to include, but only obtained negligible time savings.

  • ⚠️ Some of the rendered tables are too wide for the vignette format. Maybe worth looking at (and why it's only some of them).

Thanks for this suggestion, we implemented a solution for this by means of the scrollX option for the DT datatable objects.

Code

R

  • 🟢 You mention future extension for {edgeR} output, how difficult would it be to do now? If it's not too much work I would encourage you to do it now as it would ~double your user base. Support of other formats (SummarizedExperiment, data.frame) would also be great if possible.

This is a very good point, that indeed would mean a substantial extension of our user base. We jumped on your suggestion and, even if the full support for edgeR and co is not there yet, we restructured the code of the whole package to make it (API-wise) framework-agnostic.
This means:

  • our dds parameter became a de_container, for which now there’s full support of the DESeqDataset objects
  • we kept the res_de as we thought it was already not too tightly related to the DESeq2 way; but we specify that the DESeqResults is just one of the object types being supported. As of now, the only one, but we will extend this to the other frameworks.
  • as a side effect, some functions were also renamed accordingly (mosdef_dds_check to mosdef_de_container_check)

We did see that some of the latest changes in edgeR itself went into a similar direction, but we agree that there can only be major benefits to supporting this other main framework.
Just as an historical note: we did start from a set of functionality centered around the DESeq2 framework because of its widespread adoption in our & neighbour groups 🙂But for sure, there is beauty in the interoperability that you suggest!

  • Even if you don't want to do this now it might be worth considering modifying argument names to be more general to make this easier in the future
  • ⚠️ Some of the function names/arguments could be clearer to better communicate what they do

We renamed some parameters (see also the comment above) to better fit them to more broad application (framework-agnostic) and renamed some parameters we thought were unclear. Please let us know if there are any other parameters you had in mind that warrant further specification.

  • 🚨 Please remove commented code that isn't used

Done!

  • ⚠️ I think you need to assign the output of match.arg() for it to work as expected. Some of your argument checks also seem redundant with match.arg().

This is now implemented as suggested, thanks for catching this! And yes, some checks are now removed as they are elegantly handled by these changes.

  • ❓ Are the deseqresult2DEgenes() and deseqresult2df() functions redundant? The seem very similar to me.

We did remove the “more specific” function to keep the one with a broader applicability.

  • 🟢 It's greate to have the checking functions but I would expect them to give an error/warning if something it not valid, not just a message (similar to validObject())

Good suggestion, we did elevate the failed checks to a warning instead of leaving that only as a message!

  • ⚠️ I think it would be useful to have a argument for the de_volcano to set the p-value threshold

Good suggestion, readily implemented. Sorry for missing that when we hard-coded that parameter in.

  • ⚠️ A lot of the code in the enrichment functions is very similar. Could this be moved to a helper function(s) and reused?

Thanks for the comment on this, we also realized that indeed some chunks were good candidates for elegant refactoring. We implemented that by having a function that better handles the information printing (depending on what parameters are being passed). This of course can (and should) continue as we will implement the support for other frameworks (see also the comment above).


Please let us know if we need to do some additional work. Thank you again for the feedback!

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: c77fd7eb4f6ddf9c96428e6059811ab693732c6a

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): mosdef_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/mosdef to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@lazappi
Copy link

lazappi commented Apr 12, 2024

  • ⚠️ There are some notes in the build report, please check these and address as many as possible

Thanks for pointing this out. Two notes are basically unavoidable as we decided to have the longtests set up, which requires us to have the extra folder, and the .BBSoptions for the Bioc build system. The only remaining one is about the doc folder being bulky, but it is only related to the size of the vignette. We did some work to cut that down as well, but it still remained a bit above the 5mb limit. We hope this is not too much of a problem.

I will ask the core to reply here but I think we usually pretty strict about the size limit

  • 🟢 Building the package was quite slow for me (I think because of the vignette). If the build system doesn't complain it's probably fine but something to keep in mind.

We removed some dependencies we realised were unnecessary as well as have the tests run a bit quicker by using mock examples. We also shaved some time off of the vignette all of which should save a bit of time. The enrichment functions do take most time to run, but we opted for a balance of using real data in the vignette, and moving the long running tests to the longtests folder. We hope that these changes solve the issue!

There is also a warning from the build system about the build time. We can ask the core about this as well but it would be good to get it under 10 minutes if possible.

  • ⚠️ The dataset documentation could be improved/expanded. The inst/script folder is used for documenting internal data but exported data should be documentd in the help file.

We added more detail to the creation and detail on the data objects in the mosdef-data.R file. We hope that these changes resolve the issue you raised about in this regard!

Pointing to the scripts/ folder to more information is very useful as this isn't available to the the user. You should either include that information in the documentation or provide a link to where it is available. Alternatively, if these datasets are just intended for tests/examples and not users they could be moved to internal data and then it would be fine to document them in inst/script/.

The @format tag should describe the dataset after it is loaded, not the on-disk format. See https://roxygen2.r-lib.org/articles/rd-other.html for examples.

  • ⚠️ Parts of the vignette are quite slow to run. Could you speed it up somehow (maybe by using a smaller example dataset)?

Thanks for your comment on this. We did improve a bit overall, with the bottleneck that is put in place by actually running the enrichment functions (see also comment above) on real sized data, which we consider highly beneficial for a vignette reader. We did try to run them on smaller subsets and/or reducing the number of genes to include, but only obtained negligible time savings.

I think this might be worth looking into again given the build warning but we can see what the core says.

  • ⚠️ A lot of the code in the enrichment functions is very similar. Could this be moved to a helper function(s) and reused?

Thanks for the comment on this, we also realized that indeed some chunks were good candidates for elegant refactoring. We implemented that by having a function that better handles the information printing (depending on what parameters are being passed). This of course can (and should) continue as we will implement the support for other frameworks (see also the comment above).

Some of the checking code could maybe also be refactored, but up to you.

Apart from a couple of minor things which I think should be easy for you to fix I am pretty happy with this now. We just need to see what the core says about the size/time.

@lazappi
Copy link

lazappi commented Apr 12, 2024

@lshep Can you please give us an official opinion on the size and checking time warnings in the build report? Thanks!

@federicomarini
Copy link

Thanks for the swift reply!
For some of the points you mentioned, we were already in touch before initiating the submission process with @lshep about the runtime, so she might be the best person to ask as she is already somehow in the loops!

  • ⚠️ There are some notes in the build report, please check these and address as many as possible

Thanks for pointing this out. Two notes are basically unavoidable as we decided to have the longtests set up, which requires us to have the extra folder, and the .BBSoptions for the Bioc build system. The only remaining one is about the doc folder being bulky, but it is only related to the size of the vignette. We did some work to cut that down as well, but it still remained a bit above the 5mb limit. We hope this is not too much of a problem.

I will ask the core to reply here but I think we usually pretty strict about the size limit

Interestingly, one note might be due to an overflow of as.data.frame.numeric and similar warnings that seem to be there (hopefully temporary).
Maybe this gets better/goes away once that is solved?

  • 🟢 Building the package was quite slow for me (I think because of the vignette). If the build system doesn't complain it's probably fine but something to keep in mind.

We removed some dependencies we realised were unnecessary as well as have the tests run a bit quicker by using mock examples. We also shaved some time off of the vignette all of which should save a bit of time. The enrichment functions do take most time to run, but we opted for a balance of using real data in the vignette, and moving the long running tests to the longtests folder. We hope that these changes solve the issue!

There is also a warning from the build system about the build time. We can ask the core about this as well but it would be good to get it under 10 minutes if possible.

For that, we were indeed in touch with Lori and this was the driver to set up the longtests. We do understand that the bbs has a humongous daily task to take care of, and so far we are very close to being below 10 minutes for the whole process (only 22 seconds too much 😉 ).

Happy to adapt some more parts if this is required!

  • ⚠️ The dataset documentation could be improved/expanded. The inst/script folder is used for documenting internal data but exported data should be documentd in the help file.

We added more detail to the creation and detail on the data objects in the mosdef-data.R file. We hope that these changes resolve the issue you raised about in this regard!

Pointing to the scripts/ folder to more information is very useful as this isn't available to the the user. You should either include that information in the documentation or provide a link to where it is available. Alternatively, if these datasets are just intended for tests/examples and not users they could be moved to internal data and then it would be fine to document them in inst/script/.

Sorry, this was probably not clearly specified in these lines.
The scripts folder is indeed already in the inst/scripts folder. So we are indeed referring to a folder that does get installed with the package.

The @format tag should describe the dataset after it is loaded, not the on-disk format. See https://roxygen2.r-lib.org/articles/rd-other.html for examples.

Good point, sorry we missed that.
We will update that right away.

  • ⚠️ Parts of the vignette are quite slow to run. Could you speed it up somehow (maybe by using a smaller example dataset)?

Thanks for your comment on this. We did improve a bit overall, with the bottleneck that is put in place by actually running the enrichment functions (see also comment above) on real sized data, which we consider highly beneficial for a vignette reader. We did try to run them on smaller subsets and/or reducing the number of genes to include, but only obtained negligible time savings.

I think this might be worth looking into again given the build warning but we can see what the core says.

Ok, we will stay tuned!

  • ⚠️ A lot of the code in the enrichment functions is very similar. Could this be moved to a helper function(s) and reused?

Thanks for the comment on this, we also realized that indeed some chunks were good candidates for elegant refactoring. We implemented that by having a function that better handles the information printing (depending on what parameters are being passed). This of course can (and should) continue as we will implement the support for other frameworks (see also the comment above).

Some of the checking code could maybe also be refactored, but up to you.

True to that - we will invest more time and thoughts, especially upon expanding of the input types we will support.
Likely the first task in the coming devel cycle, to avoid doing it on a rush...

Apart from a couple of minor things which I think should be easy for you to fix I am pretty happy with this now. We just need to see what the core says about the size/time.

Thank you, this is good to hear. We will wait for the official core response on those aspects!

Leon & Federico

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 94350cb1810defd969319a9fac1a653633cbd767

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): mosdef_0.99.2.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/mosdef to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 0f374a1d5909e2db5d10359a5b606d705fd2d964

@ldammer
Copy link
Author

ldammer commented Apr 22, 2024

Hi @lazappi, while waiting for the judgement of the core team, we actively kept on working on shaving down the size and time of the building and checking.

I am happy to say we managed to squeeze things in without losing any of the functionality/the user experience in the vignette.

As a summary to our interactions above:

  • The doc folder is now much better behaving, reducing the number of rows included in the interactive tables did the job mostly. The "individual file size" warning should be gone now.
  • Some of the examples got shortened on some parts that were not entirely needed, so this should take us below the 10 minutes limit.
  • Some previous commits solved the issue for the @format tag of the data objects

Since we are close to the deadline for inclusion in the upcoming release, we would be very grateful if you can check our latest edits - We tried our best to let the core team focused on all the other pending activities before release.

Leon & Federico

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): mosdef_0.99.3.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/mosdef to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added OK and removed ERROR labels Apr 22, 2024
@lazappi
Copy link

lazappi commented Apr 22, 2024

Thanks for your work on reducing the build time, I can see the build system is happy with it now.

The only thing I'm not happy with is documenting the data objects in inst/scripts/. As I said previously, this is not the correct way to document exported datasets. Either you need to convert these to internal datasets or you need to complete the documentation in the man page. I'm happy for you to point to somewhere else that describes how the datasets were constructed but saying there is something in inst/scripts/ is not sufficient. It might be installed with the package but I don't think there is any easy way for a user to view it (I don't know how).

@federicomarini
Copy link

Thanks Luke for the reply!

I have used a couple of other times the equivalent of

file.edit(system.file("scripts", "create_mosdef_data.R", package = "mosdef"))

to access such scripts directly in the IDE. Would it be enough to specify this somewhere in their man pages (as well as in the vignette for completeness)?

Otherwise: What do you mean with "complete the documentation in the man page"? The only thing I can think of for that would be unevaluated code in the example for each object. If there is a better way we can follow as an example, we'd be happy to try that way.

Federico

@federicomarini
Copy link

For more context - we tried to follow the way described here -> https://contributions.bioconductor.org/docs.html#doc-inst-script (specifically, §12.4 and §12.3.3)

@lazappi
Copy link

lazappi commented Apr 23, 2024

For more context - we tried to follow the way described here -> contributions.bioconductor.org/docs.html#doc-inst-script (specifically, §12.4 and §12.3.3)

You will notice that 12.4 says "if data was included in the inst/extdata/ directory" which agrees with what I said. This kind of documentation is for internal data, not exported datasets. In fact, the R packages book suggests using data-raw/ for these kinds of scripts for exported data so they are not bundled with the package.

I would be ok with instructions for how to access the script file being in the documentation but I would prefer a link to GitHub (or somewhere) rather than opening the file locally. Alternatively, you could include the code directly using the @source tag. By "complete" I meant including this source information (you currently don't have an @source tag at all).

@federicomarini
Copy link

Sorry for misunderstanding this!
We will adjust the details with the link to the GitHub location for the script - and also learned the @source tag for Roxygen, thanks for pointing this out.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 18aa812e52d136ffcd131d299b6f9f3a4c9c2380

@ldammer
Copy link
Author

ldammer commented Apr 23, 2024

Hi @lazappi, we addressed this last point (see PR here: imbeimainz/mosdef#14)
Sorry again for the confusion! Now we added a @source tag, with a description and a link to the script directly on the persistent version on GitHub. Users can have a look at that with a simple click - and without the need to open the file locally.
We think this should solve the last point that needed to be addressed.
Thank you again for your valuable comments in the whole process. We think that the package improved a lot throughout the review!

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): mosdef_0.99.4.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/mosdef to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@lazappi
Copy link

lazappi commented Apr 24, 2024

Thanks for making this change! I'm still not sure this script belongs in inst/scripts but I am happy that users will now be able to view it if they are interested. I will accept the package now.

@lazappi lazappi added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels Apr 24, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor
community. If you are interested in becoming a Bioconductor package
reviewer, please see Reviewers Expectations.

@lazappi
Copy link

lazappi commented Apr 24, 2024

Congratulations on getting {mosdef} accepted into Bioconductor 🎉! It can take a couple of days to be picked up by the build system but then it should be available in Bioconductor devel and the next release.

@federicomarini
Copy link

Thanks for reviewing this in detail @lazappi ❤️
We will take all the suggested & optional feedback and use it in the further developments of the package!
Leon & Federico

@lshep
Copy link
Contributor

lshep commented Apr 25, 2024

The default branch of your GitHub repository has been added to Bioconductor's
git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/ldammer.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("mosdef"). The package 'landing page' will be created at

https://bioconductor.org/packages/mosdef

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

@lshep lshep closed this as completed Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK
Projects
None yet
Development

No branches or pull requests

5 participants