Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

motifmatchr #293

Closed
8 tasks done
AliciaSchep opened this issue Mar 1, 2017 · 39 comments
Closed
8 tasks done

motifmatchr #293

AliciaSchep opened this issue Mar 1, 2017 · 39 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution ERROR WARNINGS

Comments

@AliciaSchep
Copy link

AliciaSchep commented Mar 1, 2017

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Subversion
    (optionally via GitHub).

For help with submitting your package, please subscribe and post questions
to the bioc-devel mailing list.

@bioc-issue-bot
Copy link
Collaborator

Hi @AliciaSchep

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: motifmatchr
Type: Package
Title: Fast Motif Matching in R
Version: 0.99.0
Date: 2017-02-25
Authors@R: c(
    person("Alicia", "Schep", email = "aschep@gmail.com", role = c("aut","cre")),
    person("Stanford University", role = "cph"))
Maintainer: Alicia Schep <aschep@gmail.com>    
Description: Quickly find motif matches for many motifs and many sequences. 
    Wraps C++ code from the MOODS motif calling library, which was developed
    by Pasi Rastas, Janne Korhonen, and Petri Martinmäki.
License: GPL-3 + file LICENSE
Imports:
    Matrix,
    Rcpp,
    methods,
    TFBSTools,
    Biostrings,
    BSgenome.Hsapiens.UCSC.hg19,
    S4Vectors,
    SummarizedExperiment,
    GenomicRanges,
    IRanges
Depends:
    R (>= 3.3)
Suggests:
    testthat,
    knitr,
    rmarkdown
biocViews: MotifAnnotation
LinkingTo: Rcpp, RcppArmadillo
SystemRequirements: C++11
RoxygenNote: 6.0.1
VignetteBuilder: knitr
Encoding: UTF-8


@bioc-issue-bot
Copy link
Collaborator

Your package has been approved for building. Your package is
now submitted to our queue.

IMPORTANT: Please read the instructions for setting
up a push hook on your repository, or further changes to your
repository will NOT trigger a new build.

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation labels Mar 1, 2017
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170301030440.html

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

a704a3a make cleanup.win executable

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170301145041.html

@lawremi
Copy link

lawremi commented Mar 1, 2017

This package looks really nice. The assay components should probably have the same name as the accessors, for clarity. So $motif_scores instead of $scores. For match_motifs() why not just make the default of bg be explicit, instead of NULL? I think get_nuc_freqs() could just pass as.prob=TRUE to letterFrequency(). In which case, do you really need get_nuc_freqs()? Could just make a character method for letterFrequency(). You do not need to use paste0() with stop() because it already does that. Could match_motifs_helper() be simplified if SummarizedExperiment() treated a rowRanges=NULL as missing? Maybe we could make that change. The docs say it returns a GRanges when out="positions" but I think it actually returns a list, one per motif matrix. What about stacking those GRanges with a meta column indicating the matrix for each result?

@AliciaSchep
Copy link
Author

AliciaSchep commented Mar 2, 2017

Thanks for the feedback. I will update the assay components, and remove the unneccessary paste0 in stop calls.

For the bg argument, I can see that being more explicit would be better, although I think the reason I hadn't done this was because if the input is a GenomicRanges then the sequence is first extracted from BSgenome before being passed to get_nuc_freqs. Perhaps an alternative for greater clarity would be to have a default value like "input" to indicate that frequencies are taken from input (and if the argument is not that it has to be a vector)?

There is a small difference between get_nuc_freqs and letterFrequency with the argument as.prob = TRUE, in that if there are any "N" or other non ACGT basepairs the frequencies won't sum to 1 with letterFrequency, but they will with get_nuc_freqs. Perhaps I should document the function.

Having SummarizedExperiment treat rowRanges=NULL as missing would help make the code simpler! I think it'd be a nice feature in general.

For the docs for out=positions, I will update the documentation to be correct. I would prefer to output some kind of list rather than collapsing into one GRanges, but perhaps a GenomicRangesList would be better than a base list?

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

45d347a version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170308204302.html

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

8b40068 updated date
51cafc8 update readme to point to ref

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170309005940.html

@lawremi
Copy link

lawremi commented Mar 9, 2017

For match_motifs() (and any other generic with ... in the formals), the arguments that fall into ... do not need to be the same across methods. Consistency is a good thing, of course, but they could have different defaults for e.g. the bg argument. Right now most of the match_motifs() methods are completely ignoring e.g. genome, and the GenomicRanges method is ignoring ranges, which is confusing. You could change the signature of the GenomicRanges method to include subject="BSgenome", or even subject="ANY" since getSeq() is generic with many methods, then drop the genome argument from all methods. Then the user calls match_motifs(pwms, genome, ranges=gr) instead of match_motifs(pwms, gr, genome=genome) and the method formals are cleaner across the board.

But there is another inconsistency in that the background is estimated across the entire subject for most methods, but only within ranges (or subject currently) for the method in question. It seems like that behavior should be consistent across methods. Probably should just make the proposed BSgenome method like the others and compute the background across the entire genome. To allow for restricted background calculation, you could add a BSgenomeViews method, and have ranges default to ranges(subject). So the behavior of the GenomicRanges method could be achieved with: match_motifs(pwms, Views(genome, gr)).

@AliciaSchep
Copy link
Author

I disagree that the background is inconsistent among the methods. When supplying a DNAStringSet or character vector as the input, the background is taken across the input for which the motif matching is called. This is the same as when giving ranges and a genome-- the relevant DNA that is being assessed is what is being used. I think this makes sense in many applications, especially when looking at something like ATAC-seq or DNase-seq peaks where there is some GC bias in the assay and thus peaks and you want the background to be the sequence of the peaks themselves rather than the whole genome. The way I see it is that you could come to the match_motifs function with the peaks and have it extract the sequence or you might have already extracted the relevant sequences (perhaps you don't want to go through a BSgenome object or you want the sequences for something else). In those two scenarios, the behavior of match_motifs is identical, while with your proposed change it would not be. Of course, a user might in these cases or in other scenarios want to use the whole genome as background -- they could of course still be able to do that.

With the proposed change to make subject BSgenome and input ranges to the ranges argument, I think the bg argument could take as options: "input","genome", "even" or a numeric vector. "input" would be the behavior as it is now (and still the default). "genome" would take genome-wide frequencies (the character and DNAStringSet methods would also have a genome argument that would be optional and would be used for this purpose if provided and bg was set to "genome"). "even" would be 0.25 for all bases. I think this way of setting up would be consistent across inputs, but be more explicit than the current setup.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

96eb10b update r dependency
79cb80a make bg more explicit, update docs for more info o...

@AliciaSchep
Copy link
Author

To make the bg more explicit, I have updated it to take "subject","genome", or "even" as inputs or a numeric vector. I have added to the documentation to hopefully make this clearer. Now when the subject is character or DNAStringSet, a genome argument can also be given for use in computing genome-wide nucleotide frequencies if that is the desired method for background frequncies.

For the question of the "ranges" argument and GenomicRanges as subject, I think it would be confusing to change the GenomicRanges method to be a BSgenome method. When the subject is a character vector or DNAStringSet, the "ranges" input is meant to take as input the ranges that correspond to the input sequences (for either adding ranges to SummarizedExperiment outputs or making positions argument absolute rather than relative)-- no subsetting of those input sequences is done. For the proposed BSgenome method, ranges would be used to subset the genome, which would be inconsistent with what the "ranges" argument does in the other cases. Previously the purpose of the "ranges" argument was not very clearly explained in documentation — I have added some more explanation into the vignette and man page to hopefully make this more clear.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170322224327.html

@lawremi
Copy link

lawremi commented Mar 23, 2017

In match_motifs(), I would discourage having the genome default to hg19. It should try to get the BSgenome package for genome(subject) using BSgenome::getBSgenome(). Please remove the ranges=NULL argument from the formals for methods that do not use it. Currently, it's silently ignored which will lead to confusion.

It seems that the most natural object for the ranges + genome combination would often be a BSgenomeViews, so you might add a method for that.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

d1b95d8 version bump

@AliciaSchep
Copy link
Author

I removed default genome of hg19, referring instead to genome(subject). The genome argument allows either the genome build string (passed to BSgenome::getBSgenome()), a BSgenome object, DNAStringSet, or FaFile object. I have updated the documentation accordingly.

I removed the ranges=NULL where it was inappropriate.

I have also added a BSgenomeViews method.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, TIMEOUT, WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170328195303.html

@AliciaSchep
Copy link
Author

Hi @LiNk-NY,

I was wondering if there are further steps I should be taking before the package can be reviewed?

I asked about the WARNINGS on the bioc-devel mailing list and it was suggested I could probably ignore them.

The TIMEOUT issue appearing for MAC is new and confusing as the tests are fairly quick; I am not sure how to address that.

I have tried to address (in new code commits and/or comments) all of the helpful feedback from @lawremi.

Please let me know if there are additional steps I should be taking. Thanks!

@lawremi
Copy link

lawremi commented Mar 31, 2017

Good to go from my end. Nice contribution.

@LiNk-NY
Copy link

LiNk-NY commented Apr 3, 2017

Hi @AliciaSchep,
Sorry for the delay. I will review your package today.
Thanks for your valuable feedback @lawremi.

Regards,
Marcel

@LiNk-NY
Copy link

LiNk-NY commented Apr 3, 2017

Hi Alicia,
Thank you for submitting to Bioconductor! I have reviewed your package. Please see below.
If you have any questions, please feel free to contact me.
Regards,
Marcel


motifmatchr #293

Overall

Nicely written package. Not many issues with grammar to report.

Grammar

  • Avoid the use of underscores "_" in S4 generics and functions. Use camelCase naming style.
  • Perhaps, you could use matchMotif or motifmatchr as the main function name.

Vignette

  • Brief and informative 👍

Build issue

Please bump the version to start a clean build of the package.

@AliciaSchep
Copy link
Author

Thanks @LiNk-NY and @lawremi for the reviews!

For the question of "_" in the name, I had chosen match_motifs rather than matchMotifs to help distinguish it from the Biostrings function matchPWM. My (perhaps erroneous) understanding was that Bioconductor preferred camelCase, but that it wan't required (From the style guide: "These guidelines are really just preferences; they are not enforced."). I think motifmatchr would also be a reasonable name for the main function, my main reluctance to make the change is that some people are already using the package from github and that changing the names of the functions could lead to some confusion. I'd want to add some kind of deprecation notice for the current function names on the github version of the package, if that is allowed.

@LiNk-NY
Copy link

LiNk-NY commented Apr 10, 2017

Hi Alicia, @AliciaSchep

Yes, they are preferences but I tend to make exceptions when the system used supports the underscore use. For a recent package I reviewed, the underscores worked well with the S3 system. It's not very common in Bioconductor to see the S4 system with underscores.
Those are my thoughts but I would defer to Martin @mtmorgan on this.

Those who use your package from GitHub should expect there to be big changes as it is still in development.
A deprecation notice would only make sense to me if your package is already in development and you want to change names there. I don't think it would be necessary.

There are a couple of things that you could do to notify your alpha users:

  • include aliases (of old function names) that point to the new ones and have a note in the documentation
  • add a notice of the name changes to the NEWS file

Regards,
Marcel

@mtmorgan
Copy link
Contributor

I strongly encourage but do not require matchMotifs().

Deprecation is relatively painless and I'd encourage that, following the basic instructions on ?.Deprecated

@AliciaSchep
Copy link
Author

Thanks @LiNk-NY and @mtmorgan for the clarification. I will make the change to camelCase and add in a deprecation notice.

@LiNk-NY
Copy link

LiNk-NY commented Apr 25, 2017

Hi Alicia, @AliciaSchep
Any updates on the status of the changes? Much appreciated.
Regards,
Marcel

@AliciaSchep
Copy link
Author

Hi Marcel @LiNk-NY,

Sorry for the slowness -- I have made the changes locally but not yet pushed them. I have another package that "suggests" motifmatchr, chromVAR, that I plan to eventually also submit to Bioconductor (not quite ready yet, need to speed up some tests, spruce up docs a bit). Before pushing the changes to motifmatchr, I want to also update the vignettes for that package to the new motifmatchr method names as well as update all the methods in that package to camelCase as I imagine I'll be asked to do so when I eventually submit to Bioconductor. I hope to get that done in the next week or so, at which point I will push the changes to motifmatchr.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

d776c2b update to camelCase

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170518132655.html

@AliciaSchep
Copy link
Author

I have made the change to camelCase, adding in deprecation for all the old method names. The build is now showing an error for Windows; I will try to figure out why that might be although as it seems like a permission error I am not sure how to fix on my end.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

3f2981c version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/motifmatchr_buildreport_20170522143753.html

@LiNk-NY
Copy link

LiNk-NY commented May 30, 2017

Hi Alicia @AliciaSchep,

Thank you for making those changes. The package is looking good!

Minor note, be careful when using 1:n sequences. Use the more robust seq_len(n) or seq_along(vector).

Thank you for submitting to Bioconductor!

Best regards,
Marcel

@LiNk-NY LiNk-NY added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels May 30, 2017
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor svn repository and nightly builds. Additional
information will be sent to the maintainer email address in
the next several days.

Thank you for contributing to Bioconductor!

@AliciaSchep
Copy link
Author

Thanks Marcel @LiNk-NY

Can I still make changes to the github repo now or should I wait until the additional info from Bioconductor is sent, as mentioned by the issue bot?

@LiNk-NY
Copy link

LiNk-NY commented May 30, 2017

Hi Alicia @AliciaSchep,
You can still make changes to it.

Regards,
Marcel

@mtmorgan mtmorgan closed this as completed Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution ERROR WARNINGS
Projects
None yet
Development

No branches or pull requests

5 participants