Contributing

Matt Dowle edited this page Sep 11, 2018 · 10 revisions

Filing issues

  • Please read and follow all the instructions at Support before filing; e.g. check NEWS first and search existing Issues.
  • Please use tags rather than words at the beginning of titles; e.g. [feature request], [bug].
  • One issue for one purpose. Don't report more than one bug or request several features in the same issue.

Filing issues is contributing. Thank you!

Pull Requests (PRs)

If you are not fixing an open issue and you are confident, you do not need to file a new issue before submitting the PR. It's easier for us to accept and merge a self-contained PR with everything in one place. If discussion is needed, it can be done in the PR. However, the PR's status must be passing tests before we will start to look at it. So, before you spend the time getting to that stage, it may save you time to create an issue first and start a discussion to see if your idea would be accepted in principle. If you are going to spend more than a day on the PR, creating an issue first lets other people know you are working on it to save duplicating effort.

  1. Every new feature or bug fix must have one or more new tests in inst/tests/tests.Rraw; see below for a bit more on how this file works. You must please check that the tests fail without the fix, since the build system only checks that the new test passes with the fix, which is not sufficient. For example, run your new test with the current DEV version and verify that it actually fails.
  2. Unless the change is trivial (e.g. typo fix) there must be a new entry in NEWS. Please thank yourself by name and include what the thanks are for. Follow the prevailing style at the top of the file; e.g. "Problem with X in Y circumstance is fixed, [#123](issue link). Thanks to (them) for reporting and (me) for fixing, [PR#145](PR link)". These are the release notes that others quickly skim and search so please use relevant helpful keywords with that in mind.
  3. Please create the PR against the master branch. You can do that by forking the repository, creating a new branch for your feature/bugfix in the forked project, and then using that as a base for your pull requests. After your first successful merged PR you will very likely be invited to be a project member. This will allow you to create your next branch directly in the project which is easier and more convenient than forking.
  4. Just one feature/bugfix per PR please. Small changes are easier to review and accept than big sweeping changes. Sometimes big sweeping changes are needed and we just have to discuss those case by case.
  5. You do not need to separate formatting-only changes. Just make the format changes in your PR. When the PR is passing tests and we look at the PR's unified diff, we will subtract the formatting-only changes and make those to master directly for you. That will reduce your PR to logic changes only so that it can be more easily reviewed now and easier to look back on in future.
  6. GitHub enables us to squash commits together when merging, so you don't have to squash yourself.
  7. Your pull request's description is the place to put any benchmark results and be a bit more verbose than the entry in the NEWS file, if you think that's appropriate. Include text "Closes #ISSUE_NUM" (case insensitive but the space must be present) for GitHub to link and close the corresponding issue when the PR is merged. If multiple issues are being closed, add that many "Closes #ISSUE" lines.
  8. Ensure that all tests pass by typing test.data.table() after installing your branch. It's also better to R CMD check --as-cran against your branch source package archive .tar.gz file. You may want to add --no-manual, --no-build-vignettes or --ignore-vignettes (R 3.3.0+) options to reduce dependencies required to perform check. PRs with failed tests can't be merged and it is hard to debug every PR and explain why it fails and how to fix it. The lesser the feedback required, the faster it is likely to be merged. Matt has added his dev cycle script here and Pasha has added a Makefile here.

Example of a good pull request: PR#2332. It has a NEWS entry. It passed existing tests and added a new one. One test was removed but the PR description clearly explained why upfront (without us having to ask). Benchmark results were included, which made the need for the change compelling. We didn't need to run anything ourselves. Everything was including in one PR in one place. In short, it was a pleasure to review and merge.

Testing

data.table uses a series of tests to exhibit code that is expected to work. These are stored in inst/tests/tests.Rraw. They come primarily from two places -- when new features are implemented, the author constructs minimal examples demonstrating the expected common usage of said feature, including expected failures/invalid use cases (e.g., the initial assay of fwrite includes 28 tests). Second, when kind users such as yourself happen upon some aberrant behavior in their everyday use of data.table (typically, some edge case that slipped through the cracks in the coding logic of the original author). We try to be thorough -- for example there were initially 141 tests of split.data.table, and that number has since grown!

When you file a pull request, you should add some tests to this file with this in mind -- for new features, try to cover possible use cases extensively; for bug fixes, include a minimal version of the problem you've identified and write a test to ensure that your fix indeed works, and thereby guarantee that your fix continues to work as the codebase is further modified in the future. We encourage you to scroll around in tests.Rraw a bit to get a feel for the types of examples that are being created, and how bugs are tested/features evaluated.

Using test

The function signature of test is test(num, x, y, error=NULL, warning=NULL, output=NULL):

  • num is a unique identifier for a test, helpful in identifying the source of failure when testing is not working. Currently, we use a manually-incremented system with tests formatted as n.m, where essentially n indexes an issue and m indexes aspects of that issue. For the most part, your new PR should only have one value of n (scroll to the end of tests.Rraw to see the next available ID) and then index the tests within your PR by increasing m.

  • x is an input object to be evaluated, y is the pre-defined output against which you are testing x. For example, to check that sum is working, you might set x = sum(1:5) and y = 15.

  • error: when you are testing behavior of code that you expect to fail with an error, supply the expected error message to this argument. error is interpreted as a regular expression, so you can be abbreviated, but try to include the key portion of the error so as not to accidentally include a different error message.

  • warning is the same as error, in the case that you expect your code to issue a warning. Note that since the code evaluates successfully, you should still supply y.

  • Use output if you are testing the printing/console output behavior of some feature. Again, regex-compatible.

References: If you are not sure how to issue a PR, but would like to contribute, these links should help get you started:

  1. How to Github: Fork, Branch, Track, Squash and Pull request.
  2. Squashing Github pull requests into a single commit.
  3. Github help - you'll need the fork and pull model.

Minimal first time PR

$ cd /tmp      # or anywhere safe to play
$ git config --global core.autocrlf false   # Windows-only preserve \n in test data
$ git clone https://github.com/Rdatatable/data.table.git
$ cd data.table
$ R CMD build .
$ R CMD check data.table_1.11.5.tar.gz
...
Status: OK

Congratulations - you've just compiled and tested the very latest version of data.table in development. Everything looks good. Now make your changes. Using an editor of your choice, edit the appropriate .R, .md, NEWS and tests.Rraw files. Test your changes :

$ R CMD build .
$ R CMD check data.table_1.11.5.tar.gz

Fix the problems and repeat the build and check steps until you get Status: OK. Now commit the change and push. Since this is a first time PR and you're not a project member, this step should automatically ask you if you wish to fork the project. Say 'yes'. If that's not the case, please edit this wiki page to show what exactly happens for non project members.

$ git commit -am "Added/fixed something in somewhere"
$ git push

After your first successful non-trivial PR you'll likely be invited to be a project member. When you're a member, before making your changes, you would create a branch first :

$ git checkout -b my_new_branch

and then the commit and push shown above would push to the branch in the main project. The next time you refresh the GitHub page in your browser, a button appears which you can click to create the PR from the branch. And that's all there is to it.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.