Contributing to arXiv projects
Thanks for your interest in contributing to arXiv software development! Here is a quick run-through of things you should know before diving in.
How to help
We have a list of projects at https://github.com/orgs/arXiv/projects. We try to put things here that are relatively self-contained (i.e. don't require a ton of coordination with other projects). If you see a project that looks interesting to you, take a look at the associated issues as well as the README of the associated repository.
Recommendations for making contributions:
- Make sure that you are working from a ticket. If you see something that needs doing, and there isn't a ticket, please make one. This helps us to keep tabs on what we're doing.
- Keep contributions bite-sized. We try to split up work into small, manageable pieces. This makes it easier for multiple people to work together, and especially facilitates code review. If you find yourself adding more than a few dozen lines of code, the task might be too big. Consider splitting the work into multiple tickets that you can deliver separately.
- Don't hesitate to ask questions. With a project this size, it is impossible to anticipate everything that contributors will need to know. Please ask lots of questions (just comment on the open issues, or create a new issue), and we'll do our best to answer them.
For a high-level overview of the arXiv-NG project, see the arXiv arXitecture. Of particular interest:
Take a look at branch management.
In general, we deliver work by raising a pull request from a feature branch
feature/issue-32-better-widgets) to the
develop branch. You can also raise a PR from a forked repo to the
develop branch of this (arXiv) repo.
For the PR to be merged:
- There should be an open issue documenting the feature, bug, or task to be completed. This helps to prevent going down rabbit holes or otherwise spending time on the wrong thing.
- All tests must be passing (and the contribution must have tests).
- Mypy type checking should pass.
- Pydocstyle should have no errors.
- Pylint should score at or above 8/10.
More on each of these below.
Please also include a description of the goals of the PR, the overall approach, and any unresolved questions or potential issues. Examples and/or screen shots are appreciated if applicable.
We use the built-in Python unittest framework to write all of our tests. Try to stick to the built-in tooling here.
Tests should live either in a
tests/ module at the root of the repo, or
tests module in a particular component of the application (e.g.
We aim for tests at the following levels of granularity:
Unit tests. We focus on unit tests for public functions/classes of modules. These should be consciously designed to avoid testing behavior that is implementation-specific.
Module tests. These test whole functional components within the application, making liberal use of Python's mock library. The kinds of modules that you might test include:
- Domain modules. These capture the core concepts, rules, and interactions within the application. You shouldn't need to mock anything, because the domain must not depend on any other components of the application.
- Controller modules. Mock service modules (e.g. for persistence), and test return values against whatever parameters might get passed in by the routes.
- Service integration modules. If possible, mock/spoof the external service that you're integrating with. For AWS services, moto is pretty nice.
Application tests. These tests run the whole application, possibly mocking external dependencies as needed. Use the Flask/Werkzeug test tooling for these tests.
For some useful background on test design, check out The Practical Test Pyramid.
Type annotations + static checks
We use type annotations throughout the running codebase (i.e. everything except tests). Check out Python's built-in typing library.
We use mypy to type-check code. If
everything checks out, the following should return no lines and exit 0, using
mypyp.ini config in the root of this repo.
pipenv run mypy -p [APPLICATION] | grep -v "test.*" | grep -v "defined here"
You can also just run:
Linting + Documentation
Code should adhere as closely as possible to PEP008. The following should exit 0:
pipenv run pydocstyle --convention=numpy --add-ignore=D401 [APPLICATION]
Or just run:
We use Numpy
for docstrings, and otherwise follow
PEP257. All modules, classes,
public functions (i.e. not starting with
_) should have docstrings. It's
also nice if you add docstrings for constants and class attributes; see
.pylintrc config file can be found in the root of this repository. The
following should score >= 8/10.
pipenv run pylint [APPLICATION]