Developer page

Ivan Menshikh edited this page Sep 20, 2018 · 67 revisions

The code for gensim is hosted here, on github. Contributions in the form of pull requests are welcome, be it for code or documentation. You may also report an issue or bug here.

If you don't feel confident in your git and/or Python, you can get up-to-speed with these tutorials. If your contribution is more in the idea department rather than code, use the gensim mailing list.

Documentation

Python docstrings are for an overview of the functionality, to anchor a class or method conceptually and check their parameters, not to describe how things work internally in detail. For all other cases, the code ought to be its own documentation. Any non-obvious tricks and coding patterns that may confuse an otherwise literate Python programmer need a source code comment.

Gensim is in permanent need of better tutorials, usage examples, as well as clearer docstrings. Contributions are most welcome.

When adding a new file to the repo please add corresponding rst file in docs/src and update apiref.rst

API documentation that appears on the web is automatically generated from docstrings, via Sphinx:

tox -e docs  # generate new docs version, will be available in docs/src/_build/html
tox -e docs-upload  # generate new docs version and upload it to site (need ssh permissions)

Docstrings

All docstrings should follow numpy-style format. The docstrings are used for generating documentation so we really care about the quality of the doc strings. See the example of numpy-style docstring on sphinx page and numpy docstrings detailed guide

Git flow

Branching model follows http://nvie.com/posts/a-successful-git-branching-model/:

  • master branch is stable, HEAD is always the latest release
  • develop branch contains the latest code for the next release.
  • various feature branches, to be merged into develop upon completion
  • include the issue number in the name of the branch

For a new feature, branch off develop:

$ git checkout -b myfeature develop

To merge a feature back into develop:

$ git checkout develop
$ git merge --no-ff myfeature
$ git branch -d myfeature
$ git push --tags origin develop

Code style

PEP8 in common: no trailing whitespace in the source code, whitespace on empty Python lines (lines separating blocks of code/methods etc.) and so on. No vertical indents (only hanging indents). For automating checking your code style (except indents), please run

tox -e flake8

Making a new release

Check that all CI in gensim repository passed correctly for last commit in 'develop`

To start a new release, first, branch off develop:

export RELEASE=X.Y.Z
git checkout -b release-${RELEASE} develop

Bump up version in setup.py, docs/src/conf.py and gensim/__init__.py, commit example.

git commit -m "bump version to ${RELEASE}"

If updating README.md, then update LONG_DESCRIPTION in setup.py

Also, don't forget to update CHANGELOG.md

git add CHANGELOG.md
git commit -m "bump CHANGELOG to ${RELEASE}"

To finalize the release, re-generate Cython files (if changed):

cython gensim/corpora/_mmreader.pyx
cython gensim/_matutils.pyx
cython gensim/models/fasttext_inner.pyx
cython gensim/models/doc2vec_inner.pyx
cython gensim/models/word2vec_inner.pyx
cython gensim/models/_utils_any2vec.pyx
cython --cplus gensim/models/word2vec_corpusfile.pyx
cython --cplus gensim/models/doc2vec_corpusfile.pyx
cython --cplus gensim/models/fasttext_corpusfile.pyx

git add gensim/corpora/_mmreader.c gensim/_matutils.c gensim/models/fasttext_inner.c gensim/models/doc2vec_inner.c gensim/models/word2vec_inner.c gensim/models/_utils_any2vec.c gensim/models/word2vec_corpusfile.cpp gensim/models/doc2vec_corpusfile.cpp gensim/models/fasttext_corpusfile.cpp
git commit -m "regenerated C files with Cython"

and then merge the branch into master, tag and merge master to develop:

git checkout master
git merge --no-ff release-${RELEASE}
git tag -a ${RELEASE} -m "${RELEASE}"
git push --tags origin master
git checkout develop
git merge --no-ff master
git push origin develop

Add text description in https://github.com/piskvorky/gensim/tags

Update https://github.com/MacPython/gensim-wheels (BUILD_COMMIT in Travis/AppVeyor config + update gensim subrepository, commit example)

After CI in gensim-wheels repo will finish, download all wheels + generate sdist:

tox -e download-wheels

Test gensim with test PyPI, check that all looks good after loading on gensim page

tox -e test-pypi

Load all wheels to "wild" PyPI:

tox -e upload-wheels

Update coda-forge repository, see commit example

And update documentation at http://radimrehurek.com/gensim:

tox -e docs
cd docs/src
make upload

Tweet about successfull release :+1:

Legal

By submitting your contribution to be included in the gensim project, you agree to assign joint ownership of your changes (your code patch, documentation fix, whatever) to me, Radim Řehůřek.

This means I will have the full rights to incorporate, distribute and/or further modify your changes, without any fees or restrictions from you.

This is needed in open-source projects because you are automatically the copyright owner of your contribution by law, and I couldn't do anything with it without your permission.

An example file header:

# Copyright (C) 2016 Radim Rehurek <radim@rare-technologies.com>
# Copyright (C) 2016 Your Name <me@gmail.com>
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.