## DSCI 524 - Collaborative Software Development

### Lecture 6: Introductions to Continuous Development (CD), package documentation & publishing

#### 2020-03-11

## Lecture 6 learning objectives:
By the end of this lecture, students should be able to:
- [Define continuous deployment](#Continous-Deployment-(CD))
- [Explain why continuous deployment is superior to manually deploying software](#Why-use-CD?)
- [Use GitHub Actions to set-up automated deployment of Python packages upon push to the master branch](#Using-GitHub-Actions-to-perform-CD-for-your-Python-package)
- [Explain semantic versioning, and define what constitutes patch, minor, major and breaking changes](#Semantic-versioning)
- [Write conventional commit messages that are useful for semantic release](#Conventional-commit-messages)
- [Generate well formatted function and package-level documentation for Python packages using Sphinx & Read the Docs](#Package-level-documentation-for-Python)
- [Generate well formatted function and package-level documentation for R using Roxygen and `pkgdown`](#Package-level-documentation-for-R)
- [Publish Python packages to test PyPI](#Publishing-your-Python-package-for-this-milestone:)
- [Publish R packages to GitHub, document how to install them via `devtools::install_github`](#Publishing-your-R-package-for-this-milestone:)

## Continous Deployment (CD)

Defined as the practice of automating the deployment of software that has successfully run through your test-suite.

For example, upon merging a pull request to master, an automation process builds the Python package and publishes to PyPI without further human intervention. 

### Why use CD?

- little to no effort in deploying new version of the software allows new features to be rolled out quickly and frequently
- also allows for quick implementation and release of bug fixes
- deployment can be done by many contributors, not just one or two people with a high level of Software Engineering expertise

### Why use CD?

Perhaps this story is more convincing:

*The company, let’s call them ABC Corp, had 16 instances of the same software, each as a different white label hosted on separate Linux machines in their data center. What I ended up watching (for 3 hours) was how the client remotely connected to each machine individually and did a “capistrano deploy”. For those unfamiliar, Capistrano is essentially a scripting tool which allows for remote execution of various tasks. The deployment process involved running multiple commands on each machine and then doing manual testing to make sure it worked.*

*The best part was that this developer and one other were the only two in the whole company who knew how to run the deployment, meaning they were forbidden from going on vacation at the same time. And if one of them was sick, the other had the responsibility all for themselves. This deployment process was done once every two weeks.*

[*Source*](https://levelup.gitconnected.com/heres-why-continuous-integration-and-deployment-is-so-important-to-the-software-development-c0caeead5881)

## Using GitHub Actions to perform CD for your Python package

We will be building off what we learned last class about continuous integration with GitHub actions for Python. What we need to change to make a continuous deployment work for our package?

- event trigger: we should only deploy after merging pull requests to master

- runner & Python version: we should only deploy once for each merged pull request

- bump the package version

- create a release whose name corresponds to that version

- build the package!

- publish the package to (test) PyPI

### Exercise: read [`release.yml`](https://github.com/UBC-MDS/cookiecutter-ubc-mds/blob/master/%7B%7Bcookiecutter.project_slug%7D%7D/.github/workflows/release.yml)

To make sure we understand what is happening in our workflow that performs CD, let's convert each **step** to a human-readable explanation:

1. checkout our repository files from GitHub and put them on the runner

2. Sets up Python on the runner

3. Installs our package and the package dependencies

4. Check Style of package code and test suite

5. Run test suite

6. Upload coverage report to codecov.io

7. 

8.

9.

10.

11.

12.

> Note: I filled in the steps we went over last class, so you can just fill in the new stuff

### How can we automate version bumping?

Let's look at the step that accomplishes this:

```
- name: Bump version and tagging and publish
      run: |
        git config --local user.email "action@github.com"
        git config --local user.name "GitHub Action"
        git pull origin master
        poetry run semantic-release version
        poetry version $(grep "version" */__init__.py | cut -d "'" -f 2 | cut -d '"' -f 2)
        git commit -m "Add changes" -a
```

Our key command in this step is `poetry run semantic-release version`. 

[Python semantic-release](https://python-semantic-release.readthedocs.io/en/latest/) is a Python tool which parses commit messages looking for keywords to indicate how to bump the version. It bumps the version in the `__init__.py` file of your package, and then we use `poetry version` and some regex to grab that version from `__init__.py` and also update the `pyproject.toml` file.

To understand how it works so that we can use it, we need to understand **semantic versioning** and how to write **conventional commit** messages.

Let's unpack eack of these on its own.

## Semantic versioning

- When we make changes and publish new versions of our packages, we should tag these with a version number so that we and others can view and use older versions of the package if needed. 


- These version numbers should also communicate something about how the underlying code has changed from one version to the next. 

- Semantic versioning is an agreed upon "code" by developers that gives meaning to version number changes, so developers and users can make meaningful predictions about how code changes between versions from looking solely at the version numbers.

- Semantic versioning assumes version 1.0.0 defines the API, and the changes going forward use that as a starting reference.

## Semantic versioning

Given a version number `MAJOR.MINOR.PATCH` (e.g., `2.3.1`), increment the:

- MAJOR version when you make incompatible API changes (often called breaking changes 💥) 

- MINOR version when you add functionality in a backwards compatible manner ✨↩️

- PATCH version when you make backwards compatible bug fixes 🐞

*Source: https://semver.org/*

### Semantic versioning case study

**Case 1:** In June 2009, Python bumped versions from 3.0.1, some changes in the new release included:
- Addition of an ordered dictionary type
- A pure Python reference implementation of the import statement
- New syntax for nested with statements

**Case 2:** In Dec 2017, Python bumped versions from 3.6.3, some changes in the new release included:

- Fixed several issues in printing tracebacks (`PyTraceBack_Print()`).
- Fix the interactive interpreter looping endlessly when no memory.
- Fixed an assertion failure in Python parser in case of a bad `unicodedata.normalize()`

**Case 3:** In Feb 2008, Python bumped versions from 2.7.17, some changes in the new release included:
- `print` became a function
- integer division resulted in creation of a float, instead of an integer
- Some well-known APIs no longer returned lists (e.g., `dict.keys`, `dict.values`, `map`)

### Exercise: name that semantic version release

Read the three cases posted above and state whether each should be a major, minor or patch version bump. Then write down the next version number the new release should have.

> Your answer here

## Conventional commit messages

Python semantic-release by default uses a parser that works on the conventional (or Angular) commit message style, which is:

```
<type>(optional scope): succinct description of the change

(optional body: the motivation for the change and contrast this with previous behavior)

(optional footer: note BREAKING CHANGES here, as well as any issues to be closed)
```


How to affect semantic versioning with conventional commit messages:
- a commit with the type `fix` leads to a patch version bump
- a commit with the type `feat` leads to a minor version bump
- a commit with a body or footer that starts with `BREAKING CHANGE:` - these can be of any type

> Note - commit types other than `fix` and `feat` are allowed. Recommeneded ones include `docs`, `style`, `refactor`, `test`, `ci` and [others](https://github.com/angular/angular/blob/master/CONTRIBUTING.md#type).

### An example of a conventional commit message

```
git commit -m "feat(ratings): added the ability to initialize a project even if a pyproject.toml file exists

This has been a feature requested by different poetry users (including me!).  

closes #1639"
```

What kind of version bump would this result in?

### Another example of a conventional commit message

```
git commit -m "feat: change to use of `%>%` to add new layers to ggplot objects

This is a joke! But an interesting piece of related trivia is this is how the original ggplot was designed!

BREAKING CHANGE: `+` operator will no longer work for adding new layers to ggplot objects after this release
```

What kind of version bump would this result in?

### Some practical notes for usage in your packages:

0. You must add `python-semantic-release` as a dev dependency via poetry

1. You must add the following to the tool section of your `pyproject.toml` file for this to work (filling in `<package_name>` with the appropriate value):
    ```
    [tool.semantic_release]
    version_variable = "<package_name>/__init__.py:__version__"
    version_source = "commit"
    upload_to_pypi = "false"
    patch_without_tag = "true"
    ```
2. If `feat` or `BREAKING CHANGES:` are not included in the commits when a pull request is merged to master, by default Python's `semantic-release` bumps the patch version.

### Demo of Continous Development

- <https://github.com/ttimbers/convertempPy>

### What about CD with R packages

- This is not a common practice (yet!). One reason for this could be that CRAN has a policy where they only want to see updates every 1-2 months.

- Semantic versioning is used in Tidyverse R packages, but creating versions is done manually

## Package-level documentation for Python

TODO!!!

## Package documentation for R

There are several levels of documentation possible for R packages:
- code-level documentation (Roxygen-style comments)
- vignettes
- package websites (via `pkgdown`)

### Code-level documentation (Roxygen-style comments)

- We learned the basics of how to write Roxygen-style comments in DSCI 511
- In the package context, there are Namespace tags you should know about:
    - `@export` - this should be added to all package functions you want your user to know about
    - `@NoRd` - this should be added to helper/internal helper functions that you don't want your user to know about

### Vignettes
- Think of your vignette as a demonstration of how someone would use your function to solve a problem. 

- It should demonstrate how the individual functions in your package work, as well as how they can be integrated together.

- To create a template for your vignette, run:
    ```
    usethis::use_vignette("package_name-vignette")
    ```
    
- Add content to that file and knit it when done.

As an example, here's the `dplyr` vignette: <https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html>

### Package websites (via [`pkgdown](https://pkgdown.r-lib.org/)`)

- Vignettes are very helpful, however they are not that discoverable by others, websites are a much easier way to share your package with others

- The `pkgdown` R package let's you build a beautiful website for your R package in 4 steps!

    1. Install `pkgdown`: `install.packages("pkgdown")

    2. Run `pkgdown::build_site()` from the root of your project, and commit and push the changes made by this.

    3. Turn on GitHub pages in your package repository, setting `master branch / docs folder` as the source.
    
    4. Oh wait, there's no step 4! 🎉

In addition to the beautiful website, `pkgdown` automatically links to your vignette under the articles section of the website!!! 🎉🎉🎉

## Publishing your Python package for this milestone:

For this course, we will only publish your package on test PyPI. You will use continuous deployment via the `release.yml` workflow file to do this.

To get your packages README and important links to show-up on the test PyPI page for your package, add the  following information to the [tool.poetry] table in pyproject.toml

```
readme = "README.md"
homepage = "https://github.com/<github_username>/<github_repo>"
repository = "https://github.com/<github_username>/<github_repo>"
documentation = 'https://<package_name>.readthedocs.io'
```

## Publishing your R package for this milestone:

For this course, we will only publish your package on GitHub, not CRAN. For this to work, you need to push your package code to GitHub and provide users these instructions to download, build and install your package:

```
# install.packages("devtools")
devtools::install_github("ttimbers/convertempr")
```

Next week we will talk about publishing on CRAN.

## Summary

 What did we learn today? Biggest take homes?
 
 - 
 
 - 
 
 - 
 

## Where to next:

- Tha package indices, PyPI and CRAN
- Peer review of data science software packages
- Working with other teams (specifications, opening issues, how to ask for help, etc) 
- Licenses

### Semantic versioning case study - answers

In 2008, Python bumped versions from 2.7.17 to 3.0.0. Some changes in the 3.0.0 release included:
- `print` became a function
- integer division resulted in creation of a float, instead of an integer
- Some well-known APIs no longer returned lists (e.g., `dict.keys`, `dict.values`, `map`)
- and many more (see [here](https://docs.python.org/3.0/whatsnew/3.0.html) if interested)

[*Source*](https://docs.python.org/3.0/whatsnew/3.0.html)

In 2009, Python bumped versions from 3.0.1 to 3.1.0. Some changes in the 3.1.0 release included:
- Addition of an ordered dictionary type
- A pure Python reference implementation of the import statement
- New syntax for nested with statements

[*Source*](https://www.python.org/download/releases/3.1/)

In 2017, Python bumped versions from 3.6.3 to 3.6.4. Some changes in the 3.6.4 release included:

- Fixed several issues in printing tracebacks (`PyTraceBack_Print()`).
- Fix the interactive interpreter looping endlessly when no memory.
- Fixed an assertion failure in Python parser in case of a bad `unicodedata.normalize()`

[*Source*](https://docs.python.org/3.6/whatsnew/changelog.html#python-3-6-4-final)