# Code Versioning, Style and Quality 

We already know that versioning code is critical, even for data scientists who are not writing software. During the model development process (or data analysis, when not building machine learning models), our code goes through many changes and iterations, which is difficult to track by saving different versions separately. This is where version control comes in, and we have decided to use git, with remote hosting on Github (or even Dagshub), for this class.

We all write our code in our own way, but when collaborating on a team, adhering to a particular coding style, agreed upon by the team, makes it easier for another team member to read and understand our code. It can be difficult to break habits and begin coding in a style we are not accustomed to, but efficiency for the team is far more important than efficiency for ourselves. Luckily, there are tools we can use to help us format our code, as well as check the quality of our code.

## Branching

When collaborating on code it can be important to use a branching strategy in order to keep the codebase clean and not break things. Branching means that any changes we make to code, that we commit to a branch, do not effect other branches. What this means is that we can have one *main* branch that contains our *clean production code*, and in order to keep the main branch clean, we can create a new branch (which we can call *develop*) for our code development. All code changes we make are then pushed to one of these branches. When we are ready, we can *merge* our develop branch with the main branch. 

Different teams will have different practices, and you should follow those practices, assuming they are sound. For our purposes, working independently does not require branching, but let's play around with it anyways by following the steps in the Lab section below.

Some commands you'll want to be familiar with:

`git branch`: to see all branches and current branch  
`git branch <branch_name>`: to create a new branch  
`git checkout <branch_name>`: to switch to another branch  
`git checkout -b <branch_name>`: to create and switch to a new branch  
`git branch -d <branch_name>`: to delete a branch  
`git merge <branch_name>`: to merge a branch (branch_name) with current branch

## Linting 

There are a few linters and formatters we can choose from, and you are not required to use any particular one. But, for this lab I will be using [pylint](https://pypi.org/project/pylint/) (for linting), [python black](https://black.readthedocs.io/en/stable/) (for formatting), and [isort](https://pycqa.github.io/isort/) (for sorting the order that libraries are loaded in our scripts).

First, we can install pylint, black and isort. We do **not** need to add these to our requirements.txt file, or worry as much about versions, as they would not be required for running the model code.  

`pip install pylint black isort`

When using pylint, black, and isort we will most likely want to add some configurations. These tools can be very opinionated, and we may want to ignore some of their suggestions. For configuring pylint, black, and isort we will now create a `pyproject.toml` file, which can be used to configure several different tools that we may be using as part of our python project.

`touch pyproject.toml`

For now, we'll keep it blank and let's run pylint to see what kinds of things it complains about. We can take a look at the documentation on the website or run `pylint --help`. 

We can run pylint on single python files:

`pylint <file_name>`

or on files in a directory:

`pylint <dir_name>`

or on multiple files in the current directory by setting the `--recursive` option to `y`:

`pylint --recursive=y .`

There may be pylint integrations and extensions that you can use with your IDE. For example, in VSCode you can add the pylint extension and then enable linting with pylint in the VSCode settings, and this will allow you to see the complaints you get from pylint directly in your .py files.

Next, we can add the pylint tool messages control configuration to the `pyproject.toml` file, which will look like this:

```
[tool.pylint.messages_control]
disable = [
     "missing-final-newline",
     "missing-function-docstring"
]
```

This will keep pylint from complaining about missing docstrings and newlines. From here, we can go through pylint's output and make whatever changes to the code we want, and add whatever we want to the configuration, in order to clean things up.

## Formatting

Python black and isort will help us format our code. Before we start, let's commit any changes we've made so that if we don't like how black and isort change our code, we can always roll back. 

`git status`

Now, let's move on to python black. Python black, as we said before, is very opinionated, but we can configure certain things we'd want it to ignore in the pyproject.toml. First, we should look at the documentation on the website or run a `black --help`.

We can get a preview of the changes that black will make by running a diff. This will show us the changes, without actually applying the changes to our files. We can run it on a single file like this:

`black --diff <file_name>`

Or we can run it on multiple files like this:

`black --diff . | less`

As we scroll through the changes that black wants to make, we can decide for ourselves which changes we'd like to ignore, and then check the documentation to see if there's a way to suppress those particular changes in the configuration. For example, we may not want black to change all of our quotes to double-quotes. We can add the skip-string-normalization parameter to the configuration like below.

```
[tool.black]
line-length = 100
target-version = ['py39']
skip-string-normalization = true
```

After adding certain configurations, we can recheck the change black will make.

`black --diff <file_name>` or `black --diff . | less`

Once we are ready to apply the changes, after we've added everything to the configuration that we want, we can run:

`black <file_name>` of `black .`

Then we can look at the changes that were made using `git diff`, and if we are unhappy, we can roll back the changes and try again. Now we can re-run pylint, and we should see much fewer complaints now that black has formatted our code for us.

`pylint <file_name>`

From here, let's move on to isort. First, look at the documentation on the website, or run `isort --help`. Luckily, we can run the same kind of diff on isort, so we can preview the changes before they are applied. 

`isort --diff <file_name>` or `isort --diff . | less`

We'll notice that isort like to put "import" statements first, and then "from package package_name import" statements next, and then sorts by alphabetical order. If we prefer to use a different method for sorting, we can add a configuration to the pyproject.toml file. For example, if we want to sort by string length instead of alphabetically, we can make the following change:

```
[tool.isort]
length_sort = true
```

Again, once we are happy with the changes, we can apply them.

`isort <file_name>` or `isort .`

and do a `git diff`. 

# Branching and Code Quality Lab

## Overview

In this lab you will practice with git branching, code linting and formatting.

## Goal

The goal in this lab is to become familiar with the use of branches in the model development process and the importance of code styling. Although we will use pylint, isort, and python black in this lab, we are **not** trying to learn all we can about these tools. When it comes to linting and formatting, there are others tools out there that you might like better. 

## Instructions

### Branching

When you are working independently on a project, code branching is overkill. But, you should give it some practice to see how it works and how your files are affected. 

1. Start out clean - commit and push all uncommitted changes.  
2. Create a new text file called 'branching.txt'. You will delete it later.  
3. Add a line of text to it.  
4. Commit changes.  
5. Now, create and checkout a new branch called b1. `git checkout -b b1`  
6. Add a second line of text to your branching.txt file.  
7. Commit changes.  
8. Switch back to the main branch (`git checkout main`) and look at your text file. You should notice that the second line of code is no longer there.  
9. Switch back to the b1 branch (`git checkout b1`), make more changes and commit them.  
10. Feel free to push the main and b1 branch to Github (`git push origin b1`), and then you will notice on Github that you have more than one branch.  You can open a pull request and merge, but it's not necessary.    
11. When you are all done, you can delete branching.txt and delete branch b1 (`git branch --delete b1`). Note that this will only delete the *local* branch. You will also need to delete the *remote* branch b1 in Github.

### Linting and Formatting

1. Run the linter of your choice and begin making changes to your code.  
2. Add to your `pyproject.toml` file to have the linter ignore what you want it to ignore.  
3. Run python black, using the --diff parameter, and then add any configurations you want before applying those changes.  
4. Run isort, using the --diff parameter, and then add any configurations you want before applying those changes.  
5. Commit changes to git and push to Github.  

### Documentation

Now is a good time to go over each of your .py scripts and add docstrings and other code comments that will make your code more readable. 

### Final Project

We will return to branching, linting and styling later when we cover other topics in CI/CD/CT. For the final project your team should decide on a coding style and a branching strategy. This part is easy, and **does not** require you to compare any tools, just study the branching strategies and different coding styles and make a choice and defend it.