In [15]:
from dsc.notebook import embed_website


# Good Git commit messages

- Recall that using Git alone does not result in good version control...
<div align="left">
<img src="./figures/vc_intro_commitstrip.png" alt="VC" width=700/>
<div/>

- You don't need to care about the content of your commits and their messages if you 
    - Never look back.
    - Don't use ```git log```.
    - Don't collaborate.
- However, if you are interested in why something happened and want to maintain a project over a long period of time, you should definitely try to make [atomic commits](https://www.freshconsulting.com/insights/blog/atomic-commits/), use rebase to tidy up your commit history, and try to write good commit messages.  
- Writing good commit messages is hard and often neglected. 
- [How to Write a Git Commit Message](https://cbea.ms/git-commit/): "A well-crafted Git commit message is the best way to communicate context about a change to fellow developers (and indeed to their future selves). A diff will tell you what changed, but only the commit message can properly tell you why."
- [Peter Hutterer](http://who-t.blogspot.com/2009/12/on-commit-messages.html): "A commit message shows whether a developer is a good collaborator".
- Writing descriptive commit messages is useful for code reviews and future development.

- Similar to source code, there are some [style guidelines](https://cbea.ms/git-commit/) that you should consider when writing a commit message.
- Separate the subject from the (optional) body with a blank line
- The subject line should be
    - Capitalized
    - Limited to 50 characters (72 is the hard limit)
    - Not be ended with a period
    - Written in the imperative (which is Git's convention, e.g., ```Merge branch x into y```)
- Wrap the body at 72 characters
- If required, use the body to explain
    - What is done in the commit.
    - Why it is done.
    - Why you chose this way and no other.  
- Do not explain how the update was done. This is documented by the commit itself.
- If possible, refer to a pull/merge request, issue, or comment that explains the commit.

- An example:
    
```
Add load and store functions to the data module

If we modularize the notebook into scripts, we need functions that
store and load processed data.
The pyscaffold_test.data module has been extended by these functions
so that
- scripts/fit.py stores the data that it processes.
- scripts/predict.py loads the processed data.

Resolves: #4
```

- When writing commit messages with a body, you should use ```git commit``` to open the text editor and not ```git commit -m```
- The subject line of a commit is displayed in GitHub/GitLab or if you use ```git log --oneline```
- There are also some [VsCode extensions](https://zhauniarovich.com/post/2020/2020-03-using-vscode-as-git-editor/) that help to formate a commit message.
- To assign a commit to an author it is important that you configure Git in your project repo to use 
    - As ```user.email``` your @hm.edu eMail.
    - As ```user.name``` the first letter of your first name followed by your last name, e.g, fspanhel.

# Working on a collaborative project with a Git management application 

## Motivation
- A Git management application is a hosting service for software development and Git version control.
- Popular examples are 
    - [GitHub](https://github.com): The largest source code host. Best for public repos. Owned by Microsoft.
    - [GitLab](https://gitlab.com/gitlab-org/gitlab): Must be self-hosted (?). Best for private repos.
    - [BitBucket](https://bitbucket.org/product): From Atlassian with Jira integration. Best for private repos.
- A Git management application hosts remote Git repos that you can use to perform pull and push operations.
- Moreover, it simplifies the collaboration using Git and improves code developement by providing
    - Access control.
    - Issues to describe and discuss code changes and their tracking.
    - Merge/Pull requests to discuss, review and approve code changes.
    - Wikis for documentation.
    - An easy way to fork repos.
    - CI/CD.
- We will discuss issues and merge/pull requests later in detail.

## Workflow overview
- Each fundamental change to the code should start with an **issue** that describes the task or feature.
- Such an **issue** can be created in the browser user interface of the Git management application.
- An issue is assigned to someone to work on it. 
- Every code change related to this issue is done in a specifc **feature branch**.
- A a **merge/pull request** is submitted by the developer if he/she wants to discuss or has finished his/her work on the feature.
- A merge/pull request is a browser user interface provided by the Git management application to discuss, review, and integrate code changes.
- When the code changes on the feature branch are completed, a reviewer is assigned.
- If the reviewer agrees that the code is finished, the browser user interface can be used to merge the feature branch into the target branch.
- Click [here](https://docs.gitlab.com/ee/topics/gitlab_flow.html) for a more detailed overview of how to use GitLab for collaboration.

## Playground
- In the following, you can use the [playground](https://gitlab.lrz.de/fspanhel/dsc_gitlab_playground) to get to know Gitlab and how it can be used for your project.

## [Issues](https://docs.gitlab.com/ee/user/project/issues/ )
- Each fundamental change to the code should start with an issue that describes the task or feature.
- By creating an issue you 
    - Inform your team about what should be dicussed or be done.
    - Avoid that feature branches get too big because you have to think about the scope when writing the issue.
    - Assign responsibilities. 
    - Improve your workflow by
        - Making use of the issue board to easily see what should be done and has be done. 
        - Refering to it when making commit messages or merge/pull requests. 
- It is necessary that every issue is addressed. You can also close them without modifying your code. In this case, the issue should contain information why it is not addressed.

### Creating an issue
- To create an issue click on `Issues` on the left sidebar of your Gitlab project and use the resulting site to create a new issue.
- Click [here](https://gitlab.lrz.de/fspanhel/dsc_gitlab_playground/-/issues/1) to see an issue of the [playground](https://gitlab.lrz.de/fspanhel/dsc_gitlab_playground).
- The issue title should describe the outcome of the issue. For instance, prefer "Cross validation should consider more than one fold" to "Cross validation considers only one fold".
- Create a [label](https://docs.gitlab.com/ee/user/project/labels.html) for this issue, e.g., to create a corresponding column for the issue board.
- When it is clear who should work on the issue, assign the issue to this person.
- Create a branch for the issue from the main branch to work on it. 
- For each issue there should be only one branch. But there can be one branch for several issues.
- You can also use tasks to break the issue down into smaller parts and create a pull/merge request.

## [Issue board](https://docs.gitlab.com/ee/user/project/issue_board.html)
- With the issue board you can plan, organize, and visualize your workflow and manage your project.
- It shows the issues your team is working on, the status of each issue and who the issue is assigned to.
- You can use it as a [SCRUM](https://en.wikipedia.org/wiki/Scrum_(software_development) or [KANBAN](https://en.wikipedia.org/wiki/Kanban_(development) board. 
- Regarding Data Science projects I recommend to use it as a KANBAN board.
- You can create a new [list](https://docs.gitlab.com/ee/user/project/issue_board.html#issue-board-terminology) (a column on the issue board that displays issues matching certain attribute) by referrring to a [label](https://docs.gitlab.com/ee/user/project/labels.html)
- You can also use multiple issue boards (e.g., one for data processing, one for modeling...).
- Click [here](https://www.youtube.com/watch?v=vjccjHI7aGI&feature=youtu.be) to watch a video presentation of the issue board.

In [16]:
embed_website("https://docs.gitlab.com/ee/user/project/issue_board.html")

## [Pull/merge requests](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- A pull request (PR) or merge request (MR) is the same concept. 
    - GitHub and Bitbucket call this process a pull request because the first manual action is to pull the feature branch.
    - GitLab call this process a merge request because the final action is to merge the feature branch.
- Open a MR when you have **completed** your work on the feature branch or would like to **discuss** the work in progress.
- A MR is a browser user interface to **discuss** and **review** code changes and to **merge** the feature branch into another branch.
- For instance, if you would like to have feedback or need help, you can create a MR request, comment your code and notify corresponding persons to see the comments right to the corresponding commit. 
- In this case, you don't assign the MR to anyone and start the title of the MR with "Draft: " to prevent an unintended merge.
- Note that your feature branch is public when there is a corresponding MR.
- If the code is ready to be merged, assign the MR/PR to a reviewer which can then merge the changes.
- However, it might happen that the reviewer asks you to add further changes to the feature branch before he/she accepts the MR. 

### Creating a MR
- You can create a MR directly from and [issue](https://gitlab.lrz.de/fspanhel/dsc_gitlab_playground/-/issues/1).
- To see the resulting MR click [here](https://gitlab.lrz.de/fspanhel/dsc_gitlab_playground/-/merge_requests/1).
- When you create a MR, include a summary of the changes and what problem do they solve. Refer to corresponding issues and also refer to the MR in these issues. 
- If you use [appropriate keywords](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically), the referenced issues will automatically be close when the MR merges. 
- If you want a review for this MR select a reviewer on the right sidebar who can approve the MR.

An example pull request from the [GitHub docs](https://docs.github.com/assets/cb-155985/images/help/pull_requests/pull-request-body.png):
<div align="left">
<img src="https://docs.github.com/assets/cb-155985/images/help/pull_requests/pull-request-body.png" alt="PR example" width=1000/>
<div/>

### Discussing and reviewing
- You can use the body of the MR for a general discussion. 
- If you want to point something out to the reviews you can add comments directly to specific lines of changed files. 

<div align="left">
<img src="https://docs.github.com/assets/cb-37772/images/help/pull_requests/pull-request-comment.png" alt="Comment example" width=1000/>
<div/>

- Conversely, reviewers can comment on the whole MR or add comments to specific lines to ask questions or provide suggestions.
- When commenting you can choose to start a review which bundles commits. 
- Resolve the commit when the discussion about it is finished.
- Note: In order to check whether the code of the MR is running you can fetch and then checkout the corresponding remote feature branch, e.g., ```git fetch --all && git checkout origin/feature_branch```

### Updating
- If new commits are added to the branch of the MR, the MR is updated accordingly. 
- This is also necessary, if improvements are required before the MR can be approved.
- Besides adding a commit directly with Git, you can also use comments in the browser user interface to make direct [suggestions](https://docs.gitlab.com/ee/user/project/merge_requests/reviews/suggestions.html).
    - This is especially useful for small commits, e.g., typos.
- When you resolve one comment with a commit, you should refer to the comment link in the commit message (so that we can see which commits resolve comments in the MR)
- If you resolve a comment with a commit you should mention the corresponding commit hash, e.g, "Resolve by [commit hash]".

### Approving
- The assigned reviewers can approve the MR.
- Depending on the settings, approval might be required so that the merge of the MR can be performed.

### Merging
- After each comment has been resolved or approval has been given, the code of the feature branch can be integrated into the target branch. 
- Depending on the settings, GitLab uses different [strategies](https://gitlab.lrz.de/help/user/project/merge_requests/methods/index.md) to merge the branches. 
- By default, a merge commit is always created, even if a fast-forward merge is possible. 
- It is also possible to [squash merges](https://gitlab.lrz.de/help/user/project/merge_requests/squash_and_merge.md) and delete the feature branch automatically. 
- In general, you should delete the feature branch when it is no longer needed. 

# Code reviews
- [Google's best practicses for good reviews](https://github.com/google/eng-practices/blob/master/review/index.md)
    - Design: Is the code well-designed and appropriate for your system?
    - Functionality: Does the code behave as the author likely intended? Is the way the code behaves good for its users?
    - Complexity: Could the code be made simpler? Would another developer be able to easily understand and use this code when they come across it in the future?
    - Tests: Does the code have correct and well-designed automated tests?
    - Naming: Did the developer choose clear names for variables, classes, methods, etc.?
    - Comments: Are the comments clear and useful?
    - Style: Does the code follow our style guides?
    - Documentation: Did the developer also update relevant documentation?

# Git workflows


- Git does not dictate how to interact with it but offers a high degree of freedom in how it can be used.
- If your team has no convention how to work with Git, working with Git may be more cumbersome than it should be.
- A Git workflow is a guideline how to use Git consistently and efficiently.
- In particular, it often provides guidance for managing, creating, and combining branches.
- There is no workflow that works best for all teams and projects.
- As a result, may workflows have been proposed.

## [Archetypical flows](https://www.atlassian.com/git/tutorials/comparing-workflows)
- [Centralized workflow](https://www.atlassian.com/git/tutorials/comparing-workflows#centralized-workflow)
    - One central remote repo.
    - There are not branches and everybody pushes to the main branch.
- [Feature branch workflow](https://www.atlassian.com/git/tutorials/comparing-workflows/feature-branch-workflow): (read this at least once!)
    - Uses one central remote repo and multiple branches.
    - The main branch is the official project history in which all relevent changes eventually get merged into and which should never contain broken code. 
<!-- This is the project repository at GitHub, GitLab, ... -->
    - The development of each feature should be encapsulated in a corresponding feature branch that is branched off from the main branch.
    - Feature branchs are developed locally and pushed to the corresponding feature branch of the remote repo.
    - A pull/merge request is then submitted to integrate the feature branch into the main branch.e main branch.
- [Forking workflow](https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow)
    - Each developer has its own remote repo. 
    - Often used for open-source projects. 
    - A developer forks the official remote repo of the project maintainer, i.e., (s)he, or a Git hosting service, creates a own copy of the remote repo on a remote, and pushes changes to his/her private remote repo. 
    - Pull/merge requests are used to integrate changes from a forked repo into the official remote repo.

## [Concrete flows](https://www.gitkraken.com/learn/git/best-practices/git-branch-strategy)
- [Git flow](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow): Sophisticated feature branch workflow. 
    - The first popular workflow. 
    - Assigns specific roles to different branches and how and when they should interact. 
    - May result in merge hell. 
    - Click [here](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) and [here](https://www.youtube.com/watch?v=_w6TwnLCFwA) for more information.
- [GitHub flow](https://docs.github.com/en/get-started/quickstart/github-flow): Proposes to fork a repo before working on it with the feature branch workflow. 
- [Atlassian flow](https://www.atlassian.com/blog/git/simple-git-workflow-is-simple): Like GitHub flow but rebases feature branches before merging so that the actual merge commit is just a marker for the feature branch and does not include any changed files.
- [GitLab flow](https://docs.gitlab.com/ee/topics/gitlab_flow.html#mergepull-requests-with-gitlab-flow): Click [here](https://www.youtube.com/watch?v=InKNIvky2KE&feature=youtu.be) for a video.
- [Trunk-based flow](https://www.atlassian.com/continuous-delivery/continuous-integration/trunk-based-development). Feature branch workflow with very short-lived branches that are integrated into the main branch potentially several times a day. 
    - Common practice among DevOps teams who use [CI/CD](https://www.atlassian.com/continuous-delivery).
    - Click [here](https://cloud.google.com/architecture/devops/devops-tech-trunk-based-development) for more information.
-  Click [here](https://docs.gitlab.com/ee/topics/gitlab_flow.html) and  [here](https://www.gitkraken.com/learn/git/best-practices/git-branch-strategy) for a comparions of some flows.


## Git flow
- Uses two main branches to record the history of the project
    - The branch `main` is the official release history and contains the production-ready code.
    - The `dev` branch contains additional development changes for the next release.
- The other branches have a limited life time, since they will be merged into `main` and `dev` and removed eventually and are categorized as
    - Feature branches: Branches off from and is integrated back into `dev`.
    - Release branches: Branches off from `dev`, is integrated back into `main`.
    - Hotfix branches: Branches off from `main`, is integrated back into `main` and `dev`.
- Note:
    - If `dev`is considered to be the main branch, then the feature branch workflow is used for `dev` and the feature branches.
    - If something is merged into `main` it must also be merged into `dev. 
- See [here](https://nvie.com/posts/a-successful-git-branching-model) and [here](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) for more details.

Source: http://nvie.com/posts/a-succesful-git-branching-model
<div>
<img src="./figures/gitflow.png" alt="Git flow" width=700/>
<div/>

# Your data science challenge project

## The Git workflow

- For data science projects which focus on analyses and experimentation I think that a simple feature branch workflow is adequate.
- However, for this course I suggest to use a simplified Git flow for the project work that is basically a feature branch workflow with two branches `main` and `submission` that record the project's history.  
<!--  -->
    1. In general, you create a feature branch from the (up-to-date) branch `main`.
    1. When a feature branch is completed, create two merge requests, each with the target branch being
        1. `main`. 
        2. `submission`.
    1. You close the merge request by performing the merge into `main` without deleting the feature branch (!).
    1. After I have merged the feature branch into `submission`, you merge `submission` back into `main`.
<br>
<!--  -->
<br>  
- Using this Git flow I hope that
    1. It is ensured that I regularly check your code and give you feedback when I review the MR for the submission branch.
    2. Further development of your code is possible even if I have not yet performed the MR for the submission branch. 

## Branches and permissons
- I am the maintainer of the project repo and you have a developer role.
- Each project repo will start with the branches `main` and `submission` and the following permissions
<br>
<div>
<img src="./figures/permission.png" alt="Permissions" width=700/>
<div/>
<br>

- That is, you can only integrate changes on `main` and `submission` by creating a MR. 
    - You can perform the merge into `main`.
    - Only I can perform the merge into `submission`.

## Merge method
- The project uses the [default merge commit](https://gitlab.lrz.de/help/user/project/merge_requests/methods/index.md#merge-commit) to perform the merge of a MR.
- Moreover, commits are automatically [squashed](https://gitlab.lrz.de/help/user/project/merge_requests/methods/index.md) when merging is done.
    - Note that an explicit merge commit is always done.
    - This is contrary to ```git merge --squash``` which does not add a merge commit.


## Recommendations

### General 
- Make use of all the tools that we have discussed in this course.
- Start with a very simple model and focus on getting results.
- You can always iterate and improve later but it is important that the basic structure is in place.
- Ask questions and discuss.

### Branches
- First, create an issue for each significant feature. Then create the branch.
- You don't have to create an issue if you are just fixing a bug and this is evident from the MR.
- Ideally, one person should work on one branch.
- Rather use short-lived branches which focus on one feature than developing several features on one feature branch.
    - Reduces the likelihood of merge conflicts.
    - I can give feedback more frequently. 
    - Always create at least one issue for a merge request and refer to this issue in the merge commit message. 
- Rebase private feature branches to tidy the commit history so that each commit contains an isolated and complete change.
- Never ever rebase `main` or other public branches.
- Use descriptive names for feature branches, like ```<author>__<branch-name>``` or ```<author>_#<issue-number>```.
    - Click [here](https://stackoverflow.com/a/11886179) if you want to describe branches locally. However, referring to issues is a better way.

### Merge requests
<!-- Use merge requests for significant changes instead of pushing directly to `main` so that code can be discussed during development and a code review is possible when the branch is finished.  -->
- Use merge requests for discussion and code reviews.
- Do not delete the feature branch before it is merged into `submission` (!)
    - I will delete the feature branch after it is merged into `submission`.
    - If you have deleted the feature branch you can checkout a new branch from the commit before the corresponding merge commit on `main` and use this branch as the source for the MR into `submission`.
- Let each merge request be approved from at least one person who was not involved in the development of the feature branch (the author of the MR cannot approve but people who have added commits can).
- You can also refer to me if you want feedback during a merge request for `main`.
- You can leave the MR description empty if it refers to an issue that explains everything.
- Don't forget to create a merge request to integrate the changes into `submission` (!)

- [This merge request](https://gitlab.lrz.de/fspanhel/dsc_gitlab_playground/-/merge_requests/1) illustrates how a merge request with a referenced issue should look like  

<div>
<img src="./figures/mr_form.png" alt="MR" width=900/>
<div/>

- If you use pull or merge requests in your code review process, don't use git rebase after creating the pull request
- As soon as you make the pull or merge request, other developers will be investigating your commits, which means that it’s a public branch. 
- Re-writing its history will make it impossible for Git and your teammates to track any follow-up commits
- Before you perform the pull or merge request, you could rebase the target feature onto the main branch. However, the project settings anyway imply that commits are squashed automatically and that the merge is always a [three-way merge with an explicit merge commit](https://gitlab.lrz.de/help/user/project/merge_requests/methods/index.md#merge-commit).