<img src="resources/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Collaborative Coding with Git/Github Demo SWDB 2022 </h1> 
<h3 align="center"></h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h2>Introduction to Git</h2>
<br>
<p> Git is an open source, version control tool. It allows a developer to record a history of all changes to all files in a project. `git` the de-facto industry standard tool for collaborative software development. This will enable us to do several important things:

<ul>
<li> Compare the differences between versions of the code
<li> Temporarily run other versions of the code, or revert back to an earlier version if we encounter problems
<li> Publish our code on websites like github or bitbucket
<li> Collaborate with others by automatically merging their changes with ours
</ul>

Managing code changes with a team of people all developing in parallel is really difficult! Version control systems like git make this easier, but still come with a steep learning curve. We want you to lean some basic concepts today, but becoming comfortable with git will just take practice.
</div>
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Tools for working with git</h3>
<br>
There are three major tools we will introduce for interacting with git

<ol>
    <li><b>git</b> itself. This is the software that runs under the hood to keep track of our code changes. We will rarely interact with git <i>directly</i> during this session, but it may be required later.
    <li><b>GitHub.com</b> is a commercial service + web application (free for open source projects) that allows us to store and publish code on the web. It implements some very nice tools for online collaboration that will be introduced later in the course.
    <li><b>GitHub Desktop</b> is an application that enables you to interact with GitHub using a GUI instead of the command line or a web browser.
</ol>

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Getting started with GitHub Desktop</h3>

<div style='color: #000; background-color: #CCD; font-family: monospace; padding: 15px; margin: 10px; margin-left: 30px; border-radius: 3px'>
<ol>
<li>First: fire up GitHub Desktop on your laptop!

<li>In order to begin tracking code changes, we need to <i>initialize</i> a git repository in the package folder.

<ul>
    <li>From "File", select "New repository..."
    <ul>
        <li><b>Name:</b> "my_repo"
        <li><b>Description:</b> "A Test Repository"
        <li><b>Local Path:</b> Select your "GitHub" folder
        <li><b>Initialize this repositoru with a README:</b> Select this option
        <li><b>Git ignore:</b> "Python"
        <li><b>License:</b> "MIT License"
    </ul>
    <li>Click "Create repository"
</ul>
    
<h3>What has changed?</h3>

<ul>
    <li>A subfolder "my_repo/.git" has been created by <b>git</b>, which is where it will store all code version information
    <li>A ".gitattributes" file was created with some basic GitHub Desktop configs
    <li>A boilerplate README.md file was created
</ul>

<div> <!-- NOTE: this div is a workaround for a jupyter HTML export bug --> </div>
</div>


<h2>Definitions:</h2>

<p><b>Repository (n):</b> A folder that contains all of the files associated with a project and the history of changes made to each file. A git repository contains a `.git` subfolder that stores all data about the history of commits and the configuration of the repository.

<p><b>Commit (n):</b> A snapshot of the state of all files in your repository at one point in time. A commit includes some metadata:
<ul>
    <li>Author
    <li>Creation date
    <li>A short description (written by the developer)
    <li>A unique ID (also called a "hash")
    <li>Parent ID (which commit came before this one)
</ul>

<p><b>Commit (v):</b> To create a new commit.
    
<div> <!-- NOTE: this div is a workaround for a jupyter HTML export bug --> </div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Our First Commit</h3>

<p> Open the newly created README.md file in your favorite text editor. At the end of the file add a Hello World message and save, so your file looks like:
<pre><code>
# my_repo
 A Test Repo

\#\# Hello World!

</code></pre>

<p> In GitHub Desktop, navigate make sure the Current repository is set to my_repo. We can see that our README.md file has changes from our previous commit. We can commit this change to main by clicking the Commit to main button. We can track the history of our changes in the Project in the History column. It's important to note that we can revert back to any commit whenever we want! It's like a Super Control-Z.
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Creating Branches</h3>

<p> Great, we have a way to track changes in a project. How does that help with collaboration? Luckily, git has another feature that allows you to work and commit changes without affecting a main source of truth. In GitHub Desktop, in the Current branch tab, pull the drop-down menu and select "New branch." Call the new branch `feature-branch`.

<p> Let's add a new feature to this branch. Create a new python file called `my_module.py` with the contents:
<pre><code>
def hello_world():
    print("Hello World!")
</code></pre>
In GitHub Desktop, we see one file has changed together with the changed contents. Let's commit these changes to the feature branch as we did before. Cool, so what? Haven't we already done this? The cool thing about branches is that you can checkout the main branch without altering anything in your feature-branch. In the Current branch tab, go ahead and open up main again. If you look in your my_repo directory, the my_module.py file is gone! But don't worry, we commited the changes in the feature-branch, so we can pull it up whenever we want.

<p> Let's do this exercise one more time to show how git helps with collaboration. Make sure your Current branch is set to main. Now create a new branch called feature2-branch. In this branch, create a file called goodbye.py with the following contents:
<pre><code>
def goodbye():
    print("Goodbye!")
</code></pre>

<p> In GitHub Desktop, go ahead and commit the changes to the feature2-branch. Now we're ready to merge everything together...
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Merging Branches</h3>

<p> We have two pieces of code, but right now, they exist in separate branches. Go ahead and checkout the main branch again. Once the Current branch tab is set to main, from the very top menu, select Squash and merge into current branch. Select feature-branch, and hit the Squash and merge button. Perfect! Now we have the hello_world.py file in main. Repeat the process with feature2-branch. Now we have the goodbye.py file in main. Branch and merging allows different people to work on modifying a source of truth and merging the changes once they're finished. You should also delete your feature branches once they've been merged. You can open the Branch tab from the top menu, and then right click on the feautre branches you want to delete.
</div>

<h2>Definitions:</h2>

<p><b>Branch (n)</b> 
<ol>
<li>A chain of related commits. This is <b>not</b> the definition understood by git.
<li>A named pointer to a specific commit. Unlike tags, branches have special property: when a branch is checked out, creating a new commit will cause the branch to point to the newly created commit. In this way, branches usually point to the most recent commit in any chain of commits.
</ol>

<p><b>Branch (v)</b> To create a new branch.
    
<p><b>Merge (v)</b> To create a new commit that combines all changes from two branches.
    
<p><b>Checkout (v)</b> To <i>replace</i> the contents of the files that are currently present in the repository with the contents stored in a specific commit. Git has the capability to check out <i>any</i> commit, which lets you temporarily run other versions of the code.
    
<div> <!-- NOTE: this div is a workaround for a jupyter HTML export bug --> </div>

<div>
<h3>Git good practice #1: Make clean commits.</h3>
<br>
    <i>Staging</i> gives us finer control over which changes are included in each commit. First we tell git which changes will be included in the next commit, and then we commit all staged changes at once. In this simple example, it may seem like an unnecessary extra step. If git already knows which files have changed, why not just commit all changes automatically?

<p>When working with any version control system, it is helpful for each code snapshot to introduce changes that are <i>complete</i> and <i>related</i>. Ideally, each new commit should add one new feature, or bugfix, or concept, and the commit should not contain unfinished ideas. Why is this good practice? Sometimes we need to go back through the history of changes to find where a bug was introduced, or to revert a set of changes, or even just to understand the structure of the commit history (which can become quite complex).

<p>But many of us don't program that way--we like to plow through the code, making many unrelated changes here and there. In practice, it is very easy (and common) to lose track of which files you have changed in between commits. Perhaps you added some temporary debugging code that you forgot to remove, or started a new file but forgot to ask git to track it, or forgot about an unfinished piece of code. 

<p><b>Get into the habit of checking the state of your repository before each commit.</b> Carefully review your work before you stage and commit it--errors will be much easier to catch and resolve at this time, while you still remember why you made these changes. Make an effort to organize your changes into logical, complete commits.

<p><b>BUT:</b> at the same time, balance this with the need to develop quickly; sometimes the extra organization really isn't worth the extra effort, especially when you are prototyping a new project (my commit frequency varies between once per day and several times per hour, depending on the task). Over time you will learn what works for you.
<div> <!-- NOTE: this div is a workaround for a jupyter HTML export bug --> </div>
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<h3>Revert commits</h3>

<p> It's possible to revert a particular commit, or to return to any point in the commit history. In the main branch, go to History. Right click on the most recent commit and select "Revert Commit." The goodbye.py file that we added when we merged the feature2-branch is now gone. But don't worry, reverts themselves are commits in the history. So we can revert the revert if we need to. Go ahead and Right Click on the most recent commit (the revert commit) and click revert. We now added another commit in our history that reverts the revert. If we inspect the my_repo folder, we see that the goodbye.py file has returned.
</div>

<div style="border-left: 3px solid #000; border-top: 1px solid #DDD; padding: 20px; padding-left: 10px; background: #F0FAFF; ">

A few quick thoughts:

<ul>
    <li>If you are finding yourself confused, don't worry. Learning to use version control effectively will take time and practice.
    <li>If this seems more complicated than it needs to be, hold on. When we start doing real collaborative development, the benefit should become more apparent.
    <li>An obvious question you may have is: what happens if you change the same file from two different branches? Often merging just works magically in these situations. When it fails, there is a merge conflict and we have many tools (to be discussed next) for dealing with this situation.
</ul>
<div> <!-- NOTE: this div is a workaround for a jupyter HTML export bug --> </div>
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
    
<h2> Using branches and pull requests to code collaboratively</h2>


<ol>
<h3>Steps in your development environment</h3>

<li><strong>Check status</strong>: <code>git status</code> (check what branch you are on, see if up to date with main)</li>
<li><strong>Sync with main branch on remote</strong>: <code>git pull origin main</code></li>
<li><strong>Create feature branch</strong>: <code>git checkout -b feature1</code></li>
<li>Make changes to code</li>
<li><strong>Stage changes</strong>: <code>git add --all</code> </li>
<li><strong>Commit changes</strong>: `git commit -m &quot;add descriptive message&quot;</li>
<li><strong>Pull from main again (and maybe merge)</strong>:<ul>
<li><code>git checkout main</code></li>
<li><code>git pull</code></li>
<li>(if new change then merge)</li>
<li><code>git checkout feature1</code></li>
<li><code>git merge main</code></li>
</ul>
</li>
<li><strong>Push branch to remote</strong>: <code>git push origin feature1</code></li>


<h3>GitHub steps</h3>

<li>Move to Github. Create a Pull request</li>
<li>Review changes (team can review files/commits, comment on the code, recommend additional changes)</li>
<li><strong>Optional: implement revised changes</strong>: repeat steps 4-7</li>
<li>Merge pull request into main (Github will tell you if OK to merge, if not it will mention there are Merge Conflicts, discussed later)</li>


<h3>Back to development environment</h3>


<li><strong>Switch to main</strong>: <code>git checkout main</code></li>
<li><strong>Merge remote main to local</strong>: <code>git pull origin main</code></li>
<li><strong>Good job!</strong></li>
</ol>

<h3>Dealing with Conflicts</h3>

Let's say two developers branch off of main to work on different feature branches. Let's say developer1 modified file1 and merged that modification into the main branch. In the meantime, developer2 also made changes to file1. If developer2 tries to merge their changes into main, git will refuse to merge until the merge conflicts are resolved. To resolve conflicts, developer2 will have to perform step 7 and then:
<p>
Git will throw an error saying there are merge conflicts that need to be resolved and list the conflicted files. Developer2 will have to figure which changes to keep and which to discard. Once the conflicts are resolved, developer2 can perform steps 6 and 8-12.
</div>

<h2>One developer working on a feature</h2>

<img src="resources/git-demo-single.png">

<h2>Collaboration with multiple features</h2>

<img src="resources/git-demo-collab.png">

# Demo1: add multiple features (no conflicts)

In [None]:
# Feature1 --> Add to file1.py

def normalize_image(image, min_value=0, max_value=1):
    """ normalize image to min and max
    Inputs:
        image: 3D numpy array
        min_value: float: minimum value to normalize to
        max_value: float: maximum value to normalize to
    Return:
        normalized_image: 3D numpy array
    """
    normalized_image = (image - image.min()) / (image.max() - image.min())
    normalized_image = normalized_image * (max_value - min_value) + min_value
    return normalized_image


In [None]:
# Feature 2 --> add to file2.py

import numpy as np
from colorsys import hls_to_rgb

def generate_random_colors(n, lightness_range=(0, 1), saturation_range=(0, 1), random_seed=0, order_colors=False):
    '''Get n distinct colors specified in HLS (Hue, Lightness, Saturation) colorspace. Hue is random.

    Inputs:
        n (int) - number of desired colors
        lightness_range (2 value tuple) - desired range of lightness values (from 0 to 1)
        saturation_range (2 value tuple) - desired range of saturation values (from 0 to 1)
        random_seed (int) - seed for random number generator (ensures repeatability)
        order_colors (bool) - if True, colors will be ordered by hue, if False, hue order will be random

    Returns:
        list of tuples containing RGB values (which can be used as a matplotlib palette)
    '''
    np.random.seed(random_seed)
    colors = []

    hues = np.random.rand(n)
    if order_colors:
        hues = np.sort(hues)

    for hue in hues:
        lightness = np.random.uniform(lightness_range[0], lightness_range[1])
        saturation = np.random.uniform(saturation_range[0], saturation_range[1])
        colors.append(hls_to_rgb(hue, lightness, saturation))

    return colors


# Demo 2: Add features (with conflicts)

In [None]:
# Developer1 Adds function to file1.py

import os 
from pathlib import Path

def find_files_with_string(base_dir):
    """Search all subfolders for files with Average in name"""
    all_files = []
    for root, dirs, files in os.walk(base_dir):
        for file in files:
            if "average" in file:
                all_files.append(Path(os.path.join(root, file)))
    return all_files


In [None]:
# Developer2 ALSO works on the same function and adds to file1.py

def find_files_with_string(base_dir, string="average"):
    """Find all files with string in name in all subfolders of base_dir.

    Parameters:
    -----------
    base_dir : str
        path to base directory
    string : str
        string to search for in file names
    
    Returns:
    --------
    all_files : list of str
    """
    all_files = []
    # search all subfolders for files with string in name
    for root, dirs, files in os.walk(base_dir):
        for file in files:
            if string in file:
                all_files.append(Path(os.path.join(root, file)))
    return all_files