**2023-01-19 `07-Project one - Day 1 - Projects & Collaboration with Git`**

**Objectives**

* Articulate the requirements for Project 1.
* Draw and interpret diagrams of Git branching workflows.
* Create new branches with Git.
* Push local branches to GitHub.
* Delete branches with Git.

**Presentation**
* [07.1 Projects & Collaboration with Git](https://ucb.bootcampcontent.com/UCB-Coding-Bootcamp/UCB-VIRT-DATA-PT-11-2022-U-LOLC/-/blob/main/slides/Data-07.1_Projects_and_Collaboration_with_Git.pdf)

**Supplemental**
* [Branch Workflow](https://ucb.bootcampcontent.com/UCB-Coding-Bootcamp/UCB-VIRT-DATA-PT-11-2022-U-LOLC/-/blob/main/07-Project-1-Week-1/1/Supplemental/BranchWorkflow.md)
* [Git Recipes](https://ucb.bootcampcontent.com/UCB-Coding-Bootcamp/UCB-VIRT-DATA-PT-11-2022-U-LOLC/-/blob/main/07-Project-1-Week-1/1/Supplemental/GitRecipes.md)

# ==========================================

### 1.01 Instructor Do: Intro to Git (0:30)

# Workflow Diagrams

Imagine you're working on a Git project.

So far, you've made three different commits, all on your `main` branch. We'd write this something as follows:

```bash
(main) | [m1] -> [m2] -> [m3]
```

…Where `[m1]` is the first commit on the `main` branch, `[m2]` is the second, etc. The `m` comes from the fact that these commits are on the **m**aster branch.

## Branching

Whenever you want to either _add something new_ or _fix something broken_, you should create a new branch for your work.

Consider the illustration of the `main` branch above. All of the work in the commits `[m1]`, `[m2]`, and `[m3]` happened in sequence: First we did the work in `[m1]`; then the work in `[m2]`; and, finally, the work in `[m3]`.

Let's imagine that we've been working with Uber ride data, and we're interested in finding out whether there's a correlation between a rider's age and the time they request a driver.

Let's say that, in `[m3]`, we've finally managed to use Pandas to massage our data into just the shape we need to start analyzing it. Our next task is to write the Python that actually analyzes this newly well-formed data.

Obviously, this will take a lot of testing and debugging to get right. Since this will take a lot of experimentation, debugging, and discussion with colleagues, it's a good opportunity to create a new branch.

```bash
(main) | [m1]-[m2]-[m3]
                      \
(data_analysis)        \ -> …
```

Note that ellipsis. Those `…` indicate that we've _created_ the `data_analysis` branch, and also **checked it out** (i.e., "moved" to it), but that we haven't actually done any work yet.

Remember: When we create a new branch, the files on the new branch are _the same_ as the files on the branch we were on immediately before. In this case, the files on `data_analysis` are the same as the files in `[m3]`, _until we change and commit something_.

Let's say we finish our analyses of riders' ages—determining the average age of riders in different regions, etc—and decide this is a good point to stop and commit our changes.

```bash
(main) | [m1]-[m2]-[m3]
                      \
(data_analysis)        \ -> [sb1]
```

Now, the code on our `[sb1]` branch has the cleanup code from `[m3]`, and _also_ the code for analyzing age data. Emphatically, `[m3]` does _not_ have code for analyzing age data.

This is an extremely important concept. Now that we've switched to the `data_analysis` branch, changing files and committing things _will not_ change main, _at all_. Everything we do applies _only_ to `data_analysis`.

## Merging

After we finish analyzing age data, we'll want to update `main` with the new code from `data_analysis`.

The most common way to do this is via [merge](https://git-scm.com/docs/git-merge).

Merging takes the changes you've made on one branch, and integrates them with one another.

So, if we add a `helpers.py` file in the `data_analysis` branch, then merge `data_analysis` with `main`, `main` will also have the most recent version of `helpers.py` you committed.

```bash
(main) | [m1]-[m2]-[m3]--------------[m4]
                      \               / (M)
(data_analysis)        \-[sb1]-[sb2]-/
```

Now, we've made one more commit to the `data_analysis` branch, in `[sb2]`. Then, we **merge** it into main. This means that `[m4]` has all the files from `[m3]`, _plus_ any changes and new files from `[sb2]`.

# ==========================================

### 1.02.1 Everyone Do: Creating a Project Repo (0:10)

* In your group hoose _one_ member to do this steps. This will be the repo that the group shares through projects.

* Go to [GitHub](https://github.com/), and click on the plus button in the top right to create a new repo.

  ![Creating a new repo on GitHub.](../Images/03-add-repo.png)

  * Fill out the fields on the new repo page.

  * You _should_ initialize with a `.gitignore`.

  * You should choose `Python` in the gitignore dropdown.

  * You should edit the `.gitignore` file and add:

  ``` python
  # DS_Store
  .DS_Store
  ```

  ![New project configuration.](../Images/03-new-project.png)

  * This is sufficient to create a repository that everyone can share.

* After creating your group's repository go and send the remote URL (i.e., the link to the repo) to your teammates.

  * Team members will `git clone` this link.

* By default, only the creator of the repo can push changes.

* Add **collaborators**.

  * Navigate to the repository settings.

  ![Repository settings](../Images/03-settings.png)

  * Navigate to the collaborators tab, and enter your password when prompted.

  ![Repository collaborators](../Images/03-collaborators.png)

  * From here, you can search for your teammates by username.

  ![Adding collaborators](../Images/03-add-collaborator.png)

  * Everyone in your group should now be able to make changes to the shared repo.

* Remember that _everyone in the group must clone the new repository_.

  * Make sure that everyone has done this before moving on.

# ------

### 1.02.2 Students Do: Workflows (0:10)

# Review Questions

This document contains review questions for Git basics.

## Instructions

For the diagramming exercises below, either **draw your solutions on paper**, or use the interface provided at [Git Viz](https://peleke.github.io/git-viz/).

### Overview

* Consider the example from the lecture, where we created a branch for our data analysis. Why did we create a new branch for this? Why _not_ simply do this on `main`?

* Write down two advantages to creating branches instead of working directly on `main`.

- - -

#### Branching

* **For the exercises below, consider the following commit history:**

  ```bash
  (main) | [m1] -> [m2] -> [m3] -> [m4]
  ```

* Draw a new branch called `plotting_data`. It should branch from the second commit to `main`.

* When you first create `plotting_data`, are the files on that branch the same as the files in `[m1]`? In `[m2]`? Why, or why not?

* Add two commits to the `main` branch.

* Add two commits to the `plotting_data` branch, named `[pd1]` and `[pd2]`.

* Are the files in `[pd1]` and `[m3]` the same? Why, or why not?

- - -

### Merging

* Merge `[pd2]` with `main`.

* Explain how this merge changes the files in `main`.

* **For the problems below, consider the following history.**

  ```bash
  (main)        | [m1]-[m2]-[m3]-[m4]- - -- - -- - -[m5]
                                \               / (M)
  (plotting_data) |              \-[pd1]-[pd2]-/
  ```

* Assume `[m4]` on `main` updates `clean_data.py`, but doesn't change the directory structure.

* Assume the root project directory looks as follows at each commit:

  ```bash
  [m4]
  root/
    |_analyze_data.py
    |_clean_data.py
    |_output/
      |_cleanedRideData.csv
    |_Resources/
      |_rideData.csv

  [pd2]
  root/
    |_analyze_data.py
    |_clean_data.py
    |_helpers.py
    |_plot_data.py
    |_output/
      |_cleanedRideData.csv
      |_plots.pdf
    |_Resources/
      |_rideData.csv
  ```

* When we merge `main` and `plotting_data`, which version of each file do we get?

* Draw the directory structure for the last commit to `main`—after the merge—and label each file with the branch it originated. Assume that the only files changed on `plotting_data` were `helpers.py` and `plot_data.py`.


# Solved
# Review Questions

This document contains review questions for Git basics.

## Instructions

### Overview

* **Problem**: Consider the example from the lecture, where we created a branch for our data analysis. Why did we create a new branch for this? Why _not_ simply do this on `master`?

* **Solution**: Doing the work directly on `master` would make it harder to keep that work organized. Branches let us keep our work sandboxed and organized.

* **Problem**: Write two advantages to creating branches instead of working directly on `master`.

* **Solution**:

  1. We can isolate our experiments to a single branch—if we break something on our `data_analysis` branch, we at least know that the code on `master` still works.

  2. We can focus on writing _one_ new feature at a time, instead of having work for a handful of different features and bugfixes in a single branch.

- - -

### Branching

For the problems below, consider the following commit history:

  ```bash
  (master) | [m1] -> [m2] -> [m3] -> [m4]
  ```

* **Problem**: Draw a new branch, called `plotting_data`. It should branch from the second commit to `master`.

* **Solution**:

  ```bash
  (master)        | [m1]-[m2]
                          \
  (plotting_data) |         \-[pd1]-[pd2]
  ```

* **Problem**: When you first create `plotting_data`, are the files on that branch the same as the files in `[m1]`? In `[m2]`? Why, or why not?

* **Solution**

The files on `plotting_data` are the same as the files in `[m2]`.

This is because we created and checked out `plotting_data` while we were on `[m2]`. This means our files will look like they did when we last committed to `[m2]`, but that Git tracks any further changes to files on `plotting_datas`, _not_ `master`.

* **Problem**: Add two commits to the `master` branch.

* **Solution**:

  ```bash
  (master)        | [m1]-[m2]-[m3]-[m4]
                          \
  (plotting_data) |         \-[pd1]-[pd2]
  ```

* **Problem** Add two commits to the `plotting_data` branch, named `[pd1]` and `[pd2]`.

* **Solution**

  ```bash
  (master)        | [m1]-[m2]-[m3]-[m4]
                          \
  (plotting_data) |         \-[pd1]-[pd2]
  ```

* **Problem**: Are the files in `[pd1]` and `[m3]` the same? Why, or why not?

* **Solution**: No. The code in `[pd1]` and `[m3]` are _not_ the same. Any two given branches should contain _different_ work, so `[m3]` probably contains a patch or bugfix totally unrelated to `[pd1]`.

- - -

### Merging

* **Problem**: Merge `[pd2]` with `master`.

* **Solution**:

  ```bash
  (master)        | [m1]-[m2]-[m3]-[m4]- - --[m5]
                          \                /
  (plotting_data) |         \-[pd1]-[pd2]-/
  ```

* **Problem**: Explain how this merge changes the files in `master`.

* **Solution**: `master` now has the most recent changes to `clean_data.py` made on _either_ the `master` or `plotting_data` branch. It will also have the new files add in the `plotting_data` branch.

* **For the problems below, consider the following history.**

  ```bash
  (master)        | [m1]-[m2]-[m3]-[m4]- - -- - -- - -[m5]
                                \               / (M)
  (plotting_data) |              \-[pd1]-[pd2]-/
  ```

  * Assume `[m4]` on `master` updates `clean_data.py`, but doesn't change the directory structure.

  * Assume the root project directory looks as follows at each commit:

    ```bash
    [m4]
    root/
      |_analyze_data.py
      |_clean_data.py
      |_output/
        |_cleanedRideData.csv
      |_Resources/
        |_rideData.csv

    [pd2]
    root/
      |_analyze_data.py
      |_clean_data.py
      |_helpers.py
      |_plot_data.py
      |_output/
        |_cleanedRideData.csv
        |_plots.pdf
      |_Resources/
        |_rideData.csv
    ```

* **Problem**:

  1. When we merge `master` and `plotting_data`, which version of each file do we get?

  2. Draw the directory structure for the last commit to `master`—after the merge—and label each file with the branch it comes from. Assume that the only files changed on `plotting_data` were `helpers.py` and `plot_data.py`.

* **Solution**:

  ```bash
  [m5]
  root/
    |_analyze_data.py (master)
    |_clean_data.py (master)
    |_helpers.py (plot_data)
    |_plot_data.py (plot_data)
    |_output/ (mixed)
      |_cleanedRideData.csv (master)
      |_plots.pdf (plot_data)
    |_Resources/ (master)
      |_rideData.csv
  ```

- - -

# ==========================================

### 1.03 Everyone Do: Creating Branches (0:10)

# Branch Demo

## 0. Getting the Repo

Before we can work with Git, we must either **create a new repository**, or **clone one from GitHub**.

Note that, in the examples below, we use `git status` before every `git commit`. This is a best practice that helps ensure a deliberate commit history. For brevity's sake, this line will be omitted in future files, **but assume we've always run `git status` before any `git commit`**.

### Clone from GitHub

If someone has already shared a repository on GitHub, you can **clone** it to your local machine with \`git.

```bash
# Clone an existing repo.
git clone <repo_url>
# Navigate into newly created repo directory
cd <repo_name>
```

## 1. Add Files

Next, we simply develop as normal, and `commit` our changes whenever we make significant progress.

In general, it's best to **commit early** and **commit often**. Frequent snapshots ensure you'll never be far away from a "last working version".

```bash
# Create a file, called clean_data.py
touch clean_data.py

# Add and commit clean_data.py...
git add clean_data.py
git status
git commit -m "First commit."

# Add cleanup code to clean_data.py...
git add clean_data.py
git status
git commit -m "Clean up provided data."

# Add code to export clean data...Note that `add .` adds
# everything in the current folder
git add .
git status
git commit -m "Export clean data as CSV."
```

## 2. Create Branches

To create a new, isolated development history, we must create **branches**.

```bash
# Create new branch and switch to it
# Long form: `git checkout --branch data_analytics`
git checkout -b data_analytics
```

Alternatively, we can create a branch and then switch to it as two separate steps, though this is uncommon.

```bash
git branch new_branch_name
git checkout new_branch_name
```

Once we've created a new branch, we can develop as normal:

```bash
# Create file to contain data analysis
git add analysis.ipynb
git status
git commit -m "Add Jupyter Notebook for data analysis."

# Add notebook cells summarizing data
git add analysis.ipynb
git status
git commit -m "Add summary tables to Jupyter Notebook."

# Export analyzed data and/or plots
git add .
git commit -m "Export analysis results and save plots as PNG files."
```

### 3. Merge

Once we've developed and tested the changes on our `data_analysis` branch, we can include them in `main` by **merging** the two branches.

```bash
# Move back to main
git checkout main

# Merge changes on data_analysis with code on main
git merge data_analysis

# Delete the data_analysis branch
git branch -d data_analysis
```

**N.b.**, deleting the `data_analysis` branch isn't necessary, but it's best practice to prune unneeded branches.


# ==========================================

### 1.04 Student Do: Working with Branches (0:10)

## Instructions

### Diagramming

Refer to the series of `git` commands your instructor walked through in lecture. Draw a branch diagram describing the commits on `main` and `data_analysis`.

### Practicing Branch Workflows

Next, get some practice working with branches by following these instructions:

1. Create a new directory, and initialize a Git repo inside it.

2. Create a `hello.py` that simply prints `"Hello, World"`.

3. Add and commit your `hello.py`.

4. Create a new branch, called `helpers`.

5. Create a file called `helpers.py`.

6. Inside of `helpers.py`, write a function, called `greet`, that accepts a name and prints: `"Hello, <name>"`. For example, `greet("Jane")` would print: `"Hello, Jane"`.

7. Add and commit your changes.

8. Inside of `hello.py`, import `greet` from `helpers.py`, and use it print `"Hello, World"`.

9. Move back to your `main` branch.

10. Merge `main` with your `helpers` branch.

11. Delete your `helpers` branch.


# branches.sh
```bash
# Create a new directory, and initialize a Git repo inside of it.
mkdir git_practice
cd git_practice

# Create an `hello.py`. In the page body, put a heading with the text `"Welcome"`, and a paragraph with Lorem text.
touch hello.py

# Add and commit your `hello.py`.
git add hello.py
git commit -m "First commit."

# Create and checkout a new branch, called `helpers`.
git branch helpers
git checkout helpers
# Or: git checkout --branch helpers

# Add greet function to helpers.py
git add helpers.py
git commit -m "Add greet function to helpers.py."

# Update hello.py
git add hello.py
git commit -m "Refactor hello.py to use greet function."

# Move back to your `main` branch.
git checkout main

# Merge `main` with your `helpers` branch.
git merge helpers

# Delete your `helpers` branch.
git branch -d helpers

```

# diagram.txt
```
(master)       | [m0]-[m1]-[m2]------------------[m3]
                            \\                     /
(data_analysis) |             -[da0]-[da1]-[da2]-/
```

# hello.py
```python
"""Hello

This module simply prints "Hello, world" to the console.

Example:

  $ python hello.py

"""

from helpers import greet

greet("World")

```

# helpers.py
```python
"""Helpers

This module contains utility functions for the Git demonstration.
"""


def greet(name):
    print(f"Hello, {name}.")

```

# ==========================================

### 1.05 Instructor Do: Push (0:10)

push.sh

```bash
# Create a git repository somewhere...
# ...Then, track a remote called origin to your local repo
# Skip if you've cloned a repo!
# git remote add origin <repo_url>

# Switch to main
git checkout main

# Now, push the main branch to GitHub
git push origin main

# Create and checkout a new branch
git checkout -b push_example

# Push new branch to GitHub
git push origin push_example

# Switch to main
git checkout main

# After others have pushed their branches to GitHub...
git pull

```

setUpstream.sh

```bash
# Switch to main
git checkout main

# Set origin/main as the default branch to push to
# when on main
git push origin main -u

# Now, when we're on main, this is the same as:
# 'git push origin main'
git push

# Switch to push_example
git checkout push_example

# Set origin/push_example as the default branch to push to
# when on push_example
git push -u origin push_example

# Now, when we're on push_example, this is the same as:
# git push origin push_example
git push

```

# ==========================================

### 1.06 Students Do: Remotes & Push (0:10)

## Instructions

* Create a new GitHub repo to associate with the local Git repo you've been working on thus far. Add it as the `origin` remote.

* Next, checkout your `master` branch, and push it to GitHub.

* Create a new branch, and change something in your project (edit one of your existing files, add a new one, etc). Add and commit this change, and push your new branch to GitHub.

* Checkout `master`, and merge it with the branch you just created.

* Push the updated `master` branch to GitHub.

## Bonuses

* Use the `-u` flag the first time you use `git push` to set default upstream branches for `master` and the branch you create.


# solution.sh
```bash
# Create a new GitHub repo to associate with the local Git repo you've been working on thusfar. Add it as the `origin` remote.
git remote add origin <github_repo_url>

# Next, checkout your `main` branch, and push it to GitHub.
git checkout main
git push origin main
# Or: git push -u origin main

# Create and checkout a new branch
git branch add_gitignore
git checkout add_gitignore
# Or: git checkout -b add_gitignore

# Change something in your project-this adds a .gitignore file
echo ".DS_STORE" > .gitignore
git add .gitignore
git commit -m "Add .gitignore file."

# Push this branch to GitHub
git push -u origin add_gitignore

# Checkout `main`, and merge it with the branch you just created.
git checkout main
git merge add_gitignore

# Push the updated `main` branch to GitHub.
git push
# Or: git push origin main

```

# ==========================================

### BREAK (0:10)

# ==========================================

# Everyone Do: Project Work (0:45)
* Spend the remainder of class working with your groups on your project.

# ==========================================

### Rating Class Objectives

* rate your understanding using 1-5 method in each objective

In [None]:
title = "07-Project one - Day 1 - Projects & Collaboration with Git"
objectives = [
    "Articulate the requirements for Project 1",
    "Draw and interpret diagrams of Git branching workflows",
    "Create new branches with Git",
    "Push local branches to GitHub",
]
rating = []
total = 0
for i in range(len(objectives)):
    rate = input(objectives[i]+"? ")
    total += int(rate)
    rating.append(objectives[i] + ". (" + rate + "/5)")
print("="*96)
print(f"Self Evaluation for: {title}")
print("-"*24)
for i in rating:
    print(i)
print("-"*64)
print("Average: " + str(total/len(objectives)))