**Excercise 0.0**: Why do you think we need to track (`git add`) the `.gitignore` file?


As we saw during the tutorial, the repository will only contains the files that we explicitly add to it via `git add`. Even "special" files like `.gitignore` need to be explicitly added to the repository, so that when someone else clones the repository, she can benefit from the file-ignoring setup that you have decided to use.


**Excercise 0.1**: Can you think of (at least) two reasons why we typically don't want to track binary files in a git repository? 


There are several reasons why we don't want to track binary files in a git repository:

* Git represents the state of our repository as a snapshot of all the files tracked in the repository at a given commit. This is different from other VCS tools where the state of each file is represented as a series of deltas (changes). This design choice enables git users to change branches and clone repositories very quickly, provided *the sizes of the files in the repositories are not too large*. In general, compiled code or media binaries tend to be larger in size than the text representation of the code that generated them, so it is preferable to keep large binary files out of git's snapshots to honor this design philosophy. You can read more about how git stores data [here](https://stackoverflow.com/questions/8198105/how-does-git-store-files).

* As we will see later, two of the most powerful tools git provides are the ability to `diff` (compare) files and the ability to perform conflict resolution whenever there are conflicting commits between two branches. Both these tools rely on comparing files line by line; this is because in most software code carriage returns carry the semantics of structuring different parts of the code. Since binary files don't necessarily use carriage returns to organize its contents, they usually lead to uninterpretable diffs.

* In many cases we only care about the current (or the last few) states of a binary file and not all of its intermediate states. Git is designed to store all the intermediate history of tracked files, which means that the size of the git index (where git stores all this information) can become very large very quickly if large binaries that change frequently are tracked.

* There are better tools to store and version large binaries. See a few options [here](https://www.perforce.com/blog/storing-large-binary-files-in-git-repositories).


**Exercise 0.3**: Let's practice a bit more with the concepts of tracking, staging and committing. To this end, please try to follow these steps in order:
1. Create two empty new files, start tracking and commit them (`A.txt` and `B.txt`). 
2. Change a few lines in `A.txt` and `B.txt` and stage the changes.
3. *Unstage* `B.txt`. Hint: Google `git reset HEAD`. 
4. Now commit the changes in `A.txt`
5. Stage `B.txt` and commit it (in a different commit!)

In [None]:
%%bash
# 1)
rm -Rf exercise_0_3
mkdir exercise_0_3
git init exercise_0_3
cd exercise_0_3
echo "A" > A.txt
echo "B" > B.txt
git add -A
git commit -m 'my two new files!' # you could also skip the `git add -A` and run `git commit -a -m 'my two new files' instead`
git status

In [None]:
%%bash
# 2)
cd exercise_0_3
echo "A2" >> A.txt
echo "B2" >> B.txt
git add -A
git status

In [None]:
%%bash
# 3)
cd exercise_0_3
git reset HEAD B.txt
git status

In [None]:
%%bash
# 4)
cd exercise_0_3
git commit -m 'committing changes only in A.txt'

In [None]:
%%bash
# 5)
cd exercise_0_3
git add B.txt
git commit -m 'now commiting changes in B.txt'

**Exercise 1.0**: Show the commit history only for the file (aka. path) `test.py`. 

In [None]:
%%bash
cd ../tutorial/my_first_repo
git log test.py

**Exercise 1.1**: Show the commit history only for the last two commits.

In [None]:
%%bash
cd ../tutorial/my_first_repo
git log -n 2

**Exercise 1.2**: Show the commit history only for the first two commits.

In [None]:
%%bash
cd ../tutorial/my_first_repo
git log  --reverse --pretty=oneline | head -2

**Exercise 1.4**: You will now modify the `my_first_repo/plot_compare_reduction.py` file so that:
1. The values in the variable `C_OPTIONS` are `[1, 10, 100, 500]`
2. Instead of plotting a figure, it saves the chart in a PNG file (hint: [plt.savefig](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html))

It's important that you stage and commit **each of changes separately**. Make sure that you accomplished what you wanted by:
1. Checking each diff relative to each previous commit.
2. Running the resulting code


In [None]:
%%bash
cd ../tutorial/my_first_repo/
# makes the changes in C_OPTIONS, line 52
sed '51s/\[1, 10, 100, 1000\]/\[1, 10, 100, 500\]/' plot_compare_reduction.py > plot_compare_reduction.py.tmp
mv plot_compare_reduction.py.tmp plot_compare_reduction.py
git commit -a -m 'Updating C_OPTIONS to desired values'

In [None]:
%%bash
cd ../tutorial/my_first_repo
# makes the changes in plt.plot(), line 91
sed '90s/plt.show()/plt.savefig("plot_compare_reduction.png")/' plot_compare_reduction.py > plot_compare_reduction.py.tmp
mv plot_compare_reduction.py.tmp plot_compare_reduction.py
git commit -a -m 'Outputting figure into PNG file'

In [None]:
%%bash 
cd ../tutorial/my_first_repo
git diff 5fe3763~1 5fe3763 # <-- you will need to replace this SHAs with the
                           # ones you obtained in earlier cells
git diff 5fe3763 b98eac6

In [None]:
%%bash
cd ../tutorial/my_first_repo/
python plot_compare_reduction.py

**Exercise 1.5**: Can you visualize the differences between those two branches? (the `$USER/experiment` branch and the `master` branch) (hint: you can use git diff)

In [None]:
%%bash
cd ../tutorial/my_first_repo/
git diff master $USER/experiment
# No differences at this stage! Let's add a small change to the 
# $USER/experiment branch and see the output

In [None]:
%%bash
cd ../tutorial/my_first_repo/
#git checkout $USER/experiment
echo 'print("Done!")' >> plot_compare_reduction.py
git commit -a -m 'adding done log at the end of the script'

In [None]:
%%bash
cd ../tutorial/my_first_repo/
# Now we see the difference between the tip of the two branches
git diff master $USER/experiment

**Exercise 1.6**: Let's practice moving in and out of branches... it's a bit of work but we'll use what you do here later to work on `merge` and `rebases`. Please do the following:
1. Make a few changes in the experiment code (for example: try another classifier or dimensionality reduction method) while working on the `$USER/experiment` branch. 
2. Introduce confidence intervals to the bar charts while working on the `$USER/add_cis_to_plot` branch

**Important**: make sure you develop each change in the right branch, and that you stage and commit your work incrementally. Incremental commits are in general preferable over big long ones.

*Questions*:
1. After you've made the changes, what are the differences between the `$USER/experiment` and the `$USER/add_cis_to_plot` branches? 
2. Are there any conflicting changes? (a conflicting change is a change peformed on the same line of code)

In [None]:
%%bash
# First we will make the changes from step (1). For the sake of giving
# a solution, we provide some example changes in:
# resources/plot_compare_reduction_experiment.py, you should feel free
# to apply the changes you would prefer.
cd ../tutorial/my_first_repo/
git checkout $USER/experiment
cp ../../solutions/resources/plot_compare_reduction_experiment.py plot_compare_reduction.py
git commit -a -m 'Experimenting with SGDClassifier and Truncated SVD'
git diff master

In [None]:
%%bash
# Now we perform the changes for Step (2). Again we will use a pre-prepared
# resource as an example: resources/plot_compare_reduction_w_ci.py,
cd ../tutorial/my_first_repo/
git checkout $USER/add_cis_to_plot
cp ../../solutions/resources/plot_compare_reduction_w_ci.py plot_compare_reduction.py
git commit -a -m 'Adding confidence Intervals'
git diff master

In [None]:
%%bash
# Question 1: After you've made the changes, what are the differences between the
# $USER/experiment and the $USER/add_cis_to_plot branches?
cd ../tutorial/my_first_repo/
git diff $USER/add_cis_to_plot..$USER/experiment

**Exercise 1.7**: We will now create a situation where reverting can create a mess. Please follow the next steps of havoc-making:
0. Create and checkout new branch off master called `revert_havoc`
1. Create a new python module `dependency.py` with a function named `my_dependency()`. Stage and commit the changes.
2. Now make a call to that function somewhere in `plot_compare_reduction.py` (Don't forget to import the module first). Stage and commit those changes.
3. Now revert the commit created in Step 1. What changes does it induce? Will your code work after the revert?

In [None]:
%%bash
cd ../tutorial/my_first_repo/
git checkout -b revert_havoc master
printf "def my_dependency():\n    print('Dependency called!')\n" > dependency.py
git add dependency.py
git commit -m 'adding my dependency'

In [None]:
%%bash
cd ../tutorial/my_first_repo/
printf "\nimport dependency\ndependency.my_dependency()\n" >> plot_compare_reduction.py
# If you run your script, it should work as is and you should see 
# `Dependency called!` on std out:
# python plot_compare_reduction.py
git commit -a -m 'adding call to dependency from plot_compare_reduction.py'

In [None]:
%%bash
cd ../tutorial/my_first_repo/
# now we revert the first change
git revert 4c778d2 # <-- replace this by the commit sha of your first change

In [None]:
%%bash
cd ../tutorial/my_first_repo/
# But oh well... the changes from your second commit are still there... 
tail plot_compare_reduction.py
# so your code will fail:
python plot_compare_reduction.py

**Exercise 2.1**: Pair up with another student and add his repository as one of your remotes.

In [None]:
%%bash
cd ../tutorial/my_first_repo/
git remote -v
# suppose the next is my colleague's repo:
git remote add my_friend git@github.com:atibaup/my_friends_repo.git

**Exercise 2.2**: Fetch from your colleague's remotes which you added in Exercise 2.1.

In [None]:
%%bash
cd ../tutorial/my_first_repo/
# This will fail unless there is a master branch in your colleague's repo
git fetch my_friend master 