# Advanced Command Line Git
## Helpful tips and tricks
Here, we'll be integrating a data science workflow with git.  There are two main things to remember.

**1.  Before doing any git work, make sure you've SAVED your notebook.  You can do this by typing CTRL+S or Command+S, clicking on the floppy disk, or File->Save.**

**2.  We're using a package called `nbdev` to save us a bit of struggle with using Jupyter notebooks.  Run the nbdev commands before anything that would require you to push to the remote repository.  Reload the page after running these commands.  If the browser asks you whether you want to reload or overwrite, choose RELOAD.**

**References:** This notebook is based on the book "Oh Shit, Git!" by Katie Sylor-Miller and Julia Evans.  For more information, see the [Oh Shit, Git!](https://ohshitgit.com)!

Here, we'll use the seaborn planets dataset and do a little investigating analysis using Pandas.  If you've forgotten or are unfamiliar with any of the code, please feel free to _directly reference_ (i.e., copy/paste) the solutions or adapt from the reference code [here.](https://github.com/vanderbilt-data-science/srp-python-2020/blob/master/04.00-DS-with-Python-solns.ipynb)

In [None]:
#import statements
import pandas as pd
import numpy as np
import seaborn as sns

## Loading and basic data overview

In [None]:
#load data
planets = sns.load_dataset('planets')

In [None]:
# Get a preview of the data
planets.head(10)

In [None]:
# Get some info
planets.info()

In [None]:
# Stats about the dataset
planets.describe()

In [None]:
#Count na values
planets.isnull().sum()

## Working with git: warmup
Here, we're going to just review how to add and commit changes!

Add an extra 0 to the following line: so that the `10` at the end is `100`.  Run that cell and the cell that follows.  Save the document.  Now, we'll commit that change.  We do this by typing the following commands:

```
git add .
git commit -m "added decade column to divide by 100"
git push origin master
```

In [None]:
#mutate new column decade
planets['decade'] = 10*(planets['year']//10)

## Working with git: merge conflicts
### Handling missing values
In this section, we'll look at two ways to deal with missing values.  One way is just to completely drop the missing values.  Another way might be to use mean imputation.

**Partner 1**: Fill in the following cell with code to drop rows with any missing values (see reference cell 16).  
**Partner 2**: Fill in the following cell with the code to do mean imputation to fill in the missing values (see reference cell 13).

You can do both of these things inplace using the `inplace` keyword.

In [None]:
%%bash
source /home/bellcs1/miniconda3/etc/profile.d/conda.sh
conda activate git-python-workshop
nbdev_clean_nbs

The following code can be used to resolve the merge conflict if desired.

In [None]:
%%bash
source /home/bellcs1/miniconda3/etc/profile.d/conda.sh
conda activate git-python-workshop
nbdev_fix_merge 10-git-python.ipynb

Don't forget to refresh the page after running the above cell! 

**Partner 1**: add, commit, and push your changes.  
**Partner 2**: add and commit your changes.  After committing, run `git pull`.  Oh no, a merge conflict!  You might need your partner to help you with the following instructions.  Does your notebook even open anymore?  Go back to to `setup.ipynb` and run the last cell, or use the cell in the solutions.  Now, try to open your notebook.

In it, you should see a set of arrows separated by a code chunk, another set of arrows, another code chunk, and another set of arrows.  These are the conflicting areas of your code.  The code between the first set of arrows is your code, and the code between the second set of arrows is the remote repo code.  Choose to keep the cell which performs the mean imputation.  Delete all of the surrounding arrows.

Now that that has been resolved, save, run the following cell, and then add, commit, and push your changes.

**Partner 1:** run `git pull` after Partner 2 has finished to get the most recent version of the code.

In [None]:
%%bash
source /home/bellcs1/miniconda3/etc/profile.d/conda.sh
conda activate git-python-workshop
nbdev_clean_nbs

## Working with git: `git stash`
There are many scenarios in which you'd want to use git stash.  Sometimes, you can be working on some particular part of the code and there are some changes in the remote repository that you want to pull down now.  However, you're midway through your work and don't want to make a commit of your local changes.

Another scenario is that you're midway working through some code changes, and realize that you want to make a commit of some files or parts of files and not all of them.  Here's how you can do that!

**Partner 1:** Fill in the `#Partner 1` cell to compute the two columns `orbital_yr` and `orbital_weeks` (cell 31 of reference).  When you're finished, add, commit, and push your changes to the remote repository.  
**Partner 2:** Fill in the `#Partner 2` cell to generate the `method_min` column (cell 34 of reference).  **After** your partner has committed their changes, type the following:

```
git stash
git pull
```

Now, you'll have the new code from the remote repo.  Now, you can type `git stash pop`.  Resolve merge conflicts if necessary.  Now, add, commit, and push your changes.

**Partner 1:** Make sure to pull the fresh changes from the repo.

In [None]:
#Partner 1


In [None]:
%%bash
source /home/bellcs1/miniconda3/etc/profile.d/conda.sh
conda activate git-python-workshop
nbdev_clean_nbs

In [None]:
#Partner 2


In [None]:
%%bash
source /home/bellcs1/miniconda3/etc/profile.d/conda.sh
conda activate git-python-workshop
nbdev_clean_nbs

## Working with git: changing commit messages
Sometimes, you make a mistake in your commit message locally and you want to change it!  Let's see how we can do this.  First, we'll make a small change to the code and do a commit on our local branch.

The change we'll make is to mutate a new column called `decade`.

In [None]:
#mutate new column decade


Use the commit message "closes issue #3"

#### Changing commit messages
If you use GitHub Issues, you may know that you can close issues with commits.  However, due to formatting issues, the above commit message would not close the commit!  You may catch this before you push up to the remote repository.  To change the last local commit message at the command line, type:

`git commit --amend -m "adds decade col to df and closes #3`

This amends your most recent local commit with the commit message you specify using the `-m` command.

#### Changing commit message after pushing to the remote
You can also change the commit message after pushing to the remote, but this isn't a really great thing for a collaborative repository.  Changing the commit message also changes the commit hash, and so already pulled versions on your collaborators' local machines will have a different commit history.  Proceed at your own peril!  Descriptions of how to do this will not be provided here!

## Working with git: Removing uncommitted changes

There are many scenarios in which you'd want to remove your uncommitted changes.  Maybe you've changed a file and your pursuits didn't work out.  Maybe you haven't made many relevant changes and just want to replace the code with the original.  Here's how you can do that!

Whoops, maybe we made the following change in a cell, incorrectly computing something.  Maybe there's a whole trail of errors and we just want to get rid of it, and replace it with the most recent local version in the repo.

**Try it yourself!**  Fill in the following cell with code to compute fictitious columns `x10` and `dist_mi` (cell 32 of reference)

Having decided that we want to trash these uncommitted changes, we can just run the following command:
`git checkout origin/master 10-git-python-solns`.  This will give us the version of the code from `origin/master` in our local git repo.  Run this command now and check that the previous cell is now empty.

If we made LOTS of changes in lots of files and wanted to trash them, we could instead, just use our local branch of interest (here `origin/master`) by `git reset --hard origin/master`.

## Working with git: reverting commits
If you've noticed, the first commit we ever made was a mistake!  Decade should be divided by 10, not 100!  How can we undo this change?

We can use `git revert`.  This will create a patch that will undo that change and add it as a new commit.  To do this, we need to first find the commit hash in which we did this.  We can use `git log` to do this.

**Try it yourself!**
Find the commit that you want to revert using `git log`.  Read the commit messages; this is why it is extremely helpful to write good commit messages.  Find the first 8 digits of the commit hash.  In this example, the commit hash is `f845de23`.  Then type:

`git revert f845de23`

Now, enter a commit message and the bad commit is patched!

## Working with git: moving commits to other branches
Since most of our workflows use the GitHub workflow, sometimes, we're accidentally on the wrong branch and make all of our commits!  How can we fix this?

### Setting the stage
Let's first make a new branch using the command:
`git branch method_branch`

Let's make a commit.  Use cell #38 of the reference to do a `group_by` of the different methods and display the counts of each of the different types.  Add, and commit your changes.  Then type `git branch`.

In [None]:
#We can also return a particular column


Whoops, we were on the wrong branch when we made our commits!!  How can we change this?  We're going to use the `cherry-pick` command, which can make a new commit with the same changes, but to a different parent.  First, find the hash of the commit you want using `git log branch`.  Our command would be `git log master`.  Assuming our commit has hash `de45e3a8`, we can use the following commands:

`git checkout method_branch` checks out the new branch  
`git cherry-pick de45e3a8` gets the commit and adds it to the current branch
`git checkout master` checks out the branch we accidentally committed to
`git reset --hard HEAD^` deletes the most recent commit from the branch we're on (master)

Note that if you have uncommitted changes, `git reset --hard` will delete them.  Run `git stash` to save uncommitted changes if you have them and you want to keep them.

## Working with git: git history
`git reflog` logs everything you do with git so you can always undo your changes.  `git reflog`'s output shows the most recent actions first.  To roll back to a different action, you can type:

git reset --hard HASH
or
git reset --hard HEAD@{number},

where HASH and `number` are the commits BEFORE the terrible action taken.