## 1: Git Remotes

One of the most useful ways to use git is to use it in conjunction with Github. Using git with Github allows you to push your code to remote repositories. This enables you to:

- Share your code with others and build a portfolio.
- Collaborate with others on your project and build your code together.
- Download and use code others have created.

[Here's](https://github.com/VikParuchuri/evolve-music2) an example of a remote repository on Github. Repositories can be viewed on your Github profile, and are a great way to build a portfolio and get noticed by recruiters.

Remote repositories aren't just useful for building a portfolio. Pushing to Github also allows you to collaborate with others on your code. For example, linux is developed on Github, and has thousands of different contributors. Many companies, including Google and Facebook, also use Github to work on code projects across teams.

Remote repositories also enable you to access and use code you didn't write. For instance, [this repo](https://github.com/amznlabs/amazon-dsstne) will let you download Amazon's Deep Learning tools and start training models. Since the reposistory is public, it's accessible by anyone, and can be downloaded and used by anyone. Repositories on Github can also be private, in which case they are hidden, and not accessible by others.

In order to get a remote repository onto your own computer, you'll need to do something called cloning. cloning copies a repository from one location to a folder on your computer. The repository retains all of its git history, and you can work with it just like you would work with a git repository you created yourself.

In order to clone a remote repository, we'll use the git clone command. If we were cloning from Github, we'd specify a Github URL to clone a repository from.

Here's how we'd clone the Amazon Deep Learning repo from Github:

    git clone https://github.com/amznlabs/amazon-dsstne.git

https://github.com/amznlabs/amazon-dsstne.git is the URL of the git repository that we're cloning. This will automatically create a folder called amazon-dsstne in our current folder, and place the repository there.

Since we're working with a simplified remote repository for the purposes of this mission, we'll clone a repository slightly differently:

    git clone /dataquest/user/git/chatbot

This will clone the repository from /dataquest/user/git/chatbot, a path on our local computer, to our current folder, and place it into a folder called chatbot.

If we specify a second argument to git clone, we can change the folder the repository is saved to:

    git clone /dataquest/user/git/chatbot silentbot

The above code will place the chatbot repository into a folder called silentbot.

In [1]:
%%bash
git clone https://github.com/amznlabs/amazon-dsstne.git

Cloning into 'amazon-dsstne'...


## 2: Making Changes To Cloned Repositories 

Now that we've cloned a repository (or repo for short), we can makes changes in the same way that we did in the last mission. We'll be able to edit files, add them to the staging area, and then commit the changes. These changes will be reflected in the local version of the repo, but not the remote version.

Here's a diagram showing how the local repo and the remote repo are separate:

<img src="data/diagram.png"/>

After making the commit in the diagram, the local repo will have one more commit than the remote repo, and the file README.md will be different.

The README.md file is one you'll often see in projects on Github. A README file helps people understand what the project is about and how to install it. It's common for the README file to be in Markdown format, which is a way to create lists and other complex but useful structures using plain text. The markdown format has the .md extension.

Similar to the diagram above, we'll edit the README.md file to add a line, then commit it to the repository. When updating shared repositories, it's important to add informative messages when comitting, so other people can easily figure out what each commit is doing without reading through the code. This gets really important when debugging issues with code that multiple people are working on.

In [6]:
%%bash

cd amazon-dsstne/
ls
echo "appended line" >> README.md
cat README.md
git add *
git commit -m "updated README.md"
git status
# (Placeholder in case I find repo I want to use)

Dockerfile
FAQ.md
LICENSE
NOTICE
README.md
benchmarks
docs
samples
src
talks
tst


# Amazon DSSTNE: Deep Scalable Sparse Tensor Network Engine

DSSTNE (pronounced "Destiny") is an open source software library for training and deploying recommendation
models with sparse inputs, fully connected hidden layers, and sparse outputs. Models with weight matrices
that are too large for a single GPU can still be trained on a single host. DSSTNE has been used at Amazon
to generate personalized product recommendations for our customers at Amazon's scale. It is designed for
production deployment of real-world applications which need to emphasize speed and scale over experimental 
flexibility.

DSSTNE was built with a number of features for production recommendation workloads:

* **Multi-GPU Scale**: Training and prediction
both scale out to use multiple GPUs, spreading out computation
and storage in a model-parallel fashion for each layer.
* **Large Layers**: Model-parallel scaling enables larger n

## 3: The Master Branch

In the last screen, when you ran git status, your output looked something like this:

    On branch master                                                                
    Your branch is ahead of 'origin/master' by 1 commit.                            
      (use "git push" to publish your local commits)
    ​
    nothing to commit, working directory clean

The first two lines mention the terms branch, master, and origin, all of which may be unfamiliar. We'll look at branch and master in this screen, and origin in the next screen.

Every git repository consists of one or more branches. Each branch is a slightly different version of the code. We'll dive more into branches and how they work in the next mission, but the important fact to know now is that the main branch of a git repo is typically called master. Developers will create separate branches when they want to work on new features for a project, then add the commits in those branches back into master when the features are ready.

All of the changes we've made so far have been on the master branch of the chatbot repo. The master branch is usually the most up-to-date shared version of any code project.

We can check what branch we're on by using git branch. This will list all of the branches in the repo, along with the branch that is currently active.

In [9]:
%%bash
cd amazon-dsstne/
git branch

* master


## 4: Pushing Changes To The Remote

Once you've made changes to the local version of the repo, you can push those changes to the remote repo so that your changes can be viewed by everyone. Changes you make locally are only reflected in your local repo. Unless you push these changes to the remote, the remote repo doesn't change.

To do this, you'll need to use the **git push** command, which pushes commits from your local repo to the remote repo. Here's a diagram showing what happens when you run git push:

<img src="data/diagram2.png"/>

As you can see, until you push the branch to the remote repo, the changes are only in your local repo. Once you push to the remote, it's updated with your latest changes. Anyone else who pulls from the remote repo will then have access to the same two commits as you do in your local repo.

When you run git push, you'll need to specify both the name of a remote to push to, and the name of a branch to push. When you clone a repo, git automatically names the remote repo origin. This means that the following command will push the master branch to the remote repo:

    git push origin master

It's possible, but rare, that a remote will have a name other than origin. In cases where you're unsure, you can list remotes using **git remote**.

git remote will list all of the remotes. If you specify the -v option, you'll get additional information about where the remote repos are located.

In [11]:
%%bash
cd amazon-dsstne/
git remote
# (don't actually have permission to push)
#git push origin master 

origin


remote: Permission to amznlabs/amazon-dsstne.git denied to austinmw.
fatal: unable to access 'https://github.com/amznlabs/amazon-dsstne.git/': The requested URL returned error: 403


## 5: Viewing Individual Commits

If you'll recall from the previous mission, git stores the history of the repo as a series of commits. Each commit contains the difference between the current commit and the previous commit. This allows git to very efficiently store history, and replay that history to reconstruct the working directory. The working directory is the folder on your computer where you edit files, then add the changes, then make commits. Commits are separate from the working directory, and are a snapshot of all the files in the working directory at a specific point in time.

You can see the full commit history of the master branch of the local chatbot repo with git log. Here's the output you might get from git log:

    commit 6a95e94ea10caa28013b767510d4bc59369d83fa                                 
    Author: Dataquest <me@dataquest.io>                                             
    Date:   Wed May 18 21:56:27 2016 +0000                                          

        Updated README.md                                                           

    commit 8a1ca35dd5c5de8f93aa6cbbd153caa40233386c                                 
    Author: Dataquest <me@dataquest.io>                                             
    Date:   Wed May 18 21:55:33 2016 +0000                                          

        Add the initial version of README.md    
    </me@dataquest.io></me@dataquest.io>

This history shows two commits, the first one with the message Add the initial version of README.md, and the second with the message Updated README.md. The great thing about git is that it stores both commits, so we can easily revert back to a previous commit if we want to.

In order to do this, we'll need to use the hash of the commit. The hash is a unique identifier for each commit, and allows us to perform operations like reverting to a specific commit. In the above output, the first commit has the id 8a1ca35dd5c5de8f93aa6cbbd153caa40233386c, and the second commit has the id 6a95e94ea10caa28013b767510d4bc59369d83fa.

We can use the git show command, along with a hash, to see what changed in a specific commit. Running git show 6a95e94ea10caa28013b767510d4bc59369d83fa in the above example would result in:

    commit 6a95e94ea10caa28013b767510d4bc59369d83fa                                 
    Author: Dataquest <me@dataquest.io>                                             
    Date:   Wed May 18 21:56:27 2016 +0000                                          

        Updated README.md                                                           

    diff --git a/README.md b/README.md                                              
    index f4871de..9c05964 100644                                                   
    --- a/README.md                                                                 
    +++ b/README.md                                                                 
    @@ -1,3 +1,3 @@                                                                 
     README

    -This is a README file.  It's typical for Github projects to have a README.  A README gives information about what the project is about, and usually how to install and use it.
    \ No newline at end of file                                                     
    +This is a README file.  It's typical for Github projects to have a README.  A README gives information about what the project is about, and usually how to install and use it.This project needs no installation!
    </me@dataquest.io>
    
This indicates that the README.md file was changed in this commit, and the line This project needs no installation! was added. a/README.md is the file state before the commit, and b/README.md is the file state after the commit.

git show will allow you to scroll up and down and side to side. You'll need to type q to exit it.

In [5]:
# (Placeholder in case I find repo I want to use)