# 1. Introduction To Remote Repositories

One of the most useful ways to use Git is in conjunction with [GitHub](https://github.com/), a website built on Git, but with a familiar GUI interface. Using Git with GitHub allows us to push our code to remote repositories. This enables us to:

- Share our code with others and build a <span style="background-color: #196F3D; color:#E9F7EF">portfolio</span>
- Collaborate with others on a project and build code together.
- Download and use code others have created.

[Here's](https://github.com/sandrofsousa/awesome-network-analysis) an example of a remote repository on GitHub. People can view your public repositories on your [GitHub profile](https://github.com/sandrofsousa). Using GitHub is a great way to build a portfolio and get recruiters to notice you.

Remote repositories aren't just useful for building a portfolio. Pushing to GitHub also allows us to collaborate with others on code. For example, thousands of different [contributors](https://github.com/torvalds/linux/graphs/contributors) are developing [Linux](https://github.com/torvalds/linux/) on GitHub. Many companies, including [Facebook](https://github.com/facebook) and [Google](https://github.com/google), also use GitHub to work on code projects across teams.

Remote repositories also enable us to access and use code we didn't write. For instance, [this repo](https://github.com/amzn/amazon-dsstne) will let us download Amazon's Deep Learning tools and start training models. Because the reposistory is public, anyone can download and use it. Repositories on GitHub can also be private, in which case they're hidden, and not accessible to others.

To download a remote repository to your own computer, you'll need to clone it. *Cloning* copies a repository from one location (in this case, a remote one) to a folder on your computer. The repository retains all of its Git history, and you can work with it just like you would with a Git repository you created yourself.

We use the [git clone](https://git-scm.com/docs/git-clone) command to clone a remote repository. If we were cloning a repository we found on GitHub, we'd specify the GitHub URL for that repository. We'll be able to edit files, add them to the staging area, and then commit the changes. The local version of the repo will then reflect the changes, but the remote version won't.

Review the following diagram carefully. It illustrates the relationship between the local repo and the remote repo, and how they're separate:

<img width="400" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0cUt4VTlZdjQtZ00">


Here's how we'd typically clone the [Amazon Deep Learning repo](https://github.com/amzn/amazon-dsstne) from GitHub:

>```git
git clone https://github.com/amznlabs/amazon-dsstne.git
```



<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

> **Description**: 
1. Clone the ["fast style transfer"](https://github.com/lengstrom/fast-style-transfer) project from Github to your local repository.
2. Show history from <span style="background-color: #F9EBEA; color:##C0392B">git log</span>

# 2. Github integration

In the previous section and notebook, we explored the basics of Git version control. Now, we'll walk you through the process of setting Git up on your own machine and authenticating with GitHub. Afterwards, you'll be able to sync the changes you make to data science projects locally with GitHub. Finally, when you're ready, you can publish your code to GitHub and build a portfolio of projects for others to see.

### 2.1 Create a Github account

Use a Web browser to navigate to GitHub and create an account. There are three main steps you'll have to complete:

- Create a personal account. Select a unique username and password and enter your email.
- Choose a plan. If you select the free plan, all of your code (which is organized in repositories) will be public. Select the free plan for now. You can always upgrade to a paid plan later on, which would allow you to have private repositories.
- Read the GitHub [Hello World guide](https://guides.github.com/activities/hello-world/).


Complete the instructions in step 1 of the Hello World guide to create your first repository on GitHub.

<img width="800" alt="creating a repo" src="https://guides.github.com/activities/hello-world/create-new-repo.png">


### 2.2 Branch on repository

Every Git repository consists of one or more <span style="background-color: #F9EBEA; color:##C0392B">branches</span>. Each branch contains a slightly different version of the code. An important fact to know is that the main branch of a Git repo is typically called <span style="background-color: #F9EBEA; color:##C0392B">master</span>. Developers create separate branches when they want to work on new features for a project, then add the commits in those branches back into master when the features are ready.

We can check which branch we're on with the [git branch](https://git-scm.com/docs/git-branch) command. This command will list all of the branches in the repo. It will also highlight the currently active branch, and add an asterisk next to its name.

<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

> **Description**: 
1. Navigate to DataScience repo created in the previous notebook
2. Use the <span style="background-color: #F9EBEA; color:##C0392B">git branch</span> command to visualize the current branch of project.


### 2.3 Pushing repo to Github

Once we've made changes to the local version of a repo, we can push those changes to the remote repo so that everyone can see them. Edits we make locally are only reflected in our local repo. Unless we push them to the remote, the remote repo doesn't change.

To do this, we'll need to use the [git push](https://git-scm.com/docs/git-push) command, which pushes commits from our local repo to the remote repo. Here's a diagram showing what happens when we run <span style="background-color: #F9EBEA; color:##C0392B">git push</span>:


<img width="400" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0REJ0bjNfdGppUFE">

As the diagram shows, until we push the branch to the remote repo, the changes are only in our local repo. Pushing to the remote will update the remote with our latest changes. Anyone else who pulls from the remote repo will then have access to the same two commits that we have in our local repo.

When we run <span style="background-color: #F9EBEA; color:##C0392B">git push</span>, we need to specify both the name of the remote repo to push to, and the name of the branch we're pushing. When we clone a repo, Git automatically names the remote repo <span style="background-color: #F9EBEA; color:##C0392B">origin</span>. This means that the following command will push the <span style="background-color: #F9EBEA; color:##C0392B">master</span> branch to the remote repo:


<br>
<div class="alert alert-info" style="background-color:#F9F9F9; border: 0px">
<b>git push origin master</b>
</div>


It's possible, but rare, that a remote will have a name other than <span style="background-color: #F9EBEA; color:##C0392B">origin</span>. If we're unsure, we can list the remote(s) associated with our local repo using [git remote](https://git-scm.com/docs/git-remote). The <span style="background-color: #F9EBEA; color:##C0392B">git remote</span> command will list all of the repo's remotes. If we specify the <span style="background-color: #F9EBEA; color:##C0392B">-v</span> option, we'll get additional information about where the remote repos are located.


<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

> **Description**: 
1. Navigate to DataScience repo created in the previous notebook
2. Use the <span style="background-color: #F9EBEA; color:##C0392B">git remote</span> command to visualize information about the repos.
3. Push the local repository to Github repo created in the previous section.
4. Open a browser and visualize the new files in your github account. 

>> ```git
 git remote add origin https://github.com/<your_github_user>/hello-world.git
 git push -u origin master
```


# 3. Commits And The Working Directory

Let's take a closer look at the working directory and how it interacts with commits. As you may recall from the previous sections, the Git commit workflow has three main components:

- The working directory
- The staging area
- Commits

The working directory is the folder we're version controlling with Git, and the contents of the working directory are what we see when we list the contents of the folder with <span style="background-color: #F9EBEA; color:##C0392B"> ls</span>. In our case, <span style="background-color: #F9EBEA; color:##C0392B">~/DataScience</span> is the working directory. We can edit the working directory by changing or adding files.

So let's say our working directory looks like this:

<img width="1000" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0MHJMazlCU0JRWG8">

In this example, we have one file named <span style="background-color: #F9EBEA; color:##C0392B">README.md</span> in the working directory. There are no files in the staging area, and no commits.

When we run <span style="background-color: #F9EBEA; color:##C0392B">git add</span>, Git adds the difference between the most recent commit and the current status of our working directory to the staging area, like this:

<img width="1000" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0cHVoVE13dEd0eWc">

When we run <span style="background-color: #F9EBEA; color:##C0392B">git commit</span>, we create a commit that contains all of the changes Git added to the staging area. The commit has a unique commit hash, so we can refer to it later. Note how making a commit removes all changes from the staging area:

<img width="1000" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0c09BR1FWelVDWFk">


We now have a commit with the hash <span style="background-color: #F9EBEA; color:##C0392B">53d</span>. This commit is a snapshot of the working directory at the moment it contained a file called <span style="background-color: #F9EBEA; color:##C0392B">README.md</span> that had the text <span style="background-color: #F9EBEA; color:##C0392B">This is a README!</span>.

Next, we can add a new file to the working directory, and edit <span style="background-color: #F9EBEA; color:##C0392B">README.md</span>. This will only affect the working directory, where we're making changes -- not the remote:

<img width="1000" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0X0JqYkppZ0VFYk0">

Then we can use <span style="background-color: #F9EBEA; color:##C0392B">git add</span> to stage our changes:

<img width="1000" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0R3l5LXJMZFRQSms">

In this case, Git adds both the new file (bot.py) and the changed file to the staging area. Then we can commit the changes:

<img width="1000" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0emtTWEpsOHozbzQ">

We now have two commits, each storing a snapshot of our working directory at a different point in time. 


We can pull up the difference between two commits with the [git diff](https://git-scm.com/docs/git-diff) command -- we just pass the two commit hashes as arguments to <span style="background-color: #F9EBEA; color:##C0392B">git diff</span>. To save typing time, we can also just write the first few characters of the hash to uniquely identify the commit (four is usually enough). The order in which we pass the two hashes to <span style="background-color: #F9EBEA; color:##C0392B">git diff</span> influences whether changes appear as deletions or additions.


<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

> **Description**: 
1. Navigate to DataScience repo created in the previous notebook
2. Use the <span style="background-color: #F9EBEA; color:##C0392B">git log</span> command to visualize information about commit history.
3. Use the <span style="background-color: #F9EBEA; color:##C0392B">git diff</span> with two hashes as parameter
>```git
git diff ab5a 1f1f
```




# 4. Switching To A Specific Commit

Now that we know about commit hashes, we can use them to switch to a specific commit. Switching between commits allows us to quickly move between different historical versions of a project. If we introduce a change that causes issues and want to revert to an earlier version, for example, switching between commits will let us do so.

Commit hashes are permanent; Git preserves them and includes them in transfers between the local repo and the remote repo. For instance, let's say we have two commits, <span style="background-color: #F9EBEA; color:##C0392B">c12</span> and <span style="background-color: #F9EBEA; color:##C0392B">c53</span>. The following diagram shows what happens to them as we clone, commit, and push.

<img width="500" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0TWpzcF9tS01Ea3M">

<span style="background-color: #F9EBEA; color:##C0392B">c12</span> originally existed on the remote, but when we pulled it locally, the commit kept the same hash. This is because the commit is the same in the remote and our local repo -- the same changes were made to the same files.

When we changed a file and made a commit locally, Git gave it the hash <span style="background-color: #F9EBEA; color:##C0392B">c53</span>. When we pushed this commit to the remote later on, it kept the same hash because it was still the same commit. In the diagram above, both the local repo and the remote repo have two commits, <span style="background-color: #F9EBEA; color:##C0392B">c12</span> and <span style="background-color: #F9EBEA; color:##C0392B">c53</span>. We can switch between commits in the local repo without changing what commits are in the remote repo. We can do this with the [git reset](https://git-scm.com/docs/git-reset) command:

<img width="500" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0b1lYVnBHaHZQMG8">


The diagram shows the commit on the left, and a representation of our working directory on the right. If we type <span style="background-color: #F9EBEA; color:##C0392B">git reset --hard c12</span>, Git switches back to the commit with the hash <span style="background-color: #F9EBEA; color:##C0392B">c12</span>, and changes all of the files in the working directory so that they're exactly the same as the files in the commit. This will essentially let us rewind the repo to past commits if there are problems with more recent ones, or if we want to see what the project looked like at an earlier point in time.

The <span style="background-color: #F9EBEA; color:##C0392B">--hard</span> flag resets both the working directory and the Git history to a specific state. If we omitted the flag, or used the <span style="background-color: #F9EBEA; color:##C0392B">--soft</span> flag instead, it would skip making changes to the working directory, and only reset the Git history.

<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

> **Description**: 
1. Use the git log command to find the commit hash corresponding to the oldest commit in the DataScience repo.
2. Use git reset to reset the DataScience repo to the oldest commit. 
3. Explore <span style="background-color: #F9EBEA; color:##C0392B">script.py</span> and see what text it contains in all steps.






# 5. Pulling From A Remote Repo

Now that we've reverted our local DataScience repo to an older version, the remote repo actually has a newer commit that our local repo doesn't have. This often happens when other people make changes to a project's code, and then push those changes to a remote repo. Here's a diagram showing which commits exist in which locations:

<img width="500" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0aGZUNVNKYXRDcnM">


When the latest commit in our local repo is older than the latest commit in the remote repo, we can use [git pull](https://git-scm.com/docs/git-pull) to update the current branch with the latest commits. The <span style="background-color: #F9EBEA; color:##C0392B">git pull</span> command will also update our working directory so that it has the same files as the latest commit.

In our case, we'll be updating the <span style="background-color: #F9EBEA; color:##C0392B">master</span> branch, because the <span style="background-color: #F9EBEA; color:##C0392B">DataScience</span> repo only has a single branch.

<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

> **Description**: 
1. Pull the latest commits from the DataScience remote repo.
2. Inspect the working directory and Git history to see what happened.



# 6. Brief Summary of Commands

>1. git clone
2. git push 
3. git branch
4. git remote 
5. git reset 
6. git pull