## 1: Version Control Systems

When you're working with teams, you'll generally be making changes to the same files. Imagine you're working on a project to make a Python script, and have a folder with the following two files:

    script.py
    README.md

Here are the contents of script.py:

    if __name__ == "__main__":
        print("Welcome to a script!")
    
Imagine that you and a coworker are both working on the project at the same time. You modify script.py like this:

    if __name__ == "__main__":
        print("Welcome to a script!")
        print("Here's my amazing contribution to this project!")
    
And your coworker does this:

    import math
    print(10 + 10)
    if __name__ == "__main__":
        print("Welcome to a script!")
    
Imagine you both have the folder on your local machine. To modify files, you make changes, then upload the entire folder to a centralized location, like Dropbox or Google Drive, to enable collaboration. If you didn't have a distributed version control system, whoever changed the file last will overwrite the changes of the other person. This gets extremely frustrating and impossible to manage as you start dealing with larger and larger chunks of code. What if the folder had 100 files, and you modified 10, and your coworker modified 30 at the same time? You don't want to lose your changes every time your coworker uploads his version of the folder. Now, imagine that instead of just you and a coworker, it's a project with 10 or 100 contributors.

Companies face this problem every day, which is why distributed version control systems exist. With a distributed version control system, software will "merge" changes together intelligently, and enable multiple developers to work on a project at the same time.

Going back to the script.py file, if we intelligently merged the two versions, it would end up looking like this:

    import math
    print(10 + 10)
    if __name__ == "__main__":
        print("Welcome to a script!")
        print("Here's my amazing contribution to this project!")
    
There are a few distributed version control systems, including Mercurial, and Subversion. However, Git is by far the most popular.

Git is a command line tool that we can access by typing git in the shell. The first step in using Git is to initialize a folder as a repository. A repository tracks multiple versions of the files in the folder, and enables collaboration.

You can initialize a repository by typing git init inside the folder you want to initialize as a repository.

In [1]:
%%bash
# create new directory
mkdir random_numbers
cd random_numbers

# initialize as git directory
git init

Initialized empty Git repository in /Users/austin/OneDrive/Code/data_sci/dataquest/3_The_Command_Line/Git_And_Version_Control/random_numbers/.git/


## 2: The .Git Folder

Initializing a Git repository will create a folder called .git inside the repository folder. There should now be a folder called .git inside our random_numbers folder. Typically, when folders and files are prefixed with a period (.), it means that they are private, and they don't show up by default when you list the files in the folder.

Let's verify that it's there with ls -al. As you may recall, the -a flag will show everything in a folder, even if it starts with ..

In [2]:
%%bash
cd random_numbers/

# show that .git directory has been created
ls -al

total 0
drwxr-xr-x  3 austin  staff  102 Mar 14 19:20 .
drwxr-xr-x  5 austin  staff  170 Mar 14 19:20 ..
drwxr-xr-x  9 austin  staff  306 Mar 14 19:20 .git


## 3: Creating Some Files

Git works on the principle of adding files, making changes, then storing a checkpoint of those changes. These checkpoints are called commits.

Instead of storing every file in every commit, Git stores the diff, or the difference between the file in one commit and the next commit.

For example, if you created a file called README.md with this content:

    Welcome to my readme!

Then made a commit with it, Git would store the file. Let's say you later added another line to the file and made another commit:

    Welcome to my readme!
    Here's another line.

Git would only store the difference between the file in the two commits, which is Here's another line.. Every project is a sequence of commits. Commits give us a powerful way to merge changes together from others, and to rewind time and reset to an earlier state of the repository.

Before we make a commit, let's add some files to our folder.

In [6]:
%%bash
cd random_numbers/

# create README file
echo "Randon number generator" > README.md
ls
cat README.md

README.md
Randon number generator


In [8]:
%%bash
cd random_numbers/

# create file script.py
echo -e "if __name__ == \"__main__\":\n\tprint(\"10\")" > script.py
cat script.py

if __name__ == "__main__":
	print("10")


## 4: Git Status

Files can have 3 states in Git:

- committed -- the current version of the file has been added to a commit, and is stored by git.
- staged -- the file is currently staged for the next commit, but isn't yet stored by git.
- modified -- the file has been modified since the last commit, but isn't staged yet.

After we make any changes to a Git repository, we can run git status to see which state each file in the repository is in. Any files that don't show up in git status are in the committed state.

Git will automatically show us which files have been modified since the last commit. If we're ready to commit the modified files, we can add them to the staging area using git add. Typing git add script.py will add script.py to the staging area, where it will be staged for the next commit.

In [16]:
%%bash
cd random_numbers/

# check status of the repo (already commited files don't show up)
git status

On branch master

Initial commit

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	README.md
	script.py

nothing added to commit but untracked files present (use "git add" to track)


In [18]:
%%bash
cd random_numbers/

# stage files for commit
git add script.py README.md    # to unstage files: git reset <file>

## 5: Configuring Git

Before we can make our first commit, we need to tell Git who we are so it can store that information along with the commit. This ensures that different team members can tell who made which commit.

We can do this by running git config. This only needs to be run once per computer, as Git saves your details.

Git needs two pieces of information about you -- your email and your name. You can configure your email with:

    git config --global user.email "your.email@domain.com"

You can configure your name with:

    git config --global user.name "Your name"

In [19]:
%%bash
cd random_numbers/

# configure email
git config --global user.email "fake.email@domain.com"
# configure name
git config --global user.name "Fake Name"

## 6: Committing

Now that we have files that are staged, we can make our first commit. A commit is a way to store a snapshot of the files in the folder at a certain point in time. By building a history of these snapshots, we can easily rewind to an earlier point in time, or merge someone else's changes to files with ours.

To make a commit, just use git commit -m "Commit message here". It's customary to make the commit message something informative, so if you do have to rewind or merge code, it's obvious what changes were made when.

In [20]:
%%bash
cd random_numbers/

# make commit
git commit -m "Initial commit. Added script.py and README.md"

[master (root-commit) e620a15] Initial commit. Added script.py and README.md
 2 files changed, 3 insertions(+)
 create mode 100644 README.md
 create mode 100644 script.py


## 7: File Differences

Let's modify our files and make another commit to see how the process works. Before files are placed in the staging area, you can use **git diff** to see the line differences between the current versions of files in the folder, and the versions in the last commit. You can scroll up and down with the arrow keys, and exit git diff with the q key. If you want to see the differences after files are staged, you can use **git diff --staged**.

In [27]:
# %load random_numbers/script.py
if __name__ == "__main__":
	print("10")

10


In [30]:
%%writefile random_numbers/script.py

# change script to random number generator
if __name__ == "__main__":
	import numpy as np
    print(np.random.randint(10))

Overwriting random_numbers/script.py


In [33]:
%%bash
cd random_numbers/

# check difference between last commit
git diff

echo
# check status
git status

diff --git a/script.py b/script.py
index ca99880..c8aae0d 100644
--- a/script.py
+++ b/script.py
@@ -1,2 +1,5 @@
+
+# change script to random number generator
 if __name__ == "__main__":
-	print("10")
+	import numpy as np
+    print(np.random.randint(10))
\ No newline at end of file

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   script.py

no changes added to commit (use "git add" and/or "git commit -a")


## 8: Making A Second Commit

Now that we have a modified file, we can add the changes to the staging area using git add script.py, and then commit them using git commit.

In [34]:
%%bash
cd random_numbers/

# restage and commit
git add script.py
git commit script.py -m "second commit"

[master 33391c4] second commit
 1 file changed, 4 insertions(+), 1 deletion(-)


## 9: Looking At The Commit History

You can look at the commit history of a repository using the git log command. This will show you a list of all the commits to the repository, in descending order of creation date. If the output is very long, it will allow you to scroll. You can scroll the log with the up and down arrows, and use the q key to exit.

In [35]:
%%bash
cd random_numbers/

# check commit history
git log

commit 33391c4179d3f17c2195538f2c6c313ff2fd106a
Author: Fake Name <fake.email@domain.com>
Date:   Tue Mar 14 19:53:00 2017 -0400

    second commit

commit e620a153b02b0bbb0da422c8613a58224ffc2159
Author: Fake Name <fake.email@domain.com>
Date:   Tue Mar 14 19:42:07 2017 -0400

    Initial commit. Added script.py and README.md


## 10: Seeing Commit Differences

You can use git log --stat to see more details about the commits in the git log output.

In [36]:
%%bash
cd random_numbers/

# details about commits
git log --stat

commit 33391c4179d3f17c2195538f2c6c313ff2fd106a
Author: Fake Name <fake.email@domain.com>
Date:   Tue Mar 14 19:53:00 2017 -0400

    second commit

 script.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

commit e620a153b02b0bbb0da422c8613a58224ffc2159
Author: Fake Name <fake.email@domain.com>
Date:   Tue Mar 14 19:42:07 2017 -0400

    Initial commit. Added script.py and README.md

 README.md | 1 +
 script.py | 2 ++
 2 files changed, 3 insertions(+)
