# Concept: Git and Github


## Description

This concept will provide you an insight into Version Control with git. And then get acquainted with the entire flow with github


## Outline
- Introduction to version control sysytem (both regular and distributed)
- What is Git and Github?
- Working with Git with practical examples

## Pre-requisites
- Motivation

## Learning Outcomes
- Basics of version control
- Version control with git
- Creating repository on github and pushing changes

## Chapter 1: Version Control System

### Description: This chapter covers everything you need to know about version control systems

### 1.1 Introduction to Version Control System

***

Before even diving into what does version control system stands for, lets take a scenario where you must have unknowingly cooked up your own version control system! 


**SCENARIO**

In a directory you have the following files:

- **Important_first.txt**
- **Important_march.txt**
- **Important_june.txt**
- **Important_august.txt**
- **Important_december.txt**

Looks familiar? What is happening here you have modified the initial file **Important_first.txt** and modifed it and saved them as its modified/updated versions. The month names were added after *_* symbol so that you can track/retrieve those versions using a clever alibi (month of the year).

Pretty cool right! You have now successfully created a system by means of which you can track and retrieve versions of the files over time. This is what a **Version Control System** does. 


**Why need a Version Control System?**

This scheme of renaming the files seems to be performing well to your requirements. But this manual process of tracking files by renaming will be a very painful task especially if that is a software project containing hundreds of thousands of files. Large, fast-changing projects with many authors need a Version Control System to track changes and avoid general chaos. 


**Scenario without Version Control System**

Imagine a situation where there is a team of 4 people developing a software without using a version control. Since they have tasks assigned, they will do some contributions, make some code and merge it. 
<img src='../images/vcs.gif'>
As in the image, lets say that Bob finishes a module, then Carol also finishes her module, when they combine let’s say some merge conflicts arise (for now, you can imagine merge conflicts as some problems in merging the code or integration of modules), then Alice also finishes the work allotted and tries to merge and then Ted also completes his portion of work and tries to integrate. 
You can easily imagine the mess that is created. To prevent situations like these, version controls are used.




**Basic Functions of Version Control System**

Some of the most basic and important functions that a Version Control Syste, does are:
- **Backup and Restore**: Files are saved as they are edited, and you can jump to any moment in time. Need that file as it was on last month? No problem.
- **Synchronization**: Lets people share files and stay up-to-date with the latest version.
- **Short-term undo**: Playing around with a file and messed it up? (That’s just like you, isn’t it?). Throw away your changes and go back to the *last known good* version.
- **Track Changes**: As files are updated, you can leave messages explaining why the change happened (stored in the VCS, not the file). This makes it easy to see how a file is evolving over time, and why.
- **Track Ownership**: A VCS tags every change with the name of the person who made it. Helpful for giving credit.
- **Sandboxing**: Making a big change? You can make temporary changes in an isolated area, test and work out the kinks before incorporating your changes.
- **Branching and merging**: A larger sandbox. You can branch a copy of your code into a separate area and modify it in isolation (tracking changes separately). Later, you can merge your work back into the common area.


Throughout the entire concept and from the next topic onwards you will learn specifically about the Git version control system and also get introduced to GitHub.

### 1.2 Understanding Git

***

**What is Git and why use it?**

In the previous topic you understood the need for a Version Control System. Well Git is exactly a kind of Version Control System (Distributed VCS). It is also free and open source. **Essentially, its a system that allows you to record changes to files over time, thus, you can view specific versions of those files later on**. Over time, Git has become an industry standard for development. Being able to snapshot your code at a specific time is incredibly helpful as your codebase grows and you have to reference previous versions of it.

For example, when you edit a text file, git can help you determine exactly what changed, who changed it, and why. Git isn't the only version control system out there, but it's by far the most popular. 


**Difference between Git and other distributed version control systems**

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about data. Conceptually, most other systems store information as a list of file-based changes. These other systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they store as a set of files and the changes made to each file over time (this is commonly described as delta-based version control).

<img src='../images/delta.png'>

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

<img src='../images/git.png'>


**Git lingo**

Lets understand some terminologies which are going to be important in order to properly understand the workflow of Git. They are described below:

- **Snapshot**: It records all your files at a given point of time so that you can look up at them anytime later. It is basically a way how Git tracks your code history.
- **Commit**: The act of creating a snapshot is called a commit.
- **Repository**: The location or digital storage where all your files are stored.
- **Head**: The reference to the most recent commit is called Head.
- **Branches**: Git follows a sort of tree-like analogy for keeping track of code, when several people are collaborating on a project. The general procedure is to make a branch from the master branch, do the changes there and make a request to the master branch to merge the code. All commits live on some branch and there can be many branches in a single project repository. Refer the image below to get a visual understanding of branches. <img src='../images/branch.png'>

### 1.3 Branch, Merge and Conflicts

***

**Branching simplified**

Branching is probably one of the most dreaded topics for beginners trying to get a grasp of Git. Branches let us copy code into a separate folder so we can monkey with it separately. 

In a software project the main code is stored in the master branch (as the name suggests). Whenever a developer wants to make some changes in the main code base or add a new project module, he is expected to do these steps sequentially: 
- Make a new branch 
- Make the changes/add new modules to that branch itself and test the code properly
- Merge the branch with master branch 


**Visual guide to branching**

It all seems too theoretical till now and so we have decided to introduce a picture which will hopefully convey branching more clearly.

<img src='../images/branching.png'>

Think of the main trunk (blue horizontal bar) as the master branch. The version **r4** has the list of items *Milk, Eggs and Soup*. It also has a version **r7** ahead of it but you decide to revert to **r4** and tinker with some changes there. So what you do is:
- Branch out of the master branch
- Add *Rice* to the list containing *Milk, Eggs and Soup*

Now its upto you if the combination really worked out and if yes then merge the changes of this branch to the master. If the experiment was unsuccessful now worries; your code in the master branch is intact!


**Merging isn't easy**

After testing out your experimental code in a separate branch you merge it with the master branch. Sounds simple eh? We make something called **Pull Request and merge branches**. Lets understand it with an example. Take the situation where there are two branches; one indicated by a red horizontal arrow (new branch) and the other by a green horizontal arrow (master branch). 

<img src='../images/merging.png'>

- There is only *Milk* as content in parent directory (master)
- Sue made a branch (green arrow) and added *Soup* 
- Joe added *Juice* on the master branch which still contains only *Milk*
- To incorporate both *Juice* and *Milk* into the master branch, what needs to be done is submit  a **PULL** request
- After Joe merges the files, Sue can do a regular “pull and update” to get the combined file from Joe. She doesn’t have to merge again on her own.


**Rise of conflicts**

Git automaticallys merges changes to different parts of a file. But sometimes these changes are inconsistent and conflicts can arise when changes appear that don't gel. For example take the situation described by the image below:  Joe wants to remove eggs and replace it with cheese (-eggs, +cheese), and Sue wants to replace eggs with a hot dog (-eggs, +hot dog).

<img src='../images/conflict.png'>

At this point its a race: if Joe checks in first, that’s the change that goes through (and Sue can’t make her change). When changes overlap and contradict like this, Git might report a conflict and not let you proceed further. How to get out of it? Well two aproaches are given below:
- **Re-apply your changes**: Sync to the the latest version (r4) and re-apply your changes to this file: Add hot dog to the list that already has cheese.
- **Override their changes with yours**: Check out the latest version (r4), copy over your version, and check your version in. In effect, this removes cheese and replaces it with hot dog.

Conflicts are infrequent but can be a pain when encountered.

### 1.4 Organizing a distributed project with Git

***

Lets take all the concepts together and see how to construct a project without getting into the commands (just structure and terminologies)

**PUSH request**

A term that we deliberately kept until now is the **PUSH** step. It sends a change to another repository (may require permission). It is best described by the image below where you have a version of the **origin** repository where you have made some changes (added a file **file.html**) and you decide to incorporate changes to the master branch as well (after testing it out obviously). So what you need to do is submit a **PUSH** request. 

And now your colleague has the older version of the repository (doesn't have **file.html**) and wants these changes as well. So what he/she will do is submit a **PULL** request so that the new changes are incorporated into her system as well.

<img src='../images/model.jpg'>


**Distributed PUSH-PULL model**

Here's one way to distribute a push-pull model effectively:

<img src='../images/merge.png'>

Sue, Joe and Eve are working on a project. First they get the project from the hosting service to their systems. Then they make a branch of their own for all three of them for carrying out their experiment where they add *Soup, Juice* and *Eggs* (indicated by red, green and orange arrows) as shown in the figure and then check changes into the common experimental branch. 

Later, a maintainer can review and pull changes from the experimental branch into a stable branch, which has the latest release. A distributed VCS gives you flexibility in how a project is maintained.

## Chapter 2: Working with Git and GitHub

### Description: You will put the accumulated theory knowledge into use by learning Git commands and using GitHub 

### 2.1 What is GitHub? 

***

**Short introduction to GitHub**

<img src='../images/github.png'>

GitHub is a web-based hosting service for version control using git for your repositories and is mostly used for computer code. It combines the distributed version control and source code management (SCM) functionality of Git along with its own cool features like access control, collaboration features (bug tracking, feature requests, task management, and wikis) for every project. 


**Are Git and GitHub the same thing?**

By now its pretty evident that both are different things. Git is a distributed version control system while GitHub is a web-based hosting serivce for repositories. In short, Git is a tool, and GitHub is a service for projects that use Git.


**Make your own GitHub account and set up SSH authentication**

But before proceeding further, lets make your own personal GitHub account first. Then we add the SSH authentication so that you don't have to type your password again for push/pull requests. 

Follow the instructions in this [link](http://docs.railsbridge.org/installfest/create_a_github_account) and you should be successful in creating your own account.


**Configure Tooling**

Before getting started with making repositories and push/pull, you first need to configure user information for all the local repositories. Follow along:

```python
git config --global user.name "[name]"
git config --global user.email "[email address]"
git config --global color.ui auto
```
The function for the three CLI commands are:

> **user.name** — Sets the name you want attached to your commit transactions

> **user.email** — Sets the email you want attached to your commit transactions

> **color.ui** — Enables helpful colorization of command line output

Now check your individual config variables by: 

```python
git config --global user.name
git config --global user.email
```

It should return the same name and email address that you had entered while configuring before.

### 2.2 Git PLUS GitHub

***

**Recap of Git**

Before diving into writing the code, recap the modus operandi of Git. With Git, you record local changes to your code using a command line tool, called the "Git Shell". 

<img src='../images/git_working.png'>


**Staging in Git**

With Git, you first make changes to your code files. When you are fully satisfied with the change/changes, you then add the files you changed to a staging area and then commit them to the version history of your project (repository) using Git. This step is similar to checking in a hotel. 

But why is there two steps instead of a single unified step? Lets say you have added a new file "file.txt"; the first step will add "file.txt" to a staged area and adds it to the index. The second step i.e. committing puts the staged file in the repo and they're now tracked. 

*Git does this because Git's flexible: if a, b and c are changed, you can commit them separately or together*.

You will understand more about these two steps as you carry out the operations below:


**Setting up Git repository**

Now time to dive into the practical fun stuff. The first step is to start a new repository or obtain an already existing repository from GitHub.

*Creating new repository*

The steps are:
```python
# create new directory
mkdir project

# navigate to project
cd project

# intialize .git file to track
git init [project-name]   # project-name argument not necessary
```

*Work on existing repository*

For this simply use the command
```python
git clone [url/ssh]
```

***`git init`***: Once you run this command, Git creates a hidden `.git` file inside the main directory of your project. This file tracks the version history of your project and is what turns the project into a Git repository, enabling you to run Git commands on it.


**Create new repository on GitHub**

If you're creating a new repository then the next step is to create a new repository on GitHub. Keep in mind that the repository which we made using CLI was our local repository, there has to be a global repository which contains all the code, for that, GitHub does our work. If you're working on an already existing repository, skip this step. Follow the instructions given in the [official guide](https://help.github.com/articles/create-a-repo/). You should successsfully create a new repository with its help. 

Here is the snapshot of the repository "Project" I created on GitHub: <img src='../images/project.png'>


**First commit**

Now that there is a repository on GitHub for the same project, lets perform out first commit. First lets stage the files locally.

```python
# add all the files
git add .

# commit them
git commit -m "message" 

# add remote origin
git remote add origin [https/ssh]

# push changes to GitHub
git push -u origin master
```

What we did was: 
- **`git add .`**: This command snapshots all the files in preparation for versioning
- **`git commit -m "[descriptive message about commit]"`**: Records file snapshots permanently in the version history
- **`git remote add origin [https/ssh]`**: Add the default remote as origin in GitHub repository, or in simple words, we connect our repository with the GitHub remote (repository)
- **`git push -u origin master`**: Push your staged changes to a branch (master by default) of remote (origin by default). In simple words, this command will push your code to GitHub.

*You can also use `git status` in between to list all new or modified files to be committed*.


Below is a snapshot of the repository `project` after having performed all these steps sequentially. 
<img src='../images/project_new.png'>

### 2.3 Don't mess with the Master: Working with branches

***

Up until now you have learnt to do the following things: make a new repository, add and commit changes, add a remote origin, push and pull changes. It involved working solely on the master branch itself. However, working with big projects involves working extensively with branches as you already have seen in the topic **Organizing a distributed project with Git**. Lets look at how to work with branches.


**Recap of branching concept**

A refresher always helps! A branch is essentially is a unique set of code changes with a unique name. The repository can have one or more branches. The main branch — the one where all changes eventually get merged back into, and is called **master**. This is the official working version of your project, and the one you see when you visit the project repository at github.com/yourname/projectname. 


**Avoid messing with the master**

If you make changes to the master branch of a group project while other people are also working on it, your on-the-fly changes will ripple out to affect everyone else and very quickly there will be merge conflicts, weeping, rending of garments, and plagues of locusts. 

Also, the master branch is deployable i.e. it is your production code, ready to roll out into the world. The master branch is meant to be stable, and it is the social contract of open source software to never, ever push anything to master that is not tested, or that breaks the build. The entire reason GitHub works is that it is always safe to work from the master.


**Branching to experiment**

Everyone has new ideas and are curious to implemenet them. But since they are not recommended to do it in the master branch what every developer does is use branches created from master to experiment, make edits and additions and changes, before eventually rolling that branch back into the master once they have been approved and are known to work. Master then is updated to contain all the new stuff.


**How to branch?**

Lets take forward the `project` repository that we had created containing the README, new.py and trial.py files. Wel will be sequentially documenting every step while branching is carried out. 

- **Check for the presence of other branches**
    
    Before creating a new branch, first check for any other existing branches. So we can view all existing branches by typing  `git branch -a` into terminal, which tells git that we want to see ALL the branches in this project, even ones that are not in our local workspace. It returns the following: <img src='../images/b1.png'>

    Its appearance may vary somewhat depending on your OS and terminal application, but the info is ultimately the same. The asterisk next to master in the first line of the output indicates that we are currently on that branch. The second line tells us that on our remote, named origin, there is a single branch, also called master. (Remember that our remote is the GitHub repo for this project).

- **Create a new branch**

   Lets create a new branch **experiment** where we will put across a newer version of trial.py file. How to create a new branch (assuming this branch doesn't exist)? Simple; simply type out **`git checkout -b branchNameHere`**, In our case, we will replace `branchNameHere` with `experiment`. We get the following output: <img src='../images/new_b.png'> So, now you have made a new branch and also switched from **master** to **experiment** branch. 
   
   But suppose you want to switch back to **master**. You can achieve that by **`git checkout BranchName`**. Here, `BranchName` can be any branch (including master). While using **`git checkout master`** the output was: <img src='../images/switch_1.png'>
   
- **Making changes to new branch**

  Now lets make some changes to this **experiment** branch. Now we add the file `poem.py`. First we will view all the files inside our repository with the `ls` command and then check our current branch with the command **`git branch`**. The output is shown below: <img src='../images/add.png'> 
  
  Time to stage the changes i.e. add and commit the file `poem.py` within **experiment** branch. If you now checkout to master branch using **`git checkout master`** and then display all the contents within the master branch using `ls`, you won't see this `poem.py` file. Isn't that what you had expected?
  
  <img src='../images/and_commit.png'>
  
- **Merging changes**

    All that is left is to merge the changes with the master branch. First checkout to the master branch from the experiment branch using **`git checkout master`**. Once on the master branch, all we have to do is run the merge command. The best way to do this is to type **`git merge BranchName –no-ff`** — the additional `–no-ff` tells git we want to retain all of the commit messages prior to the merge and `BranchName` is the experiment branch. This will make tracking changes easier in the future. Now if you do an **`ls`** command, observe how `poem.py` pops up! This means that we have successfully merged! 
    
    <img src='../images/final_merge.png'>
    
- **Pushing changes**

    The final thing left to do is to push the changes to GitHub. Since the changes have already been staged, just use **`git push`** so that your file gets pushed to GitHub. <img src='../images/push.png'>
    
    On GitHub, observe how the change is now reflected. <img src='../images/final.png'>
    

With this you have learnt the basics of Git and GitHub. Keep exploring and have fun!