### 1.0 Basic Workflow

#### 1.1 Version Control

Version Control System - tools that manages changes made to files and directories in a project.

Git can :

- keep track of changes to files
- notice conflicts between changes made by different people
- synchronize files between different computers

Each Git projects has three parts : 

- (1) the files
- (2) the directories that are created and edit directory
- (3) extra information that Git records about the project's history

A repository is a combination of the three parts of the Git projects.

Git stores all of its extra information in a directory called ".git" located in the root directory of the repository.

In [1]:
# Exercise 1 : Suppose the home directory "/home/repl" contains a 
# repository called "dental," which has a subdirectory called "data". 
# Where is information about the hsitory of the files in 
# "home/repl/dental/data" stored?

$ /home/repl/dental/.git

#### 1.2 Check State of Repository

Run the command "git status," which displays a list of files that have been modified since the last time changes were saved.

In [2]:
# Exercise 2 : In "dental" repository. Use "git status" to discover
# which file(s) have been changed since the last save. Which files
# are listed?

$ cd dental                                 
$ git status
#On branch masterChanges not staged for commit:
#  (use "git add <file>..." to update what will be committed)
#  (use "git checkout -- <file>..." to discard changes in working directory)

#        modified:   report.txt

#no changes added to commit (use "git add" and/or "git commit -a")

Git has a "staging area" in which it stores files with changes one wants to save that haven't been saved yet. 

Putting files in the staging area is like putting things in a box.

Comitting those changes is like putting that box in the mail.

Changes can be made to the things in the box, but once it is in the mail, changes cannot be made.

"git status" shows which files are in the staging area and which files have changes that haven't yet been put there. In order to compare the file as it currently is to what one last saved, one can use "git diff filename".

"git diff" without any filenames will show one all the changes in your repository.

"git diff directory" will show you the changes to the files in some directory.

#### 1.4 Diff

Diff - formated display of the differences between two sets of files. 

Git displays diffs like :

diff --git a/report.txt b/report.txt

index e713b17..4c0742a 100644

--- a/report.txt

+++ b/report.txt

@@ -1,4 +1,5 @@

-# Seasonal Dental Surgeries 2017-18

+# Seasonal Dental Surgeries (2017) 2017-18

+# TODO: write new summary

This shows :

- the command used to produce the output ("diff --git"). In it, "a" and "b" are placeholders meaning "the first version" and "the second version."
- an "index" line showing keys into Git's internal database of changes.
- "--- a/report.txt" and "+++ b/report.txt", wherein lines being removed are prefixed with "-" and lines being added are prefixed with "+".
- a line starting with "@@" that tells where the changes are being made. The pairs of numbers are "start line" and "number of lines" (in that section of the file where changes occured.) This diff output indicates changes startiing at line 1, with 5 lines where there were once 4.
- a line-by-line listing of the changes with "-" showing deletions and "+" showing additions. Lines that have not changed are sometimes shown before and after the ones that have in order to give context; when they appear, they do not have either "+" or "-" in front of them.

#### 1.5 Saving Changes

"git add [filename]"
- add a file to a staging area
    
"git diff -r HEAD"
- compare the state of the files with those in the staging area. The "-r" flag means "compare to a particular revision" and "HEAD" is a shortcut meaning "the most recent commit".

"git diff -r HEAD path/to/file"
- restrict the results to a single file or directory 


In [3]:
# Exercise 3 : In the "dental" repository, where /data/northern.csv 
# has been added tot the staging area. Use "git diff" with "-r" and
# an argument to see how files different from the last saved revision

$ cd dental
$ git add data/northern.csv
$ git diff -r HEAD

# Use a single Git command to view the changes in the file that have
# been staged (and only that file)

$ git diff -r HEAD data/northern.csv

# Add "data/eastern.csv" to the staging area

$ git add /data/eastern.csv

SyntaxError: invalid syntax (3717608440.py, line 5)

#### 1.6 Edit File

Using nano as the text editor.

"nano filename"
- open filename for editing or create the file

"ctrl-K"
- delete a line

"ctrl-U"
- un-delete a line

"ctrl-O"
- save the file 

"ctrl-X"
- exit the editor

#### 1.7 Commit Changes

"git commit"
- to save the changes in the staging area

It always saves everything that is in the staging area as one unit.

When one commits changes, Git requires one to enter a log message. This serves the same purpose as a comment in the program : it tells the next person to examine the repository why the change was made.

By default, Git launches a text editor to let one write this message. To keep things simple, one can use :

"-m "some message in quotes"" 

on the command line to enter a single-line message like this :

"git commit -m "Program appears to have become self-aware."

"--amend" flag
- can change the commit log message

"git commit --amend - m "new message"

In [None]:
# Exercise 4 : In the "dental" repository, "report.txt" has been
# added to the staging area. Use a Git command to check the status
# of the repository

$ cd dental
$ git add report.txt
$ git status

# Commit the changes in the staging area with the message "Adding
# a reference."

$ git commit -m "Adding a reference"

#### 1.8 View Repository's History

"git log"
- used to view the log of the project's history
- when run, Git automatically uses a page to show one screen of output at a time. Press the space bar to go down a page or the "q" key to quit.

Log enteries are shown most recent first, and look like this :

commit 0430705487381195993bac9c21512ccfb511056d

Author: Rep Loop <repl@datacamp.com>

Date:   Wed Sep 20 13:42:26 2017 +0000

    Added year to report title.
    
- "commit" line displays a has (a unique ID for the commit)
- the other lines tells who made the change, when, and what log message was written for the change

"git log path"
- inspect only the changes to particular files or directories
- log for a file shows changes made to that file; the log for a directory shows when files were added or deleted in that directory, rather than when the contents of the directory's files were changed

### 2.0 Repositories

#### 2.1 : How Git Stores Information

Git uses a three-level structure for storing information with each commit command that is made.

1. Commit contains metadata (author, commit message, time commit happened). 

2. Each commit also has a tree, which tracks the names and locations in the repository when that commit happened. 

3. There is a blob (binary large object) for each of the files listed in the tree. This contains a compressed snapshot of the contents of the file when the commit happened.

#### 2.2 Hash

Hash -
- unique identifier for every commit to a repository
- "hash function" is a pseudo-random number generator creating the hash
- written as a 40-character hexadecimal string
- most of the time, one can give Git the first 6-8 characters in order to identify the commit one means
- enables Git to share data efficiently between repositories

#### 2.3 : View A Specific Commit

"git show"
- view the details of a specific commit

for example,

commit 0da2f7ad11664ca9ed933c1ccd1f3cd24d481e42

Author: Rep Loop <repl@datacamp.com>

Date:   Wed Sep 5 15:39:18 2018 +0000

    Added year to report title.

diff --git a/report.txt b/report.txt

index e713b17..4c0742a 100644

--- a/report.txt

+++ b/report.txt

@@ -1,4 +1,4 @@

-# Seasonal Dental Surgeries 2017-18

+# Seasonal Dental Surgeries (2017) 2017-18

 TODO: write executive summary.
 
 - first part is the same as "git log"
 - second part is the same as "git diff"
 - lines that are removed are prefixed with "-"
 - lines that are added are prefixed with "+"

#### 2.4 Head

"Hash" is an absolute path as it identifies a specific commit.

"HEAD" -

- equivalent of a relative path
- refers to the most recent commit

"HEAD~[n]" = n commit before the most recent one

In [5]:
$git show

SyntaxError: invalid syntax (3890327760.py, line 1)

#### 2.5 Annotate File

"git log"
- displays the overall history of a project or file

"git annotate file"
- shows who made the lasy change to each line of a file and when
- prints as follows :

04307054        (  Rep Loop     2017-09-20 13:42:26 +0000       1)# Seasonal Dental Surgeries (2017) 2017-18

5e6f92b6        (  Rep Loop     2017-09-20 13:42:26 +0000       2)

5e6f92b6        (  Rep Loop     2017-09-20 13:42:26 +0000       3)TODO: write executive summary.

- (1) first eight digits of the has
- (2) author (Rep Loop)
- (3) time of the commit 
- (4) line number
- (5) contents of the line

#### 2.6 "Git Show"

"git show [ID1]"
- shows the changes made in a particular commit

"git diff [ID1]..[ID2]"
- shows the changes between two commits

"git diff HEAD~ [N]..HEAD~ [NN]"
- shows the changes between the state of the repository one commit in the past and its state NN commits in the past

#### 2.7 Add New Files

"gif add"
- run at least once so Git will track file by default
- untracked files won't have a blob and won't benefit from version control

"git status"
- shows files that are in the repository but aren't being tracked

#### 2.8 Ignore Certain Files

.gitignore
- One can tell Git to stop paying attention to files by creating a file in the root directory of the repository and storing a list of "wildcard" patterns to specify these files

for example, if .gitignore contains :

build

*.mpl

then Git will ignore any file or directory called "build" and any file ending with ".mpl"

#### 2.9 Remove Unwanted Files

Git can help clean up files that Git has been told one doesn't want.

"git clean -n"
- shows a list of files that are in the repository, but whose history Git is not currently tracking.

"git clean -f"
- deletes untracked files

#### 2.10 How Git Is Configured

Git allows changes to its default settings.

"git config --list"
- see what the settings are

"git config --list --system"
- settings for every user on this computer

"git config --list --global"
- settings for every one of the projects

"git config --list --local"
- settings for specific project

Each level overrides the one above it, so
- local settings (per-project) takes precendence over
- global settings (per-user) takes precendence over
- system settings (for all users on the computer)

#### 2.11 Change Git Config

Set name and email for every computer one uses as these record in the log every time one commit a change.

"git config --global [setting] [value]"
- to change a configuration value for all projects on a particular computer
- specify the setting to be changed and the value to be set (name and email addresses are 'user.name' and 'user.email')

### 3.0 Undo

#### 3.1 Commit Changes Selectively

"git add path/to/file"
- for staging a single file

"git reset HEAD"
- unstage a file 

In [6]:
# Exercise 5 : from the ouput of "git status", two files were changed:

$ cd dental
$ git status
On branch masterChanges not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   data/eastern.csv
        modified:   data/northern.csv

no changes added to commit (use "git add" and/or "git commit -a")

#stage only the changes made to data/northern.csv

$ git add data/northern.csv 

$ git commit --amend -m "Adding data from northern region."

SyntaxError: invalid syntax (2581258077.py, line 3)

#### 3.2 Re-Stage Files

Use "git add" periodically to save the most recent changes to a file to the staging area. 

This is particuarly useful when the changes are experimental and one might want to undo them without cluttering up the repository's history.

#### 3.3 Undo Changes to Unstaged Files + Restore An Old Version of a File

"git checkout -- filename"
- discards the changes that have not yet been staged files
- command can be used to go back even further in a file's history and restore versions

Think of "commit" as saving the work and "checkout" as loading that saved version.

Restoring an old version takes two arguments : 
- (1) the hash that identifies the version one wants to restore
- (2) name of the file

Restoring a file doesn't erase any of the repository's history, instead the file is saved as another commit.

"git log -[N] [file]"
- restricts the output to N commits

#### 3.4 Undo Changes to Staged Files

"git reset"
- will unstage files that were previously staged used "git add"


"git reset HEAD path/to/file"
"git checkout -- path/to/file"
- combine these two commands to undo changes to a file that one staged changes to



#### 3.5 Undo All Changes

"git reset HEAD [directory]"
- will unstage any files from the directory

"git reset"
- unstages everything

"git checkout -- [directory]"
- restore the files in the directory to their previous state

"git checkout -- ."
- revert all files in the current directory

### 4.0 Working With Branches

#### 4.1 : Branches

If one does not use version control, a common workflow is to create different subdirectoriese to hold different versions of the project in different states (e.g. "development" and "final). However it is easy to lose track of the versions.

Git supports the creation of branches, which allows for multiple version of the prokect and tracking each version systematically.

Changes to once branch do not affect the other branches until they are merged back together. 

Blobs for files.

Trees for saved states of the repositories.

Commits record the changes. 

Branches are the reason Git needs both trees and commits.

#### 4.2 : How To See What Branches The Repository Has

By default, every Git repository has a branch called "master".

"git branch"
- to list all the of the branches in a repository

The branch one is currently in it will be shown with a " * " beside its name.

#### 4.3 : View The Differences Between Branches

Branches and revisions are closely connected.

"git diff revision-1..revision-2"
- shows the difference between two version of a repository

"git diff branch-1..branch-2"
- shows the difference between two branches

#### 4.4 Switch From One Branch To Another

"git checkout"
- commit has to switch the repository state to that has

"git checkout [branch-name] 
- to switch to that branch
- must commit changes first

#### 4.5 : Create Branch

"git checkout -b [branch-name]
- create a branch then switch to it

The contents of the new branch are initally identical to the contents of the original. Once one starts making changes, they only affect the new branch.

#### 4.6 Merge Two Branches

When merging one branch ("source") into another ("destination"), Git incorporates the changes made to the source into the destination branch.

If the changes don't overlap, the result is a new commit in the destination branch that includes everything from the source branch.

"git merge [source] [destination]"
- merges two branches

Git automatically opens an editor so that a log message for the merge can be written.

#### 4.7 : Conflicts

Sometimes the changes in two branches will conflict with each other.

For example, bug fixes might touch the same lines of code, or analyses in two different branches may both append new (and different) records to a summary data file. In this case, Git relies on the user to reconcile the conflicting changes.

#### 4.8 : Merge Two Branches with Conflicts

If the "git merge" command fails due to a conflict, run "git status" which reminds the user which files have conflicts that need be resolved by printing "both modified:" besides the files' names.

In [None]:
#For example,

$ cd dental
$ git branch

#      alter-report-title
#    * master
#      summary-statistics

$ git merge alter-report-title master

#    Auto-merging report.txt
#    CONFLICT (content): Merge conflict in report.txt
#    Automatic merge failed; fix conflicts and then commit the result.
    
$ git status

#    On branch master
#    You have unmerged paths.
#      (fix conflicts and run "git commit")
#    Unmerged paths:
#      (use "git add <file>..." to mark resolution)
#            both modified:   report.txt
#    no changes added to commit (use "git add" and/or "git commit -a")

$ nano report.txt

# made edits to the file

$ git add report.txt

# add the merged file to the stagign area

$ git commit -m "log message"

### 5.0 : Collaborating

#### 5.1 : Create Brand New Repository

"git init [project-name]"
- create a repository fo ra new project in the current working directory

#### 5.2 : Turn An Existing Project Into A Git Repository

"git init" in the project's root directory or 

"git init /path/to/project"

#### 5.3 : Create a Copy of an Existing Repository

"git clone [URL or /existing/project] [new-project-name]
- create a copy of an existing repository

#### 5.4 : Find Out Where a Cloned Repository Originated

Remotes are stored in the new repository's configuration to remember where the original repository was.

"git remote -v"
- can list the names of its remotes, adding "-v" gives the verbose version

#### 5.5 : Define Remotes

When a repository is cloned, Git automatically creates a remote called "origin" that points to the original repository.

"git remote add [remote-name] [URL]"
- add more remotes

"git remote rm [remote-name]"
- remove exisiting remote

#### 5.6 : Pull In Changes From a Remote Repository

Git keeps track of remote repositories so that one can pull changes from those repositories and push changes to them.

Remote repositories are often a repository in an online hosting service (e.g. Github).

A typical workflow is that the collaborator's work is pulled from the remote repository, do one's own work, and then push the work back to the remote so that the collaborator can access it.

"git pull [remote] [branch]"
- gets everything in branch in the remote repository "remote" and merges it into the current branch of one's local repository

One will not be able to pulling in changes from a remote repository when doing so might overwrite things one has done locally. Either commit the local changes or revert them ("git checkout -- .") and try again.

#### 5.7 : Push My Changes To a Remote Repository

"git push [remote] [branch]"
- pushes changes made locally into a remote repository

In [None]:
# add file to the staging area

$ git add data/northern.csv

# commit changes with message

$ git commit data/northern.csv -m "Added more n/tmp/tmp4zsq2cjr/hookorthern data."

# push changes to the remote repository

$ git push origin master

#### 5.8 : Push Conflicts with Someone Else's Work

Git does not allow one to push changes to a remote repository unless one has merged the contents of the remote repository into one's own work.

In [None]:
# push those changes to the remote repository 

$ git push origin master

# if this command fails, use git pull to bring one's repository 
# up to date.

$ git pull origin master

# try push again now that the remote repository's state has merged
# with the local repository

$ git push origin master