<div align='center'>
<h1>The Structure of Git</h1>
<h2>A git tutorial for (computer) scientists</h2>
<h3>Dylan Simon, Flatiron Institute</h3>
</div>

<div align='center'>
<h1>Part 1: Data Structures</h1>
<h2>(Commands you'll never use again)</h2>
</div>

## git repository environment

Create a new repository in any existing directory

In [1]:
# Prepare demo environment

GIT=`which git`
declare -A git_helped=()
git() {
    if [[ -z ${git_helped[$1]} ]] ; then
        whatis -l "git-$1" >&2
        git_helped[$1]=1
    fi
    $GIT "$@"
}

rm -rf ~/myrepo

export GIT_AUTHOR_NAME="Dylan Simon" EMAIL="dylan-gst@dylex.net"
export GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME GIT_PAGER=cat

In [2]:
mkdir -p ~/myrepo
cd ~/myrepo
git init

git-init (1)         - Create an empty Git repository or reinitialize an existing one
Initialized empty Git repository in /mnt/xfs1/home/dylan/myrepo/.git/


You can turn any (existing) directory into a git repository.  You'll more often create a new one based on an existing repository using clone, as we'll see later.  For now, we'll just start with an empty directory.

In [3]:
ls .git

[0m[00mHEAD[0m  [07mbranches[0m  [00mconfig[0m  [00mdescription[0m  [07mhooks[0m  [07minfo[0m  [07mobjects[0m  [07mrefs[0m


All this does is create a .git directory with some stuff in it.  Normally you won't touch any of this stuff directly, but we'll see what some of it does.

In [4]:
cat .git/config

[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true


One of the files in this directory you might look at is config.  Git config files are simple ini-type files.  There is also your "global" config in ~/.gitconfig.

In [5]:
git config --local -l

git-config (1)       - Get and set repository or global options
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true


However, there is also a git command that lets you manipulate config files, so you don't have to edit them yourself.

In [6]:
git config -h
git config --help

usage: git config [options]

Config file location
    --global              use global config file
    --system              use system config file
    --local               use repository config file
    -f, --file <file>     use given config file

Action
    --get                 get value: name [value-regex]
    --get-all             get all values: key [value-regex]
    --get-regexp          get values for regexp: name-regex [value-regex]
    --replace-all         replace all matching variables: name value [value_regex]
    --add                 add a new variable: name value
    --unset               remove a variable: name [value-regex]
    --unset-all           remove all matches: name [value-regex]
    --rename-section      rename section: old-name new-name
    --remove-section      remove a section: name
    -l, --list            list all
    -e, --edit            open an editor
    --get-color <slot>    find the color configured: [default]
    --get-colorbool <slot>
         

           contain line breaks.

       --get-colorbool name [stdout-is-tty]
           Find the color setting for name (e.g.  color.diff) and output
           "true" or "false".  stdout-is-tty should be either "true" or
           "false", and is taken into account when configuration says "auto".
           If stdout-is-tty is missing, then checks the standard output of the
           command itself, and exits with status 0 if color is to be used, or
           exits with status 1 otherwise. When the color setting for name is
           undefined, the command uses color.ui as fallback.

       --get-color name [default]
           Find the color configured for name (e.g.  color.diff.new) and
           output it as the ANSI color escape sequence to the standard output.
           The optional default parameter is used instead, if there is no
           color configured for name.

       -e, --edit
           Opens an editor to modify the specified config file; either
           --sys

       You can have [section] if you have [section "subsection"], but you
       don’t need to.

       There is also a deprecated [section.subsection] syntax. With this
       syntax, the subsection name is converted to lower-case and is also
       compared case sensitively. These subsection names follow the same
       restrictions as section names.

       All the other lines (and the remainder of the line after the section
       header) are recognized as setting variables, in the form name = value.
       If there is no equal sign on the line, the entire line is taken as name
       and the variable is recognized as boolean "true". The variable names
       are case-insensitive, allow only alphanumeric characters and -, and
       must start with an alphabetic character. There can be more than one
       value for a given variable; we say then that the variable is
       multivalued.

       Leading and trailing whitespace in a variable value is discarded.
       Internal whitesp

           twice faster than normal Cygwin l/stat() functions. True by
           default, unless core.filemode is true, in which case
           ignoreCygwinFSTricks is ignored as Cygwin’s POSIX emulation is
           required to support core.filemode.

       core.ignorecase
           If true, this option enables various workarounds to enable Git to
           work better on filesystems that are not case sensitive, like FAT.
           For example, if a directory listing finds "makefile" when Git
           expects "Makefile", Git will assume it is really the same file, and
           continue to remember it as "Makefile".

           The default is false, except git-clone(1) or git-init(1) will probe
           and set core.ignorecase true if appropriate when the repository is
           created.

       core.precomposeunicode
           This option is only used by Mac OS implementation of Git. When
           core.precomposeunicode=true, Git reverts the unicode decomposition
    

           Instead of the default "symref" format for HEAD and other symbolic
           reference files, use symbolic links. This is sometimes needed to
           work with old scripts that expect HEAD to be a symbolic link.

       core.bare
           If true this repository is assumed to be bare and has no working
           directory associated with it. If this is the case a number of
           commands that require a working directory will be disabled, such as
           git-add(1) or git-merge(1).

           This setting is automatically guessed by git-clone(1) or git-
           init(1) when the repository was created. By default a repository
           that ends in "/.git" is assumed to be not bare (bare = false),
           while all other repositories are assumed to be bare (bare = true).

       core.worktree
           Set the path to the root of the working tree. This can be
           overridden by the GIT_WORK_TREE environment variable and the
           --work-tree 

           for a password can be told to use an external program given via the
           value of this variable. Can be overridden by the GIT_ASKPASS
           environment variable. If not set, fall back to the value of the
           SSH_ASKPASS environment variable or, failing that, a simple
           password prompt. The external program shall be given a suitable
           prompt as command line argument and write the password on its
           STDOUT.

       core.attributesfile
           In addition to .gitattributes (per-directory) and
           .git/info/attributes, Git looks into this file for attributes (see
           gitattributes(5)). Path expansions are made the same way as for
           core.excludesfile. Its default value is
           $XDG_CONFIG_HOME/git/attributes. If $XDG_CONFIG_HOME is either not
           set or empty, $HOME/.config/git/attributes is used instead.

       core.editor
           Commands such as commit and tag that lets you edit messages by


           See git-rev-parse(1).

       am.keepcr
           If true, git-am will call git-mailsplit for patches in mbox format
           with parameter --keep-cr. In this case git-mailsplit will not
           remove \r from lines ending with \r\n. Can be overridden by giving
           --no-keep-cr from the command line. See git-am(1), git-
           mailsplit(1).

       apply.ignorewhitespace
           When set to change, tells git apply to ignore changes in
           whitespace, in the same way as the --ignore-space-change option.
           When set to one of: no, none, never, false tells git apply to
           respect all whitespace differences. See git-apply(1).

       apply.whitespace
           Tells git apply how to handle whitespaces, in the same way as the
           --whitespace option. See git-apply(1).

       branch.autosetupmerge
           Tells git branch and git checkout to set up new branches so that
           git-pull(1) will appropriately merge from the 

           remote-tracking branches, tags, stash and HEAD, respectively.

       color.grep
           When set to always, always highlight matches. When false (or
           never), never. When set to true or auto, use color only when the
           output is written to the terminal. Defaults to false.

       color.grep.<slot>
           Use customized color for grep colorization.  <slot> specifies which
           part of the line to use the specified color, and is one of

           context
               non-matching text in context lines (when using -A, -B, or -C)

           filename
               filename prefix (when not using -h)

           function
               function name lines (when using -p)

           linenumber
               line number prefix (when using -n)

           match
               matching text

           selected
               non-matching text in selected lines

           separator
               separators between fields on a line (:, -, and =) 

               parameter is given.

           lines
               Compute the dirstat numbers by doing the regular line-based
               diff analysis, and summing the removed/added line counts. (For
               binary files, count 64-byte chunks instead, since binary files
               have no natural concept of lines). This is a more expensive
               --dirstat behavior than the changes behavior, but it does count
               rearranged lines within a file as much as other changes. The
               resulting output is consistent with what you get from the other
               --*stat options.

           files
               Compute the dirstat numbers by counting the number of files
               changed. Each changed file counts equally in the dirstat
               analysis. This is the computationally cheapest --dirstat
               behavior, since it does not have to look at the file contents
               at all.

           cumulative
               

           unconditionally recurse into submodules when set to true or to not
           recurse at all when set to false. When set to on-demand (the
           default value), fetch and pull will only recurse into a populated
           submodule when its superproject retrieves a commit that updates the
           submodule’s reference.

       fetch.fsckObjects
           If it is set to true, git-fetch-pack will check all fetched
           objects. It will abort in the case of a malformed object or a
           broken link. The result of an abort are only dangling objects.
           Defaults to false. If not set, the value of transfer.fsckObjects is
           used instead.

       fetch.unpackLimit
           If the number of objects fetched over the Git native transfer is
           below this limit, then the objects will be unpacked into loose
           object files. However if the number of received objects equals or
           exceeds this limit then the received pack will b


       gitcvs.allbinary
           This is used if gitcvs.usecrlfattr does not resolve the correct -kb
           mode to use. If true, all unresolved files are sent to the client
           in mode -kb. This causes the client to treat them as binary files,
           which suppresses any newline munging it otherwise might do.
           Alternatively, if it is set to "guess", then the contents of the
           file are examined to decide if it is binary, similar to
           core.autocrlf.

       gitcvs.dbname
           Database used by git-cvsserver to cache revision information
           derived from the Git repository. The exact meaning depends on the
           used database driver, for SQLite (which is the default driver) this
           is a filename. Supports variable substitution (see git-cvsserver(1)
           for details). May not contain semicolons (;). Default:
           %Ggitcvs.%m.sqlite

       gitcvs.dbdriver
           Used Perl DBI driver. You can specify any


       guitool.<name>.prompt
           Specifies the general prompt string to display at the top of the
           dialog, before subsections for argprompt and revprompt. The default
           value includes the actual command.

       help.browser
           Specify the browser that will be used to display help in the web
           format. See git-help(1).

       help.format
           Override the default help format used by git-help(1). Values man,
           info, web and html are supported.  man is the default.  web and
           html are the same.

       help.autocorrect
           Automatically correct and execute mistyped commands after waiting
           for the given number of deciseconds (0.1 sec). If more than one
           command can be deduced from the entered text, nothing will be
           executed. If the value of this option is negative, the corrected
           command will be executed immediately. If the value is 0 - the
           command will be just sho

           log(1) for details.

       log.decorate
           Print out the ref names of any commits that are shown by the log
           command. If short is specified, the ref name prefixes refs/heads/,
           refs/tags/ and refs/remotes/ will not be printed. If full is
           specified, the full ref name (including prefix) will be printed.
           This is the same as the log commands --decorate option.

       log.showroot
           If true, the initial commit will be shown as a big creation event.
           This is equivalent to a diff against an empty tree. Tools like git-
           log(1) or git-whatchanged(1), which normally hide the root commit
           will now show it. True by default.

       log.mailmap
           If true, makes git-log(1), git-show(1), and git-whatchanged(1)
           assume --use-mailmap.

       mailmap.file
           The location of an augmenting mailmap file. The default mailmap,
           located in the root of the repository, is l


       mergetool.keepTemporaries
           When invoking a custom merge tool, Git uses a set of temporary
           files to pass to the tool. If the tool returns an error and this
           variable is set to true, then these temporary files will be
           preserved, otherwise they will be removed after the tool has
           exited. Defaults to false.

       mergetool.prompt
           Prompt before each invocation of the merge resolution program.

       notes.displayRef
           The (fully qualified) refname from which to show notes when showing
           commit messages. The value of this variable can be set to a glob,
           in which case notes from all matching refs will be shown. You may
           will be issued for refs that do not exist, but a glob that does not
           match any refs is silently ignored.

           This setting can be overridden with the GIT_NOTES_DISPLAY_REF
           environment variable, which must be a colon separated list of refs


           you understand the implications (see git-rebase(1) for details).

       pull.octopus
           The default merge strategy to use when pulling multiple branches at
           once.

       pull.twohead
           The default merge strategy to use when pulling a single branch.

       push.default
           Defines the action git push should take if no refspec is given on
           the command line, no refspec is configured in the remote, and no
           refspec is implied by any of the options given on the command line.
           Possible values are:

           ·   nothing - do not push anything.

           ·   matching - push all branches having the same name in both ends.
               This is for those who prepare all the branches into a
               publishable shape and then push them out with a single command.
               It is not appropriate for pushing into a repository shared by
               multiple users, since locally stalled branches will attemp

           Git versions over the native protocol are unaffected by this
           option.

       rerere.autoupdate
           When set to true, git-rerere updates the index with the resulting
           contents after it cleanly resolves conflicts using previously
           recorded resolution. Defaults to false.

       rerere.enabled
           Activate recording of resolved conflicts, so that identical
           conflict hunks can be resolved automatically, should they be
           encountered again. By default, git-rerere(1) is enabled if there is
           an rr-cache directory under the $GIT_DIR, e.g. if "rerere" was
           previously used in the repository.

       sendemail.identity
           A configuration identity. When given, causes values in the
           sendemail.<identity> subsection to take precedence over values in
           the sendemail section. The default identity is the value of
           sendemail.identity.

       sendemail.smtpencryption
        

           will be pushed to. In cases where some site serves a large number
           of repositories, and serves them with multiple access methods, some
           of which do not allow push, this feature allows people to specify a
           pull-only URL and have Git automatically use an appropriate URL to
           push, even for a never-before-seen repository on the site. When
           more than one pushInsteadOf strings match a given URL, the longest
           match is used. If a remote has an explicit pushurl, Git will ignore
           this setting for that remote.

       user.email
           Your email address to be recorded in any newly created commits. Can
           be overridden by the GIT_AUTHOR_EMAIL, GIT_COMMITTER_EMAIL, and
           EMAIL environment variables. See git-commit-tree(1).

       user.name
           Your full name to be recorded in any newly created commits. Can be
           overridden by the GIT_AUTHOR_NAME and GIT_COMMITTER_NAME
           en

You can get quick help on any git command with `git CMD -h`, or longer help with `--help` (same as man `git-CMD`).  Just to show you there is extensive documentation available.

## git as an object (file) store

Create a new object based on some data (e.g., file contents)

In [7]:
echo 'Hello World!' > f
git hash-object -t blob -w f
rm f

git-hash-object (1)  - Compute object ID and optionally creates a blob from a file
980a0d5f19a64b4b30a87d4206aade58726b60e3


Unique identifier for this data: each distinct file gets its own **hash**

In [8]:
echo -e 'blob 13\0Hello World!' | sha1sum

980a0d5f19a64b4b30a87d4206aade58726b60e3  -


In [9]:
git cat-file -t 980a0d5f19a64b4b30a87d4206aade58726b60e3 # object type
git cat-file -s 980a # any unique prefix of hash         # object size
git cat-file -p 980a0d5                                  # contents
file1=980a0d5f19a64b4b30a87d4206aade58726b60e3 # save for later

git-cat-file (1)     - Provide content or type and size information for repository objects
blob
13
Hello World!


In [10]:
file2=$( echo 'Something completely different.' \
         | git hash-object -t blob -w --stdin )
echo $file2

1a0985327d433bdfc3ea3c2b0a0443b3545064ac


In [11]:
git cat-file -p $file2

Something completely different.


### Where'd the data go?

In [12]:
find .git/objects -type f

.git/objects/98/0a0d5f19a64b4b30a87d4206aade58726b60e3
.git/objects/1a/0985327d433bdfc3ea3c2b0a0443b3545064ac


## Collecting objects: trees (directories)

In [13]:
( echo -e "100644 blob $file1\\thello.txt" \
; echo -e "100644 blob $file2\\tother.txt" \
) | git mktree

git-mktree (1)       - Build a tree-object from ls-tree formatted text
011ed906a8c5b0c0c14c0cad0a69d3969251b71f


A directory with two files, references to their contents by hash

In [14]:
tree1=011ed906a8c5b0c0c14c0cad0a69d3969251b71f
git cat-file -t $tree1
git cat-file -p $tree1

tree
100644 blob 980a0d5f19a64b4b30a87d4206aade58726b60e3	hello.txt
100644 blob 1a0985327d433bdfc3ea3c2b0a0443b3545064ac	other.txt


In [15]:
( echo -e "100644 blob $file1\\tREADME" \
; echo -e "040000 tree $tree1\\tstuff" \
) | git mktree

git-mktree (1)       - Build a tree-object from ls-tree formatted text
c3595f6745f977f2450eeeb5bd94ccd2e4fba498


Another directory, containing the first directory, nested

In [16]:
tree2=c3595f6745f977f2450eeeb5bd94ccd2e4fba498
git cat-file -p $tree2

100644 blob 980a0d5f19a64b4b30a87d4206aade58726b60e3	README
040000 tree 011ed906a8c5b0c0c14c0cad0a69d3969251b71f	stuff


In [17]:
git ls-tree -tr $tree2

git-ls-tree (1)      - List the contents of a tree object
100644 blob 980a0d5f19a64b4b30a87d4206aade58726b60e3	README
040000 tree 011ed906a8c5b0c0c14c0cad0a69d3969251b71f	stuff
100644 blob 980a0d5f19a64b4b30a87d4206aade58726b60e3	stuff/hello.txt
100644 blob 1a0985327d433bdfc3ea3c2b0a0443b3545064ac	stuff/other.txt


## A *tree* is "snapshot" of a directory

Just as a *blob* is a snapshot of a file

In [18]:
file3=$( echo 'New and improved.' \
         | git hash-object -t blob -w --stdin )
tree2a=$( ( echo -e "100644 blob $file3\\tREADME" \
          ; echo -e "040000 tree $tree1\\tstuff" \
          ) | git mktree )
echo $tree2a
git ls-tree -tr $tree2a

git-mktree (1)       - Build a tree-object from ls-tree formatted text
674e727fabfeb840b5c4e36f2c33610dfb50458e
100644 blob f25e220dd7c5d3082f9754786f7fd6fcae6db473	README
040000 tree 011ed906a8c5b0c0c14c0cad0a69d3969251b71f	stuff
100644 blob 980a0d5f19a64b4b30a87d4206aade58726b60e3	stuff/hello.txt
100644 blob 1a0985327d433bdfc3ea3c2b0a0443b3545064ac	stuff/other.txt


### Comparing trees
Comparing snapshots is what git's for

In [19]:
git diff-tree -p $tree2 $tree2a

git-diff-tree (1)    - Compares the content and mode of blobs found via two tree objects
diff --git a/README b/README
index 980a0d5..f25e220 100644
--- a/README
+++ b/README
@@ -1 +1 @@
-Hello World!
+New and improved.


## Versioning: commit (revision)
A *commit* is a tree and some metadata

In [20]:
export GIT_AUTHOR_DATE="2017-06-20T18:00:00"
export GIT_COMMITTER_DATE=$GIT_AUTHOR_DATE

In [21]:
commit1=$( git commit-tree -m "Added some text files to my tree" $tree2 )
echo $commit1

git-commit-tree (1)  - Create a new commit object
2e3e05b09b77bef997d7c789eb92c670e3b3ec88


In [22]:
git cat-file -p $commit1

tree c3595f6745f977f2450eeeb5bd94ccd2e4fba498
author Dylan Simon <dylan@dylex.net> 1497996000 -0400
committer Dylan Simon <dylan@dylex.net> 1497996000 -0400

Added some text files to my tree


   * Tree hash
   * Author name, email, timestamp
   * Committer name, email, timestamp (if someone else re-commits changes with annotations)
   * Commit message: arbitrary text (with `-m` or in editor)

In [23]:
git config --get 'user.name' # ~/.gitconfig

Dylan Simon


  * *Parent* commit(s), pointing to previous revision(s)

In [24]:
export GIT_AUTHOR_DATE="2017-06-22T15:00:00"
export GIT_COMMITTER_DATE=$GIT_AUTHOR_DATE

In [25]:
commit2=$( git commit-tree -p $commit1 -m "Improved README with new stuff" $tree2a )
echo $commit2

git-commit-tree (1)  - Create a new commit object
7c3b7428f484576aae429c427dd12f608a8276e4


In [26]:
git cat-file -p $commit2

tree 674e727fabfeb840b5c4e36f2c33610dfb50458e
parent 2e3e05b09b77bef997d7c789eb92c670e3b3ec88
author Dylan Simon <dylan@dylex.net> 1498158000 -0400
committer Dylan Simon <dylan@dylex.net> 1498158000 -0400

Improved README with new stuff


By walking back through the chain of parents to the root, you can see the previous states (history) of the repository

Commits do not represent "diffs": they represent a snapshot state of the files
<div align='right'>But git can easily construct these diffs</div>

Commits can have multiple parents, in the case of *merges* (but usually only 2).

Commit objects form a DAG (directed, acyclic graph)
<div align='right'><em>Why?</em></div>

## Examining commit objects

#### Reference objects indirectly
   * `COMMIT^`: commit's (first) parent
   * `COMMIT~N`: same as `COMMIT^^^^` with *N* `^` (*N*th ancestor)
   * `COMMIT^{/TEXT}`: most recent ancestor with `TEXT` in its commit message
   * `COMMIT:PATH`: the tree or blob object for `PATH` within `COMMIT`

In [27]:
git show "$commit2^{/text files}:stuff/hello.txt"

git-show (1)         - Show various types of objects
Hello World!


In [28]:
git log --graph $commit2

git-log (1)          - Show commit logs
* [33mcommit 7c3b7428f484576aae429c427dd12f608a8276e4[m
[31m|[m Author: Dylan Simon <dylan@dylex.net>
[31m|[m Date:   Thu Jun 22 15:00:00 2017
[31m|[m 
[31m|[m     Improved README with new stuff
[31m|[m  
* [33mcommit 2e3e05b09b77bef997d7c789eb92c670e3b3ec88[m
  Author: Dylan Simon <dylan@dylex.net>
  Date:   Tue Jun 20 18:00:00 2017
  
      Added some text files to my tree


* `git log [START..]END`: start with END and display each parent (stopping before START)
* `git log -- PATH ...`: only commits with changes under `PATH`
* `git log -p`: include diffs for each commit (can add other `diff` options, too)
* `git log --graph`: include the commit graph
* Many more to filter/search/reformat commit display

In [29]:
git diff $commit1 $commit2

git-diff (1)         - Show changes between commits, commit and working tree, etc
[1mdiff --git a/README b/README[m
[1mindex 980a0d5..f25e220 100644[m
[1m--- a/README[m
[1m+++ b/README[m
[36m@@ -1 +1 @@[m
[31m-Hello World![m
[32m+[m[32mNew and improved.[m


* `git diff COMMIT1 COMMIT2 [-- PATH ...]`: show changes between two commits, restricted to changes under `PATH`
* `git diff --stat`: summarize changes, one line per file
* `git diff --word-diff`: show edits to individual words instead of lines
* Many other standard `diff` options (`-a`, `-b`, `-U5`, ...)

### What about renames?

Git doesn't know anything about renames... but it can guess

* `git diff -M<p>`: detect renames as files at least *p*0% similar
* `git diff -C<p>`: detect copies as well

## Commit *refs* (branches...)

Commits are static, immutable objects, so we need some way to represent the changing state of a repository.

In [30]:
git update-ref refs/heads/master $commit1
git show-ref

git-update-ref (1)   - Update the object name stored in a ref safely
git-show-ref (1)     - List references in a local repository
2e3e05b09b77bef997d7c789eb92c670e3b3ec88 refs/heads/master


In [31]:
git update-ref refs/heads/master $commit2
cat .git/refs/heads/master

7c3b7428f484576aae429c427dd12f608a8276e4


By convention (and because all the higher-level git commands expect it), branch names start with `refs/heads/`, but can be any valid file name after that, like `dylan/refactor` or `release/2.x`.

In [32]:
git show master

[33mcommit 7c3b7428f484576aae429c427dd12f608a8276e4[m
Author: Dylan Simon <dylan@dylex.net>
Date:   Thu Jun 22 15:00:00 2017

    Improved README with new stuff

[1mdiff --git a/README b/README[m
[1mindex 980a0d5..f25e220 100644[m
[1m--- a/README[m
[1m+++ b/README[m
[36m@@ -1 +1 @@[m
[31m-Hello World![m
[32m+[m[32mNew and improved.[m


### ... And tags, *revisions*

In [33]:
export GIT_COMMITTER_DATE="2017-06-27 11:00:00"

References can also represent tags, under `refs/tags/` by convention.  Unlike branch references, you usually don't move tags after they're created.

In [34]:
git tag -m "Presented at SciCon 2017-06-27" scicon17 master
git show-ref

git-tag (1)          - Create, list, delete or verify a tag object signed with GPG
7c3b7428f484576aae429c427dd12f608a8276e4 refs/heads/master
0dabf67c491a5b1094b5ace27137dc6490c1186d refs/tags/scicon17


Tags are also a type of object, the last kind we'll see, containing just a name, annotation (message), and commit hash.  Notice than naming references is rather flexible, as you can leave off `refs/` and another part (`tags/`, `heads/`, or `remotes/`).

In [35]:
git cat-file -p tags/scicon17

object 7c3b7428f484576aae429c427dd12f608a8276e4
type commit
tag scicon17
tagger Dylan Simon <dylan@dylex.net> 1498575600 -0400

Presented at SciCon 2017-06-27


*Revision*: any way to refer to a commit object (branch, tag, hash, `^`, ...)

## Isn't this wasteful?

In [36]:
echo $commit2 | git pack-objects --revs pack
ls -l pack-*

git-pack-objects (1) - Create a packed archive of objects
Counting objects: 8, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (5/5), done.
b7935c25be9722a8f9a11512b5e08be15506ac7e
Writing objects: 100% (8/8), done.
Total 8 (delta 0), reused 0 (delta 0)
-r--r--r-- 1 dylan dylan 1296 Jun 27 16:53 [0m[00mpack-b7935c25be9722a8f9a11512b5e08be15506ac7e.idx[0m
-r--r--r-- 1 dylan dylan  666 Jun 27 16:53 [00mpack-b7935c25be9722a8f9a11512b5e08be15506ac7e.pack[0m


In [37]:
rm -f pack-*

#### Linux kernel:
   * raw source: 700MB
   * commits: 600k (since 2005)
   * git repo: 1.3GB
   
 "Don't worry about it."

<div align='center'>
<h1>Part 2: Workflows?</h1>
<h2>Commands you'll use every day</h2>
</div>

## *index*: "cache" between filesystem and trees
Going between your local *working tree* (disk) and git

In [38]:
git read-tree $tree2
git ls-files

git-read-tree (1)    - Reads tree information into the index
git-ls-files (1)     - Show information about files in the index and the working tree
README
stuff/hello.txt
stuff/other.txt


In [39]:
git checkout-index -a
ls -R

git-checkout-index (1) - Copy files from the index to the working tree
.:
[0m[38;5;203mREADME[0m  [07mstuff[0m

./stuff:
[38;5;161mhello.txt[0m  [38;5;161mother.txt[0m


Working with the index is one of the most important parts of day-to-day git

### From files to index
*Staging* changes to the index, from local changes made on disk

In [40]:
echo 'New and improved.' > README
git diff

[1mdiff --git a/README b/README[m
[1mindex 980a0d5..f25e220 100644[m
[1m--- a/README[m
[1m+++ b/README[m
[36m@@ -1 +1 @@[m
[31m-Hello World![m
[32m+[m[32mNew and improved.[m


In [41]:
git add README

git-add (1)          - Add file contents to the index


In [42]:
git write-tree
echo $tree2a

git-write-tree (1)   - Create a tree object from the current index
674e727fabfeb840b5c4e36f2c33610dfb50458e
674e727fabfeb840b5c4e36f2c33610dfb50458e


In [43]:
git rm -f stuff/other.txt
git ls-files

git-rm (1)           - Remove files from the working tree and from the index
rm 'stuff/other.txt'
README
stuff/hello.txt


In [44]:
git mv README README.md
ls

git-mv (1)           - Move or rename a file, a directory, or a symlink
[0m[00mREADME.md[0m  [07mstuff[0m


#### quick reference

   * `git add -u [FILE|DIR] ...`: update *existing* index files from disk
   * `git add     FILE|DIR  ...`: update index files from disk (creating new index entries)
   * `git add -A [FILE|DIR] ...`: update index to exactly match disk (including *removing* index entries)
   * `git add -p            ...`: *interactively* ask what to "stage" to index
   * `git mv SRC DST`: rename file/dir on disk and in index
   * `git rm         FILE`: remove file from disk and index (if no local changes)
   * `git rm -r       DIR`: remove dir from disk and index (if no local changes)
   * `git rm -f       ...`: remove from disk and index (even if they don't match!)
   * `git rm --cached ...`: remove from index only (but not disk)

### From index to files
*Checking out* the index to overwrite local changes

In [45]:
echo 'maybe not...' >> README.md
git diff

[1mdiff --git a/README.md b/README.md[m
[1mindex f25e220..63a9b90 100644[m
[1m--- a/README.md[m
[1m+++ b/README.md[m
[36m@@ -1 +1,2 @@[m
 New and improved.[m
[32m+[m[32mmaybe not...[m


In [46]:
git checkout -- README.md
git diff-files

git-checkout (1)     - Checkout a branch or paths to the working tree
git-diff-files (1)   - Compares files in the working tree and the index


#### quick reference

   * `git checkout -- FILE|DIR ...`: overwrite files on disk from index
   * `git checkout -p`: interactively ask what to "discard" from disk

`git checkout` does other things before the `--`

## Committing: from index to branches

### *HEAD*: your current branch

In [47]:
git symbolic-ref HEAD
cat .git/HEAD

git-symbolic-ref (1) - Read, modify and delete symbolic refs
refs/heads/master
ref: refs/heads/master


A symbolic ref is a reference to another ref, kind of like a symbolic link or "pointer pointer".  `HEAD` is a pointer to the current branch (name).

In [51]:
git checkout -b dylan/work # shows brief status of local 
git branch -va

D	README
A	README.md
D	stuff/other.txt
Switched to a new branch 'dylan/work'
* [32mdylan/work[m 7c3b742 Improved README with new stuff
  master    [m 7c3b742 Improved README with new stuff


#### quick reference

   * `git checkout REVISION`: switch to a branch (or "detached" revision)
   * `git checkout -b NEWBRANCH [REV]`: create and switch to a new branch pointing to REVISION or `HEAD` (`git branch` just creates)
   * `git checkout -m`: switch to a branch and bring along un-committed changes on disk
   * `git checkout REV -- PATH ...`: overwrite files on disk from revision (but don't switch branches)
   * `git branch -va`: list all branches and latest commit
   * `git branch -m OLDBRANCH NEWBRANCH`: rename branch
   * `git branch -d OLDBRANCH`: delete branch

In [49]:
man gitglossary

GITGLOSSARY(7)                    Git Manual                    GITGLOSSARY(7)



NAME
       gitglossary - A Git Glossary

SYNOPSIS
       *

DESCRIPTION
       alternate object database
           Via the alternates mechanism, a repository can inherit part of its
           object database from another object database, which is called
           "alternate".

       bare repository
           A bare repository is normally an appropriately named directory with
           a .git suffix that does not have a locally checked-out copy of any
           of the files under revision control. That is, all of the Git
           administrative and control files that would normally be present in
           the hidden .git sub-directory are directly present in the
           repository.git directory instead, and no other files are present
           and checked out. Usually publishers of public repositories make
           bare repositories available.

       blob object
           Untyped object,

           to be pre-verified and potentially aborted, and allow for a
           post-notification after the operation is done. The hook scripts are
           found in the $GIT_DIR/hooks/ directory, and are enabled by simply
           removing the .sample suffix from the filename. In earlier versions
           of Git you had to make them executable.

       index
           A collection of files with stat information, whose contents are
           stored as objects. The index is a stored version of your working
           tree. Truth be told, it can also contain a second, and even a third
           version of a working tree, which are used when merging.

       index entry
           The information regarding a particular file, stored in the index.
           An index entry can be unmerged, if a merge was started, but not yet
           finished (i.e. if the index contains multiple versions of that
           file).

       master
           The default development branch. Wheneve

           $GIT_DIR/refs/ directory, or in the $GIT_DIR/packed-refs file.

       reflog
           A reflog shows the local "history" of a ref. In other words, it can
           tell you what the 3rd last revision in this repository was, and
           what was the current state in this repository, yesterday 9:14pm.
           See git-reflog(1) for details.

       refspec
           A "refspec" is used by fetch and push to describe the mapping
           between remote ref and local ref.

       remote-tracking branch
           A regular Git branch that is used to follow changes from another
           repository. A remote-tracking branch should not contain direct
           modifications or have local commits made to it. A remote-tracking
           branch can usually be identified as the right-hand-side ref in a
           Pull: refspec.

       repository
           A collection of refs together with an object database containing
           all objects which are reachable from th