# Git Basics using Jupyter Notebook

---

## Dynamic Computing

Due to the dynamic computing nature of Jupter notebook it is easy to have an analysis up-to-date with dynamic, time changing data

Case-in-point: Linux Git repository

The linux Project is a multi developer, multi user, multi-national, multi documenter open source project with a development history of more than 20 years. The originator of Linux, Linus Torvalds, also developed Git to version control such a complex project as linux.

Let us try to learn the basics of Git using the Linux Git repository itself as a case study :)

The [Linux Git repository](https://github.com/torvalds/linux) is hosted on the GitHub server

This repository hosts the Linux kernel source tree

We will mine the git related data of this repository in this notebook using the commonly used Git commands

## Executing terminal commands in jupyter notebook

[Reference](https://support.anaconda.com/hc/en-us/articles/360023858254-Executing-Terminal-Commands-in-Jupyter-Notebooks)

Using `!`(spelled `BANG!`), one can execute linux terminal commands within the notebook. This is a useful feature if we want to get system related information or use other system tools to do computing.

In this notebook, we will use the following system tools to perform data mining on the linux Gitted repository

- wc - print newline, word, and byte counts for each file
- tail - output the last part of files
- head - output the first part of files

In [18]:
!man head | head -20

HEAD(1)                          User Commands                         HEAD(1)

NAME
       head - output the first part of files

SYNOPSIS
       head [OPTION]... [FILE]...

DESCRIPTION
       Print  the  first  10 lines of each FILE to standard output.  With more
       than one FILE, precede each with a header giving the file name.

       With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are  mandatory  for  short  options
       too.

       -c, --bytes=[-]NUM
              print  the  first  NUM bytes of each file; with the leading '-',
              print all but the last NUM bytes of each file


In [19]:
!git init

Reinitialized existing Git repository in /home/fubar/Documents/GITprojects/MyGitProjects/GitBasics/.git/


In [14]:
!git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m
	[31mMiningGitRepository.ipynb[m
	[31menv-git/[m

nothing added to commit but untracked files present (use "git add" to track)


In [20]:
!git clone https://github.com/torvalds/linux.git

Cloning into 'linux'...
remote: Enumerating objects: 6996587, done.[K
remote: Total 6996587 (delta 0), reused 0 (delta 0), pack-reused 6996587[K
Receiving objects: 100% (6996587/6996587), 2.55 GiB | 4.73 MiB/s, done.
Resolving deltas: 100% (5794196/5794196), done.
Checking out files: 100% (65695/65695), done.


In [22]:
!cd linux/

In [28]:
!git status

On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.gitignore[m
	[31m.ipynb_checkpoints/[m
	[31mMiningGitRepository.ipynb[m
	[31menv-git/[m

nothing added to commit but untracked files present (use "git add" to track)


In [29]:
!pwd

/home/fubar/Documents/GITprojects/MyGitProjects/GitBasics


In [30]:
!git status linux

On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)


In [31]:
!man git | head -20

GIT(1)                            Git Manual                            GIT(1)

NAME
       git - the stupid content tracker

SYNOPSIS
       git [--version] [--help] [-C <path>] [-c <name>=<value>]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           [--super-prefix=<path>]
           <command> [<args>]

DESCRIPTION
       Git is a fast, scalable, distributed revision control system with an
       unusually rich command set that provides both high-level operations and
       full access to internals.

       See gittutorial(7) to get started, then see giteveryday(7) for a useful
       minimum set of commands. The Git User’s Manual[1] has a more in-depth
col: write error
man: col: Segmentation fault (core dumped)
man: command exited with status 127: col -b -p -x | sed -e '/^[[:space:]]*$/{ N; /^[[:space:]]*\n[[:s

In [33]:
!git -C linux status

On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean


In [34]:
!git -C linux log --oneline | wc -l

872387


# Latest 10 versions of linux

In [36]:
!git -C linux tag | tail -10

v5.3-rc4
v5.3-rc5
v5.3-rc6
v5.3-rc7
v5.3-rc8
v5.4-rc1
v5.4-rc2
v5.4-rc3
v5.4-rc4
v5.4-rc5


# 2.3 Viewing the commit history

Our project development progresses as a series of snapshots.
These snapshots are linked to each other by the process of commiting.

Let us check the commit history of the linux repository.
`git log` is the most basic command. By default, this command lists the commits in reverse chronological order.

The linux project is a relatively mature project with a timeline of more than 20 years. The maturity of the project is evident from the total number of commits running more than `850k`. This is extracted by using `git log --oneline` and redirecting output to `wc -l` which counts the lines.

Let us check some of the commit messages. Due to the size of the linux repository, let us limit to the latest 3 commits. For this use `head -3` command as a pipe to the output of `git log`. An alternative command without pipelining is using option `git log -3`. The latter has the advantage of retaining the git color coding scheme.

For the first 3 commits, use `tail -3` command.


The option `-C linux` is to ensure the git processes the linux directory.

### First and latest 3 commits

Most of the commits are patches qualified by the `[PATCH]` header. It seems that there is a convention followed for commit messages for the linux repository.

In [12]:
!git -C linux log --oneline | head -3

e472c64aa4fa Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
320000e72ec0 Merge tag 'iommu-fixes-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
b66b449872d3 Merge tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2


In [13]:
!git -C linux log --oneline -3

[33me472c64aa4fa[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
[33m320000e72ec0[m Merge tag 'iommu-fixes-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
[33mb66b449872d3[m Merge tag 'gfs2-v5.4-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2


In [14]:
!git -C linux log --oneline | tail -3

baaa2c512dc1 [PATCH] Avoid deadlock in sync_page_io by using GFP_NOIO
8d38eadb7a97 [PATCH] mmtimer build fix
1da177e4c3f4 Linux-2.6.12-rc2


### Power of `git log`

Some of the popular options of `git log` are:

- `-p` or `--patch` which shows the difference introduced in each commit.
- `-<number>` or `-n <number>` or `--max-count=<number>`: limit the number of commits to <number>
- `--stat` give the diff stats for each commit

In [10]:
!git -C linux log -p -2

[33mcommit e472c64aa4fa6150c6076fd36d101d667d71c30a[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m
Merge: 320000e72ec0 bacdcb6675e1
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Oct 31 07:34:09 2019 +0000

    Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
    
    Pull dmaengine fixes from Vinod Koul:
     "A few fixes to the dmaengine drivers:
    
       - fix in sprd driver for link list and potential memory leak
    
       - tegra transfer failure fix
    
       - imx size check fix for script_number
    
       - xilinx fix for 64bit AXIDMA and control reg update
    
       - qcom bam dma resource leak fix
    
       - cppi slave transfer fix when idle"
    
    * tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
      dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
      dmaengine: qcom: bam_dma: Fix resource leak
      dmaengine: sprd: Fix the 

In [11]:
!git -C linux log --stat -2

[33mcommit e472c64aa4fa6150c6076fd36d101d667d71c30a[m[33m ([m[1;36mHEAD -> [m[1;32mmaster[m[33m)[m
Merge: 320000e72ec0 bacdcb6675e1
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Oct 31 07:34:09 2019 +0000

    Merge tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
    
    Pull dmaengine fixes from Vinod Koul:
     "A few fixes to the dmaengine drivers:
    
       - fix in sprd driver for link list and potential memory leak
    
       - tegra transfer failure fix
    
       - imx size check fix for script_number
    
       - xilinx fix for 64bit AXIDMA and control reg update
    
       - qcom bam dma resource leak fix
    
       - cppi slave transfer fix when idle"
    
    * tag 'dmaengine-fix-5.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
      dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle
      dmaengine: qcom: bam_dma: Fix resource leak
      dmaengine: sprd: Fix the 

### Limiting log output

`--since=2.weeks` or `--since='2019-11-01'` or `since='1 year 2 day 3 minutes ago'`

`--author=<author>`

Let us check how many commits Linus Torvalds has made to the repository.

Let us check how many commits are madee since 01 October 2019 to the repository

Let us check how many commits are made by Linus Torvals since 01 November 2019 to the repository


In [15]:
!git -C linux log --oneline --author='Linus Torvalds' | wc -l 

28417


In [16]:
!git -C linux log --oneline --author=Linus | wc -l

31846


In [17]:
!git -C linux log --oneline --author=Torvalds | wc -l 

28417


In [20]:
!git -C linux log --oneline --since=2019-10-01 | wc -l

1277


In [21]:
!git -C linux log --oneline --since=2019-10-01 --author='Linus Torvalds' | wc -l

142


In [22]:
!git -C linux log --oneline --author='Vinod Koul' | wc -l

991


### Git's "pickaxe" -S option

????

git log -S function_name

In [None]:
!git -C linux log --oneline -S 'alsa'

### Check the remote source to the current clone of the repository

In [2]:
!git -C linux remote --verbose

origin	https://github.com/torvalds/linux.git (fetch)
origin	https://github.com/torvalds/linux.git (push)


### Fetch the latest commits to the repository

In [4]:
!git -C linux fetch origin

remote: Enumerating objects: 1812, done.[K
remote: Counting objects: 100% (1812/1812), done.[K
remote: Total 2292 (delta 1812), reused 1812 (delta 1812), pack-reused 480[K
Receiving objects: 100% (2292/2292), 699.73 KiB | 1.36 MiB/s, done.
Resolving deltas: 100% (1922/1922), completed with 683 local objects.
From https://github.com/torvalds/linux
   e472c64aa4fa..56cfd2507d3e  master     -> origin/master
