# 3.2 Linux/Unix Operating System, GNU, editors, git 

**Before we start:**
* Point out update to table of commonly used format specifications at the end of Python notebook 1.1
 
In this notebook we switch between Bash and Python3 kernerl. Check on the top right next to the circle. We start with Bash. 

## Review

### Terminal FAQs
* How do commands entered on the terminal work: options and arguments.
* Where am I in the file structure? What is the [hierachical tree structure](https://docstore.mik.ua/orelly/unix/upt/ch01_19.htm)? 
* What is the difference between and absolute and a relative path name? And what does that imply for the `cp` command? 
* What does the `cat` command do? Is there a `mouse` or a `dog` command? 
* I really did not get this example 
```bash
 ls -laRt / > dirs.txt
 zip dirs.zip dirs.txt
``` 
Can you unpack that for me?
* Can you demonstrate what the difference between these command line editors is? Which one should I choose? And do I really need to use one? Can't I just use the built-in JupyterLab editor?

### Git FAQs
* What is a repository anyways?
* What is the difference between a snapshot, a version and a commit?
* I have created a new file in a repository directory. I want to add it to the repository and connect it to GitLab? What should I do?
* I have modified a file. I forgot if it is already under version control. How can I find that out?
* OK, turns out the file is tracked. How do I make add this modification to the GitLab repository?
* Can you remind me of the basic workflow of git?
* You said you updated the _course repo_ on GitLab and that we should _pull_ it. What does that mean and what should I actually do? 

## Git - part II

### Additional useful commands
command | what it does
--------|--------------
`git log --all --oneline --decorate --graph` | shows a formatted log of past commits
`git branch`  | create a new branch
`git checkout branchname` | switch to branch, or restore current branch to last commit
`git clean` | removes all untracked files

* Demonstrate how to use `git checkout` to retrieve a deleted or changed tracked file from last commit
* `git clean` can be dangerous; use with options `-i` (interactive), `-n` (dry run), `-d` (directory) and `-f` (force). 

#### Branching & merging

A key feature of any DVC system is to support development work on a development branch while the master branch stays intact with the latest working version. This is a very useful advanced feature that many people use all the time.  

```shell
git branch dev_abc
git checkout dev_abc
```

Any changes you now make to the repository are commited to that branch. When you try to push the commit to the remote you will be asked to create the remote branch, and what command to use for that. 

You can look at the changes introduced in the `dev_abc` branch online on GitLab page. 

If you switch the `HEAD` back to `master` (`git checkout master`) you may use `git merge` to merge the `dev_abc` branch with your main branch:
```shell
git merge dev_abc
```


### How to get and stay out of trouble
The most common way to get confused with your git repo is when you say `git pull` and you get a message like this:
```
error: Your local changes to the following files would be overwritten by merge:
        README.md
Please commit your changes or stash them before you merge.
Aborting
```
What has happened? Well, for a start - read the error message. Git is very good at providing useful messages. This situation happens when you have made changes to a tracked file that are not committed. Then git does not know how to merge them, but it also will not overwrite your changes. 

What can you do? Decide if you want to keep the changes, then add/commit. Or if you want to discard, then checkout the file, i.e. recreate it from the last commit with `git checkout filename`.

Then repeat the `git pull` command. 

You can avoid the problem by being deliberate in what you do in your repo and generally keep it clean. Make changes to a notebook and commit. Remove files that you do not want to track. 

There are _best practice_ solutions to all possible problems in git, such as undoing a commit, etc. For a beginner an easier, even if a bit more radical method to extract yourself is to copy untracked and/or uncommited files out of the repo, remove the repo, clone it again and copy the files in question back into the repo. 

## Advanced terminal and shell commands
In this section we will cover additional commands to be used in the terminal.

### System operations via Python
There are Python modules that allow you to do operating system or terminal commands via Python. This allows you to do things inside Python that you would otherwise do with the shell. 

**Temporarily switch kernel to Python 3.**

In [None]:
import sys

Which version of Python

In [None]:
print(sys.version) 

Create 5 files with names `samples_nn.dat` where `nn` takes numbers from 8 to 12. Each file contains the number `nn`. Then create a directory called `samples` and move the files into the directory.

In [None]:
import subprocess

In [None]:
for i in range(7,12):
    f = open("samples_"+str(i)+".dat",'w')
    f.write(str(i))
    f.close()

In [None]:
!ls sam*dat

Change the program so that the file names have always the same number of characters by padding single-digit numbers with zeros. This can be done with the string method `zfill`.

In [None]:
for i in range(7,12):
    fname = "samples_"+str(i).zfill(2)+".dat"
    f = open(fname,'w')
    f.write(str(i))
    f.close()

In [None]:
subprocess.call(['mkdir','samples'])
for i in range(7,12):
    fname = "samples_"+str(i).zfill(2)+".dat"
    f = open(fname,'w')
    f.write(str(i))
    f.close()
    subprocess.call(['mv',fname,'samples'])

In [None]:
cat samples/samples_07.dat

There are other libraries that do similar things, such as `os` and `shutil`.

**Switch kernel back to Bash.**

### More file system commands
command | what it does
--------|--------------
  `df` | report file system disk space usage  
  `more/less/tail/head` | pager commands that show all or part of a text or data file
  `touch` | change file timestamps 
  `grep`  | print lines matching a pattern
  `wc` | print newline, word, and byte counts for each file
  `sed` | stream editor for filtering and transforming text
  `rmdir` | remove empty directory
  `history` | show command history

#### Examples:
* List directory content

In [None]:
ls 

* Show the top 10 lines of the file `Index.md`:

In [None]:
head Index.md

* How many lines does the entire file have?

In [None]:
wc Index.md

* Check the `man` page to see what the other output is.
* Check the `man` page to find out how to show the first 15 lines. Then, pipe the output of that command into the `wc` command. What should be the result?
* Use a combination of the `head` and `tail` command to show just line number 15.
* Scroll through the entire `Index.md` file using the `less` command. Try the `/` command for searching and `g` and `G` for going to the top and bottom.

#### More examples:
* Again, at the terminal use a pager command to look how a notebook file actually looks like.
* The following command looks for all lines that contain the words _superduper_ in the file name `3.2_Linux_terminal_git.ipynb` - i.e. this notebook file:

In [None]:
# for next cell make sure you are in the course repository directory
!pwd

* Show the lines that contain the word "superduper" 

In [None]:
grep "superduper" "3.2_Linux_terminal_git.ipynb" 

* Demonstrate `history` 

### More about the shell
The most common shell to use these days is called the [Bash](https://en.wikipedia.org/wiki/Bash_(Unix_shell)) shell. It is the interactive program responsible for many things you do at the command line. It is also a scriptable program language, which means you can write Bash programs (also called scripts). We will not use this much in this class, but it is very useful to know if its existence and to be aware of the basic concepts explained below.

#### Processes
command | what it does
--------|--------------
  `\|`  | a shell feature called a _pipe_ which allows you to pipe the output of one command as input into another command
  `>` | redirect the output of a command into a file
  `top` | show all processes
  `ps` | report a snapshot of the current processes
  `jobs` | show currently running jobs or programs
  `kill` | send a signal to a process
  `nohub` and `&` and `nice` | send job to background
  Crl-C | press Ctrl (control) and C at the same time - this kills an interactive process
  Crl-Z | press Ctrl (control) and Z at the same time - this stops an interactive process


* How many lines are there in the file `3.2_Linux_terminal_git.ipynb` that contain the word `superduper`?

In [None]:
grep "superduper" "3.2_Linux_terminal_git.ipynb" | wc -l

Modify file content with stream-line editor `sed`. The following replaces the word _superduper_ with the word _mountain goat_ in this file and write the result to standard output, which is the redirected to the file `tt.ipynb`. Have a look at that notebook.

In [None]:
sed s/"superduper"/"mountain goat"/g 3.2_Linux_terminal_git.ipynb > tt.ipynb

Want to learn more about stream-line-editing? Check the `sed` man page.

* Another example for output redirection

In [None]:
ls | grep READ > out.txt
cat out.txt

#### Variables 
Like any programming language Bash allows to define and work with variables, and show the content with the `echo` command:

In [None]:
name="Edward"
names="Alfred Paul Ellis Roxie Sam"

echo The name is $name
echo The other names are $names

A special class of variables are those that have special meaning to the environment, and will be recognized by the shell. They can be set in the `.bashrc` file, see below. Examples for environment variables are: 

Variable | Comment
--|--
`EDITOR` | The command line editor to be used when needed, e.g. when the `git commit` command is used without the `-m` option (see [Git notebook](2._Intro_Git.ipynb)
 `PATH` | A list of absolute directory file path addresses where the shell will look for executables.
 
##### Assigning output of command to variable
A command like `ls` will print the output to the terminal. If you would like to do an operation on each file in a directory, for example, it is useful to be able to interpret the output of a command as a string. The _back quotes_ (left, top on your keyboard below the `Esc` key) will accomplish that. Consider the following example:

In [None]:
files=`ls`
echo $files

##### Access substring
Often is is required to do something with just a part of a string. A substring can be accessed in the following way (experiment with the two number arguments to figure out what they mean):

In [None]:
echo ${files:3:12}

#### Regular expressions
* `*` stands for _any sequence of characters
* `?` stands for exactly any one character
* `[1-5]` or `[1,4,d]` stands for range or list of characters

These can be used, for example in the `ls` or `mv` or `cp` commands.

#### Escaping special characters
Study the following examples that demonstrate working with variables and shows how the special meaning of the `$` character can be escaped.

In [None]:
ABC=Orwell.1984
echo The variable is shown with the command 
echo "echo \$ABC"

In [None]:
echo The value of the variable ABC is $ABC

##### String manipulation
```
set 
basename
dirname
```

In [None]:
set -- abc def 123 fg10
echo $1

In [None]:
astr="abc def 123 fg10"
set -- $astr
echo $2

Note that in the above example `1`, `2`, `3` etc are _set_ to be variables that contain the words provided in sequence. The list starts with `1` (and not with `0` as it would in Python).

In [None]:
basename /usr/local/share/doc/foo/foo.txt

In [None]:
dirname /usr/local/share/doc/foo/foo.txt

### Networking
This is a list of common networking commands. We will not need these (much) in this course, but it is useful for you to know they exist.

command | what it does
--------|--------------
  `ssh` | login to a remote computer
  `scp` | cp a file from/to a remote computer
  `whoami` | who am i
  `rsync`  | incrementally synchronize files and directory accross the network
  `ftp`  | file-transfer protocal, transfer files from remote locations using the ftp protocol
  `wget` | transfer files from remote locations using the http protocol
```

* `wget` example: download file `example.sh` file from the internet:


In [None]:
wget https://raw.githubusercontent.com/UVic-CompPhys/physmath248-2018/master/examples/example.sh

### Shell scripts

Combine a sequence of shell commands into a file and use as a program called in this case a _shell script_. You have to make the file with the script executable (`chmod u+x file_name.sh`). See file [`example.sh`](./examples/example.sh). The `#` character signals a comment line. You can execute a shell script by just typing the name of the shell script file and hit Return:
```
./example.sh
```


The `./` at the beginning ensures that I am getting the command `example.sh` from the file in this current directory, and not in some other directory in the `PATH` environment variable. The `PATH` environment variable is a list of directories in which the shell will look for executable files. One common use case is that people create a `bin` directory (bin for binary which usually implies executable, but shell scripts are `ASCII` files and can also be executed). They place all personal programs and shell scripts in that `bin` dir. Then add the `$HOME/bin` dir to the `PATH` variable. Now, those programs will be available from anywhere in your directory tree. The `HOME` environemnt variable contains the path name of your home directory. Try `echo $HOME` on the command line.

#### Customize your shell/CLI
You can define variables and aliases in the .bashrc file. The details on how this is set up depends on the particular Linux/Unix/Mac flavour.

```
   alias
   environment variables
```

At startup the shell will execute the shell commands in `.bashrc`.
There may already be a `.bashrc` in your home directory, have a look in there, it may suggest that you add your own aliases to a `.bashrc` file. Else add aliases and environment variables directly to the `.bashrc` file. Add:
* an environment variable of your workspace, e.g. `export PATH=$PATH:$HOME/bin`
* an alias to redefine the `rm` command to always ask if a file should really be removed
* an alias for your editor   

#### Flow control in bash

Bash provides elaborate flow control that allows you to create powerful tools to maninupate files in a file system.

In [None]:
names="Alfred Paul Ellis Roxie Sam"
for name in $names
do
echo Hello $name! How are you doing? > $name.txt
done

# and check output:
ls *txt
cat $name.txt

In [None]:
if [ -f Sam.txt ] 
then
echo Sam.txt does already exist.
fi

In [None]:
# Another way to do this is the logical "and" operator "&&"
# think about it ....
[ -f Sam.txt ] && echo Sam.txt does already exist.

`&&` is the _and_ operator, and `||` is the _or_ operator. Try an example with the '||' operator.

#### Dot files (configuration files, hidden files)
We have seen that `ls -a` shows us _all_ files, including the _hidden_ files. These are files that start with a `.` and are usually configuration files of programs and applications that one does not need to see most of the time. These so-called _dot_ files contain default settings and set environment variables. Environment variables are bash variables that have a special meaning. An example is the `EDITOR` variable that sets your default choice of command-line editor. E.g., if your choice of command-line editor is `nano` then you could set the `EDITOR` variable to that value:
```
export EDITOR=nano
```
Here `export` makes sure the variable is known in all child processes as well.

#### Example
The following cell writes a file on the command line using `cat`:

In [None]:
cat > ~/.bashrc << %% 
export EDITOR="nano"
export PATH=$PATH:$HOME/bin
alias ed="emacs -nw"
alias git_log='git log --all --oneline --decorate --graph'
alias rm="rm -i"
%%

In order to make the changes to the your `.bashrc` file known to the shell you _source_ your `.bashrc` file:

```
source .bashrc
```

Now try `echo $EDITOR` to see if that environment variable has been set. 
Whenever a new terminal is started the `.bashrc` is automatically sourced.

#### Convert a notebook
Individual notebooks can be converted to static formats using the terminal command `jupyter nbconvert --to html notebookname.ipynb` or via the _Download as_ option in the _File_ menu.

### Resources
You can find numerous online tutorials and support resources on the internet, such as:
* <http://linuxcommand.org/lc3_learning_the_shell.php>
* <http://www.emacswiki.org/emacs/LearningEmacs>
* [bash tutorial](http://www.funtoo.org/Bash_by_Example,_Part_1)
* [youtube shell/terminal tutorial](https://www.youtube.com/watch?v=oxuRxtrO2Ag) covers some of what we did in class, and then some more
