# Basic Commands In the Unix Shell

For windows 10 users, activate [bash](http://www.numerama.com/tech/158150-le-shell-bash-sous-windows-10-ce-quil-faut-savoir.html).


## Unix Shell
The shell is a command programming language that provides an interface to the UNIX operating system. Documentation of unix command is displayed by command `man`. Exemple:
```bash
man ls
```



In [1]:
!man man

man(1)                                                                  man(1)



NNAAMMEE
       man - format and display the on-line manual pages

SSYYNNOOPPSSIISS
       mmaann  [--aaccddffFFhhkkKKttwwWW]  [----ppaatthh]  [--mm _s_y_s_t_e_m] [--pp _s_t_r_i_n_g] [--CC _c_o_n_f_i_g___f_i_l_e]
       [--MM _p_a_t_h_l_i_s_t] [--PP _p_a_g_e_r] [--BB _b_r_o_w_s_e_r] [--HH _h_t_m_l_p_a_g_e_r] [--SS  _s_e_c_t_i_o_n___l_i_s_t]
       [_s_e_c_t_i_o_n] _n_a_m_e _._._.


DDEESSCCRRIIPPTTIIOONN
       mmaann formats and displays the on-line manual pages.  If you specify _s_e_c_-
       _t_i_o_n, mmaann only looks in that section of the manual.  _n_a_m_e  is  normally
       the  name of the manual page, which is typically the name of a command,
       function, or file.  However, if _n_a_m_e contains  a  slash  (//)  the

## Directories
The shell should start you in your home directory. This is your individual space on the UNIX system for your files. You can find out the name of your current working directory by typing:
```sh
pwd
```

In the terminal, type the letters 'p', 'w', 'd', and then "enter" - always conclude each command by pressing the "enter" key. The response that follows on the next line will be the name of your home directory, where the name following the last slash should be your username.) The directory structure can be conceptualized as an inverted tree.

In the jupyter notebook, unix shell command can be executed using the escape character "!". You can type command directly in a terminal without the "!".

In [2]:
!pwd

/Users/navaro/big-data


Some unix command (not all) are also jupyter magic command like %pwd

In [3]:
%pwd

'/Users/navaro/big-data'

## Home directory

No matter where in the directory structure you are, you can always get back to your home directory by typing:
```sh
cd
```

### Create a new subdirectory named "primer" :
```sh
mkdir primer
```

In [4]:
%rm -rf primer  # remove primer directory if it exists
%mkdir  primer  # make the new directory 

Now change to the "primer" subdirectory, making it your current working directory:
```sh
cd primer
pwd
```

In [5]:
%cd primer

/Users/navaro/big-data/primer


## Files

Create a file using `date` command and `whoami`:
```sh
date >> first.txt
whoami >> first.txt
```
date and whoami are not jupyter magic commands

In [6]:
!date >> first.txt
!whoami >> first.txt

### List files and directories
Files live within directories. You can see a list of the files in your "primer" directory (which should be your current working directory) by typing:
```sh
ls
```

In [7]:
%ls

first.txt


### Display file content
You can view a text file with the following command:
```sh
cat first.txt
```
("cat" is short for concatenate - you can use this to display multiple files together on the screen.) If you have a file that is longer than your 24-line console window, use instead "more" to list one page at a time or "less" to scroll the file down and up with the arrow keys. Don't use these programs to try to display binary (non-text) files on your console - the attempt to print the non-printable control characters might alter your console settings and render the console unusable.

In [8]:
%cat first.txt

Jeu  9 nov 2017 21:56:02 CET
navaro


- Copy file "first" using the following command:

```sh
cp first.txt 2nd.txt
```
By doing this you have created a new file named "2nd.txt" which is a duplicate of file "first.txt". Geet he file listing with:
```sh
ls
```

In [9]:
%cp first.txt 2nd.txt
%ls

2nd.txt    first.txt


- Now rename the file "2nd" to "second":
```sh
mv 2nd.txt second.txt
```
Listing the files still shows two files because you haven't created a new file, just changed an existing file's name:
```sh
ls
```

In [10]:
%mv 2nd.txt second.txt
%ls

first.txt   second.txt


If you "cat" the second file, you'll see the same sentence as in your first file:
```sh 
cat second.txt
```

In [11]:
%cat second.txt

Jeu  9 nov 2017 21:56:02 CET
navaro


"mv" will allow you to move files, not just rename them. Perform the following commands:
```sh 
mkdir sub
mv second.txt sub
ls sub
ls
```
(where "username" will be your username and "group" will be your group name). Among other things, this lists the creation date and time, file access permissions, and file size in bytes. The letter 'd' (the first character on the line) indicates the directory names.

In [12]:
%mkdir sub
%mv second.txt sub
%ls sub

second.txt


This creates a new subdirectory named "sub", moves "second" into "sub", then lists the contents of both directories. You can list even more information about files by using the "-l" option with "ls":

In [13]:
%ls -l

total 8
-rw-r--r--  1 navaro  staff   36  9 nov 21:56 first.txt
drwxr-xr-x  3 navaro  staff  102  9 nov 21:56 [34msub[m[m/


Next perform the following commands:
```sh
cd sub
pwd
ls -l
cd ..
pwd
```


In [14]:
# go to sub directory
%cd sub 
# current working directory
%pwd  
# list files with permissions
%ls -l 
# go to parent directory
%cd ..  
# current working directory
%pwd     

/Users/navaro/big-data/primer/sub
total 8
-rw-r--r--  1 navaro  staff  36  9 nov 21:56 second.txt
/Users/navaro/big-data/primer


'/Users/navaro/big-data/primer'

Finally, clean up the duplicate files by removing the "second.txt" file and the "sub" subdirectory:
```sh
rm sub/second.txt
rmdir sub
ls -l
cd
```
This shows that you can refer to a file in a different directory using the relative path name to the file (you can also use the absolute path name to the file - something like "/Users/username/primer/sub/second.txt", depending on your home directory). You can also include the ".." within the path name (for instance, you could have referred to the file as "../primer/sub/second.txt").

In [15]:
%rm sub/second.txt
%rmdir sub
%ls -l
%cd ..
%rm -r primer

total 8
-rw-r--r--  1 navaro  staff  36  9 nov 21:56 first.txt
/Users/navaro/big-data


## Connect to a server

Remote login to another machine can be accomplished using the "ssh" command:
```
ssh -l myname host
```
where "myname" will be your username on the remote system (possibly identical to your username on this system) and "host" is the name (or IP address) of the machine you are logging into. 

Transfer files between machines using "scp". 
- To copy file "myfile" from the remote machine named "host":
```sh
scp myname@host:myfile .
```
- To copy file "myfile" from the local machine to the remote named "host":
```sh
scp myfile myname@host:
```
(The "." refers to your current working directory, meaning that the destination for "myfile" is your current directory.)

## Summary Of Basic Shell Commands
```sh
% pico myfile	#text edit file "myfile"
% ls	#list files in current directory
% ls -l	#long format listing
% touch myfile  #create new empty file "myfile"
% cat myfile	#view contents of text file "myfile"
% more myfile	#paged viewing of text file "myfile"
% less myfile	#scroll through text file "myfile"
% head myfile   #view 10 first lines of text file "myfile"
% tail myfile   #view 10 last lines of text file "myfile"
% cp srcfile destfile	#copy file "srcfile" to new file "destfile"
% mv oldname newname	#rename (or move) file "oldname" to "newname"
% rm myfile	#remove file "myfile"
% mkdir subdir	#make new directory "subdir"
% cd subdir	#change current working directory to "subdir"
% rmdir subdir	#remove (empty) directory "subdir"
% pwd	#display current working directory
% date	#display current date and time of day
% ssh -l myname host	#remote shell login of username "myname" to "host"
% scp myname@host:myfile .	#remote copy of file "myfile" to current directory
% scp myfile myname@host:	#copy of file "myfile" to remote server
% firefox &	#start Firefox web browser (in background)
% jobs      # display programs running in background
% kill %n  # kill job number n (use jobs to get this number)
% man -k "topic"	#search manual pages for "topic"
% man command	#display man page for "command"
% exit	#exit a terminal window
% logout	#logout of a console session
```

## Redirecting 

Redirection is usually implemented by placing characters <,>,|,>> between commands.

- Use  > to redirect output.
```bash
ls *.ipynb > file_list.txt
```
executes `ls`, placing the output in file_list.txt, as opposed to displaying it at the terminal, which is the usual destination for standard output. This will clobber any existing data in file1.

In [29]:
!ls *.ipynb > file_list.txt

- Use < to redirect input.
```bash
wc < file_list.txt
```
executes `wc`, with file_list.txt as the source of input, as opposed to the keyboard, which is the usual source for standard input.



### Python example

In [34]:
%%file test_stdin.py
#!/usr/bin env python3
import sys

# input comes from standard input
k = 0
for file in sys.stdin:
    k +=1
    print('file {} : {}'.format(k,file))

Overwriting test_stdin.py


In [35]:
!python3 test_stdin.py < file_list.txt

file 1 : 01.MapReduce.ipynb

file 2 : 02.Containers.ipynb

file 3 : 03.ParallelComputation.ipynb

file 4 : 04.AsynchonousProcessing.ipynb

file 5 : 05.UnixCommands.ipynb

file 6 : 06.Hadoop.ipynb

file 7 : 07.PySpark.ipynb

file 8 : 08.Dask.ipynb



You can combine the two capabilities: read from an input file and write to an output file.

In [36]:
!python3 test_stdin.py < file_list.txt > output.txt

In [37]:
%cat output.txt

file 1 : 01.MapReduce.ipynb

file 2 : 02.Containers.ipynb

file 3 : 03.ParallelComputation.ipynb

file 4 : 04.AsynchonousProcessing.ipynb

file 5 : 05.UnixCommands.ipynb

file 6 : 06.Hadoop.ipynb

file 7 : 07.PySpark.ipynb

file 8 : 08.Dask.ipynb



To append output to the end of the file, rather than clobbering it, the >> operator is used:

date >> output.txt

It will append the today date to the end of the file output.txt


In [38]:
!date >> output.txt
%cat output.txt

file 1 : 01.MapReduce.ipynb

file 2 : 02.Containers.ipynb

file 3 : 03.ParallelComputation.ipynb

file 4 : 04.AsynchonousProcessing.ipynb

file 5 : 05.UnixCommands.ipynb

file 6 : 06.Hadoop.ipynb

file 7 : 07.PySpark.ipynb

file 8 : 08.Dask.ipynb

Jeu  9 nov 2017 22:01:19 CET


## Permissions
Every file on the system has associated with it a set of permissions. Permissions tell UNIX what can be done with that file and by whom. There are three things you can (or can't) do with a given file:
- read,
- write (modify),
- execute.

Unix permissions specify what can 'owner', 'group' and 'all' can do. 

If you try ls -l on the command prompt you get something like the following:
```bash
-rw-r--r--  1 navaro  staff   15799  5 oct 15:57 01.MapReduce.ipynb
-rw-r--r--  1 navaro  staff   18209 12 oct 16:04 02.Containers.ipynb
-rw-r--r--  1 navaro  staff   37963 12 oct 21:28 03.ParallelComputation.ipynb
```


Three bits specify access permissions: 
- **r** read,
- **w** access,
- **w** execute. 



### Example 
```
rwxr-xr--
```
- the owner can do anything with the file, 
- group owners and the  can only read or execute it. 
- rest of the world can only read

## chmod
To set/modify a file's permissions you need to use the chmod program. Of course, only the owner of a file may use chmod to alter a file's permissions. chmod has the following syntax: 
```bash
chmod [options] mode file(s)
```

- The 'mode' part specifies the new permissions for the file(s) that follow as arguments. A mode specifies which user's permissions should be changed, and afterwards which access types should be changed.
- We use `+` or `-` to change the mode for owner, group and the rest of the world.
- The permissions start with a letter specifying what users should be affected by the change.

Original permissions of script.py are `rw-------`

- `chmod u+x script.py` set permissions to `rwx------`
- `chmod a+x script.py` set permissions to `rwx--x--x`
- `chmod g+r script.py` set permissions to `rwxr-x--x`
- `chmod o-x script.py` set permissions to `rwxr-x---`
- `chmod og+w script.py` set permissions to `rwxrwx-w-`






## Download files

```bash
mkdir books
wget -q -O books/972.txt http://www.gutenberg.org/ebooks/972.txt.utf-8
wget -e use_proxy=yes -e http_proxy=127.0.0.1:8080 
```

## Pipelining
```bash
ls | grep ipynb
```
executes `ls`, using its output as the input for `grep`.

### Exercice 5.1

- Pipe `cat *.ipynb` output to `sort` command.
- Pipe `ls` output to `wc` command.
- Pipe `cat 06.UnixCommands.ipynb` to `less` command.


## Chained pipelines

The redirection and piping tokens can be chained together to create complex commands. 

### Exercice 5.2

Use unix commands chained to display word count of file `sample.txt`.

Hints:

- `fmt -n` takes text as input and reformats it into  paragraphs with no line longer than n. 
- `sort` sort the output alphabetically
- `tr -d str` delete the string str from the output
- `uniq -c` writes a copy of each unique input and precede each word with the count of the number of occurences.



In [41]:
from lorem import text
with open('sample.txt', 'w') as f:
    f.write(text())

In [48]:
!cat sample.txt | tr [:upper:] [:lower:] | tr -d "." | fmt -1 | sort | uniq -c | sort -r

   9 quisquam
   6 porro
   6 numquam
   6 non
   5 velit
   5 sed
   5 modi
   5 etincidunt
   5 amet
   4 voluptatem
   4 tempora
   4 dolorem
   4 aliquam
   4 adipisci
   3 sit
   3 quiquia
   3 ipsum
   3 eius
   3 dolor
   2 ut
   2 neque
   2 magnam
   2 labore
   2 
   1 quaerat
   1 est
   1 dolore
   1 consectetur


### Exercice 5.3

- Create a python script mapper.py to count words from stdin. The script prints out every word found in stdin with the value 1 separate by a tab.
```text
Consectetur	1
adipisci	1
quiquia	1
sit	1
```


In [67]:
%%file mapper.py
#!/usr/bin/env python3
from __future__ import print_function # for python2 compatibility
import sys, string
translator = str.maketrans('', '', string.punctuation)
# input comes from standard input
for line in sys.stdin:
    line = line.strip().lower() # remove leading and trailing whitespace
    line = line.translate(translator)   # strip punctuation 
    words = line.split() # split the line into words
    # increase counters
    for word in words:
        # write the results to standard output;
        # what we output here will be the input for the
        # Reduce step, i.e. the input for reducer.py
        # tab-delimited; the trivial word count is 1
        print ('%s\t%s' % (word, 1))

Overwriting mapper.py


File `mapper.py` must be executable.

In [68]:
!chmod +x mapper.py

In [69]:
!cat sample.txt | ./mapper.py | sort | uniq -c | sort -r

   9 quisquam	1
   6 porro	1
   6 numquam	1
   6 non	1
   5 velit	1
   5 sed	1
   5 modi	1
   5 etincidunt	1
   5 amet	1
   4 voluptatem	1
   4 tempora	1
   4 dolorem	1
   4 aliquam	1
   4 adipisci	1
   3 sit	1
   3 quiquia	1
   3 ipsum	1
   3 eius	1
   3 dolor	1
   2 ut	1
   2 neque	1
   2 magnam	1
   2 labore	1
   1 quaerat	1
   1 est	1
   1 dolore	1
   1 consectetur	1


### Exercice 5.4

- Create a python script reducer.py to read output from mapper.py. The script prints out every word and number of occurences.
```bash
cat sample.txt | ./mapper.py | ./reducer.py
```

```text
7	porro
7	eius
6	non
6	dolore
```



In [78]:
%%file reducer.py
#!/usr/bin/env python3
from __future__ import print_function
import sys

current_word = None
current_count = 0

for line in sys.stdin:
    
    # parse the input we got from mapper.py
    word = line.split('\t')[0]
    if current_word is None:
        current_word = word

    # this IF-switch only works because Hadoop sorts map output
    # by key (here: word) before it is passed to the reducer
    if current_word == word:
        current_count += 1
    else:
        # write result to sys.stdout
        print ('{}\t{}'.format(current_count, current_word))
        current_count = 0
        current_word = None




Overwriting reducer.py


In [79]:
!cat sample.txt | ./mapper.py | sort | ./reducer.py | sort -r

8	quisquam
5	porro
5	numquam
5	non
5	etincidunt
4	velit
4	sed
4	modi
4	dolorem
4	amet
4	adipisci
3	tempora
3	quiquia
3	dolor
3	aliquam
2	sit
2	ipsum
2	eius
1	ut
1	neque
1	magnam
1	labore
