<DIV ALIGN=CENTER>

# Introduction to Unix    
## Professor Robert J. Brunner
  
</DIV>  
-----
-----

## Unix Overview

The Unix operating system is a complex technology that underlies many
current operating systems, including both Linux and Mac OSX. In this
lesson, we will briefly review basic Unix concepts:

- Unix shell
- Unix process model
- Unix filesystem

Next, we will discuss a number of important Unix commands that allow you
to work with a shell.

-----

## The Unix Shell

A standard [Unix](http://en.wikipedia.org/wiki/Unix) system provides a
command-line interface to allow a user to interact with the underlying
hardware. While this may lack the ease of use most modern graphical
operating systems provide, a command line has several benefits:

1. Flexible and powerful access to the underlying hardware
2. Interactions are identical between local and remote hardware
3. Commands can be connected or their input/output redirected.

The standard command-line interface on a Unix system is provided by a
program called a _shell_. While several different shells exist, one of the
more popular is the [Bash
shell](http://en.wikipedia.org/wiki/Bash_(Unix_shell)), which is what we
will use by default in this course.

A shell allows a user to run (multiple) programs, to redirect the input
and output from programs, and to connect programs together by using
pipes. They also allow a user to move around the filesystem, and to
automate basic processing by writing small programs known as shell
scripts.

Commands are entered at the shell prompt. In this course, we can have
one of three different shell prompts:

1. The JupyterHub Server terminal prompt, which is something similar to
`data_scientist@temp_host:~$`.
2. The Host operating system prompt, which for boot2docker is
`bash-3.2$`, which is provided by the Boot2Docker application on Mac OSX
or Windows.
3. The Docker container system prompt, which looks identical to the
JupyterHub Server Terminal prompt, but is obtained by connecting to a
running docker container from within a host shell.

Of these, you will be primarily using only the JupyterHub terminal prompt.

Any commands you enter will include the relevant prompt to help clarify
where you should enter the command text.

-----

## The Unix Process Model

Unix operating systems support running multiple processes at the same
time. In fact a central tenet of Unix is that the sum is greater than
the individual parts, or the combination of multiple running parts gives
greater strength. Even if a user has no running processes, the system
itself will have a number of processes running. A process is identified
by a process identifier, or `pid`. 

Several Unix commands are available for working directly with processes:

### `ps`
used to find out the processes that are current;y running. By default ps
only list the user's processes, although flags can be used to list
system processes or processes being run by other users (on a multi-user
system).

    $ ps
 
### `top`
used to continually monitor running processes to see how they are
consuming resources (e.g., memory or cpu). To exit from a running `top`
display, enter the `q` key.

    $ top

### `kill`
used to send a signal, including the terminate signal to a running
process. To send a signal to a specific process you must have the
correct pid (and have sufficient permission to send the signal). For
example, to send the terminate signal to a running process with `pid` of
323:

    $ kill -9 323

### User Processes
Unix allows a program to be run in the background, thereby allowing the
user continue working in the current shell. You can automatically place
a job in the background by adding an `&` at the end of the program
invocation:

    $ my-program &

You can also halt a currently running program by entering `CTRL-Z` (the
suspend signal), and subsequently sending it _into the background_ by
entering `bg` at the new Unix prompt and hitting return. Likewise, you
can bring a background job _to the front_ by entering `fg` and hitting
return at the Unix prompt. 

If you have more than one job running in the background, you can list
running background jobs with the `jobs` command, which will list each
job along with its _job id_. You can use the _job id_ to selectively
bring a job to the foreground by passing the _job id_ to `fg` at the
Unix prompt.

### Standard Streams
By default Unix processes read input from a standard input stream, known
as `stdin`, write their output to a standard output stream, known as
`stdout`, and report any error conditions to a standard error stream,
known as `stderr`. By default, `stdin` is the keyboard, while `stdout`
and `stderr` are the display. However, these can be changed as needed at
the Unix command line.

The Unix process model allows commands to be connected by employing a
[Unix pipe](http://en.wikipedia.org/wiki/Pipeline_(Unix)) to connect the
`stdout` of one command to the `stdin` of a second command. This process
can continue, allowing multiple processes to be chained together into a
pipeline. This concept will make more sense later when we have covered
more Unix commands, but the basic syntax is to use the pipe character
`|` to connect commands together into a pipeline.

    command1 | command2 | command3
    
-----           

## The Unix Filesystem

The [Unix filesystem](http://en.wikipedia.org/wiki/Unix_filesystem)
provides for data storage and retrieval from the underlying hardware, as
well as interprocess communication through pipes. The Unix filesystem is
based on a single rooted tree model. The root of the tree is known as
the __root__ directory, and is denoted by the `/` character.
Sub-directories branch off from this root directory to form the entire
filesystem hierarchy.

Files and directories have owners and groups (for example I am the owner
of this lesson, but the entire class would have group access). A
special owner is known as root, or the superuser. If you have sufficient
privileges, you can switch to the superuser by using the `sudo` command.
Each entry in the file system has a permission mask that specifies what
the owner, the group, and the entire world (or all) can do to the
particular item.

For example, in our JupyterHub terminal we have the following root directory:

![Docker root](images/docker-root.png)

listing a number of standard directories, including `bin`, `dev`,
`home`, `usr`, `var`, and `lib` subdirectories. Inside each of these
directories can be additional directories or files, thus forming the
entire tree-like structure. To list these directories, we use the
`ls` command.

### `ls`
used to list the contents of a directory. The directory is supplied as a
parameter, for example to list the contents of the root folder:

    $ ls /
  
The `ls` command takes a number of different parameters, two of the more
useful parameters include

- `-a` to list all files and directories. Any entry with a `.` or dot as
the fist character is by default hidden when listing the contents of a
directory.
- `-l` to list the long format of each entry. This is useful to see the
permissions and owner of a directory or file.

On our docker container we can display the full listing for all files in
the `/usr` directory. 

![Docker list](images/docker-ls.png)

In this listing the first two entries show the current directory,
indicated by a single `.` character, and then the parent directory,
indicated by two `.` characters. After this the full directory listing
is shown in alphanumeric sorted order. 

-----

## File Permissions

In a long directory listing the first column specifies the _mode_ and
_permission_ in a specific order: muuugggaaa

m stands for the mode, which can be `d` for a directory or it can be empty, indicated
by a `-` character for a file. Other
[modes](http://en.wikipedia.org/wiki/Unix_file_types) are more advanced
(and beyond this lesson), and include a link, a pipe, a socker and `l` for a
link`.

The next segment contains three _triads_, or permission groupings for
user (u) specific permission, group (g) specific permission, and world
or all (a) specific permissions. There are three types of permissions
available: read, write, and execute, and they are listed in that order.
If a permission is not granted it is indicated by a `-` character in the
relevant position. So `-rwxr-xr-x` means the entry is a file that can be
read, written, and executed by the user, but only read and executed by
the group to which the file belongs or by anyone who can access file.
Unix file permissions can be confusing, but with practice will begin to
make sense.

The owner, group, and permissions can be changed for an item by using
the `chown`, `chgrp`, and `chmod` commands. Each of these commands can take
`-R` as a flag to indicate that the operation should be performed
recursively if the item is a directory. This will change every
sub-directory or file contained within that directory or sub-directory.

### `chown`
used to change the owner of a file or directory. May require superuser
privileges. For example, to change the owner of _myfile_ to user _rb_:

    $ chown rb myfile
    
### `chgrp`
used to change the group of a file or directory. May require superuser
privileges. For example, to change the group of _myfile_ to group _www_:

    $ chgrp www myfile

### `chmod`
used to change specific permissions of a file or directory. May require
superuser privileges. For example, to change the permissions to allow
anyone to read _myfile_:

    $ chmode a+r myfile

note this command also can use octal notation to specify the target
permission, which is more compact, but sometimes more susceptible to
user error.

In our docker container, we only have the _root_ user and _root_ group, thus
we can't try out the first two commands; however, we can change file or
directory permissions.

-----

## Standard Streams

The Unix file system allows for the standard streams to be
[redirected](http://en.wikipedia.org/wiki/Redirection_(computing)). This
was discussed before in the context of process pipes; however, we can
also redirect the standard streams to be files. For example, we can
change `stdin` to be a file so that a command reads its input from the
file, or change `stdout` and/or `stderr` so that the output of a program
is written to a file. In the Bash shell, redirection like this is
accomplished by using the `<`, `>`, `&`, or `2` characters as follows:

## Redirect `stdin`
to redirect `stdin` to be a file called _infile_:

    command < infile

## Redirect `stdout`
to redirect `stdout` to be a file called _outfile_:

    command > outfile


## Redirect `stderr`
to redirect `stderr` to be a file called _errfile_:

    command 2> errfile

## Redirect multiple streams
These redirections can be combined to redirect one or more of `stdin`,
`stdout`, and `stderr`. For example, this will change the `stdin` and
`stdout`.

    command < infile > outfile

To redirect the `stderr` to `stdout`.

    command 2>&1 

To redirect the `stderr` to `stdout` and capture the result in _outfile_.

    command > outfile 2>&1 

### `tee`
When using Unix pipes, you may want to capture the output stream in a
file, while still continuing with the pipe. To do this, you use the `tee`
command. To save the output of `command1` in _myfile_ and also pipe the
output into `command2`:

    command1 | tee myfile | command2

-----

## Unix File System commands

There are a number of Unix commands that we can use to view, move,
create, and change files and directories. Some of the more useful ones
include:        

### `pwd`
used to find out the name of the current working directory.

    $ pwd

### `cd`
used to change the current working directory. If a directory is
specified, we change to that directory, otherwise we change to the
user's home directory. Directory names can be absolute (starting with
the root directory, or `/`) or relative, where we use two `.` characters
to signal the parent directory of the current directory (one `.`
character represents the current directory):

    $ cd /notebooks
    $ cd ..

### `pushd`
a shell builtin command that changes directories, but pushes the current
directory onto a stack variable for alter retrieval.

    $ pushd /notebooks

### `popd`
a shell builtin command that changes directories to the last value
pushed onto the directory stack by using `pushd`.

    $ popd

### `touch`
used to make a new, empty file, with the name specified on the command
line. For example to make a new file called _myfile_:

    $ touch myfile

### `mkdir`
used to make a new directory, with the name specified on the command
line. Might require superuser privileges. For example to make a new
directory called _mytest_:

    $ mkdir mytest

### `rmdir`
used to remove an empty directory. Might require superuser privileges.
For example to delete a directory called mytest:

    $ rmdir mytest

### `rm`
used to remove files or directories. To forcibly remove all entires
(including non-empty directories) you can use the `-rf` flag. For
example, to remove _myfile_:

    $ rm myfile

![Docker rm](images/docker-rm.png)

### <font color ='red'>Warning:</font> Removing files or directories at the Unix command prompt is permenant!

-----

### Additional References

1. A UNIX [Tutorial for Beginners](http://www.ee.surrey.ac.uk/Teaching/Unix/)
2. The [Linux Command Line (PDF)](http://sourceforge.net/projects/linuxcommand/?source=dlp)
3. [Introduction to Linux](http://www.tldp.org/LDP/intro-linux/html/index.html), a hands on guide.
4. A [Bash Shell Programming Introduction](http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html)

-----

### Return to the [Course Index](index.ipynb).

-----