# Introduction to Shell

## Chapter 1: Manipulating files and directories

In [None]:
# Current directory
! pwd

In [None]:
# List contents
! ls

In [None]:
# Change directory
! cd directory_name

# Move home
! cd ~

# Move up a directory
! cd ..

In [None]:
# Copy files (overwrites file if already exists)
! cp original.txt duplicate.txt

# Copy multiple files in the backup folder
! cp seasonal/autumn.csv seasonal/winter.csv backup

In [None]:
# Move files up a directory
! mv autumn.csv winter.csv ..

In [None]:
# Rename files using mv also
! mv course.txt old-course.txt

In [None]:
# Remove files (removes both files here)
! rm thesis.txt backup/thesis-2017-08.txt

# Remove directories (only works if the directory is empty)
! rmdir old_directory

# Create directory
! mkdir new_directory

## Chapter 2: Manipulating Data

In [None]:
# View a file's contents
! cat course.txt

# View piece by piece
! less seasonal/spring.csv seasonal/summer.csv
:n # next file
:q # quit
    
# View first 10 lines of a file
! head seasonal/summer.csv

# View the first 5 files of a file
! head -n 5 seasonal/winter.csv

In [None]:
# List everything below a directory (-R means recursive)
! ls -R

In [None]:
# -F flag makes it a little easier to read
! ls -R -F 

In [None]:
# get help
! man head

In [None]:
# Show everything except the first 6 rows
! tail -n +7 file_name

In [None]:
# Select columns of a file
! cut -f 2-5,8 -d , values.csv

In [None]:
# See what you just ran
! history

### Grep:

`head` and `tail` select rows, `cut` selects columns, and `grep` selects lines according to what they contain. In its simplest form, `grep` takes a piece of text followed by one or more filenames and prints all of the lines in those files that contain that text. For example, `grep bicuspid seasonal/winter.csv` prints lines from winter.csv that contain "bicuspid".

`grep` can search for patterns as well; we will explore those in the next course. What's more important right now is some of grep's more common flags:

* -c: print a count of matching lines rather than the lines themselves
* -h: do not print the names of files when searching multiple files
* -i: ignore case (e.g., treat "Regression" and "regression" as matches)
* -l: print the names of files that contain matches, not the matches
* -n: print line numbers for matching lines
* -v: invert the match, i.e., only show lines that don't match

In [None]:
! grep bicuspid seasonal/winter.csv

In [None]:
# Combine files
! paste -d , seasonal/autumn.csv seasonal/winter.csv

## Chapter 3: Combining tools

#### Storing outputs

In [None]:
# Store first 5 rows of summer.csv as top.csv
! head -n 5 seasonal/summer.csv > top.csv

#### Using commands output as input

In [None]:
# Get lines 3-5 from winter.csv
! head -n 5 seasonal/winter.csv > top.csv
! tail -n 3 top.csv

# Use pipe command to do both at once
! head -n 5 seasonal/summer.csv | tail -n 3

# Find elements from column 2 that don't match to "Tooth
! cut -d , -f 2 seasonal/summer.csv | grep -v Tooth

#### Counting records
* Use the command `wc` (short for "word count") which prints the number of characters, words, and lines in a file.
    * choose just one of them by using `-c`, `-w`, or `-l`

In [None]:
# count the number of lines where the date is 2017-07
! grep 2017-07 seasonal/spring.csv | wc -l

#### Specify Multiple Records at Once

Wildcards:
* `?` matches a single character, so `201?.txt` will match `2017.txt` or `2018.txt`, but not `2017-01.txt`.
* `[...]` matches any one of the characters inside the square brackets, so `201[78].txt` matches `2017.txt or 2018.txt`, but not `2016.txt`.
* `{...}` matches any of the comma-separated patterns inside the curly brackets, so `{*.txt, *.csv}` matches any file whose name ends with `.txt` or `.csv`, but not files whose names end with `.pdf`.

In [None]:
# Get the first column from all of these files
! cut -d , -f 1 seasonal/winter.csv seasonal/spring.csv seasonal/summer.csv seasonal/autumn.csv

# Better to use a wildcard
! cut -d , -f 1 seasonal/*.csv

#### Sorting
As its name suggests, `sort` puts data in order. By default it does this in ascending alphabetical order, but the flags `-n` and `-r` can be used to sort numerically and reverse the order of its output, while `-b` tells it to ignore leading blanks and `-f` tells it to fold case (i.e., be case-insensitive). Pipelines often use grep to get rid of unwanted records and then sort to put the remaining records in order.

In [None]:
! cut -d , -f 2 seasonal/winter.csv | grep -v Tooth | sort -r

#### Removing duplicates

Another command that is often used with `sort` is `uniq`, whose job is to remove duplicated lines. More specifically, it removes *adjacent* duplicated lines. 

In [None]:
! cut -d , -f 2 seasonal/winter.csv | grep -v Tooth | sort | uniq -c # -c for count

#### Stop a program

The commands and scripts that you have run so far have all executed quickly, but some tasks will take minutes, hours, or even days to complete. You may also mistakenly put redirection in the middle of a pipeline, causing it to hang up. If you decide that you don't want a program to keep running, you can type `Ctrl` + `C` to end it. This is often written `^C` in Unix documentation; note that the 'c' can be lower-case.

## Chapter 4: Batch Processing

### How Shell stores information
Like other programs, the shell stores information in variables. Some of these, called environment variables, are available all the time. Environment variables' names are conventionally written in upper case, and a few of the more commonly-used ones are shown below.

Variable	| Purpose	                        | Value
------------|-----------------------------------|----------
`HOME`	    |User's home directory	            |`/home/repl`
`PWD`       |Present working directory	        |Same as `pwd` command
`SHELL` 	|Which shell program is being used	|`/bin/bash`
`USER`	    |User's ID	                        |`repl`

To get a complete list (which is quite long), you can type `set` in the shell.

#### Print variable's value
* Use `echo`

In [8]:
! echo hello DataCamp!

hello DataCamp!


In [9]:
! echo USER

USER


In [10]:
! echo $USER

MikaelaKlein


In [None]:
# Create a shell variable (cannot have spaces around the =)
! training=seasonal/summer.csv 
! echo $training

#### Repeating Commands

In [None]:
! for filetype in gif jpg png; do echo $filetype; done

Notice these things about the loop:

1. The structure is `for` ...variable... `in` ...list... ; `do` ...body... `; done`
2. The list of things the loop is to process (in our case, the words `gif`, `jpg`, and `png`).
3. The variable that keeps track of which thing the loop is currently processing (in our case, `filetype`).
4. The body of the loop that does the processing (in our case, `echo $filetype`).

Notice that the body uses `$filetype` to get the variable's value instead of just `filetype`, just like it does with any other shell variable. Also notice where the semi-colons go: the first one comes between the list and the keyword `do`, and the second comes between the body and the keyword `done`.

In [None]:
# for multiple files
! for filename in seasonal/*.csv; do echo $filename; done

In [None]:
# record the names of a set of files
! files=seasonal/*.csv
! for f in $files; do echo $f; done

In [None]:
# run many commands in a single loop
! for file in seasonal/*.csv; do head -n 2 $file | tail -n 1; done

In [None]:
# do many things in a single loop (separate them with semi-colons)
for f in seasonal/*.csv; do echo $f; head -n 2 $f | tail -n 1; done

## Chapter 5: Creating new tools

#### Editing files
Unix has a bewildering variety of text editors. For this course, we will use a simple one called Nano. If you type `nano filename`, it will open `filename` for editing (or create it if it doesn't already exist). You can move around with the arrow keys, delete characters using backspace, and do other operations with control-key combinations:

* `Ctrl + K`: delete a line.
* `Ctrl + U`: un-delete a line.
* `Ctrl + O`: save the file ('O' stands for 'output'). You will also need to press Enter to confirm the filename!
* `Ctrl + X`: exit the editor.

#### Recording Commands

When you are doing a complex analysis, you will often want to keep a record of the commands you used. You can do this with the tools you have already seen:

1. Run `history`.
2. Pipe its output to `tail -n 10` (or however many recent steps you want to save).
3. Redirect that to a file called something like `figure-5.history`.

This is better than writing things down in a lab notebook because it is guaranteed not to miss any steps. It also illustrates the central idea of the shell: simple tools that produce and consume lines of text can be combined in a wide variety of ways to solve a broad range of problems.

In [None]:
! cp seasonal/s* ~
! grep -h -v Tooth spring.csv summer.csv > temp.csv
! tail -n 3 history > steps.txt

In [None]:
# Create your own shell script
! nano dates.sh
# inside write cut -d , -f 1 seasonal/*.csv
# then run 
! bash dates.sh

#### Pass file names to scripts

A script that processes specific files is useful as a record of what you did, but one that allows you to process any files you want is more useful. To support this, you can use the special expression `$@` (dollar sign immediately followed by at-sign) to mean "all of the command-line parameters given to the script".

For example, if `unique-lines.sh` contains `sort $@ | uniq`, when you run:

> `bash unique-lines.sh seasonal/summer.csv`

the shell replaces $@ with seasonal/summer.csv and processes one file. If you run this:

> `bash unique-lines.sh seasonal/summer.csv seasonal/autumn.csv`

it processes two data files, and so on.

As a reminder, to save what you have written in Nano, type `Ctrl + O` to write the file out, then Enter to confirm the filename, then `Ctrl + X` to exit the editor.

In [None]:
! nano count-records.sh
# inside:
tail -q -n +2 $@ | wc -l
# then:
! bash count-records.sh seasonal/*.csv > num-records.out

As well as `$@`, the shell lets you use `$1`, `$2`, and so on to refer to specific command-line parameters. You can use this to write commands that feel simpler or more natural than the shell's. For example, you can create a script called `column.sh` that selects a single column from a CSV file when the user provides the filename as the first parameter and the column as the second:

> `cut -d , -f $2 $1`

and then run it using:

> `bash column.sh seasonal/autumn.csv 1`

Notice how the script uses the two parameters in reverse order.

#### Loops

Shell scripts can also contain loops. You can write them using semi-colons, or split them across lines without semi-colons to make them more readable:

`# Print the first and last data records of each file.`

`for filename in $@` 

`do`

    `head -n 2 $filename | tail -n 1`
    
    `tail -n 1 $filename`
`done`