In [0]:
# https://www.datacamp.com/courses/introduction-to-shell-for-data-science

## 1. Manipulating files and directories

**Where am I?**

To find out where you are in the filesystem, run the command pwd (short for "print working directory"). This prints the absolute path of your current working directory, which is where the shell runs commands and looks for files by default.


In [0]:
! pwd

**How can I identify files and directories?**

To find out what's there, type ls (which is short for "listing") and press the enter key. On its own, ls lists the contents of your current directory (the one displayed by pwd). If you add the names of some files, ls will list them, and if you add the names of directories, it will list their contents.

In [0]:
! ls

**How else can I identify files and directories?**

The shell decides if a path is absolute or relative by looking at its first character: if it begins with /, it is absolute, and if it doesn't, it is relative.

1. You are in /home/repl. Use ls with a relative path to list the file /home/repl/course.txt (and only that file).

In [0]:
! ls course.txt 

2. You are in /home/repl. Use ls with a relative path to list the file /home/repl/seasonal/summer.csv (and only that file).

In [0]:
! ls seasonal/summer.csv

3. You are in /home/repl. Use ls with a relative path to list the contents of the directory /home/repl/people.

In [0]:
! ls people

**How can I move to another directory?**

Just as you can move around in a file browser by double-clicking on folders, you can move around in the filesystem using the command cd (which stands for "change directory").

1. You are in /home/repl/. Change directory to /home/repl/seasonal using a relative path.

In [0]:
! cd seasonal 

2. Use pwd to check that you're there.

In [0]:
! pwd

3. Use ls without any paths to see what's in that directory.

In [0]:
! ls

**How can I move up a directory?**

The parent of a directory is the directory above it. 

. You can always give the absolute path of your parent directory to commands like cd and ls. More often, though, you will take advantage of the fact that the special path .. (two dots with no spaces) means "the directory above the one I'm currently in". 

A single dot on its own, ., always means "the current directory", so ls on its own and ls . do the same thing, while cd . has no effect (because it moves you into the directory you're currently in).

One final special path is ~ (the tilde character), which means "your home directory", such as /home/repl. No matter where you are, ls ~ will always list the contents of your home directory, and cd ~ will always take you home.

In [0]:
! cd ~/../. # The path means 'home directory', 'up a level', 'here'.

**How can I copy files?**

You will often want to copy files, move them into other directories to organize them, or rename them. One command to do this is cp, which is short for "copy".

1. Make a copy of seasonal/summer.csv in the backup directory (which is also in /home/repl), calling the new file summer.bck.

In [0]:
! cp seasonal/summer.csv backup/summer.bck

2. Copy spring.csv and summer.csv from the seasonal directory into the backup directory without changing your current working directory (/home/repl).

In [0]:
! cp seasonal/spring.csv seasonal/summer.csv backup

**How can I move a file?**

While cp copies a file, mv moves it from one directory to another, just as if you had dragged it in a graphical file browser. 

- You are in /home/repl, which has sub-directories seasonal and backup. Using a single command, move spring.csv and summer.csv from seasonal to backup.

In [0]:
! mv seasonal/spring.csv seasonal/summer.csv backup

**How can I rename files?**

mv can also be used to rename files. If you run:

mv course.txt old-course.txt
then the file course.txt in the current working directory is "moved" to the file old-course.txt. 

*One warning: just like cp, mv will overwrite existing files. If, for example, you already have a file called old-course.txt, then the command shown above will replace it with whatever is in course.txt.*

1. Go into the seasonal directory.

In [0]:
! cd seasonal

2. Rename the file winter.csv to be winter.csv.bck.

In [0]:
! mv winter.csv winter.csv.bck

3. Run ls to check that everything has worked.

In [0]:
! ls

**How can I delete files?**

We can copy files and move them around; to delete them, we use rm, which stands for "remove". As with cp and mv, you can give rm the names of as many files as you'd like.

1. You are in /home/repl. Go into the seasonal directory.
2. Remove autumn.csv.
3. Go back to your home directory.
4. Remove seasonal/summer.csv without changing directories again.

In [0]:
! cd seasonal

In [0]:
! rm autumn.csv

In [0]:
! cd .. # or cd ~

In [0]:
! rm seasonal/summer.csv

**How can I create and delete directories?**

mv treats directories the same way it treats files: if you are in your home directory and run mv seasonal by-season, for example, mv changes the name of the seasonal directory to by-season. However, rm works differently.

If you try to rm a directory, the shell prints an error message telling you it can't do that, primarily to stop you from accidentally deleting an entire directory full of work. Instead, you can use a separate command called rmdir. For added safety, it only works when the directory is empty, so you must delete the files in a directory before you delete the directory.

1. Without changing directories, delete the file agarwal.txt in the people directory.

In [0]:
! rm people/agarwal.txt 

2. Now that the people directory is empty, use a single command to delete it.

In [0]:
! rmdir people

3. Since a directory is not a file, you must use the command mkdir directory_name to create a new (empty) directory. Use this command to create a new directory called yearly below your home directory.

In [0]:
! mkdir yearly

4. Now that yearly exists, create another directory called 2017 inside it without leaving your home directory.

In [0]:
! mkdir yearly/2017 

**Wrapping up**

You will often create intermediate files when analyzing data. Rather than storing them in your home directory, you can put them in /tmp, which is where people and programs often keep files they only need briefly. (Note that /tmp is immediately below the root directory /, not below your home directory.) 

1. Use cd to go into /tmp
2. List the contents of /tmp without typing a directory name
3. Make a new directory inside /tmp called scratch.
4. Move /home/repl/people/agarwal.txt into /tmp/scratch. We suggest you use the ~ shortcut for your home directory and a relative path for the second rather than the absolute path.

In [0]:
! cd /tmp

In [0]:
! ls

In [0]:
! mkdir scratch

In [0]:
! mv ~/people/agarwal.txt scratch

## 2. Manipulating data

**How can I view a file's contents?**

Before you rename or delete files, you may want to have a look at their contents. The simplest way to do this is with cat, which just prints the contents of files onto the screen. (Its name is short for "concatenate", meaning "to link things together", since it will print all the files whose names you give it, one after the other.)

- Print the contents of course.txt to the screen.

In [0]:
! cat course.txt

**How can I view a file's contents piece by piece?**

You can use cat to print large files and then scroll through the output, but it is usually more convenient to page the output. The original command for doing this was called more, but it has been superseded by a more powerful command called less.



*   Use less seasonal/spring.csv seasonal/summer.csv to view those two files in that order. Press spacebar to page down, :n to go to the second file, and :q to quit.





In [0]:
! less seasonal/spring.csv seasonal/summer.csv

**How can I look at the start of a file?**

 A quick way to figure out what it contains is to look at the first few rows.

We can do this in the shell using a command called head. As its name suggests, it prints the first few lines of a file (where "a few" means 10).

In [0]:
! head seasonal/autumn.csv

**How can I type less?**

One of the shell's power tools is tab completion. If you start typing the name of a file and then **press the tab key**, the shell will do its best to auto-complete the path. 

**How can I control what commands do?**

A flag's name usually indicates its purpose (for example, -n is meant to signal "number of lines"). Command flags don't have to be a - followed by a single letter, but it's a widely-used convention.

- Display the first 5 lines of winter.csv in the seasonal directory.

In [0]:
! head -n 5 seasonal/winter.csv

**How can I list everything below a directory?**

In order to see everything underneath a directory, no matter how deeply nested it is, you can give ls the flag -R (which means "recursive"). 

To help you know what is what, ls has another flag -F that prints a / after the name of every directory and a * after the name of every runnable program. Run ls with the two flags, -R and -F, and the absolute path to your home directory to see everything it contains. (The order of the flags doesn't matter, but the directory name must come last.)

In [0]:
! ls -R -F ~

**How can I get help for a command?**

To find out what commands do, people used to use the man command (short for "manual").

1. Read the manual page for the tail command to find out what putting a + sign in front of the number used with the -n flag does.
2. Use tail with the flag -n +7 to display all but the first six lines of seasonal/spring.csv.

In [0]:
! man tail +7

In [0]:
! tail -n +7 seasonal/spring.csv

**How can I select columns from a file?**

head and tail let you select rows from a text file. If you want to select columns, you can use the command cut. It has several options (use man cut to explore them), but the most common is something like:

 cut -f 2-5,8 -d , values.csv


which means "select columns 2 through 5 and columns 8, using comma as the separator". cut uses -f (meaning "fields") to specify columns and -d (meaning "delimiter") to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns.

**What can't cut do?**

cut is a simple-minded command. In particular, it doesn't understand quoted strings. 

**How can I repeat commands?**

One of the biggest advantages of using the shell is that it makes it easy for you to do things over again. If you run some commands, you can then press the up-arrow key to cycle back through them. You can also use the left and right arrow keys and the delete key to edit them. Pressing return will then run the modified command.

Even better, history will print a list of commands you have run recently. Each one is preceded by a serial number to make it easy to re-run particular commands: just type !55 to re-run the 55th command in your history (if you have that many). You can also re-run a command by typing an exclamation mark followed by the command's name, such as !head or !cut, which will re-run the most recent use of that command.

1. Run head summer.csv in your home directory (which should fail).
2. Change directory to seasonal.
3. Re-run the head command with !head.
4. Use history to look at what you have done.
5. Re-run head again using ! followed by a command number.

In [0]:
! cd seasonal

In [0]:
! !head

In [0]:
! history

In [0]:
! !5

**How can I select lines containing specific values?**

head and tail select rows, cut selects columns, and grep selects lines according to what they contain. In its simplest form, grep takes a piece of text followed by one or more filenames and prints all of the lines in those files that contain that text. For example, grep bicuspid seasonal/winter.csv prints lines from winter.csv that contain "bicuspid".

grep can search for patterns as well; we will explore those in the next course. What's more important right now is some of grep's more common flags:

- c: print a count of matching lines rather than the lines themselves
-h: do not print the names of files when searching multiple files
-i: ignore case (e.g., treat "Regression" and "regression" as matches)
-l: print the names of files that contain matches, not the matches
-n: print line numbers for matching lines
-v: invert the match, i.e., only show lines that don't match

1. Print the contents of all of the lines containing the word molar in seasonal/autumn.csv by running a single command while in your home directory. Don't use any flags.

In [0]:
! grep molar seasonal/autumn.csv

2. Invert the match to find all of the lines that don't contain the word molar in seasonal/spring.csv, and show their line numbers. Remember, it's considered good style to put all of the flags before other values like filenames or the search term "molar".

In [0]:
! grep -v -n molar seasonal/spring.csv 

3. Count how many lines contain the word incisor in autumn.csv and winter.csv combined. (Again, run a single command from your home directory.)

In [0]:
! grep -c incisor seasonal/autumn.csv seasonal/winter.csv

**Why isn't it always safe to treat data as text?**

The SEE ALSO section of the manual page for cut refers to a command called paste that can be used to combine data files instead of cutting them up.

Read the manual page for paste, and then run paste to combine the autumn and winter data files in a single table using a comma as a separator. What's wrong with the output from a data analysis point of view?

Ans: The last few rows have the wrong number of columns, because joining the lines with columns creates only one empty column at the start, not two.

## 3. Combining tools

**How can I store a command's output in a file?**

All of the tools you have seen so far let you name input files. Most don't have an option for naming an output file because they don't need one. Instead, you can use redirection to save any command's output anywhere you want. 

If you run this command:

> $ head -n 5 seasonal/summer.csv > top.csv

nothing appears on the screen. Instead, head's output is put in a new file called top.csv. The greater-than sign > tells the shell to redirect head's output to a file. It isn't part of the head command; instead, it works with every shell command that produces output.



- Combine tail with redirection to save the last 5 lines of seasonal/winter.csv in a file called last.csv.

In [0]:
! tail -n 5 seasonal/winter.csv > last.csv

**How can I use a comman's output as an input?**

Suppose you want to get lines from the middle of a file. More specifically, suppose you want to get lines 3-5 from one of our data files. You can start by using head to get the first 5 lines and redirect that to a file, and then use tail to select the last 3:

$ head -n 5 seasonal/winter.csv > top.csv

$ tail -n 3 top.csv

A quick check confirms that this is lines 3-5 of our original file, because it is the last 3 lines of the first 5.

1. Select the last two lines from seasonal/winter.csv and save them in a file called bottom.csv.
2. Select the first line from bottom.csv in order to get the second-to-last line of the original file.

In [0]:
! tail -n 2 seasonal/winter.csv > bottom.csv

In [0]:
! head -n 1 bottom.csv

**What's a better way to combine commands?**

Using redirection to combine commands has two drawbacks:

1. It leaves a lot of intermediate files lying around (like top.csv).
2. The commands to produce your final result are scattered across several lines of history.

Instead of sending head's output to a file, add a vertical bar and the tail command without a filename:

$ head -n 5 seasonal/summer.csv | tail -n 3

The pipe symbol tells the shell to use the output of the command on the left as the input to the command on the right.

- Use cut to select all of the tooth names from column 2 of the comma delimited file seasonal/summer.csv, then pipe the result to grep, with an inverted match, to exclude the header line containing the word "Tooth". 

In [0]:
! cut -d, -f 2 seasonal/summer.csv | grep -v Tooth

**How can I combine many commands?**

You can chain any number of commands together. For example, this command:

> $ cut -d , -f 1 seasonal/spring.csv | grep -v Date | head -n 10


1. select the first column from the spring data;
2. remove the header line containing the word "Date"; and
3. select the first 10 lines of actual data.

- In the previous exercise, you used the following command to select all the tooth names from column 2 of seasonal/summer.csv:

> $ cut -d , -f 2 seasonal/summer.csv | grep -v Tooth


Extend this pipeline with a head command to only select the very first tooth name.

In [0]:
! cut -d , -f 2 seasonal/summer.csv | grep -v Tooth | head -n 1

**How can I count the records in a file?**

The command wc (short for "word count") prints the number of characters, words, and lines in a file. You can make it print only one of these using -c, -w, or -l respectively.

- Count how many records in seasonal/spring.csv have dates in July 2017. To do this, use grep with a partial date to select the lines and pipe this result into wc with an appropriate flag to count the lines.

In [0]:
! grep 2017-07 seasonal/spring.csv | wc -l

**How can I specify many files at once?**

Most shell commands will work on multiple files if you give them multiple filenames.

To make your life better, the shell allows you to use wildcards to specify a list of files with a single expression. The most common wildcard is * , which means "match zero or more characters". Using it, we can shorten the cut command above to this:

> $ cut -d , -f 1 seasonal/*
or:

> $ cut -d , -f 1 seasonal/*.csv

- Write a single command using head to get the first three lines from both seasonal/spring.csv and seasonal/summer.csv, a total of six lines of data, but not from the autumn or winter data files. Use a wildcard instead of spelling out the files' names in full.

In [0]:
! head -n 3 seasonal/s*

**What other wildcards can I use?**

The shell has other wildcards as well, though they are less commonly used:

- ? matches a single character, so 201?.txt will match 2017.txt or 2018.txt, but not 2017-01.txt.
- [...] matches any one of the characters inside the square brackets, so 201[78].txt matches 2017.txt or 2018.txt, but not 2016.txt.
- {...} matches any of the comma-separated patterns inside the curly brackets, so {*.txt, *.csv} matches any file whose name ends with .txt or .csv, but not files whose names end with .pdf.

**How can I sort lines of text?**

As its name suggests, sort puts data in order. By default it does this in ascending alphabetical order, but the flags -n and -r can be used to sort numerically and reverse the order of its output, while -b tells it to ignore leading blanks and -f tells it to fold case (i.e., be case-insensitive). Pipelines often use grep to get rid of unwanted records and then sort to put the remaining records in order.

- Remember the combination of cut and grep to select all the tooth names from column 2 of seasonal/summer.csv?

> $ cut -d , -f 2 seasonal/summer.csv | grep -v Tooth


Starting from this recipe, sort the names of the teeth in seasonal/winter.csv (not summer.csv) in descending alphabetical order. To do this, extend the pipeline with a sort step.

In [0]:
! cut -d , -f 2 seasonal/winter.csv | grep -v Tooth | sort -r

**How can I remove duplicate lines?**

Another command that is often used with sort is uniq, whose job is to remove duplicated lines. More specifically, it removes *adjacent* duplicated lines. 

Write a pipeline to:

- get the second column from seasonal/winter.csv,
- remove the word "Tooth" from the output so that only tooth names are displayed,
- sort the output so that all occurrences of a particular tooth name are adjacent; and
- display each tooth name once along with a count of how often it occurs.


The start of your pipeline is the same as the previous exercise:

> $ cut -d , -f 2 seasonal/winter.csv | grep -v Tooth

Extend it with a sort command, and use uniq -c to display unique lines with a count of how often each occurs rather than using uniq and wc.

In [0]:
! cut -d , -f 2 seasonal/winter.csv | grep -v Tooth | sort | uniq -c

**How can I save the output of a pipe?**

The shell lets us redirect the output of a sequence of piped commands:

> $ cut -d , -f 2 seasonal/*.csv | grep -v Tooth > teeth-only.txt

However, > must appear at the end of the pipeline.

**How can I stop a running program?**

The commands and scripts that you have run so far have all executed quickly, but some tasks will take minutes, hours, or even days to complete. You may also mistakenly put redirection in the middle of a pipeline, causing it to hang up. If you decide that you don't want a program to keep running, you can type Ctrl + C to end it. This is often written ^C in Unix documentation; note that the 'c' can be lower-case.

**Wrapping up**

To wrap up, you will build a pipeline to find out how many records are in the shortest of the seasonal data files.

1. Use wc with appropriate parameters to list the number of lines in all of the seasonal data files. (Use a wildcard for the filenames instead of typing them all in by hand.)

2. Add another command to the previous one using a pipe to remove the line containing the word "total".

3. Add two more stages to the pipeline that use sort -n and head -n 1 to find the file containing the fewest lines.

In [0]:
! wc -l seasonal/*

In [0]:
! wc -l seasonal/*.csv | grep -v total

In [0]:
! wc -l seasonal/*.csv | grep -v total | sort -n | head -n 1

## 4. Batch processing

**How does the shell store information?**

Like other programs, the shell stores information in variables. Some of these, called environment variables, are available all the time. Environment variables' names are conventionally written in upper case, and a few of the more commonly-used ones are shown below.

>Variable | Purpose | Value
>--- | --- | ---
>HOME  | User's home directory | /home/repl
>PWD  | Present working directory | Same as pwd command
>SHELL | Which shell program is being used | /bin/bash
>USER | User's ID | repl

To get a complete list (which is quite long), you can type set in the shell.



- Use set and grep with a pipe to display the value of HISTFILESIZE, which determines how many old commands are stored in your command history. What is its value?

In [0]:
! set | grep HISTFILESIZE # 2000

**How can I print a variable's value?**

A simpler way to find a variable's value is to use a command called echo, which prints its arguments. Typing

> $ echo hello DataCamp!
prints

> $ hello DataCamp!
If you try to use it to print a variable's value like this:

> $ echo USER
it will print the variable's name, USER.

To get the variable's value, you must put a dollar sign $ in front of it. Typing

> ! echo $USER

prints

>  $ repl


This is true everywhere: to get the value of a variable called X, you must write $X. (This is so that the shell can tell whether you mean "a file named X" or "the value of a variable named X".)

- The variable OSTYPE holds the name of the kind of operating system you are using. Display its value using echo.

In [0]:
! echo $OSTYPE

**How else does the shell store information?**

The other kind of variable is called a shell variable, which is like a local variable in a programming language.

To create a shell variable, you simply assign a value to a name:

> $ training=seasonal/summer.csv

without any spaces before or after the = sign. Once you have done this, you can check the variable's value with:

> ! echo $training

1. Define a variable called testing with the value seasonal/winter.csv.
2. Use head -n 1 SOMETHING to get the first line from seasonal/winter.csv using the value of the variable testing instead of the name of the file.

In [0]:
! testing=seasonal/winter.csv

In [0]:
! head -n 1 $testing

**How can I repeat a command many times?**

Shell variables are also used in loops, which repeat commands many times. If we run this command:

> ! for filetype in gif jpg png; do echo $filetype; done

it produces:

> gif
> jpg
> png

Notice these things about the loop:

1. The structure is for ...variable... in ...list... ; do ...body... ; done
2. The list of things the loop is to process (in our case, the words gif, jpg, and png).
3. The variable that keeps track of which thing the loop is currently processing (in our case, filetype).
4. The body of the loop that does the processing (in our case, echo $filetype).

Notice that the body uses $filetype to get the variable's value instead of just filetype, just like it does with any other shell variable. Also notice where the semi-colons go: the first one comes between the list and the keyword do, and the second comes between the body and the keyword done.

- Modify the loop so that it prints:

> docx
> odt
> pdf

Please use filetype as the name of the loop variable.

In [0]:
! for filetype in docx odt pdf; do echo $filetype; done

**How can I repeat a command once for each file?**

You can always type in the names of the files you want to process when writing the loop, but it's usually better to use wildcards. Try running this loop in the console:

> !  for filename in seasonal/* .csv; do echo $filename; done

It prints:

> seasonal/autumn.csv

> seasonal/spring.csv

>seasonal/summer.csv

> seasonal/winter.csv


because the shell expands seasonal/*.csv to be a list of four filenames before it runs the loop.

- Modify the wildcard expression to people/* so that the loop prints the names of the files in the people directory regardless of what suffix they do or don't have. Please use filename as the name of your loop variable.

In [0]:
! for filename in people/*; do echo $filename; done

**How can I record the names of a set of files?**

People often set a variable using a wildcard expression to record a list of filenames. For example, if you define datasets like this:

> $ datasets=seasonal/*.csv

you can display the files' names later using:

> ! for filename in  $datasets;

> do echo $filename; done


This saves typing and makes errors less likely.

**A variable's name versus its value**

A common mistake is to forget to use $ before the name of a variable. When you do this, the shell uses the name you have typed rather than the value of that variable.

**How can I run many commands in a single loop?**

Printing filenames is useful for debugging, but the real purpose of loops is to do things with multiple files. This loop prints the second line of each data file:

> ! for file in seasonal/* .csv; do head -n 2 $file | tail -n 1; done


It has the same structure as the other loops you have already seen: all that's different is that its body is a pipeline of two commands instead of a single command.

- Write a loop that produces the same output as

> $ grep -h 2017-07 seasonal/*.csv


but uses a loop to process each file separately. Please use file as the name of the loop variable, and remember that the -h flag used above tells grep not to print filenames in the output.

In [0]:
! for file in seasonal/*.csv; do grep -h 2017-07 $file; done

**Why shouldn't I use spaces in filenames?**

It's easy and sensible to give files multi-word names like July 2017.csv when you are using a graphical file explorer. However, this causes problems when you are working in the shell. Instead, you have to quote the files' names so that the shell treats each one as a single parameter

**How can I do many things in a single loop?**

The loops you have seen so far all have a single command or pipeline in their body, but a loop can contain any number of commands. To tell the shell where one ends and the next begins, you must separate them with semi-colons:

> ! for f in seasonal/*.csv;

> do echo $f;

> head -n 2 $f | tail -n 1;

> done


## 5. Creating new tools

**How can I edit a file?**

Unix has a bewildering variety of text editors. For this course, we will use a simple one called Nano. If you type nano filename, it will open filename for editing (or create it if it doesn't already exist). You can move around with the arrow keys, delete characters using backspace, and do other operations with control-key combinations:

- Ctrl + K: delete a line.
- Ctrl + U: un-delete a line.
- Ctrl + O: save the file ('O' stands for 'output').
- Ctrl + X: exit the editor.

**Instructions:**

Run nano names.txt to edit a new file in your home directory and enter the following four lines:

> Lovelace

> Hopper

> Johnson

> Wilson

To save what you have written, type Ctrl + O to write the file out, then Enter to confirm the filename, then Ctrl + X and Enter to exit the editor.

In [0]:
! nano names.txt

In [0]:
! Lovelace

Hopper

Johnson

Wilson

**How can I record what I just did?**

When you are doing a complex analysis, you will often want to keep a record of the commands you used. You can do this with the tools you have already seen:

1. Run history.
2. Pipe its output to tail -n 10 (or however many recent steps you want to save).
3. Redirect that to a file called something like figure-5.history.

**Instructions**
1. Copy the files seasonal/spring.csv and seasonal/summer.csv to your home directory.
2. Use grep with the -h flag (to stop it from printing filenames) and -v Tooth (to select lines that don't match the header line) to select the data records from spring.csv and summer.csv in that order and redirect the output to temp.csv.
3. Pipe history into tail -n 3 and redirect the output to steps.txt to save the last three commands in a file. (You need to save three instead of just two because the history command itself will be in the list.)

In [0]:
! cp seasonal/spring.csv seasonal/summer.csv ~

In [0]:
! grep -h -v Tooth spring.csv summer.csv > temp.csv

In [0]:
! history | tail -n 3 > steps.txt

**How can I save commands to re-run later?**

You have been using the shell interactively so far. But since the commands you type in are just text, you can store them in files for the shell to run over and over again. To start exploring this powerful capability, put the following command in a file called headers.sh:

> $ head -n 1 seasonal/*.csv


This command selects the first row from each of the CSV files in the seasonal directory. Once you have created this file, you can run it by typing:

> $ bash headers.sh


This tells the shell (which is just a program called bash) to run the commands contained in the file headers.sh, which produces the same output as running the commands directly.

**Instructions:**

1. Use nano dates.sh to create a file called dates.sh that contains this command:

> $ cut -d , -f 1 seasonal/*.csv

to extract the first column from all of the CSV files in seasonal.

2. 

In [0]:
! nano dates.sh

In [0]:
! cut -d , -f 1 seasonal/*.csv

In [0]:
! bash dates.sh

**How can I re-use pipes?**

A file full of shell commands is called a * shell script, or sometimes just a "script" for short. Scripts don't have to have names ending in .sh, but this lesson will use that convention to help you keep track of which files are scripts.

Scripts can also contain pipes. For example, if all-dates.sh contains this line:

> $ cut -d , -f 1 seasonal/*.csv | grep -v Date | sort | uniq

then:

> $ bash all-dates.sh > dates.out

will extract the unique dates from the seasonal data files and save them in dates.out.

**Instructions:**

1. A file teeth.sh in your home directory has been prepared for you, but contains some blanks. Use Nano to edit the file and replace the two _ _ _ _ placeholders with seasonal/* .csv and -c so that this script prints a count of the number of times each tooth name appears in the CSV files in the seasonal directory.

In [0]:
! nano teeth.sh

In [0]:
! cut -d , -f 2 seasonal/*.csv | grep -v Tooth | sort | uniq -c

In [0]:
! bash teeth.sh > teeth.out

In [0]:
! cat teeth.out

**How can I pass filenames to scripts?**

A script that processes specific files is useful as a record of what you did, but one that allows you to process any files you want is more useful. To support this, you can use the special expression $@ (dollar sign immediately followed by at-sign) to mean "all of the command-line parameters given to the script". For example, if unique-lines.sh contains this:

> ! sort $@ | uniq

then when you run:

> ! bash unique-lines.sh seasonal/summer.csv

the shell replaces $@ with seasonal/summer.csv and processes one file. If you run this:

> ! bash unique-lines.sh seasonal/summer.csv seasonal/autumn.csv

it processes two data files, and so on.

1. Edit the script count-records.sh with Nano and fill in the two _ _ _ _ placeholders with $@ and -l respectively so that it counts the number of lines in one or more files, excluding the first line of each.

In [0]:
! nano count-records.sh

In [0]:
! tail -1 -n +2 $@ | wc -1

2. Run count-records.sh on seasonal/*.csv and redirect the output to num-records.out using >.

In [0]:
! bash count-records.sh seasonal/*.csv > num-records.out 

**How can one shell script do many things?**

Our shells scripts so far have had a single command or pipe, but a script can contain many lines of commands. For example, you can create one that tells you how many records are in the shortest and longest of your data files, i.e., the range of your datasets' lengths.

Note that in Nano, "copy and paste" is achieved by navigating to the line you want to copy, pressing CTRL + K to cut the line, then CTRL + U twice to paste two copies of it.

In [0]:
! nano range.sh

In [0]:
! wc -1 $@ | grep -v total

In [0]:
! nano range.sh

In [0]:
! wc -1 $@ | grep -v total | sort -n | head -n 1 

In [0]:
! nano range.sh

In [0]:
! wc -1 $@ | grep -v total | sort -n -r | head -n 1 

In [0]:
! bash range.sh seasonal/*.csv > range.out 

**How can I write loops in a shell script?**

Shell scripts can also contain loops. You can write them using semi-colons, or split them across lines without semi-colons to make them more readable:

-  Print the first and last data records of each file.
> for filename in $ @

> do
>>    head -n 2 $ filename | tail -n 1
 
 >> tail -n 1 $f ilename
 
>done

(You don't have to indent the commands inside the loop, but doing so makes things clearer.)

The first line of this script is a comment to tell readers what the script does. Comments start with the # character and run to the end of the line. Your future self will thank you for adding brief explanations like the one shown here to every script you write.

**Instructions**

1. Fill in the placeholders in the script date-range.sh with $filename (twice), head, and tail so that it prints the first and last date from one or more files.

In [0]:
! nano date-range.sh

In [0]:
for filename in $@
do
  cut -d , -f 1 $filename | grep -v Date | sort | head -n 1
  cut -d , -f l $filename | grep -v Date | sort | tail -n 1
done  

2. Run date-range.sh on all four of the seasonal data files using seasonal/*.csv to match their names.

In [0]:
! bash date-range.sh seasonal/*.csv 

3. Run date-range.sh on all four of the seasonal data files using seasonal/*.csv to match their names, and pipe its output to sort to see that your scripts can be used just like Unix's built-in commands.

In [0]:
! bash date-range.sh seasonal/*.csv | sort