### Intro to Software Carpentry: http://swcarpentry.github.io/slideshows/introducing-software-carpentry/index.html#slide-0

You need to download some files to follow this lesson:

* Download <b>shell-novice-data.zip</b> and move the file to your Desktop.
* Unzip/extract the file (ask your instructor if you need help with this step). You should end up with a new folder called data-shell on your Desktop. 
* Create a new folder on your desktop called <b>SIO-SWC</b>
* Move the <b> data-shell folder into the sio-swc folder </b>.
* Open a terminal and type:


In [7]:
cd



#### In the lesson, you will find out how to access the data in this folder.

check to make sure data files have been installed to \\workshop\data-shell\

Introducing the shell
* The shell is a <b>command line interface</b>. 
* The Shell uses text commands instead of a graphical interface.  
* The heart of a <b>CLI is a read-evaluate-print loop, or REPL </b>: 
* when the user types a command and then presses the enter (or return) key, the computer <b>reads it, executes it, and prints its output</b>. The user then types another command, and so on until the user logs off.

* Commands are used in a program called a <b>command shell</b>. What your type goes into the shell, which then figures out what commands to run and orders the computer to execute them. Note, the shell is called the shell because it encloses the operating system in order to hide some of its complexity and make it simpler to interact with.

* Importance of knowing how to use the Unix command shell the command line is often the easiest way to interact with remote machines and supercomputers. Familiarity with the shell is near essential to run a variety of specialised tools and resources including high-performance computing systems. As clusters and cloud computing systems become more popular for scientific data crunching, being able to interact with them is becoming a necessary skill.

* Introduce Researcher Data: Nelle’s pipeline, Researcher data set and files used 


<b>Introduce Researcher Data: Nelle’s pipeline </b>
Researcher data set and files used 

#### Nelle’s Pipeline: Starting Point
Nelle Nemo, a marine biologist, has just returned from a six-month survey of the North Pacific Gyre, where she has been sampling gelatinous marine life in the Great Pacific Garbage Patch. She has 300 samples in all, and now needs to:
* Run each sample through an assay machine that will measure the relative abundance of 300 different proteins. The machine’s output for a single sample is a file with one line for each protein.
* Calculate statistics for each of the proteins separately using a program her supervisor wrote called <b>goostat</b>.
* Compare the statistics for each protein with corresponding statistics for each other protein using a program one of the other graduate students wrote called goodiff.
* Write up results. Her supervisor would really like her to do this by the end of the month so that her paper can appear in an upcoming special issue of Aquatic Goo Letters.

It takes about half an hour for the assay machine to process each sample. The good news is that it only takes two minutes to set each one up. Since her lab has eight assay machines that she can use in parallel, this step will “only” take about two weeks.

The bad news is that if she has to run goostat and goodiff by hand, she’ll have to enter filenames and click “OK” 45,150 times (300 runs ofgoostat, plus 300*299/2 (half of 300 times 299) runs of goodiff). At 30 seconds each, that will take more than two weeks. Not only would she miss her paper deadline, the chances of her typing all of those commands right are practically zero.

The next few lessons will explore what she should do instead. More specifically, they explain how she can use a command shell to automate the repetitive steps in her processing pipeline so that her computer can work 24 hours a day while she writes her paper. As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.

In [8]:
pwd

/Users/rotsuji


### Lesson 1:  Files and Directories Objectives
* Explain the similarities and differences between a file and a directory.
* Translate an absolute path into a relative path and vice versa.
* Construct absolute and relative paths that identify specific files and directories.
* Explain the steps in the shell’s read-run-print cycle.
* Identify the actual command, flags, and filenames in a command-line call.
* Demonstrate the use of tab completion, and explain its advantages.


#### Lesson starts here:

* The part of the operating system responsible for managing files and directories is called the file system. 

* FS organizes our data into files, which hold information, and directories (also called “folders”), which hold files or other directories.

* There are several commands are frequently used to create, inspect, rename, and delete files and directories. 

* let’s take a look at the shell window and start exploring:


Fist looking at the shell:
** $ **

* the dollar sign is a ** prompt **. All shell commands are typed after the prompt.  
	
Let’s start by typing: 


In [9]:
whoami

rotsuji


the Output is the ID of the current user, shows us who the shell thinks we are.

what’s happening:
* finds a program called ** whoami **,
* runs that program,
* displays that program’s output, then
* displays a new prompt to tell us that it’s ready for more commands.

Next, let’s find out where we are by running a command called:


In [10]:
pwd

/Users/rotsuji


Or <b>“print working directory”</b> which shows your current working directory. 

The output you see is your **home directory.**


Note: The directory path will look different on Windows ** Git Bash type example /c/Users/Reid or C:\documents and settings\Reid **

#### Explain home directory:


To understand what a “home directory” is, let’s have a look at how the file system as a whole is organized.

* show file system image  http://swcarpentry.github.io/shell-novice/fig/filesystem.svg

At the top is the **root directory** that holds everything else. We refer to it using a slash character / on its own; this is the leading slash in/Users/nelle.

After this illustration, you’ll be learning commands to explore your own filesystem, which will be constructed in a similar way, but not be exactly identical.

Inside that directory are several other directories: bin (which is where some built-in programs are stored), data (for miscellaneous data files), Users (where users’ personal directories are located), tmp (for temporary files that don’t need to be stored long-term), and so on.

We know that our current working directory /Users/nelle is stored inside /Users because /Users is the first part of its name. Similarly, we know that /Users is stored inside the root directory / because its name begins with /.



Typically, when you open a new command prompt you will be in your home directory to start.

Now let’s learn the command that will let us see the contents of our own filesystem. We can see what’s in our home directory by running ls, which stands for “listing”:


In [12]:
ls

Adlm				anaconda
Applications			backup-rstudio-desktop
Applications (Parallels)	data
Conferencing			data-carpentry
Creative Cloud Files		git_test
Desktop				libswc
Documents			my_project
Downloads			myprojects
Google Drive			pebble-dev
Google Drive MBPRO		planets
LibCarp-lesson-two.docx		planets-test
Library				sd-workshop
Movies				sio-swc
Music				stickies 5_2016.txt
My_R_projects			swCarpentry-shell-data
Pictures			swc_files
Public				to-do stickie.txt
README.md			todo - stickies .txt
USERNAME.github.com		vagrant_getting_started
VirtualBox VMs			vagrant_vms


**ls** prints the names of the files and directories in the current directory in alphabetical order, arranged neatly into columns. 

We can make its output more comprehensible by using the **flag -F**, which tells **ls to add a trailing / to the names of directories:**

In [13]:
ls -F

Adlm/				anaconda/
Applications/			backup-rstudio-desktop/
Applications (Parallels)/	data/
Conferencing/			data-carpentry/
Creative Cloud Files/		git_test/
Desktop/			libswc/
Documents/			my_project/
Downloads/			myprojects/
Google Drive/			pebble-dev/
Google Drive MBPRO/		planets/
LibCarp-lesson-two.docx		planets-test/
Library/			sd-workshop/
Movies/				sio-swc/
Music/				stickies 5_2016.txt
My_R_projects/			swCarpentry-shell-data/
Pictures/			swc_files/
Public/				to-do stickie.txt
README.md			todo - stickies .txt
USERNAME.github.com/		vagrant_getting_started/
VirtualBox VMs/			vagrant_vms/


The forward slash after file names show directories that contain sub-directories

We can also use **ls** to see the contents of a different directory. 

Let’s take a look at our Desktop directory by running **ls -F Desktop**, i.e., the command ls with the **arguments -F and Desktop.** The second argument — the one without a leading dash — tells ls that we want a listing of something other than our current working directory:


**ls** has lots of other options. To find out what they are, we can type:

In [1]:
ls --help

ls: illegal option -- -
usage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]


Introduce **man** help pages.

Note: Windows Gitbash does not support man. use web search for information.

To navigate through the man pages, you may use the **up and down arrow keys** to move line-by-line, or try the **“b” and spacebar keys to skip up and down by full page.** 

**Quit the man pages by typing “q”.**


In [18]:
ls -F Desktop

ANSM presentation FINAL Copies/
ConnectingtoLibraryserversinOSX.pdf
DSC_1742.JPG*
DSC_7135.JPG*
GoogleRefineCheatSheets.pdf
Introduction-to-OpenRefine-handout-CC-BY.pdf
LC-data/
Launcher.app@
MSFTFreeEbooks.txt
MyPDF.PDF
OpoenRefinetutorial.pdf
ParallelsDesktop-12.0.0-41273.dmg
Screen Shot 2016-09-07 at 2.15.01 PM.png
Screen Shot 2016-09-07 at 2.16.48 PM.png
Screen Shot 2016-09-07 at 2.25.50 PM.png
Screen Shot 2016-09-07 at 2.27.50 PM.png
Software carpentry stuff/
Travel Funding Request Form Calc_Rev2014.02.21.pdf
Ubuntu Linux 14.04 Desktop@
Unix Shell Lesson Images/
blender tutorial/
desktop items for review/
desktop stuff 5-2016/
libcarp-data-notes/
libcarp-data-notes - clean/
lifecycle__intestinal_parasites_1.png
lifecycle__intestinal_parasites_2.png
otsuji_reid.pdf
remy_on_trike.jpg
sio-swc/
slice.avi
workshop/
workshop_old/


Your output should be a list of all the files and sub-directories on your Desktop, including the workshop/data-shell directory you setup before the lesson. Take a look at your Desktop to confirm that your output is accurate.

As you may now see, using a bash shell is strongly dependent on the idea that your files are organized in an hierarchical file system.

Now that we know the data-shell directory is located on our Desktop in the workshop folder, we can do two things.
First, we can look at its contents, using the same strategy as before, passing a directory name to ls:

In [4]:
pwd

/Users/rotsuji


In [5]:
ls -F Desktop/sio-swc/data-shell

Desktop/		molecules/		pizza.cfg
creatures/		north-pacific-gyre/	solar.pdf
data/			notes.txt		writing/


we can actually change our location to a different directory, so we are no longer located in our home directory.	

The command to change locations is **cd** followed by a directory name to change our working directory. cd stands for “change directory”, which is a bit misleading: the command doesn’t change the directory, it changes the shell’s idea of what directory we are in.

Let’s say we want to move to the **data directory** we saw above. We can use the following series of commands to get there:

To start over from my home directory, you can use the command:

In [15]:
cd 



In [16]:
pwd

/Users/rotsuji


In [12]:
cd desktop



In [13]:
cd sio-swc



In [14]:
cd data-shell



These commands will move us from our home directory onto our Desktop, then into the workshop  directory then into data-shell directory, then into the data directory. cd doesn’t print anything, but if we run pwd after it, we can see that we are now in
/desktop/sio-swc/data-shell/

to check type:

In [19]:
pwd

/Users/rotsuji/Desktop/sio-swc/data-shell


In [20]:
ls -F

Desktop/		molecules/		pizza.cfg
creatures/		north-pacific-gyre/	solar.pdf
data/			notes.txt		writing/


Now We now know how to go down the directory tree: how do we go up? We might try the following:

In [21]:
cd data-shell

bash: cd: data-shell: No such file or directory


Error!   cd can only see sub-directories inside your current directory
	Windows - no such file or directory cd
    
There is a shortcut in the shell to move up one directory level that looks like this:

In [22]:
cd ..



.. is a special directory name meaning “the directory containing this one”, or more succinctly, the parent of the current directory. Sure enough, if we run pwd after running cd ..

In [23]:
pwd

/Users/rotsuji/Desktop/sio-swc


The special or hidden directory .. doesn’t usually show up when we run ls. If we want to display it, we can give ls the -a flag:

In [24]:
ls -F -a

./		.DS_Store	README.md
../		.git/		data-shell/


-a stands for “show all”; it forces ls to show us file and directory names that begin with ., such as ..
As you can see, it also displays another special directory that’s just called., which means “the current working directory”. It may seem redundant to have a name for it, but we’ll see some uses for it soon.

These then, are the basic commands for navigating the filesystem on your computer: pwd, ls and cd. Let’s explore some variations on those commands. 

What happens if you type cd on its own, without giving a directory?

In [25]:
cd



How can you check what happened? pwd gives us the answer!

In [26]:
pwd

/Users/rotsuji


It turns out that cd without an argument will return you to your home directory, which is great if you’ve gotten lost in your own filesystem.

Let’s try returning to the data directory from before. Last time, we used three commands, but we can actually string together the list of directories to move to data in one step:

In [27]:
cd desktop/sio-swc/data-shell



Check that we’ve moved to the right place by running pwd and ls -F.

In [28]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell


In [29]:
ls -F

Desktop/		molecules/		pizza.cfg
creatures/		north-pacific-gyre/	solar.pdf
data/			notes.txt		writing/


ensure that we’re in the directory we expect

**shortcuts:** The shell interprets the character ~ (tilde) at the start of a path to mean “the current user’s home directory”.  e.g. /c/Users/Reid
Another shortcut is the - (dash) character. cd will translate - into the previous directory I was in, which is faster than having to remember, then type, the full path. 

**The difference between cd .. and cd - is that the former brings you up, while the later brings you back.**

#### Introduce Nelle's pipline Organizing files:

lets look at the researcher's files:

First, she creates a directory called north-pacific-gyre (to remind herself where the data came from). 
Inside that, she creates a directory called 2012-07-03, which is the date she started processing the samples. She used to use names like conference-paper and revised-results, but she found them hard to understand after a couple of years. (The final straw was when she found herself creating a directory called revised-revised-results-3.)

Each of her physical samples is labelled according to her lab’s convention with a unique ten-character ID, such as “NENE01729A”. 

This is what she used in her collection log to record the location, time, depth, and other characteristics of the sample, so she decides to use it as part of each data file’s name. 

Since the assay machine’s output is plain text, she will call her files NENE01729A.txt, NENE01812A.txt, and so on. All 1520 files will go into the same directory.


Introduce tab completion while looking at Nelle's files:

In [30]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell


In [31]:
ls north-pacific-gyre/2012-07-03

NENE01729A.txt	NENE01751B.txt	NENE01971Z.txt	NENE02040A.txt	NENE02043B.txt
NENE01729B.txt	NENE01812A.txt	NENE01978A.txt	NENE02040B.txt	goodiff
NENE01736A.txt	NENE01843A.txt	NENE01978B.txt	NENE02040Z.txt	goostats
NENE01751A.txt	NENE01843B.txt	NENE02018B.txt	NENE02043A.txt


This is a lot to type, but we can let the shell do most of the work through what is called **tab completion.** If we types:

**$ ls nor [TAB]**

and then presses tab (the tab key on her keyboard), the shell automatically completes the directory name for us: $ ls north-pacific-gyre/ 

**$ [Tab]**

If we presses tab again, Bash will add 2012-07-03/ to the command, since it’s the only possible completion. 
$ ls north-pacific-gyre/2012-07-03/

In [34]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell/north-pacific-gyre/2012-07-03


## CHALLENGE SET 1 - end of section

http://swcarpentry.github.io/shell-novice/02-filedir/#absolute-vs-relative-paths

### Lesson 2:  Files and Directories Objectives

* Create a directory hierarchy that matches a given diagram.
* Create files in that hierarchy using an editor or by copying and renaming existing files.
* Display the contents of a directory using the command line.
* Delete specified files and/or directories.


We now know how to explore files and directories, but how do we create them in the first place? Let’s go back to our **data-shell** directory
	
If you are not at your data-shell directory use the cd command:

In [36]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell/north-pacific-gyre


In [37]:
cd ..



now let’s check that we are in our data-shell directory and see what it contains:

In [38]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell


In [39]:
ls -F

Desktop/		molecules/		pizza.cfg
creatures/		north-pacific-gyre/	solar.pdf
data/			notes.txt		writing/


Let’s create a new directory called thesis using the command **mkdir thesis (which has no output):**

In [40]:
mkdir thesis



As you might (or might not) guess from its name, mkdir means “make directory”. Since thesis is a relative path (i.e., doesn’t have a leading slash), 

In [41]:
ls -F

Desktop/		north-pacific-gyre/	thesis/
creatures/		notes.txt		writing/
data/			pizza.cfg
molecules/		solar.pdf


the new directory is created in the current working directory
However, there’s nothing in it yet: 

#### Advice:

Complicated names of files and directories can make your life very painful when working on the command line. Here we provide a few useful tips for the names of your files from now on.

* Don’t use whitespaces.

White spaces can make a name more meaningful but since whitespace is used to break arguments on the command line is better to avoid them on name of files and directories. You can use - or _ instead of whitespace.

* Don’t begin the name with -.

Commands treat names starting with - as options.

* Stay with letters, numbers, ., - and _.

May of the others characters have an special meaning on the command line that we will learn during this lesson. Some will only make your command not work at all but for some of them you can even lose some data.

If you need to refer to names of files or directories that have whitespace or another non-alphanumeric character you should put quotes around the name.


In [42]:
ls -F thesis



shows no list because the directory is empty.

Let’s create and add a file.  Let’s change our working directory to thesis using cd, then run a text editor called Nano to create a file called draft.txt:

In [43]:
cd thesis



In [44]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell/thesis


Run nano draft.txt to create draft.txt file

**$ nano draft.txt**

Windows users can run:notepad draft.txt or nano in gitbash

**It’s not “publish or perish” anymore,
	it’s “share and thrive”.**

Let’s type in a few lines of text, mac users: use Control-O to write our data to disk:
(Windows users save the txt file directly to the \\workshop\data-shell\thesis\ folder)

Once our file is saved, we can use Control-X to quit the nano editor and return to the shell. (Unix documentation often uses the shorthand ^A to mean “control-A”.) nano doesn’t leave any output on the screen after it exits, 


In [45]:
ls

draft.txt


ls now shows that we have created a file called draft.txt:

now, Let’s tidy up by running rm draft.txt:

In [46]:
rm draft.txt



This command removes files (“rm” is short for “remove”). If we run ls again, its output is empty once more, which tells us that our file is gone:

In [47]:
ls



the thesis directory is now empty.

**Deletion note:**  The Unix shell doesn’t have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do). Instead, when we delete files, they are unhooked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there’s no guarantee they’ll work in any particular situation, since the computer may recycle the file’s disk space right away. 

* deleting is forever

keep this in mind when deleting files in the Shell

In [48]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell/thesis


next, we are going to re-create that file in thesis folder again and then move up one directory to using cd ..:

recreate file by typing


**$ nano draft.txt or notepad draft.txt**

type something in the txt file.
double check the file has been created by typing **ls**

In [49]:
ls

draft.txt


In [50]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell/thesis


move up one directory by typing cd .. cd .

In [51]:
cd ..



In [52]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell


If we try to remove the entire thesis directory using **rm thesis**, we get an error message:

In [53]:
rm thesis

rm: thesis: is a directory


rm: cannot remove `thesis': Is a directory

This happens because rm only works on files, not directories. 

The right command is **rmdir**, which is short for “remove directory”. 

It doesn’t wrmdirpwdork yet either, though, because the directory we’re trying to remove isn’t empty:

In [54]:
rmdir thesis

rmdir: thesis: Directory not empty


rmdir: failed to remove `thesis': Directory not empty

This little safety feature can save you a lot of grief, particularly if you are a bad typist. 

To really get rid of thesis we must first delete the file draft.txt:

In [55]:
rm thesis/draft.txt



The directory is now empty, so rmdir can delete it:

In [56]:
rmdir thesis



the directory should now be deleted, we can double check by running

In [57]:
ls -F

Desktop/		molecules/		pizza.cfg
creatures/		north-pacific-gyre/	solar.pdf
data/			notes.txt		writing/


TIP: if you are sure you want to delete a directory and all it’s content you can use 

$rm - r [directory name]

**This removes everything in the directory, then the directory itself. If the directory contains sub-directories, rm -r does the same thing to them, and so on. It’s very handy, but can do a lot of damage if used without care.**

once more, Let’s create that directory and file one more time. 

(Note that this time we’re running nano with the path thesis/draft.txt, rather than going into the thesis directory and running nano on draft.txt there.)

In [59]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell


In [60]:
mkdir thesis



**$ nano (or notepad) thesis/draft.txt**

In [61]:
ls thesis

draft.txt


Since draft.txt  isn’t a particularly informative name, we are going to change the file’s name using **mv**, which is short for “move”

**$ mv thesis/draft.txt thesis/quotes.txt**

when we run the move command, the first parameter tells mv what we’re “moving”, while the second is where it’s to go. 

In this case, we’re moving thesis/draft.txt to thesis/quotes.txt, which has the same effect as renaming the file.

	let’s check to make sure the file was renamed. 

In [73]:
ls thesis

quotes.txt


Sure enough, ls shows us that thesis now contains one file called quotes.txt:

**Caution:** One has to be careful when specifying the target file name, since mv will silently overwrite any existing file with the same name, which could lead to data loss. An additional flag, mv -i (or mv --interactive), can be used to make mv ask the user for confirmation before overwriting.


Now let’s say we want to move the quotes.txt file into the current working directory.  We use mv once again, but this time we’ll just use the name of a directory as the second parameter to tell mv that we want to keep the filename, but put the file somewhere new. (This is why the command is called “move”.) In this case, the directory name we use is the special directory name . that we mentioned earlier..  


In [76]:
mv thesis/quotes.txt .



The effect is to move the file from the directory it was in to the current working directory. ls now shows us that thesis is empty:

In [77]:
ls thesis



Further, ls with a filename or directory name as a parameter only lists that file or directory. We can use this to see that quotes.txt is still in our current directory:

In [78]:
ls quotes.txt

quotes.txt


For the last part of this lesson we’ll be using the command **cp** to make a copy of a file.  

The **cp** command works very much like **mv**, except it copies a file instead of moving it. 



In [81]:
cp quotes.txt thesis/quotations.txt



We can check that it did the right thing using **ls** with two paths as parameters


In [82]:
ls quotes.txt thesis/quotations.txt

quotes.txt		thesis/quotations.txt


To prove that we made a copy, let’s delete the quotes.txt file in the current directory and then run that same ls again.

In [83]:
rm quotes.txt



In [84]:
ls quotes.txt thesis/quotations.txt

ls: quotes.txt: No such file or directory
thesis/quotations.txt


**Error: ls: cannot access quotes.txt: No such file or directory
thesis/quotations.txt**

This time it tells us that it can’t find quotes.txt in the current directory, but it does find the copy in thesis that we didn’t delete.

## CHALLENGE SET 2 - end of section

http://swcarpentry.github.io/shell-novice/03-create/#renaming-files

### Lesson 3:  Pipes and Filters
* Redirect a command’s output to a file.
* Process a file instead of keyboard input using redirection.
* Construct command pipelines with two or more stages.
* Explain what usually happens if a program or pipeline isn’t given any input to process.
* Explain Unix’s “small pieces, loosely joined” philosophy.


Now that we know a few basic commands, let’s take a look at the shell’s most powerful feature: the ease with which it lets us combine existing programs in new ways. 

We’ll start with a directory called molecules that contains six files describing some simple organic molecules. The .pdb extension indicates that these files are in Protein Data Bank format, a simple text format that specifies the type and position of each atom in the molecule.

To get started, 

Make sure you are in the data-shell folder. if not navigate to the folder in the shell.  

You can check by typing:

In [85]:
pwd

/Users/rotsuji/desktop/sio-swc/data-shell


In [86]:
ls -F

Desktop/		north-pacific-gyre/	thesis/
creatures/		notes.txt		writing/
data/			pizza.cfg
molecules/		solar.pdf


Change to the molecules directory.