# First Morning, Introductions, and Lesson Plans

Your instructors are: James Hart, Christopher Hann-Soden, Dat Mai, Sumayah Rahmam, and David DeTomaso



## Topics:
- Expectations for the course
- Navigating the UNIX shell
- Viewing the content of files
- How to get help
- Jupyter Notebooks

## Introduction

Welcome to the CCB Introduction to Programming for Bioinformatics bootcamp!

Overview for today:

1. Course goals, we will cover, and why should biologists know how to code
2. Class structure & organization
3. Learn how to function in a UNIX environment
4. Write your first python program

### What are you going to learn?
You are going to learn python :O)

Python is a simple and powerful programming language that is used for many applications, from simple tasks to large software development projects. It has become popular as both a first language for beginning students and an everyday one for advanced programmers. Python is used by a range
of companies including Netflix, Google, Microsoft, YouTube, and Industrial Light & Magic.


Our goal is to show you how to apply programming to the problems and tasks that you face in the lab. By the end of this course, you will be able to do the following:
Extract data from large files.
Parse a .fasta file and translate a file of sequences.
Organize your data in python's data structures and write it out to files.
Perform automated tasks quickly and efficiently.
Create data analysis pipelines that incorporate pieces of specialized bioinformatics software.
Apply statistical tests to your data.
Make publication-quality figures with your data.

Although we will mostly focus on programming as it pertains to biology, our aim is for you to leave this course with a sufficiently generalized knowledge of programming (and the confidence to read the manuals) that you will be able to apply your skills to whatever you happen to be working on.

### What are you not going to learn?

With only one week to teach this course, there are many topics that we are not going to cover (for example, classes and unit testing). This course is intended to be an introduction to the basics with a focus on the practical uses of coding in biology. From this course, you will gain the ability to at least "talk the talk" and understand where and how to seek out more information should you choose to go further in your development as a "coder." If you want to learn more advanced things, your best bet is to take a full class on coding in a more formal setting to understand the theory and delve into more applied skills (like software development). We are (primarily) experimental biologists that use programming to manage and handle large data sets, allowing us to speed up tasks, efficiently use
existing software to answer biological questions, build pipelines for data analysis and management, and perform basic statistical tests and analysis.


## Things to Keep in Mind

### Coding is Hard

Coding has a STEEP learning curve! IT'S HARD! Learning to program is like learning a new language. Furthermore, the world of programming has its own culture and lexicon, and for novice coders, it can be a little intimidating. Learning a new language requires adjusting the way you think about solving a problem and communicating that solution. Even more confusing, each programmer develops his/her own style (accent). In practice, this means that there is almost never a "RIGHT ANSWER," but rather that there are almost infinite ways to solve almost any problem. The good news is if you like problem solving, you'll love this course! The bad news is that it will be hard. Don't despair! Our goal is to create a safe space for you to immerse yourself in learning this important skill. Ask questions, be engaged, and keep coming to class, even if you're behind!

In the course of the class, you will be exposed to several styles and see several ways to get to more complicated solutions. The ultimate goal, though, is to give you the tools to begin to develop a style of your own. Later on, you may also want to read PEP 8, which is a [style guide for Python](http://www.python.org/dev/peps/pep-0008/), much like the vaunted Strunk and White is a style guide for English.

You have two incredibly useful resources at your disposal during the labs: first, you have us, the TAs and instructors, who are all familiar with the language and here to help you out. Second, you have the extensive documentation about python and programming that we will be introducing you to in the course of the class. Here are links to some of these resources here:

- [Learning Python](http://proquest.safaribooksonline.com/1-56592-464-9)
- [Python Pocket Reference](http://proquest.safaribooksonline.com/9780596802011)
- [Python Website (documentation)](https://docs.python.org/release/2.7.11/)
- [Linux Pocket Guide](http://proquest.safaribooksonline.com/9780596806347)
- [Python Code Visualization](http://www.pythontutor.com/visualize.html#mode=edit)

### Python is BIG

Thousands of lab scientists, computer scientists, and programmers have
used it, contributed to it, and extended it to their own sub-fields and sub-sub-fields. And they keep improving, and thus changing it! We can't get to all of it, even if we had much longer than two weeks. Included in the list of subjects we will not cover in any depth is object-oriented programming, writing parallel programs,
integrating your code with faster code written in C or C++, or a host of other powerful-but-subtle methods and topics. We will teach you enough that you should be able to go learn about them if and when you want.

### Python is evolving

Like most things in the computing world, things move fast and change is constant. At the moment, there are two major versions of Python available: Python 2.7 and Python 3.5. In this course, we'll be using Python 2.7. Why aren't we using the more up-to-date Python 3? There's a bit of history: For most of Python's lifetime, each new version of the language would introduce new features, but would try very, very hard to not break any code that other people had already written; in other words, most changes were backwards compatible. In about 2006, however, Python's creator and "Benevolent Dictator For Life" Guido van Rossum decided that there were a number of things that he had gotten wrong in the original Python specification, and would like to change. Making those changes would break other people's code, so it was decided to make them all at once.  Further confusing matters, Python 2 was actually upgraded to its current version, 2.7, after the release of 3.1. I (and the rest of your instructors) still pretty much exclusively use Python 2.7 in our own work. Most of the differences between the two are fairly minor, and although that version was released almost 10 years ago, there are still some add-ons to the core language that haven't completely switched.



## Course Schedule

Over the course of this week, we will learn the very basics of programming practice and the
fundamentals of Python syntax, including:

- reading and writing information from/to files
- how to do interesting and complicated things with the information
- incorporating other people's code to do more faster, with less effort
- basic data analysis

Our daily schedule will generally proceed like this:

Start at 9am

1-2 hour lecture

2-2.5 hours lab for exercises

Lunch from 1 - 2

1-2 hour lecture

2-2.5 hours lab for exercises

Leave at 5 pm

You will have a number of exercises each day covering the breadth of the lectures. You will not be graded on these but it's REALLY important for you to be able to demonstrate what you've just learned. Just like learning French, one learns programming by doing. If you only finish half the problems, you've only really learned half the material. You get what you put in.

*Questions?*

# Using the UNIX shell

You will spend nearly all of your time in one of two places: the shell, or the interactive Jupyter notebooks. The shell allows you to move and copy files, run programs, and more, while the notebooks are where you will write your programs. We will focus mostly on the shell this morning and touch on the basic usage of the Jupyter notebooks. We will begin Python this afternoon.

A large fraction of what you can do in the shell (also called the "command line") can be done using the windowed operating system you're used to. While for the simplest of tasks, the command line may seem like a step backwards. But for anything even mildly more complicated (for example, "move every file with 2012 anywhere in its name to the folder Backup"), it can save a lot of time. And then there are the programs that can only be run from the command line, which are much easier to write and more flexible in what they can do.
___________
### Informative Interlude: Some notes on the formatting of the lessons for this course

Periodically in these lessons, we may stop with an informative interlude outlined with a horizontal
line above and below (like the one two lines up!). In this case, we're taking a quick break to discuss
this and other aspects of the formatting.

For this and all further examples, a $ represents your shell prompt, and boldface indicates the
commands to type at the prompt. The text below will be used for output you should see when you take the
described action.

Finally, when we use actual python code examples, they will be contained in a separate box:

In [1]:
print "This is where code will appear."

This is where code will appear.


You'll notice that some of the words are in different colors. These words mean special things in Python. The notebook software understands that, and will color the code to make the structure more clear. Many editors like aquamacs also have a "syntax highlighting" feature, and this can actually be a useful hint when something inevitably goes wrong.
_____________

## Moving around different folders

One of the basic concepts is that your shell is always based somewhere in the directory structure.
An analogy here is if you have only a single window open (e.g. in the Finder on a Mac, or the
directory browser in Linux).


### pwd - Print Working Directory
>*Where am I?* 

Prints the directory in which you are at the current moment. If you create any files, they will appear in this spot. When you first open the terminal shell, you will be in your "home" directory.

```bash
$ pwd

/home/james
```

### cd - Change Directory 
>*Move to a new directory*

Given a path, this command moves your "current location" to the specified directory.
```bash
$ cd Dropbox/2017_Winter_Python

$ pwd
/home/james/Dropbox/2016_Summer_Python
```

To go up, use the command cd ..
```bash
$ cd ..
$ pwd
/home/james/Dropbox
```

Thus far, these have been relative paths (i.e. relative to your current directory), but you can also use an absolute path (which will start with a /):

```bash
$ pwd
/home/james/Dropbox
$ cd /usr/local/bin
```

A shortcut for your home directory is ~:

```bash
$ cd ~
$ pwd
/home/james
```

And you can use these as part of a path as well:

```bash
$ cd ~/Dropbox/2016_Summer_Python
```

Another way to get to the home directory is to simply type "cd":

An aside on directories...
Directories in UNIX are set up the same way as your regular computer. Just as you would open up a window into your directories and click to open up folders, here you use cd to go through the directories. You are just typing the command instead of clicking.

### ls - Lists contents of a directory (LiSt)
>*Shows the files and directories inside the current directory*

```bash
$ cd ~/Dropbox/2016_Summer_Python
$ ls
1.1_Why_We_Program                 5.1_Intro_to_Plotting
1.2_The_Very_Basics                5.2_Biopython_and_System_Calls
2.1_Functions_and_Methods          linux_install.txt
2.2_Control_Flow                   linux_install.txt~
3.1_Lists_Dictionaries_and_Tuples  macos_install.txt
3.2_Fancy_Data_Structures          macos_install.txt~
4.1_Reading_and_Writing_Files      ProgramFiles
4.2_Modules_Numpy_Scipy
```

ls has many options. Here are some of the more useful ones to know:

```bash
$ ls -l
```
lists the long form of the directory entries' security permissions, owners of files, sizes, date created

```bash
$ ls -lh
```
lists the long form of the directory entries, but with the sizes in a human-readable format (i.e. MB and GB instead of the number of bytes)

```bash
$ ls -lt
```
shows long listing, and sorts by modification time

```bash
$ ls -lr
```
reverses the list

```bash
$ ls ..
```
list contents of the directory above

```bash
$ ls A_PATH
```
list contents in the directory specified by A_PATH, which can be either relative or absolute.

```bash
$ ls -ltr
```
combine -l, -t, -r options


## Making your mark...

```bash
$ cd ~/Dropbox/2017_Winter_Python
```

### mkdir - Create a given directory (MaKe DIRectory)
>Exactly what it says - let's you create new directories.

```bash
$ cd 1.1_Why_We_Program
$ mkdir Notes
$ cd Notes
$ echo 'Hello World' > python_notes.txt
$ ls
python_notes.txt
$ mkdir data
$ ls
data python_notes.txt
```

### cp - copy file or directory] (CoPy) 
>Create a copy of the original file

Make a copy of python_notes.txt called python_notes.txt2

```bash
$ cp python_notes.txt python_notes2.txt
$ ls
data/
python_notes.txt
python_notes2.txt
```
Create a file called project_notes.txt and make a copy of it called backup.txt


```bash
$ echo 'To Do' > project_notes.txt
$ ls
project_notes.txt
data/
python_notes.txt
python_notes2.txt
```

```bash
$ cp project_notes.txt backup.txt
$ ls
backup.txt
project_notes.txt
data/
python_notes.txt
python_notes2.txt
```


### mv - move files or directories(MoVe) 
>Rename a file or directory. Renaming is the same as moving within the same directory.

Example:

```bash
$ mv backup.txt project_notes.txt
$ ls
project_notes.txt
data/
python_notes.txt
python_notes2.txt
```
Here we've renamed the file backup.txt to project_notes.txt

Because project_notes.txt already existed, this overwrites the old file with whatever was in backup.txt

Essentially, we've restored the backup.

### rm - delete a file or directory (ReMove)
>Delete a file or directory.

Delete the file somefile.txt

```bash
rm somefile.txt
```

Delete somedirectory

```bash
rm -r somedirectory
```
\*Note the "-r" (stands for "recursive") is needed to delete directories and all their contents.


## Peeking inside files

less - view contents of a file 
less shows the contents of a file, and allows you to scroll and search the contents. However, less can only be used for simple text files, so you cannot reliably view contents of, say, MS Word documents with less. Fortunately, most of the files we'll be dealing with will be plain text files

So let's see this works. Download this [Pythons of the World](https://intro-prog-bioinfo-2015.wikispaces.com/file/view/pythons_of_the_world.txt/556090701/pythons_of_the_world.txt) text file and save it to your ~/PythonCourse directory. To read into this file, type:

```bash
$ less pythons_of_the_world.txt

The Pythonidae, commonly known simply as pythons, from the Greek word Python (πυθων), are a family of nonvenomous (though see the section "Toxins" below) snakes found in Africa, Asia and Australia. Among its members are some of the largest snakes in the world. Eight genera and 26 species are currently recognized.[2]

Contents [hide] 
1 Geographic range
2 Conservation
3 Behavior
4 Feeding
5 Toxins
6 Reproduction
7 Captivity
8 Genera
9 Taxonomy
10 References
11 External links
Geographic range[edit]
Pythons are found in sub-Saharan Africa, Nepal, India, Sri Lanka, Burma, southern China, Southeast Asia and from the Philippines southeast through Indonesia to New Guinea and Australia.[1]
```

Some useful navigational tips for less:
- Use the "enter" key to progress one line at a time through the text.
- You can use the arrow keys to move up or down a line in the text.
- The spacebar will advance an entire page.
- You can search for a word by typing a slash (e.g. /) followed by the search word.
- To quit, type q.
- To see the full help screen, type h.

_____________________________
Optional Informative Interlude: UNIX names tend to be overly clever.

As you've seen with the basic commands thus far, the names are generally descriptive abbreviations of the program's function. For example, mkdir is for making a directory, ls is for listing the contents of a directory, etc. However, programmers, especially UNIX programmers, tend to get increasingly clever as things progress. Unaware of the fact that this practice makes things opaque, the typical programmer cries out for attention by making program names self-referentially clever. less is a good example of this. In the olden days, the most basic ways to view a text file could not divide files into individual pages, thus a multipage document would scroll off the screen before the first page could be read. As a solution, a program called more was written, which paused at the bottom of each page and prompted the user to press the spacebar for "more." The program name here is reasonably descriptive, but more had some noticeable feature deficiencies: you could neither advance the text one line at a time nor navigate backward in the document without reloading the whole file. The program written to accommodate these features is less. The cleverness of the name is revealed by the paradoxical adage "[less is more](http://en.wikipedia.org/wiki/Ludwig_Mies_van_der_Rohe) ." Your teachers and TAs may use the more command interchangeable with less throughout the class.
___________________________

### cat - join two files together (conCATenate) 
>If given just one file, cat will print the contents of the file to the screen. Given multiple files, it will print one after another.

Let's start by making two files, cat1.txt and cat2.txt:

```bash
$ echo 'HEY EVERYONE!!!' > cat1.txt
$ echo 'WISH I WAS OUTSIDE PLAYING' > cat2.txt
```

To view the contents, type:

```bash
$ cat cat1.txt
HEY EVERYONE!!!
$ cat cat2.txt
WISH I WAS OUTSIDE PLAYING
$ cat cat1.txt cat2.txt
HEY EVERYONE!!!
WISH I WAS OUTSIDE PLAYING
```




## Special characters

### wildcard matching with the *

The star functions as a "wild-card" character that matches any number of characters.

```bash
$ ls
cat1.txt cat2.txt pythons_of_the_world.txt
$ ls cat*
cat1.txt cat2.txt
```

The star can go anywhere in a list of arguments you're supplying, even in the middle of words! There are [other wildcards you can use](https://en.wikibooks.org/wiki/A_Quick_Introduction_to_Unix/Wildcards) but * is the most common.


### pipe |
>(the one above the backslash "\" key)

Piping with | connects UNIX commands, allowing the output of one command to "flow through the pipe" to another. This lets you chain programs together, such that each one only needs to worry about one step of the process (either generating, filtering, or modifying data), without knowing or caring where it came from or where it's going to.


```bash
$ ls | cat
```

A common use of the pipe is to pipe the output to less, to allow scrolling through the first bits of output without overloading the screen

```bash
$ ls | less
```



### Output to a file with > and >>

In addition to redirecting output to another command, the results can be sent into a file with the >

```bash
$ cat cat1.txt cat2.txt > wishes.txt
$ cat wishes.txt
HEY EVERYONE!!!
WISH I WAS OUTSIDE PLAYING
```

Or you can append to the end of a file with >>

```bash
$ echo "Just kidding, I love Programming!" >> wishes.txt
$ cat wishes.txt
HEY EVERYONE!!!
WISH I WAS OUTSIDE PLAYING
Just kidding, I love Programming!"
```


## Other useful commands (in your own time)

### head - print first few lines of a file

By default, head prints the top 10 lines of the input file. To print a different number, say 12, lines:
```bash
$ head pythons_of_the_world.txt
The Pythonidae, commonly known simply as pythons, from the Greek word Python (πυθων), are a family of nonvenomous (though see the section "Toxins" below) snakes found in Africa, Asia and Australia. Among its members are some of the largest snakes in the world. Eight genera and 26 species are currently recognized.[2]

Contents [hide] 
1 Geographic range
2 Conservation
3 Behavior
4 Feeding
5 Toxins
6 Reproduction
7 Captivity
```

### tail - print the last few lines of a file

```bash
$ tail pythons_of_the_world.txt
^ Jump up to: a b c d e McDiarmid RW, Campbell JA, Touré T. 1999. Snake Species of the World: A Taxonomic and Geographic Reference, vol. 1. Herpetologists' League. 511 pp. ISBN 1-893777-00-6 (series). ISBN 1-893777-01-4 (volume).
^ Jump up to: a b c d e "Pythonidae". Integrated Taxonomic Information System. Retrieved 15 September 2007.
Jump up ^ "Huge, Freed Pet Pythons Invade Florida Everglades", National Geographic News. Accessed 16 September 2007.
Jump up ^ Hardy, David L. (1994). "A re-evaluation of suffocation as the cause of death during constriction by snakes". Herpetological Review 229: 45-47.
Jump up ^ Mehrtens JM. 1987. Living Snakes of the World in Color. New York: Sterling Publishers. 480 pp. ISBN 0-8069-6460-X.
Jump up ^ Stidworthy J. 1974. Snakes of the World. Grosset & Dunlap Inc. 160 pp. ISBN 0-448-11856-4.
Jump up ^ Carr A. 1963. The Reptiles. Life Nature Library. Time-Life Books, New York. 192 pp. LCCCN 63-12781.
Jump up ^ Bryan G. Fry, Nicolas Vidal, Janette A. Norman, Freek J. Vonk, Holger Scheib, S. F. Ryan Ramjan, Sanjaya Kuruppu, Kim Fung, S. Blair Hedges, Michael K. Richardson, Wayne. C. Hodgson, Vera Ignjatovic, Robyn Summerhayes, Elazar Kochva (2006). "Early evolution of the venom system in lizards and snakes". Nature 439 (7076): 584–588. doi:10.1038/nature04328. PMID 16292255.
Jump up ^ Bryan G. Fry, Eivind A. B. Undheim, Syed A. Ali, Jordan Debono, Holger Scheib, Tim Ruder, Timothy N. W. Jackson, David Morgenstern, Luke Cadwallader, Darryl Whitehead, Rob Nabuurs, Louise van der Weerd, Nicolas Vidal, Kim Roelants, Iwan Hendrikx, Sandy Pineda Gonzalez, Alun Jones, Glenn F. King, Agostinho Antunes, Kartik Sunagar (2013). "Squeezers and leaf-cutters: differential diversification and degeneration of the venom system in toxicoferan reptiles". Molecular & Cellular Proteomics 12 (7): 1881–1899. doi:10.1074/mcp.M112.023143.
Jump up ^ "The Keeping of Large Pythons" at Anapsid. Accessed 16 September 2007.
```




### grep (Global Regular Expression Print)
Searches for the "search string" in a text file and prints out all lines where it find the desired text. The search string can be a simple word, or a [complicated specification of matches/mismatches](http://www.robelle.com/smugbook/regexpr.html).

**Format:**
```bash
grep 'search_string' file1 [file2 ...]
```

**Examples**

```bash
$ grep "python" pythons_of_the_world.txt
The Pythonidae, commonly known simply as pythons, from the Greek word python (πυθων), are a family of nonvenomous snakes found in Africa, Asia and Australia. Among its members are some of the largest snakes in the world. Eight genera and 26 species are currently recognized.[2]
In the United States, an introduced population of Burmese pythons, Python molurus bivittatus, has existed as an invasive species in the Everglades National Park since the late 1990s.[3]
Many species have been hunted aggressively, which has decimated some, such as the Indian python, Python molurus.
Black-headed python,
Larger specimens usually eat animals about the size of a house cat, but larger food items are known: some large Asian species have been known to take down adult deer, and the African rock python, Python sebae, has been known to eat antelope. Prey is swallowed whole, and may take anywhere from several days or even weeks to fully digest.
Contrary to popular belief, even the larger species, such as the reticulated python, P. reticulatus, do not crush their prey to death; in fact, prey is not even noticeably deformed before it is swallowed. The speed with which the coils are applied is impressive and the force they exert may be significant, but death is caused by suffocation, with the victim not being able to move its ribs to breathe while it is being constricted.[5][6][7]
Apodora Kluge, 1993 1 0 Papuan python Most of New Guinea, from Misool to Fergusson Island
Bothrochilus Fitzinger, 1843 1 0 Bismark ringed python The islands of the Bismark Archipelago, including Umboi, New Britain, Gasmata (off the southern coast), Duke of York and nearby Mioko, New Ireland and nearby Tatau (off the east coast), the New Hanover Islands and Nissan Island
Leiopython Hubrecht, 1879 1 0 D'Albert's water python Most of New Guinea (below 1200 m), including the islands of Salawati and Biak, Normanby, Mussau, as well as a few islands in the Torres Strait
Carpet python,
Green tree python,
Albino Burmese python,
Borneo short-tailed python,
```


The -c argument counts the number of lines with a match (not the number of matches).

```bash
$ grep -c python pythons_of_the_world.txt
13
```

The -v argument inVerts the search (i.e. prints lines that *don't* contain your search string).

### cut - extract columns from a text table
**Format:**
```bash
cut -f column_number(s) -d delimiter file
```
Many of the data files well be dealing with are actually tables, usually separated by tabs. The cut command will pull out the column numbers you specify and print them out to the shell, while leaving the original file alone.
```bash
$ cat Snake_Data.csv
Burmese python,Python,Pythonidae
King cobra,Ophiophagus,Elapidae
Night adder,Causus,Viperidae

$ cut -f 1 -d ',' Snake_Data.csv
Burmese python
King cobra
Night adder
```

Grab multiple collumns using commas


```bash
$ cut -f 2,3 -d ',' Snake_Data.csv
Python,Pythonidae
Ophiophagus,Elapidae
Causus,Viperidae
```


By default, the delimiter is set to a tab
```bash
$ cut -f 1 Snake_Data.tsv
Burmese python
King cobra
Night adder
```



## Permissions

Unlike the computers you are used to, UNIX doesn't automatically know what to do with files (e.g. It won't know to use Word to open a .doc document), and it doesn't even know whether a file is data or a program (and as we'll see with the programs we write, it might be different things at different times)

The first thing that controls a file is the file's permissions. You can control who can read, write, and execute (run as a program) each of your files. This command lists the permissions:

```bash
$ ls -la
```

The first letter tells you whether it is a directory.

The next set of letters tell you if a file is readable (r), writable (w), or executable (x).

The 2nd-4th letters tell you what *your* permissions are, 5th-7th tell you what your group's permissions are, and the last three tell you what everyone else's permissions are. Unix was designed to be a multi-user operating system, so even if you're the only one who uses the computer, it maintains the distinction for you, versus your group, versus everyone else.

### chmod - Modify permissions.
chmod [flags] [filename]

```bash
$ echo 'script' >script.py
$ ls -l script.py
-rw-rw-r-- 1 james james 7 May 25 13:05 script.py
$ chmod +x script.py
$ ls -l script.py
-rwxrwxr-x 1 james james 7 May 25 13:05 script.py
```

If you try running a program and it's not working at some point in the class, double check the permissions!!!

## Getting Help

man command_name
[what does that command do again?]

Most commands have many useful flags beyond what I've shown you. For information on a particular command, look at the MANual pages with man.
```bash
$ man wc
WC(1)                            User Commands                           WC(1)

NAME
       wc - print newline, word, and byte counts for each file

SYNOPSIS
       wc [OPTION]... [FILE]...
       wc [OPTION]... --files0-from=F

DESCRIPTION
       Print newline, word, and byte counts for each FILE, and a total line if
       more than one FILE is specified.  A word is a non-zero-length  sequence
       of characters delimited by white space.

       With no FILE, or when FILE is -, read standard input.

       The  options  below  may  be  used  to select which counts are printed,
       always in the following order: newline, word, character, byte,  maximum
       line length.

       -c, --bytes
              print the byte counts

       -m, --chars
              print the character counts

       -l, --lines
              print the newline counts

       --files0-from=F
              read  input  from the files specified by NUL-terminated names in
              file F; If F is - then read names from standard input

       -L, --max-line-length
              print the maximum display width

       -w, --words
              print the word counts

       --help display this help and exit

       --version
              output version information and exit

```


## Final Installation Check
Ok, now that we have a handle on the terminal, let's do a final installation check of the programs we had you install.

In your terminal window type:

```bash
which python
```

This prints out the full system path to the version of python that runs when you type in "python" on the terminal.  We want to make sure that this is Anaconda distribution of Python that we had you install, and not the system python.  Make sure there is a folder in the path that says "Anaconda" or "Miniconda" and if not, raise your hand.

Next we'll make sure iPython and Jupyter Notebook are book installed.

Type

```bash
ipython
```

And you should get a few lines of text and a prompt that looks like:

```bash
In [1]:
```

If this is what you are seeing, everything is installed correctly.  Type "exit" and hit return to go back to the terminal.

Lastly, to check on Jupyter Notebook type this in the terminal:

```bash
jupyter notebook
```

You should see a bunch of lines by text, followed by your browser opening to the Jupyter Notebook home page.  It might take 5-10 seconds to launch, but if it isn't coming up, raise your hand and someone will come by to help you out.



## Jupyter Notebook Intro

We'll be using Jupyter Notebooks to teach much of the class.  While there are many ways to write and execute Python code, we've found that Jupyter Notebooks can be a useful tool, both for learning Python and for writing your own data analysis scripts.

The main advantage of Jupyter Notebooks, is that they provide you with an environment in which you can develop python programs in small chunks (called cells).  The cells can easily be run, modified, and re-run, which is very helpful when trying to fix problems in your code, or when just experimenting with Python in general.  

Also, the output of each cell is embedded in the notebook, right underneath the code, for a nice resulting presentation.  Text, and even images and plots, can all be embedded in the notebook, allowing for the easy creation of a nice data-analysis workflow with results and the code that produced the results side-by-side.

### Live Jupyter Notebook Demo

On the overhead, I'm going to walk you some Jupyter Notebook basics.  Follow along on your own laptops.

- Launching Jupyter Notebook (we just did this)
- Creating a new Notebook
- Entering some code, running a cell
- Saving a script and running from the command line

#### Creating a New Notebook
To create a new notebook, click the "New" button in the top-right area of the page after the web-browser launches.  Select "Python 2" from the list.

#### Running a cell
In the first cell, type the following:

```python
print "Hello World!"
```

Then, to run the cell, hold SHIFT and press ENTER.  You'll see the text "Hello World!" below the first cell, and a second, empty cell is created for you below.  You can also create new cells yourself by using the "Insert" menu at the top of the page.

#### Saving a Script
In the second cell, type this:
```python
print "I'm excited to be learning Python this week!"
```

Then, let's give the notebook a better title.  At the top of the page, next to the Jupyter logo, click "Untitled" and change the title to "FirstScript".

Now, go to file, and select "Download As", then select "Python".

This will download a file, FirstScript.py.  Make sure to use "Save As" when your browser asks you what to do with the file.

To run the file, open a new terminal (our original terminal is still busy running Jupyter Notebook), use "cd" to navigate to the Downloads folder, and then run the script using this command:

```bash
$ python FirstScript.py
```

And you'll see the two lines of text output to the terminal.

### Running a Script

Python scripts can be run from the UNIX shell. To do so, open a new terminal (our original terminal is still busy running Jupyter Notebook), use "cd" to navigate to the Downloads folder, and then run the script using this command:

```bash
$ python FirstScript.py
```

Similar to other UNIX commands, python scripts can take additional arguments as inputs. In fact, many scripts require additional input, such as a file or a sequence of nucleotides. If you aren't sure, look at the script itself - hopefully the author has comments on how it works!

Let's take a look at the Greet_User.py script in the 1.1 folder using less.
```bash
$ less Greet_User.py
```



I usually put a commented header on all of my scripts - here it is the lines in between the triple quotes (''') - more on these later! Here I declared the name of the script, and its inputs (it takes a name). It also gives a brief description of the script for easy reference later.

To run the script, I can give it a name (I'll use my own)
```bash
$ python Greet_User.py James
Greetings James!
```



## Summary

Today we learned how to get around in the Unix shell

### List of commands we covered
1. **pwd** - Print Working Directory
2. **cd** - Change Directory
3. **ls** - List files
4. **mkdir** - Make a new directory
5. **cp** - Copy a file
6. **echo** - Just repeat it back (useful when combined with | or >)
7. **mv** - Move (or rename) a file or folder
8. **rm** - Delete a file or folder
8. **less** - Browse the contents of a text file
9. **head** - Peek at the beginning of the file
10. **tail** - Peek at the end of the file
11. **cat** - Paste files together
12. **grep** - Return (or count) lines that match a pattern
13. **cut** - Extract columns from a text file in tabular form
14. **man** - Manual:  Get help for a command
15. **python** - Run a python script

### Special Characters
1. **Wilcard: \***
Placeholder - makes command apply to all files that match.

2. **Pipe: |**
Used to chain commands together

3. **Chevron: > or >>**
Overwrites or Appends output to a file.

### Other useful commands
1. **gzip** - Used for zipping/unzipping files that end in .gz
2. **tar** - Used for bundling/unbundlind archives that in in .tar
3. **find** - Search for files that match a pattern
4. **wget** - Download a file from the internet

## Exercises

### 1) Greetings!

Use the Greet_User.py script to greet 3 different users and save the output to a single file called greetings.txt. This file should contain the output of the 3 different greetings.

### 2) Someone else's code

Look at the two scripts Rev_Comp.py and Get_Fasta_Seq.py using less (or your favorite text editor). Both have a commented section similar to the Greet_User.py script, and have examples of how to run them.

A) Use the Rev_Comp.py script to find the reverse complement of the sequence AATTGGCC. 

B) Use the Get_Fasta_Seq.py to find the sequence of the 3rd chromosome (named chrIII) of the Yeast_Genome.fasta file. Save the result to a file called chrIII.seq

### 3) Good old wc

Look up the man page for the 'wc' command. Use the wc command and the chrIII.seq file to determine the number of nucleotides on the 3rd chromosome.

Can you do this without first creating the chrIII.seq file?

### 4) Bulding a pipeline

Use the Rev_Comp.py and Get_Fasta_Seq.py scripts to generate the reverse complement of the 3rd chromosome (chrIII) of the Yeast_Genome.fasta file in a single line using a pipe. Save the result to a file called chrIII_revcomp.seq. Check your result using wc on both files to determine if they are the same length. 


### 5) Star-Struck

Use wc or ls (or a combination of both or even more) along with appropiate wild-card characters (\*) to do the following within the 1.1 folder. Record the commands used, and the computer should count, not you!

A) List all text (.txt) files

B) List all files starting with the letter 'c'

C) List all files within the Notes folder

D) Count the number of characters in the pythons_of_the_world.txt file

E) Count the number of lines in each text (.txt) file 

F) Count the number of files/folders within the 1.1 folder 

G) Count the number of files starting with the letter 'c'

H) Count the total number of characters in all files within the 1.1 folder (should be a single number)

I) Count the total number of lines in all text (.txt) files within the 1.1 folder (should be a single number)


### 6) Wait, you said this wouldn't be on the test!

Look at the "Other Useful Commands" section of the lecture. 

A) Count number of chromosomes in the whole yeast genome file using commands from the lecture. 

B) Determine the total size of the genome in base-pairs (FASTA header lines don't count!)

C) CHALLENGE. Count the number of times a line in pythons_of_the_world.txt starts with 'Python.' How many times is 'python' the second word in the line? The third?

### 7) Moving beyond the lecture
Use Google and any other references you want to find a command that tells you how much disk space you have left. Use the 'man' command to see how it works. How much space is left on your system? Find the option that makes the command output in terms of gigabytes and megabytes-- 'human-readable' form.
