# Lesson 1: Introduction to Python

Welcome to the Fu Lab Python Mini-course. In the next month, you will learn the basics of python in the context of using it for genetic analyses. This course is going to be presented in jupyter notebook - it is not necessary to also use the notebook, but it is a useful way of organizing code. 

Many of these lessons are directly taken or adapted from [a python bootcamp at UC Berkeley](intro-prog-bioinfo-2015.wikispaces.com) that I helped to teach from 2012-2016. 

## Topics:
- Expectations for the course
- Navigating the UNIX shell
- Viewing the content of files
- How to get help
- Text editors

### Expectations for the Course

The idea behind the course is to create an environment for you to intensively work on developing your python programming skills and give you a crash course in the basics of the language. By the end of the course, you will not be an expert on python, but you should be able to write simple scripts and have the groundwork and general vocabulary to effectively use the internet and/or other resources to teach you fancier tricks. Most importantly, you should be able to ask yourself a question and start envisioning what ways you can write code that will help you arrive at an answer. 

Perhaps a tiresome phrase for you by now, but it is very true: learning to code is like learning a new language. It takes lots of practice, and it continuously feels hard and unintuitive until suddenly, it starts sticking. Just keep trying and asking questions--it will just get easier and easier!

Some potentially useful references include:
- [style guide for Python](http://www.python.org/dev/peps/pep-0008/)
- [Learning Python](http://proquest.safaribooksonline.com/1-56592-464-9)
- [Python Pocket Reference](http://proquest.safaribooksonline.com/9780596802011)
- [Python Website (documentation)](https://docs.python.org/release/2.7.11/)
- [Linux Pocket Guide](http://proquest.safaribooksonline.com/9780596806347) (for Unix)
- [Python Code Visualization](http://www.pythontutor.com/visualize.html#mode=edit)


## Basic Tools Required

Before we touch any python, let's go over some of the main tools you will use as you write in python, as well as another language - Unix. 

The shell is the the way you interact with your operating system. Most of you are used to the graphical user interface (GUI) you see when you turn on a computer - clicking the start button allows you to access a folder, which contains files that you double click to open within the relevant default application. However, there is a command line user interface (CLI) that is very useful as well. Both Linux and Mac operating systems use the CLI **Terminal**, in which we use the Unix language for navigation. In Windows operating system, this CLI is the **Command Prompt**, which uses a different language. If you have a Windows operating system, you should have already installed cygwin, a CLI that allows you to use UNIX in a Windows computer. 

The next vital piece in writing python code is a text editor. Your python code is usually written to a text file ending in ".py", and then you can use the Unix commands to access and run the python script from your shell (Terminal). Sometimes, you may also want to use an interpreter, or a program that allows you to directly execute commands without calling on a script you save to file. Here, we will gain familiarity with the Terminal (or cygwin) shell and the Unix language, and use jupyter notebook to write our code to file and as an interpreter. Jupyter notebook can also be used as a shell, making it a powerful one stop shop for all the main tools you use to begin coding. 



## Navigating the UNIX shell

We will begin by briefly exploring how to use Terminal (if you are using a Windows computer, then opening the Terminal means opening cygwin). Open the Terminal - you should see < username >@< whatever computer >. Type 'pwd'. From here, you get a file path indicating the set of folders leading to your current directory. That is, if you were navigating as you normally would on a your computer, you would have to click on each of these folders, from left to right, to end up in the folder the Terminal is waiting in (your 'working directory'). 

* **pwd**   = print working directory 
* **ls**    = list all files/folders in the directory
* **cd**    = change the working directory (type the filepath to the requested directory)
* **echo**  = print to screen whatever string follows the command
* **mkdir** = make a new directory
* **rmdir** = remove a directory (can only be done if the directory is empty)
* **cat**   = print all contents of a file to screen
* **rm**    = remove a file
* **cp**    = copy a file (to a new filename and/or directory)
* **mv**    = move a file (rename file or move to a different directory)
* **head**  = print the first few lines of file to Terminal (default = 10)
* **tail**  = print the last few lines of file to Terminal (default = 10)
* **grep**  = search for an expression and print line(s) containing that expression
* **cut**   = print certain columns of the file (based on delimiter or by character)
* **less**  = opens a file within the Terminal screen as a text file (use q to exit)
* **man**   = gives manual for UNIX command you include after 'man'


Try some of the following. 

```bash
pwd
cd /< file path to your Documents directory >/
mkdir Python_MiniCourse2017/
cd Python_MiniCourse2017/
echo hello
echo hello > hello.txt     ## '>' indicates rather than print to screen, print to the specified file. 
cat hello.txt
echo "how are you" > hello.txt  ## What happened?
cat hello.txt
echo "I am excited to learn Unix commands!" >> hello.txt ##What happened here?
cat hello.txt

rmdir Python_MiniCourse2017/  #What error did we get? 
cd ..
rmdir Python_MiniCourse2017/  #Now what error did we get?
cd Python_MiniCourse2017/
ls
mkdir testdir/
ls testdir/
echo "Let's fill the test directory!" > testdir/testfile.txt
ls testdir/
cat testdir/testfile.txt
rmdir testdir/
rm testdir/testfile.txt
rmdir testdir/
ls

cp hello.txt hello_copy.txt
mkdir Lesson1
mv hello.txt Lesson1/
cat hello.txt
cat Lesson1/hello.txt
cp hello_copy.txt Lesson1/hello_copy2.txt
cp hello_copy.txt Lesson1/hello_copy2.txt
ls Lesson1/
```

#### Important: Be very careful with rm and rmdir and the '>' sign. You can easily delete an important document, and these deleted documents are not saved in your Trash or Recycle Bin! 

Let's download this file: [Pythons of the World](https://intro-prog-bioinfo-2015.wikispaces.com/file/view/pythons_of_the_world.txt/556090701/pythons_of_the_world.txt). Make a new directory in Python_MiniCourse2017/ called "resources" and move the Pythons of the World text file to the resources folder under the new name "worldpythons.txt". While you can do this using the GUI, I encourage you to do this using only Unix commands. 

### Reading through files

Earlier, we used cat to quickly look at the contents of our files. 'cat' stands for concatenate, so it's actual function it to put the contents of files together and then print it to screen. For instance, if you were to write two filenames after 'cat', you would get both files printed to screen in order by filenames.

```bash
cd ~/Documents/Python_MiniCourse2017/Lesson1/
cat ../resources/worldpythons.txt hello.txt    ##The ../ is a quickhand way of saying 'look backwards one directory'. 
cd ../resources/
```

A problem with looking at files through cat is that it is difficult to view large files printed to Terminal. 'head' and 'tail' allow you to look at the head or tail of the file, with a default of 10 lines. 

```bash
head worldpythons.txt
tail worldpythons.txt
```

However, sometimes you might want to look at the whole file. Another command you can use is 'less', which opens the file within the Terminal, allowing you to look through the document one screen at a time. It opens from the top of the file, and you use arrow keys to scroll up and down. The space bar allows you to move a screen's worth of text, allowing faster scrolling. The default is word wrapping, but you can use '-S' to chop the long lines (thus you would need left and right arrow keys to view the long sentences). Typing /word will search for 'word' in the file and highlight wherever it appears. Type 'q' to exit.

```bash
less worldpythons.txt
```

We can also search these files using grep. 

```bash
grep Aspidites worldpythons.txt
```

### Extra options for above commands

Above, we went through some of the basic functions of some UNIX commands. However, there are many sub-options for each of these commands that you may find useful. Here, we will illustrate a few commonly used ones and run through how to get more information on these options using 'man', a command that allows you to read the manual for the UNIX command in question.

```bash
ls -lrth
man ls
```

Here, the '-l' option uses the long listing format. That is, more detail is provided. '-t' sorts your files by time, from most recently edited to the oldest edited. Since the filenames are printed to screen, we normally read the end of the printed string--thus, we reverse the order using '-r', printing from the oldest edited to the most recently edited. Lastly, the '-l' option also includes the file sizes, but they are just very large numbers. Adding '-h' makes this human-readable. Here, that means it switches these size output so that they are based on K (kilobytes), M (megabytes), G (gigabytes) and T (terabytes) for easy reading. 

Other useful commands include:

```bash
head -n 1 worldpythons.txt
tail -n 1 worldpythons.txt
grep -n Aspidites worldpythons.txt
```

cut is a command allowing you to selectively pull out columns, separating by a delimiter. For instance, 



```bash
cut -c2-4 hello.py
cut -f' ' -c2-4
```








## Special characters

### wildcard matching with the *

The star functions as a "wild-card" character that matches any number of characters.

```bash
ls
ls *txt
```

The star can go anywhere in a list of arguments you're supplying, even in the middle of words! There are [other wildcards you can use](https://en.wikibooks.org/wiki/A_Quick_Introduction_to_Unix/Wildcards) but * is the most common.


### pipe |
>(the one above the backslash "\" key)

Piping with | connects UNIX commands, allowing the output of one command to "flow through the pipe" to another. This lets you chain programs together, such that each one only needs to worry about one step of the process (either generating, filtering, or modifying data), without knowing or caring where it came from or where it's going to.

```bash
env
```

The 'env' command returns a list of environment variables.  We won't go over what they mean here, but rather we're using this to demonstrate that some commands might return too much text to usefully view in the terminal.

However, you can use the | character to direct the output to another program to show just portion.  
For example, the head command to show just the first few lines:


```bash
env | head
```

Or, so you were just trying to find the value of the HOME variable.  You could use grep to isolate just this portion.

```bash
env | grep HOME
```

## Permissions

Unlike the computers you are used to, UNIX doesn't automatically know what to do with files (e.g. It won't know to use Word to open a .doc document), and it doesn't even know whether a file is data or a program (and as we'll see with the programs we write, it might be different things at different times)

The first thing that controls a file is the file's permissions. You can control who can read, write, and execute (run as a program) each of your files. This command lists the permissions:

```bash
ls -la
```

The first letter tells you whether it is a directory.

The next set of letters tell you if a file is readable (r), writable (w), or executable (x).

The 2nd-4th letters tell you what *your* permissions are, 5th-7th tell you what your group's permissions are, and the last three tell you what everyone else's permissions are. Unix was designed to be a multi-user operating system, so even if you're the only one who uses the computer, it maintains the distinction for you, versus your group, versus everyone else.

### chmod - Modify permissions.
chmod [flags] [filename]

```bash
echo 'script' > script.py
ls -l script.py
chmod +x script.py
ls -l script.py
```

If you try running a program and it's not working at some point in the class, double check the permissions!!!

### Other useful commands

I will not explain these now, but in the exercises for today's lesson, you will explore these UNIX commands and test them out. For now we will move on to the Jupyter notebook and our first python commands!
1. **gzip** - Used for zipping/unzipping files that end in .gz
2. **tar** - Used for bundling/unbundlind archives that in in .tar
3. **find** - Search for files that match a pattern
4. **wget** - Download a file from the internet

## Jupyter notebook basics

[Jupter notebook](http://jupyter-notebook.readthedocs.io/en/latest/) was previously known as IPython notebook, and it is the main tool we will be using to write and execute our scripts. Not only is it useful for developing our scripts, but it is also a great organizer and actual notebook. All of the lectures, including this one, are provided in a notebook, so you will get many examples of its utility. While we do not explore all of the functions of the notebook, we highlight some of the basics here so you can navigate and write in these notebooks easily. 

* The easiest way of accessing the notebook is opening a terminal and typing in "jupyter notebook". Currently, this might not be working for Windows users - another way is to use the Anaconda Navigator and click on the "Jupyter Notebook" in your start screen. This will do the same. 
* What opens is a web-based user interface with the files in the directory you opened in terminal (or your home directory when opening through Anaconda Navigator). Going to the top right and clicking "New" followed by "Python 2" under 'Notebook' will open your first jupyter notebook (ends in .ipynb) like the one we observe here. 

    Note that while we open and use the notebook in a web browser, the notebook is still running from that Terminal screen. Exiting the Terminal screen will close the notebook, so minimize this Terminal screen and open a new one if you want to further use Terminal. 

```bash
jupyter notebook
```

* Each time you open a notebook, you are opening a kernel. Sometimes you may need to restart or reconnect to a kernel (look under the Kernel option). Other times you may want to interrupt the kernel, stopping whatever action is being undertaken. I use these most when I accidentally run something too large for my notebook to handle - interrupting stops the kernel but saves all your previous actions, while restart will stop the kernel but also will not save any of your previous actions within the kernel. 

* Each of these squares where we can write text is called a cell.By default each cell is a python interpreter. Going to "Cell"->"Cell Type"->"Markdown" turns it into a text displayer, allowing you to write just plain notes, as this cell current does. I will not be teaching all the different options for Markdown, but [this page](http://nestacms.com/docs/creating-content/markdown-cheat-sheet) provides some of the formatting you might be interested in.

* The top bar includes many different options. While the notebook usually automatically saves periodically, you have a 'save option', followed by options to insert new cells, move cells around, and run the cells. Holding Shift+Enter will also run the active cell. You can also switch the cell format in the dropdown box. 

* Magic commands: One useful aspect of the notebook are [magic commands](https://ipython.org/ipython-doc/3/interactive/magics.html). We won't delve into magic commands here, but there are a few I use often, particularly %%bash and %%writefile

In [None]:
%%bash
ls
## This allows you to treat the cell as a Terminal screen, and you can use UNIX commands

In [None]:
%%writefile ~/Documents/PythonBootcamp2017/Lesson1/file_writefromnotebook.txt
I am writing this file from within the notebook. 

I can write whatever I want and it will be written to a text file in the specified folder. 

If I don't write a filepath, this text file will be written into the home directory of the notebook.

* Last but not least, the default is that the cell treats what you type as if you wrote Python commands. For this final section of the day, we will begin to start writing in python!

## Writing in Python - you're ready to begin!

This entire lesson, you've learned about the Terminal, writing with the UNIX language, and gotten a brief intro into the Ipython notebook. However, you haven't started working with python yet! Here, we will begin with the simple command:

In [None]:
print "hello world"

As you can see, when I typed
```python
print "hello world"
```
and pressed Shift+Enter, the notebook processed what I wrote in the cell and output it below the cell. Here, I used the function 
```python
    print
```
to call the string "hello world". "print" is similar to echo in UNIX - it tells you to print to screen whatever comes after the command.

However, you can also run these python scripts outside of the notebook. Below, we write 
```python
print "hello world"
print
```
into a text file using the magic command %%writefile and then run it through the bash/Terminal. Typically, python scripts are saved with the subscript ".py" to indicate it is a python script. 

In [None]:
%%writefile ~/Desktop/PythonBootcamp2017/Lesson1/myfirstscript.py
print "hello world"
print

In [None]:
%%bash

python ~/Desktop/PythonBootcamp2017/Lesson1/myfirstscript.py
##You can also do this directly in Terminal

## Variables: integers, floating-point numbers (decimal), strings (text)

Computer programming is useful because it allows the programmer to tell the computer to perform operations that are too boring, tedious, or difficult for the programmer to do by hand. A useful computer program needs to be able to interact with the user, perform operations on changing sets of data, and make decisions about how to proceed based on conditions specific to each instance of its execution. To achieve these tasks, computer programs employ variables.

Variables in computer science are different from algebraic variables, just like algebraic variables are different from statistical variables or experimental variables. In Python, a variable is a datum with a human-readable name which is assigned a given value. Variables can be reassigned to different values as the logic of the program dictates (variables have variable values, hence variables). 

In Python, everything and every type of data, from numbers and text to vectors and functions, are called **objects**, and objects are stored in memory. Technically, the values of variables in Python are the memory address of these objects.  Variables point (reference) at data (objects). It is often easier to think of variables as 'storing' values, though some situtations will require understanding of the more technical memory-based definition.

Python programs use variables to store parameters taken in from the user, the execution environment, or the data your program is being called upon to process.

These variables are named whatever you like, within the strictures of a few basic rules:

1. Python variable names are case-sensitive, so Var and var are different variables.
2. Though variable names can contain letters, numbers and underscores ( _ ), they MUST start with a letter (a-z).
3. Variable names, CANNOT contain spaces or special non-alphanumeric characters (e.g. holyS#+%? is naughty, but holyMackerel is kid tested, mother approved), nor can they be any of the following words that already have special meaning in python:

In [None]:
    and    assert   break    class      continue   def      del      elif
    else   except   exec     finally    for        from     global   if
    import in       is       lambda     not        or       pass     print
    raise  return   try      while      yield

For the most part, ipython will remind you that these words are off-limits by coloring these words in helpful ways when you type them.


Here are some invalid python variable names:

**1sample
sampleA.1
class**

And here are some good alternatives:

**sample_1
SampleA1
bootcamp_class**

## Variables, Objects, and Types

Variables can reference (store) many different types of objects. Today we'll talk about three types of objects: integers, floating point (i.e. decimal) numbers, and strings.

Run the following example, through which we'll explore a few properties of variables:

In [None]:
# by the way, lines starting with the pound sign (#)
# makes them comments, ignored by the interpreter
 
s = 'hella world'
i = 42
f = 3.14159
print s
print 'the variable s is type',type(s)
 
print i
print 'the variable i is type',type(i)
 
print f
print 'the variable f is type',type(f)


 In general, variables are assigned by typing the name you want to use, followed by a single equals sign, then the value you'd like to store. This is the same whether the variable you're assigning is an object of type str (a character string), int (whole number), float (non-integer real number), or any number of other fancier things you'll be using in the next two weeks.
 
 
While (as your program tells you with the handy type() function) i is currently an integer, that doesn't mean it cannot change. You can easily reassign i to be anything that takes your fancy, including the value of another variable. You would do this with a statement such as the following:

In [None]:
i = s
 
print i
print 'the variable i is now type',type(i)

There are plenty of cases where this is exactly what you want to do, but bear in mind that once a variable is re-assigned to a new value, the old value is lost forever.

As an example, consider the case where (for some reason) you want to swap the values of two variables s and i. The first step might appear to be a line very much like the i = s statement above, but if you do this, the value of i is lost forever, meaning you can never assign it to s. This may seem like a rather abstract problem, (unless you've read ahead to today's exercises) but you'll encounter similar situations more often than you might think.

## Numerical operations

Numerical values can be subjected to a wide variety of operations. While the full list is quite extensive (see [this link](http://docs.python.org/lib/typesnumeric.html for the full workup)), the most common operations should be familiar. For the most part, we use basic arithmetic operators exactly as you're used to seeing them.


Note that standard mathematical order of operations applies, but it's far easier ... and safer ... to explicitly order compound operations using parentheses.

In [None]:
i = 42
f = 3.14159
 
# addition uses the plus sign (+)
sum = i + f
# subtraction uses the minus sign (-)
diff = i - f
# multiplication uses the asterisk (*)
prod = i * f
# division uses the slash (/)
quo = i / f
# and exponents use a double-asterisk (**)
pow = i ** f
 
print 'sum',sum
print 'diff',diff
print 'prod',prod
print 'quo',quo
print 'pow',pow
 
x = 5
print "x = ", x
x = x + 1
print "now x is one more than before = ", x
x += 1
print "now x is one more than before = ", x

Before we end, here are a few more functions related to thinking about the type of the variable you are using. Perhaps you have an integer but you'd rather treat it as a floating number (with decimals), or turn it into a string. You can use coercion functions such as:
```python
int()
float()
str()
``` 
to turn an object into an integer, floating number, or string, respectively.

In [2]:
x_int = 5
y_flt = 10.2
z_str ='4'

x_str = str(x_int)
x_flt = float(x_int)

y_int = int(y_flt)
y_str = str(y_flt)

z_int = int(z_str)
z_flt = float(z_str)

print x_int, x_str, x_flt

#result = x_int+y_flt+z_int; print result, type(result)  # What happens when you add floats and integers?
#result = x_int+y_int+z_int; print result, type(result)  # What happens when you add all integers?
#result = x_int+y_flt+z_str; print result, type(result)  #What happens when you add a string to numbers?
#result = x_str+y_str+z_str; print result, type(result)  # What happens when you add all strings?


5 5 5.0


We will end today's lesson here. Take a look at the exercises for Lesson 1 in the exercises folder.