NIA Intro to Python Class - May 15, 2017

# Day 1

## About the instructor
* Chris Coletta, Computer Scientist
* Human Genetics Section (Schlessinger Lab), LGG
* christopher.coletta@nih.gov
* x8170
* Room 10C222
* [LinkedIn](https://www.linkedin.com/in/chriscoletta/), [personal webpage](http://www.chriscoletta.com), [twitter](https://twitter.com/blahblahetcetc)

## Course format
* Bootcamp style - no prior programming knowledge assumed
* 6 total hours of instruction
* No homework
* Goal of this course: Spreadsheet Manipulation
    * Read in and Excel file
    * Do some transformations on the data
    * Visualize the data
* Roadmap
    * Day 1: Background; the IDE; basic syntax & data types
    * Day 2: Iterable data types; [Operators](https://en.wikipedia.org/wiki/Operator_%28computer_programming%29)
    * Day 3: Controling the [flow](https://en.wikipedia.org/wiki/Control_flow) of your program
    * Day 4: Data manipulation/visualization

## Python fast facts

* General-purpose programming language
* [Open-source software](https://en.wikipedia.org/wiki/Open-source_software)
* Free
* Started in 1989 by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum)
* Emphasizes code readability => Lower barrier to entry than other programming languages

## Help Learning Python

* [Python for Scientists and Engineers](http://pythonforengineers.com/python-for-scientists-and-engineers/) - Free Book by Shantnu Tiwari
* Google search!
* Use Python <code>help()</code> command

## Ecosystem of Python Data Analysis Software

[Anaconda](https://www.continuum.io/downloads) is one of many Python "distributions" that bundles the following three types of software:

### "Core" Python
* The Python interpreter - understands the syntax of the [Python](python.org) language
* [Python Standard Library](https://docs.python.org/3/library/)
    * Built-in tools, mathmatical functions, algorithms
    * Organized into sub-units called "packages" that you <code>import</code>
   
### Third-party packages
* Hundreds of them. My favorites:
    * [NumPy](http://www.numpy.org/) - Linear algebra/matrices
    * [SciPy](https://docs.scipy.org/doc/scipy/reference/) - Statistics + math
    * [statsmodels](http://www.statsmodels.org/stable/index.html) - Linear models/regression
    * [matplotlib](https://matplotlib.org/) - Makes plots/figures
    * [Seaborn](https://seaborn.pydata.org/) - Really nice plots/figures
    * [Pandas](http://pandas.pydata.org/) - Spreadsheet replacement/data manipulation
    * [Scikit-learn](http://scikit-learn.org/) - Machine Learning
    * [Scikit-image](http://scikit-image.org/) - Image processing
    * [Biopython](http://biopython.org/wiki/Biopython) - Bioinformatics
    * [WND-CHARM](https://github.com/wnd-charm/wnd-charm) - NIA in-house image analysis/machine learning
    
### IDE
* [Jupyter Notebook](http://jupyter.org/) - Creates sharable documents containing live code, equations, visualizations and explanatory text.
* [Spyder](https://pythonhosted.org/spyder/) - "Scientific PYthon Development EnviRonment"

## IDE Concepts
* [Integrated Development Environment](https://en.wikipedia.org/wiki/Integrated_development_environment) - The software app you use to build and test your code
* Compare and contrast how the user interfaces with Python and Excel
    * Excel: Little cubby holes that you can shove data into
    * Python: Give it a command to enter data
    * Excel: You're the customer in the restaurant: All possible operations listed in the MENU
    * Python: You're the chef in the restaurant: Write your own program by following recipes/[cookbooks](http://chimera.labs.oreilly.com/books/1230000000393)
    * Excel: Don't really talk to other files
    * Python: Input/output to other files is fundamental
    * Excel: Sandbox: input and output to the same place
    * Python & Jupyter Notebook: Clear workflow, like a cooking recipe or driving directions. Good for reproducible science.   
* Example notebooks [here](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) and [here](http://nb.bianp.net/sort/views/)
* Jupyter components
    * Do coding inside web browser
    * Browser communicates with a "kernel" (on local machine or in the cloud)
    * <code>nbconvert</code> to save notebook into a .py, HTML, PDF, LaTeX, etc

## Exploring the Jupyter IDE
* Do the user interface tour

### Cell types
#### Markdown cells 
* [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) Document-formatting style that is easly convertable to HTML
* Headings preceeded by #
* unordered lists preceeded by a \*
* ordered lists preceeded by a number
* Math equations go in between two \$, example: $t=\frac{\hat{\beta}-\beta_{{H}_0}}{s.e.(\hat{\beta})}$
* Create links like [this](https://www.nia.nih.gov/)
* insert images from the web like this: ![nia logo](https://calerie.duke.edu/sites/calerie.duke.edu/themes/calerie/images/logo-nia@2x.png)
    
#### Code Cells
* commands go in here
* tab-completion

### Interacting with cells
#### Command mode
* Press Esc - box turns blue
* Useful shortcuts:
    * b = Insert cell below
    * a = inser cell above
    * dd = Delete cell
    * Shift + up or down = select/highlight two or more cells
    * M = merge highlightes cells into one

#### Edit mode
* Double click to edit - box turns green
* Useful shortcuts
    * Ctrl + Shift + - = split cell at cursor location
    * Enter = gives you a new line inside the same cell
    * Shift + Enter = Runs the code in this cell and go to the next one
    * Ctrl + Enter = Runs the code in this cell and stay on this one

## Linux-style file system commands

Here are some useful Linux file commands that Jupyter notebook understands:

<code>pwd</code> - "Present working directory"

In [None]:
pwd

<code>ls</code> - "list files in present working directory"

In [None]:
ls

<code>mkdir</code> - "make a new folder"

In [None]:
mkdir NewFolder

In [None]:
ls

<code>cd</code> - "Change Directory of present working directory to another folder"

In [None]:
cd NewFolder

<code>cd ..</code> - "Go up one folder"

In [None]:
cd ..

<code>cd ~</code> "Go home! (tilde character)"

In [None]:
cd ~

<code>rmdir</code> FolderName "Delete this directory"

In [None]:
rmdir NewFolder

Other commands include:
* <code>cp</code> which copies a file from one place to another
* <code>mv</code> which moves a file from one place to another, or changes the name of a file.
* more...

## First steps with Python syntax

FYI: Python is a case sensitive language, so <code>True</code> is not the same as <code>true</code>.

### Statement

Your Python code is broken up into statements. One statement per line, or separate with semi-colon

### Comments
Lines preceededed by a hash symbol "#" are ignored by the Python interpreter

In [None]:
# Run me! nothing happens!!!

### Assignment

* An assignment is the name on the left side of an equal sign.
* It gives a name to a value.
* Names can have upper and lowercase letters, numbers (as long as it's not the first character), as well as underscores (Shift + -).
* Don't use a name that is also a [Python Syntax keyword](https://docs.python.org/3/reference/lexical_analysis.html#keywords)

In [None]:
my_fav_number = 42

In [None]:
f00 = "asdfasdf"

See the value attached to the name by typing the name

In [None]:
my_fav_number

### Print function
Use the <code>print</code> function to see multiple values at once

In [None]:
print( my_fav_number, f00)

### Code-completion

Hit the TAB key to use code completion to help you type faster. 

In [None]:
my_

### Scalar Data types

#### Integer
<code>int</code> a counting number 1,2,3,....

In [None]:
1

BTW: You can use the Python <code>type()</code> command to have Python tell you the type of any named value.

In [None]:
type( my_fav_number )

#### Float

<code>float</code>s are decimal numbers

#### PEMDAS operators

1. Parentheses - <code>()</code>
2. Exponent - <code>**</code>
3. Multiplication - <code>*</code>
4. Division - <code>/</code>
5. Addition - <code>+</code>
6. Subtraction - <code>-</code>

### Boolean Expressions

A <code>bool</code> can only have a value of <code>True</code> or <code>False</code>.

In [None]:
True

### <code>and</code>, <code>or</code>, and <code>not</code>

<code>and</code> and <code>or</code> are "binary operators", meaning you slap them in between two truth values to make one value. <code>not</code> is a unary operator that negates the value after it.

In [None]:
True and False

In [None]:
True or False

In [None]:
my_bool_value = True and False
print( my_bool_value )

In [None]:
not True

## Keeping track of your named values

In [None]:
whos

In [None]:
whos

In [None]:
?whos

In [None]:
_

In [None]:
56

In [None]:
_

### Strings

* A <code>string</code> is a data type that contains one or more characters
* Strings are surrounded by matching single or double quotes
* You choose whether to use single or double quotes based on what's in the string.

In [None]:
"Hello, world!"

In [None]:
'Hello, world!'

### Escape characters 
* Escape characters use a backslash followed by a single letter
    * '\n' - a newline character
    * '\t' - a tab character
    * '\\' - a backslash character (escape the escape character)
    * "\"" - a single quote (but why wouldn't you just use '"'?

In [None]:
some_escape_chars = 'line one\nline two'
print( some_escape_chars )

In [None]:
# triple double quotes capture the newlines
no_need_for_escape_chars = """ hello
what's up? nuttin'
"whatchu say to me?"
word.
here are some backslashes: \ \\ \\\ \\\\"""
print (no_need_for_escape_chars)

### String Operations

#### Repeat

In [None]:
n = "hello"
echo = n * 5 # Repeat the string 5 times
echo

#### Concatenate

In [None]:
# Concatenate with +
boast = "I am the very model of a "
occupation = "modern major general."
a_string = boast + occupation
a_string

In [None]:
# To concatenate strings, everything must already be a string
# See the problem here?
profound_statement = "The answer to life, the universe, and everything is " + 42

In [None]:
# Try converting the value to a string
profound_statement = "The answer to life, the universe, and everything is " + str(42)
profound_statement

#### Slicing strings into substrings

In [None]:
# Return a substring using a brackets separated by a colon
profound_statement[3:22]

In [None]:
# Just because you just returned a substring from a string
# doesn't mean you changed the original string. 
profound_statement

In [None]:
# [begin index:end index:step]
profound_statement[3:32:3] # take every 3rd letter

In [None]:
# Leave the index blank to default to the beginning or end of the string
profound_statement[:25]

In [None]:
profound_statement[25:]

In [None]:
# If i is negative, index is relative to end of string
profound_statement[-25:]

In [None]:
# Reverse a string by using a negative step value
"a man, a plan, a canal, panama"[::-1]