NIA Intro to Python Class - July 17, 2017

# Day 1 - Introduction to Python

## About the instructor
* Chris Coletta, Computer Scientist
* Human Genetics Section (Schlessinger Lab), LGG
* christopher.coletta@nih.gov
* x8170
* Room 10C222
* [LinkedIn](https://www.linkedin.com/in/chriscoletta/), [personal webpage](http://www.chriscoletta.com)

## Course format
* Bootcamp style - no prior programming knowledge assumed
* 6 total hours of instruction
* No homework
* Goal of this course: Spreadsheet Manipulation
    * Read in and Excel file
    * Do some transformations on the data
    * Visualize the data
* Roadmap
    * Day 1: Background; the IDE; basic syntax; data types; basic operators
    * Day 2: Complex data types; reading in data; slicing and sorting
    * Day 3: Statistics
    * Day 4: Visualization

## Python fast facts!

* General-purpose programming language
* [Open-source software](https://en.wikipedia.org/wiki/Open-source_software)
* Free
* Started in 1989 by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum)
* Emphasizes code readability => Lower barrier to entry than other programming languages
* Case-sensitive
<!--* Counts from zero! (like C/C++, unlike MATLAB) -->

## Help Learning Python

* [Python for Scientists and Engineers](http://pythonforengineers.com/python-for-scientists-and-engineers/) - Free Book by Shantnu Tiwari
* Google the error message
    * Questions and answers on StackOverflow.com
* Use Jupyter built-in operator <code>?</code>
* Use Python <code>help()</code> command to see documentation

## Ecosystem of Python Data Analysis Software

[Anaconda](https://www.continuum.io/downloads) is one of many Python "distributions" that bundles the following three types of software:

### "Core" Python
* The Python interpreter - understands the syntax of the [Python](python.org) language
* [Python Standard Library](https://docs.python.org/3/library/)
    * Built-in tools, mathmatical functions, algorithms
    * Organized into sub-units called "packages" that you <code>import</code>
   
### Third-party packages
* There are hundreds of them. My favorites:
    * [NumPy](http://www.numpy.org/) - Linear algebra/matrices
    * [SciPy](https://docs.scipy.org/doc/scipy/reference/) - Statistics + math
    * [statsmodels](http://www.statsmodels.org/stable/index.html) - Linear models/regression
    * [matplotlib](https://matplotlib.org/) - Makes plots/figures
    * [Seaborn](https://seaborn.pydata.org/) - Really nice plots/figures
    * [Pandas](http://pandas.pydata.org/) - Spreadsheet replacement/data manipulation
    * [Scikit-learn](http://scikit-learn.org/) - Machine Learning
    * [Scikit-image](http://scikit-image.org/) - Image processing
    * [Biopython](http://biopython.org/wiki/Biopython) - Bioinformatics
    * [WND-CHARM](https://github.com/wnd-charm/wnd-charm) - NIA in-house image analysis/machine learning
    
### IDEs
* [Jupyter Notebook](http://jupyter.org/) - Creates sharable documents containing live code, equations, visualizations and explanatory text.
* [Spyder](https://pythonhosted.org/spyder/) - "Scientific PYthon Development EnviRonment"

## IDE Concepts
* [Integrated Development Environment](https://en.wikipedia.org/wiki/Integrated_development_environment) - The software app you use to build and test your code
* Example notebooks [here](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) and [here](http://nb.bianp.net/sort/views/)
* Compare and contrast how the user interfaces with Python and Excel
    * Excel: Little cubby holes that you can shove data into
    * Python: Give it a command to enter data
    * Excel: You're the customer in the restaurant: All possible operations listed in the MENU
    * Python: You're the chef in the restaurant: Write your own program by following recipes/[cookbooks](http://chimera.labs.oreilly.com/books/1230000000393)
    * Excel: Don't really talk to other files
    * Python: Input/output to other files is fundamental
    * Excel: Sandbox: input and output to the same place
    * Python & Jupyter Notebook: Clear workflow, like a cooking recipe or driving directions. Good for reproducible science. 
* Jupyter components
    * Do coding inside web browser
    * Browser communicates with a "kernel" (on local machine or in the cloud)
    * Can optionally "Download as..." notebook into a .py, HTML, PDF, LaTeX, etc

## Exploring the Jupyter IDE
* Do the user interface tour

### Cell types
#### Markdown cells 
* [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) Document-formatting style that is easly convertable to HTML
* Headings preceeded by #
* unordered lists preceeded by a \*
* ordered lists preceeded by a number
* Math equations go in between two \$, example: $t=\frac{\hat{\beta}-\beta_{{H}_0}}{s.e.(\hat{\beta})}$
* Create links like [this](https://www.nia.nih.gov/)

#### Code Cells
* commands go in here
* tab-completion

### Interacting with cells
#### Command mode
* Press Esc - box turns blue
* Useful shortcuts:
    * b = Insert cell below
    * a = inser cell above
    * dd = Delete cell
    * Shift + up or down = select/highlight two or more cells
    * M = merge highlightes cells into one

#### Edit mode
* Double click to edit - box turns green
* Useful shortcuts
    * Ctrl + Shift + - = split cell at cursor location
    * Enter = gives you a new line inside the same cell
    * Shift + Enter = Runs the code in this cell and go to the next one
    * Ctrl + Enter = Runs the code in this cell and stay on this one

## Linux-style file system commands

Here are some useful Linux file commands that Jupyter notebook understands:

### <code>pwd</code>

<code>pwd</code> stands for "Present working directory." Tell me whare I am on the filesystem right now.

In [1]:
pwd

'/Users/colettace/courses/July2017_NIA_Python_Course'

### <code>ls</code>

<code>ls</code> will list files in present working directory.

In [2]:
ls

NIAPythonDay1_July2017.ipynb  [31mNIAPythonDay4_July2017.ipynb[m[m*
NIAPythonDay1_July2017.pdf    [34mimages[m[m/
[31mNIAPythonDay2_July2017.ipynb[m[m* samplefile.xlsx
[31mNIAPythonDay3_July2017.ipynb[m[m*


### <code>mkdir</code>

<code>mkdir</code>- "make a new folder"

In [3]:
mkdir NewFolder

In [4]:
ls

NIAPythonDay1_July2017.ipynb  [31mNIAPythonDay4_July2017.ipynb[m[m*
NIAPythonDay1_July2017.pdf    [34mNewFolder[m[m/
[31mNIAPythonDay2_July2017.ipynb[m[m* [34mimages[m[m/
[31mNIAPythonDay3_July2017.ipynb[m[m* samplefile.xlsx


### <code>cd</code>

<code>cd foldername</code> - Change Directory of present working directory to <code>foldername</code>

In [5]:
cd NewFolder

/Users/colettace/courses/July2017_NIA_Python_Course/NewFolder


### <code>cd ..</code>

<code>cd ..</code> means make the current working directory one folder up.

In [6]:
cd ..

/Users/colettace/courses/July2017_NIA_Python_Course


### <code>cd ~</code>

<code>cd ~</code> (tilde character) means make the current working directory your home folder.

In [7]:
cd ~

/Users/colettace


### <code>rmdir</code>

<code>rmdir foldername</code> means delete the folder in the current working directory named <code>foldername</code>.

In [8]:
cd /Users/colettace/courses/July2017_NIA_Python_Course

/Users/colettace/courses/July2017_NIA_Python_Course


In [9]:
rmdir NewFolder

### Some other commands

* <code>cp</code> which copies a file from one place to another
* <code>mv</code> which moves a file from one place to another, or changes the name of a file.
* more...

## The Python Statement

* Your Python code is broken up into statements
* One statement per line, except:
    * You can put two statements on one line if they are separated by a semi-colon ;
    * You can break up one statement over multiple lines using a backslash, which is called the "continuation" character.
* Can have multiple statements inside a code cell

## Comments
Lines preceededed by a hash symbol "#" are ignored by the Python interpreter

In [10]:
# Run me! nothing happens!!!

## Assignment, i.e., give a value a name

* An assignment is the name on the left side of an equal sign.
* It gives a name to a value.
* Names can have upper and lowercase letters, numbers (as long as it's not the first character), as well as underscores (Shift + -).
* Don't use a name that is also a [Python Syntax keyword](https://docs.python.org/3/reference/lexical_analysis.html#keywords)

In [11]:
my_fav_number = 42

In [12]:
f00 = "asdfasdf"

See the value attached to the name by typing the name

In [13]:
my_fav_number

42

## <code>print()</code> function
Use the <code>print</code> function to output one or more values at once.

In [14]:
print( my_fav_number, f00)

42 asdfasdf


## Code-completion using <code>TAB</code> key

Hit the TAB key to use code completion to help you type faster. Most IDEs have this option. Usually a pop-up menu will appear

In [15]:
my_fav_number

42

## Python Data Types: what are they, and why do we care?

* Different types of data, different data types
* Each type has their own various "superpowers," i.e., functionality.
* Advanced programmers often define their own types with their own functionality
* Here, "simple" means that these are types that are built into core Python, and you can use them right away.
* "Fancy" means simply that you need to use the <code>import</code> command before you use them.

### What the difference between "scalar" and "iterable"?
* You can't loop over a scalar. 

### Scalar Data Types (simple)
* integer <code>int</code>: counting numbers
* float <code>float</code>: decimal numbers
* boolean <code>bool</code>: true/false

### Iterable Data Types (simple)
* string <code>str</code>: words
* list <code>list</code>: collection of things (ordered)
* dictionary <code>dict</code>: map one value to another (unordered)
* set <code>set</code>: unique collection of things (unordered)

### Iterable Data Types (fancy)
* NumPy multi-dimensional <code>array</code>: data, images
* Pandas <code>DataFrame</code>: spreadsheet analog

And many more...

## Scalar Data Types: Integer (<code>int</code>)

* A counting number 1, 2, 3, -89 ..., 0

In [16]:
-23

-23

## Scalar Data Types: Float (<code>float</code>)

* Decimal numbers
* An accurate approximation to many many decimal places, but technically not an EXACT representation
* If you want to know more about why decimal numbers are called "floats", click [here](https://en.wikipedia.org/wiki/Floating-point_arithmetic).

In [17]:
3.14159

3.14159

In [18]:
1/3

0.3333333333333333

## PEMDAS operators

1. Parentheses - <code>()</code>
2. Exponent - <code>**</code>
3. Multiplication - <code>*</code>
4. Division - <code>/</code>
5. Addition - <code>+</code>
6. Subtraction - <code>-</code>

Example: What is $9-3\div\frac{1}{3}+1=?$

In [19]:
9 - 3 / 1/3 + 1

9.0

In [20]:
9 - 3 / (1/3) + 1

1.0

## Using the <code>type()</code> function

Use this to have Python tell you the data type of any expression or named value.

In [21]:
type( my_fav_number )

int

In [22]:
type( 3.14159 )

float

## Scalar Data Types: Boolean (<code>bool</code>)

Bools can only have a value of <code>True</code> or <code>False</code>.

In [23]:
True

True

## Boolean operators <code>and</code>, <code>or</code>, and <code>not</code>

<code>and</code> and <code>or</code> are "binary operators", meaning you slap them in between two truth values to make one value.

In [24]:
True and False

False

In [25]:
True or False

True

In [26]:
my_bool_value = True and False
print( my_bool_value )

False


<code>not</code> is a unary operator that negates the value after it.

In [27]:
not True

False

## Some math operators

* <code><</code> less than
* <code><=</code> less than or equal to
* <code>></code> greater than
* <code>>=</code> greater than or equal to
* <code>==</code> is equal to
* <code>!=</code> is not equal to

Note the double equal signs is an operator, not an assignment!!

In [28]:
5 < 6

True

In [29]:
6 <= 6

True

In [30]:
-6 <= 6

True

In [31]:
6 != 6

False

In [32]:
not( 6 == 6 )

False

## Using <code>whos</code> command to keep track of named values

In [33]:
whos

Variable        Type    Data/Info
---------------------------------
f00             str     asdfasdf
my_bool_value   bool    False
my_fav_number   int     42


## Iterable Data Types: Strings (<code>str</code>)

* A data type that contains one or more characters
* Strings are surrounded, a.k.a. "delimited" by matching single or double quotes
* You choose whether to use single or double quotes based on what's in the string.

In [34]:
"Hello, world!"

'Hello, world!'

In [35]:
'Hello, world!'

'Hello, world!'

I repeat: ***No difference between single and double quotes strings!!!!*** I promise!

In [36]:
"Can't"

"Can't"

In [37]:
'"Really," she said?'

'"Really," she said?'

By the way, I'm ***not*** talking about the backtick `, which shares a key with the tilde ~ character. Backtick is ***different*** than a single quote ', which shares a key with the double quote ".

## Iterable Data Types: Lists (<code>list</code>)

* Container for a collection of values
* Can all be the same type or different, doesn't matter.
* Items delimited by commas, all surrounded by brackets [], not parentheses ()
* The order of the values in the list is remembered

![https://www.javatpoint.com/python/images/elementsinalists.png](https://www.javatpoint.com/python/images/elementsinalists.png)

In [38]:
a_list = [ 1, 2, 3, 1, "a dog" ]

### Get the ith element from a list using bracket notation

In [39]:
a_list[0]

1

In [40]:
a_list[4]

'a dog'

### Negative index counts from the back of the list

In [41]:
a_list[-1]

'a dog'

### Use the "unpacking" syntax to get values out of small lists

In [42]:
a_few_things = [ "hello", "goodbye", 42 ]

In [43]:
first, second, third = a_few_things

In [44]:
first

'hello'

## Iterable Data Types: Dictionaries (<code>dict</code>)

* A <code>dict</code> is one-way associative array, where "keys" are mapped to "values."
* Note: A <code>dict</code> does not keep track of the order in which you inputted the key-value pairs
    * for that you need <code>collections.OrderedDict</code>

![https://developers.google.com/edu/python/images/dict.png](https://developers.google.com/edu/python/images/dict.png)

### Create a <code>dict</code> with stuff in it

The keys are separated by the values by a colon (:), and the key-value pairs are separated by commas.

In [45]:
simple = { 1 : 'a', 2 : 'b', 3: 'c'}

In [46]:
simple

{1: 'a', 2: 'b', 3: 'c'}

### Access an element in a <code>dict</code> using its key and bracket notation <code>[]</code>

In [47]:
simple[1]

'a'

### Create an empty <code>dict</code>

Declare empty dict with {}, or dict().

In [48]:
{}

{}

### Add a new key-value pair to an existing <code>dict</code> using bracket notation <code>[]</code>

In [49]:
simple['new_key'] = 'new_value'

In [50]:
simple

{1: 'a', 2: 'b', 3: 'c', 'new_key': 'new_value'}

## Iterable Data Types: Sets (<code>set</code>)

* Similar to math concept of sets; has operations like union, intersection, etc.
* Sets are unindexed, unordered, and contains no duplicates.
* My personal favorite of the Python standard types!

### Create a <code>set</code> with stuff in it

Declare a set by putting values inside braces.

In [51]:
a_set = {'set', 'of', 'words'}

### Create an empty <code>set</code>

Make an empty using <code>set()</code>.

In [52]:
empty_set = set() # not {}, that would be an empty dict

In [53]:
empty_set

set()

In [54]:
empty_set.add( 'hi')

In [55]:
empty_set

{'hi'}

In [56]:
first = {1,2,3,4,5}
second = {4,5,6,7,8}

### Set Union Operator (<code>|</code>) - "or"

![https://cdn.programiz.com/sites/tutorial2program/files/set-union.jpg](https://cdn.programiz.com/sites/tutorial2program/files/set-union.jpg)

In [57]:
first | second

{1, 2, 3, 4, 5, 6, 7, 8}

### Set Intersection Operator (<code>&</code>) - "and"

![https://cdn.programiz.com/sites/tutorial2program/files/set-intersection.jpg](https://cdn.programiz.com/sites/tutorial2program/files/set-intersection.jpg)

In [58]:
first & second 

{4, 5}

### Set Difference Operator (<code>-</code>)

![https://cdn.programiz.com/sites/tutorial2program/files/set-difference.jpg](https://cdn.programiz.com/sites/tutorial2program/files/set-difference.jpg)

In [59]:
first - second

{1, 2, 3}

### Set Symmetrical Difference Operator (<code>^</code>)

![http://www.itmaybeahack.com/book/python-2.6/html/_images/p2c6-symmdiff.png](http://www.itmaybeahack.com/book/python-2.6/html/_images/p2c6-symmdiff.png)

In [60]:
first ^ second

{1, 2, 3, 6, 7, 8}

## How many elements in an iterable? Use <code>len()</code>

In [61]:
len( first ^ second )

6

In [62]:
len(a_list)

5

## Can you change a value's type? Yes!

Use these functions to "[coerce](https://en.wikipedia.org/wiki/Type_conversion)" a value from one type to another:

* <code>int()</code>
* <code>float()</code>
* <code>bool()</code>
* <code>list()</code>
* <code>dict()</code>
* <code>set()</code>
* et al.

In [63]:
a_string = '45'

In [64]:
56 + a_string

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [65]:
56 + int(a_string)

101

In [66]:
str( 56 ) + a_string

'5645'

In [67]:
list( "listify me!")

['l', 'i', 's', 't', 'i', 'f', 'y', ' ', 'm', 'e', '!']

In [68]:
set( "listify me!" )

{' ', '!', 'e', 'f', 'i', 'l', 'm', 's', 't', 'y'}

In [69]:
float( 3 )

3.0

In [70]:
int( 3.14159 )

3

In [71]:
bool( "a_string")

True

In [72]:
bool( "" )

False

## Iterating over items in a <code>list</code> using a <code>for</code> loop

* Statements you want to be repeated inside the loop should be *indented* below the first line.
* Use the <code>TAB</code> key to indent.

In [73]:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June']

In [74]:
for m in months:
    print( m )

Jan
Feb
Mar
Apr
May
June


## Iterating over items in a <code>dict</code> using a <code>for</code> loop using <code>.items()</code> syntax

In [75]:
num_days_in_month = { 'Jan' : 31, 'Feb' : 28, 'Mar' : 31, 'Apr' : 30 }

In [76]:
for m, d in num_days_in_month.items():
    print( "There are", d, "days in", m )

There are 31 days in Jan
There are 28 days in Feb
There are 31 days in Mar
There are 30 days in Apr


## Day 1 review

1. Python ecosystem of tools
2. Jupyter Notebook is code, output and documentation all in one document
3. Type code into cells, and to run them you press Shift-Enter
4. Tab completion is nice
4. Different data types for different data
5. Operators take one or more input values and turn them into other values *based on the input values type*
6. Converting data from one type to another using the function syntax, e.g., <code>int()</code>
7. Iterating over iterables using a for loop