Introduction of Python Through Jupyter!
---

Who Am I?
---

- Barry Moore II, PhD (bmooreii@pitt.edu)
    - PhD from University at Buffalo in Theoretical Chemistry
    - Twitter (random programming and personal stuff): @chiroptical
    - GitHub (lots of Python): @chiroptical
    - Website (random blog posts): https://chiroptical.dev
    - Streaming Haskell: https://twitch.tv/chiroptical
- Center for Research Computing
    - Technical Director
    - Advanced Research Computing Team at Pitt
    - 10,200 compute cores
    - 90 GPUs

### Set Expectations

- Started 8 years ago to solve problems in Theoretical Chemistry
    - The point: I was a scientist similar to you
    - It takes a long time to be okay at programming, even longer to be good
    - Learned from doing it very wrong in a low level programming language
- Python is a great language to start with
    - It isn't terribly hard to be productive
    - Doesn't mean it is easy!
- Languages are tools!
    - You learned how to use spreadsheets, you can do this!
    - There are a lot of languages to choose from
        - I always suggest using the one which people around you use!
- You won't learn __any__ language in 3 hours!
    - This workshop is meant to get you started.

Initial Steps
---

- In your browser navigate to https://hub.crc.pitt.edu
- Click "Start My Server"
- Under "Select a job profile:" choose "Host Process" and click "Spawn"
- Get this notebook:
    - First open up a Terminal via. `File` $\rightarrow$ `New Launcher` $\rightarrow$ `Terminal`
    - `git clone https://github.com/chiroptical/nolecture-notebooks.git`

What did I just connect to?
---

- __JupyterHub__:
    - Multi-user server for Jupyter Notebooks
- What is a Jupyter Notebook? (from https://jupyter-notebook.readthedocs.io/en/stable/notebook.html)
> The notebook extends the console-based approach to interactive computing in a qualitatively new direction, providing a web-based application suitable for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results. The Jupyter notebook combines two components:
>
> __A web application__: a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output.
>
> __Notebook documents__: a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects.

    - Note: Jupyter Notebooks are not just for Python, but we are using them to learn Python
        - We have kernels for R and Julia if you want to use them
        - Other kernels exist, please ask me if you want to install them

What is Python?
---

- Python is an interpreted programming language. This means that code you write is read by an _interpreter_ to convert your text into pre-compiled elementary functions which can be run by the computer
    - This is opposed to compiled languages which are compiled directly to another language which can be run by the computer
- There are two flavors of Python
    - 2.7 (deprecated 2020)
    - __3.7__
        - We are using this now!

Why Jupyter Notebooks and Python?
---

- Allows you to integrate code and documentation seamlessly
- Allows you to share your Notebooks online (e.g. on github)
- Allows you to explore and massage data on the fly
    
#### Let's Explore the User Interface

- Drop down menus
- Tool bar
- Keystrokes

Helper Syntax
---

In the Notebook, you will encounter some syntax you __must__ understand:

1. `<Shift-Enter>`: This is a keystroke.
    - Hold `Shift`, press `Enter`.
    - Keystrokes will always start with a capital letter!
2. `<var1> <operator> <var2>`: This is a pattern, I want you to fill it in!
    - patterns will always start with a lowercase letter
    - Given the above pattern, using the multiplication operator (`*`) one could multiply the variables `a` and `b` together in the following cell:

In [None]:
a = 1
b = 2
# pattern was `<var1> <operator> <var2>`
a * b

Note the omission of `<` and `>`! It is imperative you understand this.

Notebook Cells
---

Basically 2 types:

1. Markdown:
    - A simple language which converts to HTML
    - Jupyter Docs: http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html
    - Use these to document your work
    - Write equations using $\LaTeX$: $ \frac{1}{N}\Sigma_i n_i $, or on it's own line:
$$ \frac{1}{N}\Sigma_i n_i $$
    - Write code: `x = 1`, or highlighted on it's own line:
```python
x = 1
```
2. Code:
    - Where you write your code.

### Modes

- The blue bar on the left represents the current cell focus
- Two modes:
    - Edit mode: blinking cursor, for typing documentation and code
        - Press `<Esc>` to exit edit mode
    - Command mode: no cursor, for commands
        - Press `<Enter>` to enter edit mode on cell

### How do you work with them:

- In edit mode, run the following keystrokes on the next cell:
    - `<Shift-Enter>`
    - `<Ctrl-Enter>`

In [None]:
x = 1
print(x)

- In command mode:
    - Move up, or down, cells using `<Up>` (`<Down>`)
    - Enter edit mode on focused cell: `<Enter>`
- In edit mode:
    - Enter command mode with `<Esc>`
- Creating cells:
    - In command mode:
        - Above current cell, press `a`
        - Below current cell, press `b`
- Converting between types:
    - In command mode:
        - convert to Markdown cell, press `m`
        - convert to code cell, press `y`

Active Learning
---

- Using a variant of Process Oriented Guided Inquiry Learning (POGIL) during this workshop
- Every section will introduce a concept followed by some tasks to reinforce the section
- Keep this quote in mind:
> "Mistakes are the portals to discovery" - James Joyce

    - __Python generates useful error messages, read them!__
- Learn at your own pace
- Struggle a little, but don't forget we are here to help!

Let's look at an simple example:

Addition in Python
---

The addition operator is `+` and is used to add two values

- Syntax: `<value> <operator> <value>`
- Example: `1 + 2`

### Tasks

1. Using the addition operator, add
    - 2 and 3
    - 4 and 5
2. The multiplication operator is `*`, complete 1. with multiplication
3. If I wrote `"hello" + "world"` what do you think would happen? Try it.

Python Syntax
---

### Comments

- Lines beginning with `#` are completely ignored
- Everything after `#` is ignored
- Comments help increase the readability of code

In [None]:
# print(1)
print(1) # this is ignored

### Jupyter Tips

- Comments can be separated into Markdown cells, but don't need to be
- In edit mode on code cells:
    - Try highlighting some code with your mouse and use `<Ctrl-/>` a few times
        - This behavior is called "toggling comments"

### Whitespace

In Python, unlike many programming languages, whitespace is important! Run the following cell:

In [None]:
print("hello")
 print("world")

Oh, we are hit with an `IndentationError`. First, we should discuss the anatomy of an error message in Python

### Anatomy of an Error

```
1.  File "<ipython-input-1-2f431114b51f>", line 2
2.    print("world")
3.    ^
4. IndentationError: unexpected indent
```

- Line 1:
    - Summarizes where the error occurred in the cell
    - The part in quotes is specific to Jupyter
    - `line 2` suggests the error occurred on line 2 (relative to the top of the cell)
- Line 2 & 3:
    - Summarizes where the error occurred on the line
    - The `^` is pointing the user to the `print` statement
- Line 4:
    - Summarizes what error occurred

### Blocks

Python is arranged into commonly indented _blocks_. The code above is properly written in one block as:

In [None]:
print("hello")
print("world")

Case Sensitivity
---

Python is case sensitive, i.e. you can __NOT__ do the following

In [None]:
Z = 1
print(z)

- You should get a `NameError` because `z` is not defined only `Z`!

Variables
---

- Variables represent "objects" via unique "names"
- Names represent a single object only
    - The programmer is allowed to change the object a name represents
- Variable and name are often used interchangably
- Syntax: `<name> = <object>`
    - The equal sign above is called the "assignment operator"
- Example:

In [None]:
x = 1
x

- Above, `x` is a name and `1` is represented by an `int` object
- `int` is the "type" of `x`

Types
---

- Every object has a type, types:
    - Allows Python to understand what can be done with, or to, an object
    - Are dynamically assigned for you, e.g. `1` was automatically determined to be an `int`
    - Can be determined with the following syntax: `type(<object>)`

In [None]:
x = 1
type(x), type(1)

Aside: Printing
---

There are many ways to print things in Python. I will show you 2:

- Syntax: `print(<object-0>, <object-1>, ...)`

In [None]:
x = 1
print(x)
print(x, x)
print(x, x, x)

- Jupyter is specific and will only print the last line of the cell
    - Syntax: `<object-0>, <object-1>, ...`
    - Example below will only print one thing!
    - Output will look a little different than using `print` directly

In [None]:
x = 1
x
x, x
x, x, x

#### Tasks:

1. Define a variable named `x` equal to `1`, print `x` and the type of `x`
    - Is `x` a name or an object?
    - Is `1` a name or an object?
2. Define a variable named `y` equal to `"a"` (including quotes!), print `y` and the type of `y`
3. Define a variable named `z` equal to `1.0`, print `z` and the type of `z`
4. Define a variable named `b` equal to `True`, print `b` and the type of `b`
5. If you used the same `print` style for 1 - 4, try the other style with 4.

Aside: Data Persistence
---

- Because you ran the cells above, the names can be accessed from anywhere in the notebook. Try printing the variable `b` from above. Is it the same?

- Understanding data persistence is key to using Jupyter Notebooks
    - It is the main weakness of Notebooks
    - If you know it is there you can strive to make notebooks shareable

Operators
---

- Operators are syntax which represent an arithmetic or relation operation
- A "unary" takes one input and the operator is in the prefix position
    - Syntax: `<operator><object>`
    - Examples: `-1` or `+1`
- A "binary" operator takes 2 inputs and the operator is in the infix position
    - Syntax: `<object-0> <operator> <object-1>`
    - Examples: `1 + 1` or `1 - 1`
- The binary arithmetic operators
    - Addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`)
    - Floor division (`//`): quotient without fractional parts
    - Modulus (`%`): Integer remainder of quotient
    - Exponentiation (`**`)
- Precedence
    - The order of operations you expect from mathematics is obeyed
        - Use parentheses to override the precedence

#### Tasks:

1. Run the definition cell below
2. Use `+`, `-` unary operators on `a`
3. Use `+`, `-`, `*`, `/` binary operators on combinations of:
    - `a` with `a`
    - `b` with `b`
    - `a` with `b`
4. Some of the binary operators yield results which are different from the input types, why do you think this is? 

In [None]:
# Definitions
a = 2.2
b = 4

Aside: Syntactic Sugar
---

"Syntactic sugar" is a term for syntax which is easier to type but is translated to more verbose syntax, consider the following cell

In [None]:
a = a + 1

This cell can be "sugared" using the `+=` addition-assignment operator, e.g.

In [None]:
a += 1

#### Tasks

1. Run the definition cell
2. Use the multiplication assignment operator to return the area of a cirle if `a` was the radius i.e. $ \pi r^2$

In [None]:
# definitions
a = 2.0
pi = 3.14159

Comparison Operators
---

- Binary comparison operators return boolean values and are used to compare objects:
    - `==`: is equal to
    - `!=`: is not equal to
    - `>`, `<`: greater, less than (respectively)
    - `>=`, `<=`: greater, less than or equal to (respectively)

#### Tasks

Convert the textual representation of a comparison into code, example:

- "2 is less than or equal to 1"

In [None]:
2 <= 1

- Convert the following text:
    - 4 is not equal 3
    - 25 is odd (hint: what's 25 % 2, how about 20 % 2)
    - 16 is not odd
    - 13 is not even
    - 20 is between 10 and 30

Boolean Operators
---

- Booleans represents truth values, in Python use `True` or `False` (note capitals!)
- A unary boolean operator `not`
    - Syntax: `not <boolean>`
- Two binary boolean operators `and`, `or`
    - Syntax: `<boolean> <operator> <boolean>`

#### Tasks

1. We are going to do something a little different here. I gave you the syntax for the boolean operators and I would like you to tell me what they do. First, an example. Fill in the following statement: `not` is a unary operator which ...

For this problem, we need the syntax for `not`, i.e. `not <boolean>`. We know there are two boolean values, i.e. `True` and `False`. Therefore, running the following cells should help me understand what `not` does.

In [None]:
not True

In [None]:
not False

Neat, the `not` keyword inverts a boolean expression. Given `True` it returns `False` and vice-versa. Filling in the statement now:

`not` is a unary operator which takes a boolean value as input and inverts it.

Now your turn, use `True` and `False` and the following operators

- `and`
- `or`

To fill-in the following statements:
    
- `and` is a binary operator which <does what?>
- `or` is a binary operator which <does what?>

Boolean Operators Cont.
---

You now have some documentation for how the boolean operators work. Let's combine the boolean and comparison operators to make some comparisons!

#### Tasks

1. Evaluate the following expressions (without the interpreter!):
    - `1 == 2 or 2 == 2`
    - `1 == 2 and 2 == 2`
    - `not 1 == 2 and 2 == 2`
2. Enter the expressions and check your answers 

Data Structures
---

- Data structures group values and the operations
    - The operations could:
        - Provide information about the data structure
        - Modify the structure itself
- One of the simplest data structures is a `tuple`
- Tuples can contain many values in "order", but are "immutable"
    - Syntax: `(<object>, <objects>...)`
    - Example:

In [None]:
a_tuple = (10, 9, 8)

Definitions
---

- "Ordered" means objects within the data structure be accessed by an "index", pictorally

```
tuple (10, 9, 8)
        ^  ^  ^
index   0  1  2
```

- To access index `n` from a data structure `c`, syntax: `c[n]`
- Example:

In [1]:
example_tuple = (3, 2, 1)
print(example_tuple[0])
print(example_tuple[1])
print(example_tuple[2])

3
2
1


### Tasks

1. Looking at the above `example_tuple`,
    - What are the objects in the data structure?
    - What are the indices?
2. Using `a_tuple` from the definitions cell, try to extract `9` using the index

In [None]:
# definitions
a_tuple = (10, 9, 8)

Definitions Cont.
---

- "Immutable" means the object can _NOT_ change, example:

In [None]:
a_tuple[0] = 12

- You should have gotten a `TypeError` because this object cannot change
- You can add items to tuples, but you need to reassign the name to a new object
    - First, an example:

In [None]:
a_tuple = a_tuple + (7,)
a_tuple

### Breaking Down the Syntax

```python
a_tuple = a_tuple + (7,)
```

- `a_tuple` (right side of equals) resolves to the object `(10, 9, 8)`
- `(7,)` is a new tuple object with one element (note: `(7)` would just be an integer and would fail)
- The two objects on the right side can be added to form a new object `(10, 9, 8, 7)` and reassigned to the name `a_tuple` (left side of equals)

### Back to Data Structures

- Tuples aren't terribly flexible, but they might be perfectly acceptable depending on what you are doing
- "Sets" are used to represent a unique group of things
    - Syntax: `{<object>, <objects>, ...}`
    - Example:

In [None]:
a_set = {4, 10, 8}
a_set

- Sets are "mutable" (can be changed) and "unordered" (can not be accessed via an index)
    - To add elements, syntax: `a_set.add(<object>)`
        - Sets are unique, can't add an element that already exists
    - To remove elements, syntax: `a_set.remove(<object>)`
        - Trying to remove an object which doesn't exist will fail
- "Lists" are much more flexible than tuples or sets
    - Syntax: `[<object>, <objects>, ...]`
    - Example:

In [None]:
a_list = [3, 4, 5]
a_list

- Lists are mutable and ordered
    - To add items, use `a_list.append(<object>)`
    - To remove items,
        - by index, `a_list.pop(<index>)`
        - by object, `a_list.remove(<object>)`
            - will only remove first occurrence if duplicate objects exist
        
### Aside: `append`

Appending many items to a list can be extremely slow. If you know the size of the list before-hand it is best to preallocate the list and enter items, example:

In [None]:
num_items = 20
empty_list = [None] * num_items
# some command/s to insert many items
empty_list

### Task

1. Could you preallocate a `tuple` object? Why or why not? 

### Back to Data Structures

- The final data structure I will discuss is a "dictionary"
- Dictionaries are mutable and unordered
- Dictionaries are accessed using "keys", each key is associated with a "value"
    - Syntax: `{<key0>: <value0>, <key1>: <value1>, ...}`
        - Keys must be unique
        - Values could be duplicates
    - Example:

In [None]:
a_dict = {'a': 1, 'b': 2}
a_dict

- Above, the keys are `a` and `b` with values `1` and `2`

### Data Structures: Utilities

There are many functions which can operate on many containers. I will go over a few useful properties now.

- Does the data structure contain a specific element?
    - Syntax: `<object> in <container>`
    - Examples:

In [None]:
# Reminder
a_tuple, a_set, a_list, a_dict

In [None]:
9 in a_tuple

In [None]:
4 in a_set

In [None]:
5 in a_list

In [None]:
'a' in a_dict

In [None]:
1 in a_dict

In [None]:
1 in a_dict.values()

- How many elements does the container have?
    - Syntax: `len(<container>)`
    - Examples:

In [None]:
len(a_tuple), len(a_set), len(a_list), len(a_dict)

Loops
---

- "Loops" abstract away repetition 

### Motivation

In [None]:
a = [1, 2, 3, 4, 5]
a

Let's say I want to double every element in `a` to produce `b`, I could write:

In [None]:
b = []
b.append(a[0] * 2)
b.append(a[1] * 2)
b.append(a[2] * 2)
b.append(a[3] * 2)
b.append(a[4] * 2)
b

- This repetition will become unbearable quickly. What if a list had thousands of entries?
- __Important note before continuing: loops can be difficult for beginners__
    - Reminders:
        - Accessing elements of data structures via bracket syntax: `a_data_structure[an_index]`
        - In Python, the indices of data structures start at 0

### Loops Handle Repetitive Tasks

- In the above problem, you want to abstract over the indices of `a`
    - __What are the values of the indices of `a`?__
        - Do not continue until you answer this question
        - If you are unsure of your answer, ask me!

- The `for` loop syntax:
```python
for <temporary> in <data_structure>:
    <body>
```
- Example (same problem as above):

In [None]:
b = []
for idx in [0, 1, 2, 3, 4]:
    b.append(a[idx] * 2)

### Breaking Down the Syntax

- `idx` temporarily takes on each value "in" the list `[0, 1, 2, 3, 4]`
- The "body" of the `for` loop multiplies `a[idx]` by 2 and `append`s the result to `b`
    - `idx` will "iterate" through the body five times because `len([0, 1, 2, 3, 4])`
    - In table form:
    
|Iteration|`idx`|`a[idx]`|`a[idx] * 2`|
|---|---|---|---|
|0|0|1|2|
|1|1|2|4|
|2|2|3|6|
|3|3|4|8|
|4|4|5|10|

- The body is an indented block (by 4 spaces) which contains a procedure
    - Most times, Jupyter will handle the spacing for you automatically
    - Use `<Tab>` to indent right and `<Shift-Tab>` to indent left"

Aside: `for` with `range`
---

- The `range` function will generate an "iterable" (think list, but we will discuss in detail later)
- Syntax: `range(n)` where `n` is an integer
- Example:
    - The `list` function is used to show you what is inside a `range object

In [2]:
list(range(5))

[0, 1, 2, 3, 4]

- `range` objects are often used to represent indices of object
    - Rewriting the example above using `for` and `range`:

In [4]:
a = [1, 2, 3, 4, 5]
b = []
for idx in range(5):
    b.append(a[idx] * 2)
b

[2, 4, 6, 8, 10]

- Awesome, but the `range(5)` isn't generic
    - Using the `len` function on `a` we get the number of elements, i.e. 5
- Example:

In [5]:
a = [1, 2, 3, 4, 5]
b = []
for idx in range(len(a)):
    b.append(a[idx] * 2)
b

[2, 4, 6, 8, 10]

Tasks
---

1) Using a `for` loop with the `range` function generate `c` by doing the element-wise addition of `a` with `b`
- Hints:
    - You need the `append` function from the `list`s section
    - `c` below should equal `[6, 8, 10, 12]`

In [None]:
# Definitions
a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
c = []

### Motivation for the next task

Appending to lists is really slow! If you know the length of the list is is best to do the following.

You can assign to values inside a list just like any other value. Try the following cell.

In [None]:
c = [None, None]
print(c)
c[0] = 10
c[1] = 20
c

2) Do "1)", but this time using a preallocated `c`
- Hint: `c` below should equal `[6, 8, 10, 12]`

In [None]:
# Definitions
a = [1, 2, 3, 4]
b = [5, 6, 7, 8]
c = [None] * len(a) # i.e. `[None] * 4` or `[None, None, None, None]` printed below for clarity
c

Flow Control: `if`
---

- The keyword `if` takes a comparison, if the comparison is `True` the block will run
- You can add the following to extend an `if`:
    - `elif` means "else if" and provides additional comparisons to match
    - `else` will run if none of the previous `if` or `elif`s match
    - The first `True` comparison will be run, others will be ignored
- Syntax:
```python
if <comparison0>:
    # procedure if comparison0 True
elif <comparison1>:
    # procedure if comparison1 True
else:
    # procedure if neither comparison0 or comparison1 are True
```  
- Important notes
    - `elif` and `else` not strictly necessary, but must start with `if`
    - Any number of `elif`s can be used, but only one `else`
    - `else` must be last
    - Python won't warn you if
        - You make the same comparisons multiple times
        - Create an `if`, `elif`, or `else` which would never be `True`

#### Tasks:

1. Before running the code below, what do you think is printed?
2. Run the code, what is the result? Why?

In [None]:
x = -12

if x == 0:
    print(x, "is zero")
elif x > 0:
    print(x, "is negative")
elif x > 0:
    print(x, "is positive")
else:
    print("I don't recognize", x)

Aside: Error versus Bug
---

- The code above runs, but doesn't produce the result we expect. This is known as a "bug".
- An "error" will result in the code not running, e.g. `TypeError` and `IndentationError` which we saw before

### Tasks Cont.

1. Fix the bug in the code below
2. Try different values for `x`
3. Will the `else` block ever run?

In [None]:
x = -12

if x == 0:
    print(x, "is zero")
elif x > 0:
    print(x, "is negative")
elif x > 0:
    print(x, "is positive")
else:
    print("I don't recognize", x)

Functions
---

- A "function" is a procedure which takes input and produces output
- Examples (which we saw earlier)
    - `len` took a container and produced an integer (the number of elements)
    - `range` took an integer and produced an "iterable" with that many elements in ascending order
- Defining your own functions, syntax:

```python
def <function_name>(<input0>, <input1>, ...):
    # Do something with input0 and input1 to product result
    result = <input0> ... <input1>
    return result
```

- Example, squaring an input:

In [1]:
def square(x):
    return x * x

square(2)

4

### Breaking Down the Syntax:

- `def` is the keyword to start a function definition
- `square` is the name of the function
- `return` passes back information to the caller of the function

### Notes

- Functions don't have to return anything (behind the scenes they return `None`)
- When the interpreter evaluates a `return` it will ignore anything else in the function
- Use meaningful function names to make your intention clear
    - How helpful is the function if it was named `awesome_function`?
- Function names should be lowercase and words separated by `_`
    - This is true for variable names too
    - You may also see `camelCase` (e.g. `awesome_function` -> `awesomeFunction`)
        - Use this only if working on code which is already using this convention

#### Tasks

1. Verify that a function with multiple `return` statements only returns the first occurence
2. Write the definition of a `cube` function
3. Write the definition of the area for a circle given a radius
    - $ A = \pi r^2 $
4. Write the definition of the `multiply` function

Aside: Data Descriptors and Methods
---

- Objects can have "members", they include:
    - "Data descriptors" are simply variables
    - "Methods" are functions, sometimes called "member functions"
- Members of objects can be accessed via `.`
- Syntax `<object>.<data_descriptor>` or `<object>.<method>(<args>)`
- The `help` function can be used to access members

In [None]:
help(int)

### Notes

- Methods which look like `__<name>__` aren't meant to be called directly
    - They are used by the interpreter to handle things like printing, adding, subtraction, etc.
- When a method refers to `self` or `/`, it must be ignored when calling the method
    - i.e. `bit_length` (below) doesn't take any arguments

In [None]:
help(int.bit_length)

- Example

In [None]:
a = 1
a.bit_length()

### Tasks

1. What are the data descriptors available for an integer?
2. Check out the `help` for a `list`
3. Given the definitions cell below, try to use one of the methods defined for `list` on `a`
    - Preferably, one you haven't seen before!
    - Feel free to try more than one

In [None]:
# definitions
a = [1, 2, 3]

Anonymous Functions
---

These functions are essentially syntactic sugar for simple functions. They can also be unnamed and used inside other functions (we will look at this later).

- Syntax:

```python
lambda x: x * 2
```

- What do you think this function would do?

### Break Down the Syntax

- `lambda` starts the anonymous function
- `x` names a single input, use `x, y` for multiple inputs
- `:` seperates the inputs from the body
- `x * 2` is the body of the function

### Notes

- We can "desugar" the above to:

```python
def double(x):
    return x * 2
```

- Note when desugaring a name is required!
    - You _can_ name anonymous functions if you like, e.g.

```python
double = lambda x: x * 2
```

- Examples:

In [2]:
cube = lambda x: x ** 3
cube(2)

8

In [3]:
exclaim = lambda x: x + "!"
exclaim("hello")

'hello!'

### Tips

- Anonymous functions should only be used for simple functions, e.g. double, square, and exclaim
- Anonymous functions are typically used as inputs for other functions (in detail later)
    - Simple example:

In [8]:
def apply_unary_f(f, x):
    return f(x)

apply_unary_f(lambda x: x * 2, 2)

4

#### Tasks

1. Write an anonymous `addition` function
2. Given the definition for `apply_unary_f` above, try to write `apply_binary_f`
3. Use your `addition` and `apply_binary_f` to add 2 and 3

Iterable and Iterator
---

Previously, we looked at `for` loops with the `range` function and I told you `range` was an "iterable", but didn't explain further. An "iterable" is very similar to a list, e.g. you can access elements via the bracket syntax, but the behavior behind the scenes is different. A list contains every object in memory, but the `range` object only stores 3 data descriptors: `start`, `stop`, and `step`. Using the descriptors one doesn't need to store every element inside the range, it is simply calculated.

Iterators are similar to iterables, but the operation behind the scenes is a bit different too. Iterators contain enough information to generate the next object in the series. The `next` function will get the next object in the series.

- Example:

In [None]:
a = iter(range(2))
next(a)

In [None]:
next(a)

In [None]:
next(a)

### Notes

- `range`s can be converted into iterators via the `iter` function
- `a` is an iterator which can be thought of like `[0, 1]`
- The first `next` "consumes" the `0` object from the iterator
- The second `next` consumes the `1`
- The third `next` fails with a `StopIteration` error because the iterator is empty!
- Iterable objects (i.e. ranges) are not consumable

Example Iterators
---

- I am going to show a lot of examples below. __Try to guess what is printed before running the examples!__
    - Reading other people's code is a great way to learn

### Enumerate

- `enumerate`: packages each index with each element in a container
- Example (without `enumerate`):

In [None]:
l = [1, 2, 3]
for idx in range(len(l)):
    print(idx, l[idx])

- Example ("packed"):

In [None]:
m = [4, 5, 6]
for i in enumerate(m):
    print(i)

### Notes

- What is the type of each `i` above?
- `tuple`s can be "unpacked", example:

In [None]:
a, b = (0, 1)
print(a)
print(b)

- Coming back to the example and using the unpacking syntax

In [None]:
m = [4, 5, 6]
for idx, i in enumerate(m):
    print(idx, i)

### Zip

- `zip` packages multiple containers elementwise into tuples
- Example (unpacking the tuples):

In [None]:
left = [1, 2, 3]
right = [3, 2, 1]
for l, r in zip(left, right):
    print(l, r, l + r)

### Notes

- A zip will only create tuples if enough entries exist in both lists
- How many lines with the cell below print (feel free to guess)?

In [None]:
a = [1, 2]
b = [2]
for x, y in zip(a, b):
    print(x, y)

- Because the `2` in `a` has no complement, it is ignored
- Note you can combine more than 2 things! example:

In [None]:
a = [1, 2]
b = [3, 4]
c = [5, 6]
for x, y, z in zip(a, b, c):
    print(x, y, z)

### Maps

- `map` applies a function to each element in a container
- Example:

In [None]:
a = [1, 2, 3]
for x in map(lambda x: x * 2, a):
    print(x)

- Combined with a `zip`, remember `tuple`s are ordered!

In [None]:
a = [1, 2, 3]
b = [6, 5, 4]
for x in map(lambda t: t[0] + t[1], zip(a, b)):
    print(x)

### Filter

- `filter` takes a function which returns a boolean, if the function returns `True` the element will be returned
- Example:

In [None]:
for evens in filter(lambda x: x % 2 == 0, range(5)):
    print(evens)

### Tasks

Going to use a range syntax you may have seen in mathematics class:

- `[begin, end]` - means both begin and end are included
- `(begin, end)` - means neither begin nor end are included


1. Use `enumerate` and `range` (not `filter`) to print the index and value of even integers between [0, 10)
2. Use `enumerate` and `zip` to sum `a` and `b` into `c` (definitions below)
    - Hint: use `list(enumerate(zip(a, b)))` to see the shape of what to unpack
3. Use `map` to print the squares of values in the range [0, 10) using only anonymous functions
4. Use `filter` to print the cubes of even values in the range [0, 10) using only anonymous functions

In [None]:
# definitions for "2."
a = [1, 2, 3]
b = [3, 4, 5]
c = [None] * 3

List Comprehensions
---

We have looked at a few examples of adding two lists element-wise, but there is actually some sugar which will help you out! Comprehensions are my favorite Python language constructs (yes, that's a thing!).

- Syntax:
```python
[<body> for <temporary> in <container/iterable/iterator>]
```
- Example:

In [None]:
a = [1, 2, 3]
b = [3, 2, 1]
c = [x + y for x, y in zip(a, b)]
c

### Breaking Down the Syntax

- `for x, y in zip(a, b)` should look familiar
    - `x` and `y` are unpacked `tuple`s returned by the `zip` iterator
- `x + y` is simply the body
- Wrapping these up in brackets we have our comprehension

### Notes

- Comprehensions are great for simple operations, but don't get carried away
    - Sometimes it is more readable to have a `for` loop
- Another example with a filter clause:

In [None]:
a = [1, 2, 3]
b = [3, 4, 5]
c = [x + y for x, y in zip(a, b) if (x + y) % 2 == 0]
c

- This is very similar to the example above, but we added `if (x + y) % 2 == 0` to filter out only even results
- Again, I want to emphasize that it is really easy for comprehensions to __decrease__ readability of code
    - Unreadable code is also unmaintainable!

### Dictionary Comprehensions

- Similarly to list comprehensions, one can also do dictionary comprehensions!
- Some examples:

In [None]:
d = {'one': 1, 'two': 2, 'three': 3}
keys_become_values = {k: k for k in d}
keys_become_values

In [None]:
d = {'one': 1, 'two': 2, 'three': 3}
values_become_keys = {v: v for v in d.values()}
values_become_keys

In [None]:
d = {'one': 1, 'two': 2, 'three': 3}
values_doubled = {k: v * 2 for k, v in d.items()}
values_doubled

### Notes

- One can add the filter clause to the end of a dictionary comprehension
- The `<body>` of the dictionary comprehension must be a `<key>: <value>` pair
- You can use dictionaries in list comprehensions, but the bodies must be values not key-value pairs

### Tasks

1. Given the definitions below,
    - Make `my_even` from `my_dict` which only contains even values
    - Make `my_odd_squared` from `my_dict` which contains squares of the odd values
    - Build `favorite_songs_by` from `songs` (values) and `artists` (keys)
    - Invert `favorite_songs_by`, i.e. keys become values, values become keys

In [None]:
# definitions
my_dict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
songs = ["Tom Sawyer", "Money", "Simple Man"]
artists = ["Rush", "Pink Floyd", "Lynyrd Skynyrd"]

Reading and Writing Files
---

- Opening files using the `with` syntax:

```python
with open(<file>, <mode>) as <name>:
    # do something with name
```

### Breaking Down the Syntax:

- `<file>` is the name of a file on the filesystem
- `<mode>` can be:
    - `"r"`: read
    - `"w"`: write (overwrites contents)
    - `"a"`: append
- Write and append mode will create the file if it doesn't exist
- `<name>` is how you will refer to the file in the body
- Example (using `<file_object>.write`):

In [None]:
with open("hello.world.txt", "w") as f:
    f.write("Hello, World!\n")
    f.write("Is there anybody in there?\n")

- `"\n"` represents the creation of a new line
    - Try removing it and running the above code, what happened?
- Example (using `<file_object>.writelines`)

In [None]:
with open("hello.world.txt", "a") as f:
    content = ["Yes, it's me\n", "Carmen Sandiego\n"]
    f.writelines(content)

### Tasks

- Given the file `data.csv`,
    - `open` can be used as an iterator to get to each line (feel free to use `with` syntax if you prefer it)
    - `print` returns `None`
        - Generates `[None, None, None, None]` at the end of the cell
    - The type of `l` below is `str` or String

In [None]:
[print(l) for l in open("data.csv", "r")]

For each step, I suggest you save the intermediate values into variables

1) Use a list comprehension to read "data.csv" into a list
- Hint: use `<str>.strip()` to remove new line characters

2) Use another list comprehension to split the string into two values
- Hint: `"10,11".split(',')` is `["10", "11"]`

3) Use another list comprehension to convert the strings to integers
- Hint: `int("10")` will convert `"10"` to `10`

4) Add every pair of values together via a list comprehension

5) Write each sum, on individual lines, to a file named "sums.csv" using the `with` syntax
- Hint: An integer can be converted to a string, example `str(1)` is `"1"`

Modules
---

There is a funny saying,

> Python is the second best language for everything

This turns out to be really helpful because there is a module for everything!

### How do I find modules?

- You will download most packages from PyPI, or the [Python Package Index](https://pypi.org/)
- First, you will need to know what packages to get
    - To find them, use Google. Search examples:
        - "Python Data Science Libraries"
        - "Python Health Sciences Libraries"
    - Or, Google a specific problem
        - "Python how do I read CSV files?"
- I try to prefer built-in functions to external ones (when it makes sense)
    - For example, there is a built in CSV reader in the library `csv` but the library `pandas` has a lot more functionality for data manipulation
- Example, importing `csv`

In [None]:
import csv

- Remember the `help` function?

In [None]:
help(csv)

### Digesting `help`

- That was a lot of information!
    - `pandas` would be even more
- To access the `reader` function (towards the end of help output), we would use:

In [None]:
csv.reader

- If you only need the `reader` function:

In [None]:
from csv import reader
reader == csv.reader # Are they the same?

### Tips

- I tend to only import the functions I need, but feel free to import the entire library
- It is considered good style to import one function per line

```python
# considered poor style
from csv import reader, writer
# considered cleaner
from csv import reader
from csv import writer
```

- Sometimes, modules have a recommended way of importing them
    - Example, `pandas`:

In [None]:
import pandas as pd

- Now, you can access members via `pd.<member>`
    - `pd` is called an "alias"

### Installing a Module

- I have installed a lot of modules for you, but if you are missing one they must be installed via:
        
```python
import sys
!{sys.executable} -m pip install --user numpy
```
    
- If you want to use the other kernels, please ask me
- The `--user` flag is the secret sauce, without it you will get a permission denied error

Project: Population Analysis
---

Given the US Census information for 2016 and 2017 for each state (including DC and Puerto Rico) in CSV format. Use Python to determine:

1. The average population for 2016 and 2017
2. The state with the largest and smallest population

### Why this problem?

Yes, this is a relatively simple problem but it brings together a lot of what you learned today. Most importantly, you will need to determine an appropriate data structure to store the information. Knowing which data structure to use is the most important decision when analyzing data. An example, if you were given a person's full name and needed to find their phone number what would you use? Likely, you said phone book. Would you use a phone book if you were given someone's phone number?

Most of the problems you will deal with are simply reading data files, doing some processing, and then spitting out some statistical result. Python is fantastic for this.

### Tasks

1. Open the data file "states.csv" using the `with` syntax
    - Note: this is a proper CSV file which means it has a header on line 0
2. Read the first line, there are 3 columns separated by commas
    - Hint: print this line and think about how you want to store your data!
3. Use a `for` loop to read the remaining lines and parse them into your data structure
    - Hint: the function `int` can convert a string to an integer
4. Generate the following information
    - Average population for 2016 and 2017
    - State with the largest population
    - State with the smallest population
        - Hints:
            - `min`, `max`, and `sum` are built-in functions which operate on lists, test them
            - A `mean` function exists, inside the `statistics` module, which operates similarly to the above functions
            - Lists have an `index(<object>)` method to find a particular object's index
5. Was the data structure you chose optimal? Could you rearrange it and make your life easier?

Additional Resources
---

1. Me! Don't hesitate to email me (bmooreii@pitt.edu) and we can make an appointment
2. If you want to install Python and Jupyter on your own machine
    - Check out [Anaconda](https://www.anaconda.com/download)
        - It is easy to install and includes Jupyter
3. This workshop is loosely modeled after [World Wind Tour](http://nbviewer.jupyter.org/github/jakevdp/WhirlwindTourOfPython/blob/master/Index.ipynb)
4. [Stack Overflow](https://stackoverflow.com). If you "Google it" it will probably come from Stack Overlow.
    - Pro tip, search for: `python <search phrase>`
    - As you get better you will easily be able to filter good answers from poor ones

#### [Python Documentation](https://docs.python.org/3/library/index.html)

#### [Python Built-in Functions](https://docs.python.org/3/library/functions.html)

#### Notable Python Libraries

1. [docopt](http://docopt.org) - command line arguments (not helpful in a Jupyter Notebook)
2. [pandas](http://pandas.pydata.org) - great for processing data
3. [matplotlib](https://matplotlib.org) - plotting tool, can view inside Jupyter Notebooks!
4. [requests](http://docs.python-requests.org/en/master) - HTTP Requests
5. [numba](https://numba.pydata.org) - just-in-time compilation for Python (make your code run faster)
6. [subprocess](https://docs.python.org/3/library/subprocess.html) - run external commands
7. [numpy](https://docs.scipy.org/doc/numpy/dev/) - great array package for doing math
8. [sympy](https://docs.sympy.org/latest/index.html) - symbolic math
9. [scikit-learn](http://scikit-learn.org/stable/documentation.html) - machine learning package
10. [dataset](https://dataset.readthedocs.io/en/latest/) - deal with SQL databases as if they were dictionaries

#### Useful Books

- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) - Free! If you don't mind reading online
- [Hands on Machine Learning with Scikit-learn and Tensorflow](https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291) - Great for machine learning and data science
    - I own this book if you want to borrow it

Quick Survey
---

- `<Ctrl-Enter>` the following cell

In [None]:
from IPython.display import IFrame
IFrame("https://forms.gle/N3h1vUYWneHs9Rkf8", width=760, height=500)