## Intro to Python and Jupter Notebook

A Jupyter notebook lets you write and execute Python code in your web browser. Jupyter notebooks make it very easy to tinker with code and execute it in bits and pieces; for this reason Jupyter notebooks are widely used in scientific computing. A Jupyter notebook is made up of a number of cells. Each cell can contain Python code or text. Every time a new cell is created, it's classified as a code cell. If you want it to be a text cell, choose `Markdown` instead of `Code` in the toolbar. You can execute any cell by clicking on it and pressing `Shift-Enter`, or clicking the `Run` button in the toolbar. When you do so, the code in the cell will run, and the output of the cell will be displayed beneath the cell. For example:

In [34]:
x = 1 + 2
print(x)

3


Global variables are shared between cells. Executing the second cell thus gives the following result:

In [35]:
y = 2 * x
print(y)

6


By convention, Jupyter notebooks are expected to be run from **top to bottom**. Failing to execute some cells or executing cells out of order can result in errors.

To edit any cell that has already been run, double click on it. If you do this to a text cell, what appears might look a little funny if you haven't been exposed to the markup langauge *Markdown*. Unlike cumbersome word processing applications, text written in Markdown can be easily shared between computers, mobile phones, and people. It’s quickly becoming the writing standard for academics, scientists, writers, and many more. Websites like GitHub and reddit use Markdown to style their comments. You won't need to be an expert with Markdown for this class, but it you'd like to learn more, this [link takes](https://www.markdowntutorial.com/) you to a really fun tutorial.

### Why Python?

* Python is a clear and powerful programming language
* Elegant syntax makes programs easy to read
* Large standard library supports many common programming tasks
* Python can be used in two different modes
    * Interactive mode makes it easy to experiment with the language
    * Standard mode is for running scripts and programs from start to finish
* Python programs are much shorter than equivalent C, C++, or Java programs
* High-level language and data types allow concise complex operations

### Modules 

* Python **modules** are libraries of code (like packages in R)
* They are imported using the `import` statement
* Python comes with several modules but you can also write your own modules
* Large Python programs are usually organized into modules and then loaded
* For example, mathematical functions are available in the `math` module

In [36]:
from math import pi, sqrt
import math
print(math.pi)
print(math.e)
print(math.sqrt(10))
print(math.sin(math.pi/2))

3.141592653589793
2.718281828459045
3.1622776601683795
1.0


### Object Types

* Every piece of data stored in a Python program is an object
* Each object has a **type**, **value**, and **identity*
* For example:
  * `str` (type)
  * `"hello"` (value)
  * `2915232` (identity)
* **Attribute** is a name that is attached to a specific object and has a dot between the two: `object.attribute`
* Some attributes are callable (functions) and others are noncallable (variables)
* Objects are characterized by data attributes and methods
  * A **data attribute** is a value attached to an object
  * A **method** is a function attached to an object that performs some operation on the object
* Object type determines which operations the object supports, i.e., which operations you can perform on it

#### Numbers

* Numbers are one type of object in Python
* Python provides 3 numeric types: integers, floating-point numbers, and complex numbers
* Integers have unlimited precision
* Different numeric types can be freely mixed
* Numbers support all the usual arithmetic operations

In [37]:
125 + 25.21
125 + 25
125 * 25
125 ** 25
5 / 2       # normal division
5 // 2      # floor division

2

We very commonly need to go beyond built-in functions and operations. We can import the `math` module for this purpose:

In [38]:
import math
math.sqrt(math.pi)
math.sin(math.sqrt(math.pi)/2)

0.7746914034386123

`random` is a very useful module when dealing with simple settings that require random sampling of a sequence of objects. 

In [39]:
import random
random.random()
random.choice([10, 15, 20, 25, 30, 35, 40])
random.choice(["Monday", "Wednesday", "Friday"])

'Monday'

#### Boolean Operations 
* **Expression** is a combination of objects and operators that computes a value
* Many expressions involve what is known as the Boolean data type
* The Boolean data type is a data type that has only two values: true and false
* In Python, the Boolean type is `bool`, and it has two values: `True`, `False`
* Compare this to the `int` type which can represent any integer
* Operations involving logic, so-called Boolean operations, take in one or more Boolean objects and return one Boolean object
* There are only 3 Boolean operations (`x` and `y` are Boolean; listed here by ascending priority)

In [40]:
x or y

x and y

not x

False

#### Comparisons 

* There are 8 comparison operations in Python
* These are commonly used for numeric types
* Can also be used for other types, like sequences, where comparisons are done element wise
* The result of a comparison is either `True` or `False` (returns a Boolean type)
* All have the same priority (higher than Boolean)
* Here, `x` and `y` are any objects for which the comparison operators are defined. The last two compare object identity and its negation

In [41]:
x < y
x <= y
x > y
x >= y
x == y
x != y
is
is not

SyntaxError: invalid syntax (<ipython-input-41-58f34c4ca123>, line 7)

#### Sequences

* In Python, a **sequence** is a collection of objects ordered by **position**
* There are 3 basic sequence types: lists, tuples, and range objects
* Additional sequence types exist for representing strings (text)
* All sequences support **common sequence operations** and each sequence type has its **additional type specific operations**
* These data types are called sequences because the objects they contain form a sequence
* Indexing for sequences starts from zero (0)
* Indexing logic: [from position, to position)


#### Strings

* Strings are **immutable** sequences of characters
* String literals are enclosed in single (`'hi'`), double (`"hi"`), or triple quotes (`"""Computer says "No.""""`)
* Examples of common sequence operations on strings:

In [4]:
S = "Python"
print(len(S)) # print the length of the string
print(S[0])   # print the first character in the string
print(S[0:6]) # slicing - print the 0 - 5 positions
print(S[-1])  # print the last character in the string
"y" in S      # membership
"z" not in S

6
P
Python
n


True

* **Polymorphism** is the idea that the meaning of an operation, such as `+` and `*`, depends on the objects being operated on
* This is like in math: each operation has to be defined separately for each type of object
* Here, `+` is referred to as "concatenation" and `*` as "repetition"

In [42]:
S = "Python"
print(3*5)
print(3*S)
print(3 + 5)
print(S + ' is phun')
#print("eight is " + 8)     # string object + integer object (!) 
                            # This creates an error since strings and integers are different types
print("eight is " + str(8)) # string object + string object

15
PythonPythonPython
8
Python is phun
eight is 8


The above operations on strings were really generic sequence operations. Strings are a type of sequence and support all sequence operations. In addition, strings have operations all their own, which are available as **string methods**. Methods are functions attached to objects and they are triggered with a call expression.

In [43]:
S = 'Python'
S.find('y')

1

In [44]:
S.replace('y', 'Y')

'PYthon'

In [45]:
name = "Amy Poehler"
name.split(' ')
print(name)

Amy Poehler


In [46]:
name.upper()

'AMY POEHLER'

In [47]:
name.lower()

'amy poehler'

#### Lists 

* Lists are **mutable** sequences of objects of any type typically used to store homogeneous items
* Lists are a type of sequence
* Strings vs. lists: Sequences of characters vs. sequences of any objects
* Strings vs. lists: Strings are immutable whereas lists are mutable
* It is common in practice for lists to hold objects of one type only (which may consist of many kinds of nested objects)

In [48]:
names = ["Peter", "John", "Mary"]
names[0]
names[1]
names.append("Tom")
names = names + ["Jim"]
names

['Peter', 'John', 'Mary', 'Tom', 'Jim']

In [49]:
import random
random.choice(names) 

'Peter'

In [50]:
names.reverse() # reverses the list in place (common mutable sequence operation)
names.sort()    # sorts the list i place (list method)

**NOTE:** Because lists are mutable, the above method calls actually *modify the original list*. Because strings are immutable, they cannot modify the string itself, so many methods return a new string. Therefore, with strings, if you want to store the result, you need to assign it to a variable.

#### Tuples 

* Tuples are **immutable** sequences typically used to store heterogeneous data
* Tuples are best viewed as a single object consisting of several parts
* You can use a tuple to return multiple values from a function (coming up)
* Although tuples and lists can often be used to perform similar tasks, creating short lists wastes memory because Python optimizes the performance of certain methods (such as `append()`) by slightly over-allocating memory

In [51]:
T = (1,2,3,4)
len(T)
T + (5,6)             # concatenation
T[0]
T[1] = 20             # immutable (!)
x = 12.3; y = 22.5
coordinates = (x,y)   # tuple packing
(x, y) = coordinates  # tuple unpacking
x, y = coordinates    # tuple unpacking
coordinates = [(x,x**2) for x in range(10)]
for x, y in coordinates:
    print(x,y)

TypeError: 'tuple' object does not support item assignment

#### Ranges 

* Ranges are **immutable** sequences of integers commonly used in `for` loops
* Can use negative steps but cannot use non-integer steps
* Note that `range` objects are different from `list` objects
* Although could use a list in `for` loops, `range` is much more memory efficient as it stores only start, stop, and step values, calculating individual items as needed

In [52]:
list(range(5))      # stop
list(range(1,6))    # start, stop
list(range(0,11,2)) # start, stop, step

[0, 2, 4, 6, 8, 10]

#### Sets 

* Sets are unordered collections of distinct objects
* There are two types of sets: `set` is **mutable** whereas `frozenset` is **immutable**
* A set is an unordered collection of objects and it *cannot be indexed*
* The elements of a set are never duplicated, i.e., adding the same element again to a set has no effect (compared to `list.append()`)
* Useful for keeping track of distinct objects and for doing mathematical set operations (union, intersection, set difference)
* Sets come with non-mutating and mutating methods, where the mutating methods can be called only on instances of type `set`


In [53]:
nodes = set()         # empty set
nodes = set([2,4,6,8,10,12])
nodes.add(14)
nodes.add(6)          # adding a single item at a time 
nodes.update([6,14])  # item adding multiple items at a time
nodes.pop()           # remove and return an arbitrary element
nodes.remove(4)       # remove a specified element
females = set([2,6,8])
males = nodes.difference(females)
print(nodes)          # difference is a nonmutating method
2 in males            # testing set membership

{6, 8, 10, 12, 14}


False

#### Dictionaries 

* Dictionaries are mappings from key objects to value objects
* Consist of key-value pairs, where keys must be immutable and the values can be anything
* Dictionaries themselves are mutable, and can be seen as associative arrays
* Dictionaries can be used for performing fast lookups on unordered data
* Note that dictionaries are not sequences, and therefore do not maintain any type of left-to-right order, i.e., keys and values are iterated over in an arbitrary order

In [54]:
md = {}      # create an empty dictionary
md = dict()  # create an empty dictionary
age = {'Tim': 29, 'Jim': 31, 'Pam': 27, 'Sam': 35} # create a dictionary named age
age['Tim']                                         # names are keys and numbers are values
age['Tim'] = age['Tim'] + 1
age['Tim'] += 1
age['Tom'] = 45

In [55]:
# To get a new view of the dictionary's keys
age.keys()


dict_keys(['Tim', 'Jim', 'Pam', 'Sam', 'Tom'])

In [56]:
# To get a new view of the dictionary's values

age.values()

dict_values([31, 31, 27, 35, 45])

### Python Compound Statements

* Compound statements contain (groups of) other statements and they affect or control the execution of those other statements in some way
* Compound statements typically span multiple lines
* A compound statement consists of one or more **clauses** where a clause consists of a header and a block or suite of code
* The clause headers of a particular compound statement start with a keyword, end with a colon, and are **all at the same indentation level**
* A block or suite of code of each clause however must be indented to indicate that it forms a group of statements that logically fall under the header
* There are no hard-and-fast rules about indentation as long as you are consistent
* Tab and 4 spaces are probably the most common choices
* Here's an example of a compound statement with 1 clause (1 header line + 2 lines of code in the block)

In [57]:
x = 3
y = 1

if x > y:
    difference = x - y
    print("x is greater than y")
    
print("But this gets printed no matter what!")

x is greater than y
But this gets printed no matter what!


#### The Python `if` statement:

if test:
    (block of code)
elif test:
    (block of code)
else:
    (block of code)

* The `if` statement selects from among one or more actions, and it runs the block associated with the first `if` or `elif` test that is true, or the `else` suite if all are false
* This example computes the absolute value of a difference, i.e., the distance between the numbers `x` and `y`
* Python has the built-in `abs` function for this purpose, so this is just a pedagogical example

In [58]:
if x > y:
    absval = x - y
elif y > x:
    absval = y - x
else:
    absval = 0

#### The Python `for` statement:

for target in sequence:
    (block of code)
    
* The `for` loop is a sequence iteration that assigns items in sequence to targets one at a time and runs the block of code for each item
* Unless the loop is terminated early with the `break` statement, the block of code is run as many times as there are items in the sequence


In [60]:
names = ['Peter', 'John', 'Mary', 'Helen', 'Tom', 'Nicholas'] 
for name in names:
    print(name)
    
for x in [0,1,2,3,4,5,6,7,8,9,10]:
    print(x)
    
for x in range(11):
    print(x)

Peter
John
Mary
Helen
Tom
Nicholas
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
8
9
10


A common operation is to take an existing list, apply some operation to all of the items of the list, and create a new list containing the results. In Python, there is an operator for this task known as a **list comprehension**. Consider the following approach to computing squares of a list of numbers:

In [61]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squares = []
for n in numbers:
    squares.append(n*n)

This can be easily implemented as a list comprehension:

In [62]:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squares = [n*n for n in numbers]

If we just care about the squares, we can do the following:

In [64]:
squares = [n*n for n in range(0,11)]

#### Python `for` Statement and Handling Files

* We commonly want to read data from a file or write data to a file
* The syntax for reading a file line-by-line:

In [None]:
for line in open('input.txt'):
    line = line.rstrip()
    line = line.split(',')

Or more economically:

In [None]:
for line in open('input.txt'):
    line = line.rstrip().split(',')

* Writing a file line-by-line:

In [None]:
F = open('output.txt', 'w')
F.write('Hello there! \n')    # need the newline character here (\n)
F.close()

#### The Python `while` is used for repeated execution as long as an expression is true:

while expression:
    (block of code)
else:
    (block of code)

* The while loop repeatedly tests the expression
* If the expression is true, it executes the first block of code
* If the expression is false, it executes the second block of code, if present, and terminates the loop
* Use of `for` and `while` depends on whether looping over a known sequence or whether looping while some condition is true


### Python Functions

* Functions are devices for grouping statements so that they can be easily run more than once in a program
* Functions maximize code reuse and minimize code redundancy
* Functions enable dividing larger tasks into smaller chunks (procedural decomposition)
* Functions are written using the `def` statement
* The `return` statement sends the result object back to the caller

In [66]:
def add(a,b):
    result = a + b    
    return result

my_sum = add(2,3)
my_sum

5

* All names assigned in a function are local to that function and exist only while the function runs (unless using the `global` statement)
* Arguments are matched by position
* Use `tuples` to return multiple values from a function:

In [67]:
def add_and_subtract(a,b):
    sm = a + b
    df = a - b    
    return (sm, df)       # parenthesis are optional here

my_sum, my_diff = add_and_subtract(5, 18)
print(my_sum, my_diff)

23 -13


* Functions do not exist until Python reaches and runs the `def` statement
* A function is not executed until the given function is called using the `function()` syntax
* The `def` statement creates an object and assigns it to a name
* Let's look at a slightly more complex example
* This function returns the intersection of two sequences:

In [68]:
def intersect(seq1,seq2):
    res = []
    for x in seq1:
        if x in seq2:
            res.append(x)
    return res
 
A = [1,3,5,7,9]
B = [2,3,4,5,6]
C = intersect(A,B)
C

[3, 5]

### Understanding Common Errors

* Not reading and/or understanding common error messages
* Not understanding basic built-in types (e.g., dictionaries have no internal ordering)
* Trying to do an operation that is not supported by the object
* Accessing the object in a wrong way (e.g., key vs. index)
* Trying to modify immutable objects
* Not understanding nested objects
* Not knowing the type of an object
* Trying to operate on two objects of different type (e.g., trying to concatenate a string and a number)
* **Incorrect Indexing:** remember that indexing starts at 0 in Python, rather than 1, like in R
