# STA 141B - Lecture 3

January 11, 2022

#### Announcement

* Remote teaching until Jan 28
* Homework 1 posted, due Jan 19 @ 23:59pm
* Discussion session this week: Github and hints on hw1 (please attend)
* Zoom links (OH, Lectures, DS) can be found on Piazza --> "Resources"
* Thinking about group members?


#### Last time

* Python basics
* Tuple, list, and dictionaries
* Methods and attribus (use . to access methods and attributes)
* If, while, for statements
* Modules


#### Today


* Modules and Packages 
* Iteration: Loops and Comprehensions and Generators
* Numpy and pandas 

## Jupyter and Markdown

Jupyter breaks sections of the notebook into _cells_. You can choose the type of cell in the `Cell -> Cell Type` menu. Use "Code" for cells that contain code and "Markdown" for cells that contain text or images.

Code sells are set up to run Python code. When you open a Jupyter notebook, Jupyter runs a Python session called a _kernel_ in the background. Each time you run a code cell, the code is sent to the kernel, and then the results are printed in the notebook. The kernel maintains state between cells, so code you run in one cell can affect code you run in another cell.

__Caution!__ The state of the kernel depends on the order you run cells in, not the order cells appear in the notebook.

You can stop or restart the kernel using the `Kernel` menu. This is mostly useful when you want to cancel a computation.

Markdown cells allow you to input text and format it using the Markdown language. You can learn more about Markdown [here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).

* This
* is
* a
* list

**This is bold**

$\int$

$$ $$

In [2]:
x = 3

In [3]:
x

3

In [4]:
y

NameError: name 'y' is not defined

In [5]:
y = 3

In [6]:
y

3

* one-statement-per-line. Example: [here][link1]
* Convert .ipynb to .pdf or .html

[link1]: https://pedantic-python.readme.io/docs/one-statement-per-line

### Modules and Packages

A module is a file containing Python code in run time for a .py code. Any Python file is a module, its name being the file's base name without the .py extension.

A package is a collection of Python modules

Python's import command lets us load code from a module to use in our script or notebook. Note: import is like a combination of R's source() and library() functions.

Python provides many built-in modules for common tasks (see the [list](https://docs.python.org/3/py-modindex.html)).

In [1]:
# import pandas
import pandas

In [2]:
# important numpy
import numpy

In [3]:
# rename modules
import numpy as np

In [4]:
# important math
import math
# use math.pi
math.pi

3.141592653589793

In [1]:
# import self written python packages

import sys
# sys.path.append("path")
# import file_name

The difference between __import module__ and __from module import foo__ is mainly subjective. Pick the one you like best and be consistent in your use of it. Here are some points to help you decide.

#### import module

__Pros:__
* Less maintenance of your import statements. Don't need to add any additional imports to start using another item from the module

__Cons:__
* Typing module.foo in your code can be tedious and redundant (tedium can be minimized by using import module as mo then typing mo.foo)

#### from module import foo

__Pros:__
* Less typing to use foo
* More control over which items of a module can be accessed

__Cons:__
* To use a new item from the module you have to update your import statement
* You lose context about foo. For example, it's less clear what ceil() does compared to math.ceil()

Reference: [here](https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import)


More on __import__ can be found [here](https://docs.python.org/3/reference/import.html)

### Iteration

The three most important methods to repeat code for identical or similar tasks are:

<ol>
    <li>Loops (while and for)</li>
    <li>Comprehensions, Generators, and map()</li>
    <li>Vectorization (NumPy arrays and functions)</li>
</ol>    
    
These methods have tradeoffs. In general:

<ul>
    <li>Loops are the most flexible -- particularly while loops.</li>
    <li>Generators tend to use the least memory.</li>
    <li>Vectorization tends to be fastest. (Try to use vectorization as much as possible.)</li>
</ul>    
    
There are other methods for iteration, like recursion (more info 
[the list][link1] and [the list][link2] but they are not common in statistical computing with Python.)

[link1]: https://greenteapress.com/thinkpython2/html/thinkpython2006.html#sec62 
[link2]: https://greenteapress.com/thinkpython2/html/thinkpython2007.html#sec74)

#### 1. Loop tips and tricks

An iteratable object is a object that can be iterated over, element-by-element. Examples: tuples, lists, strings

Python's for-loops can automatically get elements from iterable objects.

In [5]:
# DO THIS
for x in 'hello':
    print(x)

h
e
l
l
o


In [6]:
# NOT THIS
x = 'hello'
for i in [0, 1, 2, 3, 4]:
    print(x[i])

h
e
l
l
o


The range() function returns a sequence of integers.

In [8]:
for i in range(1, 5):
    print(i)

1
2
3
4


You can use list() to convert objects like ranges to lists.

Generally, you'll only need to do this for visual inspection. You DO NOT need to convert ranges into lists to use them in loops.

In [7]:
list(range(5))

[0, 1, 2, 3, 4]

You can make the keys and values in a dictionary iterable with the .items() method.

In [10]:
x = {'hello': 1, "goodbye": 2}

for elt in x:
    print(elt, x[elt])

hello 1
goodbye 2


In [11]:
for key, val in x.items():
    print(key, val)

hello 1
goodbye 2


In [12]:
list(x.items())

[('hello', 1), ('goodbye', 2)]

__Zipping__ two sequences together means combining them into a list of tuples where:

<ul>
    <li>The first element of each tuple is an element from the first sequence.</li>
    <li>The second element of each tuple is an element from the second sequence.</li>
 </ul>

Usually it only makes sense to zip sequences that are the __same length__.

The zip() function zips two or more sequences. Use it to iterate over multiple sequences at the same time.

In [2]:
x = [1, 2, 3]
y = [4, 5, 6]

for x_elt, y_elt in zip(x, y):
    print(x_elt, y_elt)

1 4
2 5
3 6


In [6]:
zip(x, y)
list(zip(x, y))

[(1, 4), (2, 5), (3, 6)]

In [14]:
list(zip(x, y, [7, 8, 9]))

[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

In [14]:
print(list(zip(x, y, [7, 3])))
print(list(zip(x, y, [7, 3, 4, 8])))
print(type(zip(x,y)))
zip(x,y)+zip(y,x) # cannot do addition

[(1, 4, 7), (2, 5, 3)]
[(1, 4, 7), (2, 5, 3), (3, 6, 4)]
<class 'zip'>


TypeError: unsupported operand type(s) for +: 'zip' and 'zip'

The enumerate() function zips together index numbers and a sequence. In other words, the function enumerates a sequence.

In [17]:
# If you absolutely must use index numbers, at least use enumerate() to get them
x = 'hello'

enumerate(x)
list(enumerate(x))

[(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')]

In [None]:
for i, x_elt in enumerate(x):
    print("Position", i, "is", x_elt)

#### 2. Comprehensions and generators

A comprehension is a Python expression that transforms a sequence, element-by-element. The notation is similar to mathematical set notation:

In [17]:
# [x for x in Z]
[x**2 for x in range(5)]

[0, 1, 4, 9, 16]

In [18]:
import math
[math.sqrt(x) for x in range(5)] # think of this as Python's lapply()

[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0]

You can include a condition in a comprehension:

In [19]:
# Get all squares of even numbers from 0...10
# [x for x in Z if W]

x = [x**2 for x in range(11) if x % 2 == 0]
x

[0, 4, 16, 36, 64, 100]

In [20]:
[math.sin(y) for y in x]

[0.0,
 -0.7568024953079282,
 -0.2879033166650653,
 -0.9917788534431158,
 0.9200260381967907,
 -0.5063656411097588]

You can also iterate over subelements.

In [19]:
x = [[1, 2, 3], [4, 5, 6]] # print 1, 2, 3, 4, 5, 6

In [20]:
for sublist in x:
    for elt in sublist:
        print(elt)

1
2
3
4
5
6


In [21]:
[elt for sublist in x for elt in sublist]

[1, 2, 3, 4, 5, 6]

<mark>This is tricky!</mark> The outermost iterables always come first in the comprehension, which can be counterintuitive.

A comprehension surrounded by [ ] is called a list comprehension and produces a list.

A comprehension surrounded by { } (and including : ) is called a dictionary comprehension and produces a dictionary.

In [24]:
x = ["hello", "goodbye"]

lens = {name: len(name) for name in x} # print the length of names
lens

{'hello': 5, 'goodbye': 7}

In [24]:
x = {1, 2, 2, 4}  # Python dictionaries don't support duplicate keys 
x

{1, 2, 4}

In [27]:
x = [1, 2, 2, 4]
x

[1, 2, 2, 4]

In [26]:
{x**2 for x in [-1, 0, 1]}

{0, 1}

#### 3. Generator Expressions
There's no such thing as a tuple comprehension. Instead, a comprehension surrounded by ( ) is called a generator expression.

In [3]:
y = (x**2 for x in range(11) if x % 2 == 0)

In [4]:
# to print out result
for i in y:
    print(i, end=" ")

0 4 16 36 64 100 

In [5]:
sum(y) # This also forces evaluation

0

The generator yields one item at a time and generates item only when in demand. Whereas, in a list comprehension, Python reserves memory for the whole list. Thus we can say that the generator expressions are memory efficient than the lists.

In [30]:
from sys import getsizeof
  
comp = [i for i in range(10000)] # list 
gen = (i for i in range(10000)) # generator
  
#gives size for list comprehension
x = getsizeof(comp) 
print("x = ", x)
  
#gives size for generator expression
y = getsizeof(gen) 
print("y = ", y)

x =  87616
y =  112


There is a remarkable difference in the execution time. Thus, generator expressions are faster than list comprehension and hence time efficient.

In [31]:
import timeit
  
print(timeit.timeit('''list_com = [i for i in range(100) if i % 2 == 0]''', number=1000000))

4.941507946999991


In [32]:
import timeit
  
print(timeit.timeit('''gen_exp = (i for i in range(100) if i % 2 == 0)''', number=1000000))

0.3718962149996514


Python's itertools module has functions for manipulating generators and iterable objects

A generator is a special kind of iterable which computes its elements on demand. Examples: ranges, generator expressions

Generators are especially useful for working with data that are __too large__ to fit in memory. While making a huge list (say $10^9$ elements) might use enough memory to crash Python, making a generator with the same number of elements uses almost no memory.

See more examples [here](https://zacks.one/python-generators/)

[link3]: https://www.google.com/search?q=A+generator+is+a+special+kind+of+iterable+which+computes+its+elements+on+demand.+Examples%3A+ranges%2C+generator+expressionsGenerators+are+especially+useful+for+working+with+data+that+are+too+large+to+fit+in+memory.+While+making+a+huge+list+(say+%2410%5E9%24+elements)+might+use+enough+memory+to+crash+Python%2C+making+a+generator+with+the+same+number+of+elements+uses+almost+no+memory.You+can+become+a+generator+ninja+and+see+several+examples+that+use+real+data+here.&oq=A+generator+is+a+special+kind+of+iterable+which+computes+its+elements+on+demand.+Examples%3A+ranges%2C+generator+expressionsGenerators+are+especially+useful+for+working+with+data+that+are+too+large+to+fit+in+memory.+While+making+a+huge+list+(say+%2410%5E9%24+elements)+might+use+enough+memory+to+crash+Python%2C+making+a+generator+with+the+same+number+of+elements+uses+almost+no+memory.You+can+become+a+generator+ninja+and+see+several+examples+that+use+real+data+here.&aqs=chrome..69i57j69i64l2&sourceid=chrome&ie=UTF-8.