# Hello Everybody!

Welcome to the REU Programming Workshop. The rough agenda will be:

- Brief overview of what Anaconda is (further help & discussion after the workshop will be available).
- Introduction to Python and Jupyter
  - We will use Google Colab for simplicity
- Basic use of Python in Jupyter
- Introduction to NumPy

# Anaconda

Anaconda is a mostly platform-independent software management system that makes your life a lot easier! **Highly recommended**.

- https://www.anaconda.com
- Most standardized way to install Python, tools etc. on Windows and Mac OS, very large community!
- Bundles almost all useful Python packages out of the box!

# Python

Python is the most popular programming language in the scientific community except for very high-performance work. Most of this workshop will be targeted towards introducing the common tools available to Python users.

- If you install Anaconda, you will have Python!



# Jupyter

Jupyter is a convenient, "interactive" interface built on top of the Python programming language. It lets you execute little pieces of code at a time, and it lets you mix text and code to create a self-contained writeup.

- We will mostly use https://colab.research.google.com during this workshop, a convenient web-based tool that includes most of Jupyter's feature set.
- Jupyter supports using "Markdown", a very popular text annotation convention, to format text (e.g. **bold**, *italic*). My bookmarked cheat sheet is https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

# **CODING TIME**



---



Below, we will run our first line of code in Jupyter.

In [None]:
# this is a comment
# Click this cell, and press CTRL + Enter to execute this cell.
print("Hello World")

Hello World


## Universal Code State ("Runtime")

Jupyter saves code state across cells in a Python "runtime".

In [None]:
# this is one cell
a = 5

In [None]:
# this is another cell
# make sure to run (CTRL+Enter) the previous cell before this one!
print(a)

5


## Importing Packages

Jupyter allows you to "import" packages to extend the core Python functionality. Here, we will use Numpy to compute the sine of 5.

In [None]:
import numpy as np
print("The sin of 5 radians is", np.sin(5))

The sin of 5 radians is -0.9589242746631385


If you have already imported a package, you can view the description ("docstring") of the function when running the function

- In Jupyter, use `Shift + Tab` to do this

In [None]:
# type np.sin() and see the popup!
# np.cos()

Can also view documentation using the built in `help` command

In [None]:
help(np.array)

Help on built-in function array in module numpy:

array(...)
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
        Specify the memory layout of the array. If object is not an array, the
        newly c

## Restarting Runtime

Sometimes, you may want to "restart runtime" when you've done something silly

In [None]:
np = 5
print(np.sin(5))

AttributeError: ignored

Oh no! Now, comment out the above cell, "restart runtime" then "run before".

- **NB:** It's good practice to make sure your Jupyter notebook runs from start to finish, if you want to email it to somebody. It's like showing your work!

In [None]:
print(np.sin(5))

-0.9589242746631385


# Useful Programming Features

### Lists and indexing

In [None]:
# get a list of all numbers below 20
numbers = list(range(20))

In [None]:
# print every number starting with the 10th (zero-indexed!)
print(numbers[10: ])

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


In [None]:
# print every number until the second to last (exclusive)
print(numbers[ :18])
print(numbers[ :-2]) # negative notation!

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]


In [None]:
# print every third number
print(numbers[::3])

[0, 3, 6, 9, 12, 15, 18]


In [None]:
# GETTING WILD
print(numbers[1:-1:2])

[1, 3, 5, 7, 9, 11, 13, 15, 17]


Just remember that the syntax is `[start:stop:stride]`

### Defining functions

It is almost always recommended to split your code up into separate, well-named functions for readability. For example:

In [None]:
def list_sum(lst):
  '''
  This is a docstring. It is recommended to at document your code somewhat if it can get messy. For instance:

  Given a list, returns the sum of the entries of the list.

  Many fancier documentation conventions exist, but any are better than none
  '''
  sum = 0
  for item in lst:
    sum += item
  return sum

In [None]:
# in a new cell, we can call list_sum!
my_lst = [1, 2, 3, 4, 5]
print("The sum of the list is", list_sum(my_lst))

The sum of the list is 15


# Numpy In Depth

"Numeric Python" (not num-pee):

- Mathematical functions (sine like before!)
- Faster numerical computations (Python is not performant, numpy uses "machine code")

## Numpy Mathematical Functions

In [None]:
# import numpy again ("np" is convention)
import numpy as np

# Numpy uses radians by default
print("The sine of pi / 6 is", np.sin(np.pi / 6))

# Numpy can convert for us though
print(np.cos(np.radians(60)))

The sine of pi / 6 is 0.49999999999999994
0.5000000000000001


### Bonus topic!

Ohh, that's ugly. There are two solutions: rounding, and "string formatting"
- For a string formatting cheat sheet, check out: https://kapeli.com/cheat_sheets/Python_Format_Strings.docset/Contents/Resources/Documents/index

In [None]:
print(np.round(np.sin(np.pi / 6), 6)) # round to millionths
print("{:.6g}".format(np.sin(np.pi / 6))) # print up to 6 decimal digits
print("{:.6f}".format(np.sin(np.pi / 6))) # print exactly 6 (zero padded) decimal digits

0.5
0.5
0.500000


### Back on track

In [None]:
print("The natural log of e is", np.log(np.e))
print("The base 10 log of 5 is {:.5g}".format(np.log10(5)))
print("The inverse sine of 0.5 in degrees is {:.5g}".format(np.degrees(np.arcsin(0.5))))

The natural log of e is 1.0
The base 10 log of 5 is 0.69897
The inverse sine of 0.5 in degrees is 30


## Numpy utility functions

In [None]:
print("10 numbers between 0 and 1 inclusive")
print(np.linspace(0, 1, 10))

10 numbers between 0 and 1 inclusive
[0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
 0.66666667 0.77777778 0.88888889 1.        ]


In [None]:
print("Count from 0 (inclusive) to 20 (exclusive) by 2s")
print(np.arange(0, 20, 2))

Count from 0 (inclusive) to 20 (exclusive) by 2s
[ 0  2  4  6  8 10 12 14 16 18]


Fun problem! Print a list containing all squares below 100, two ways:

In [None]:
# way 1
squares = []
for i in range(100):
  if np.sqrt(i) == np.floor(np.sqrt(i)):
    squares.append(i)
print(squares)

# way 2
vals = np.arange(100)
square_idxs = np.where(np.sqrt(vals) == np.floor(np.sqrt(vals)))[0]
print(vals[square_idxs])

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[ 0  1  4  9 16 25 36 49 64 81]


## 2D Arrays

Numpy supports multidimensional arrays! You can slice them in all sorts of fun ways.

In [None]:
twod_arr = np.array([[1, 2], [3, 4]])
print(twod_arr)

[[1 2]
 [3 4]]


In [None]:
print(twod_arr[0], twod_arr[0, :]) # specify multiple indicies via a comma; later indicies can be omitted

[1 2] [1 2]


In [None]:
print(twod_arr[:, 0]) # slice notation in the first index this time

[1 3]


In [None]:
print(twod_arr[1, 1]) # a single element

4


## The strength of numpy

Loops are **very** slow in Python, and you should try to avoid them as much as possible. Let's see a few examples

In [None]:
import time

# using loops
num_nums = 1000000
a = list(range(num_nums))
start = time.time()
for idx in range(num_nums):
  a[idx] = a[idx]**2
print("Took {:.6f} seconds".format(time.time() - start))

# using numpy
a = np.arange(num_nums)
start = time.time()
a = a**2
print("Took {:.6f} seconds".format(time.time() - start))

Took 0.375018 seconds
Took 0.001565 seconds


This is especially useful for all of the built in numpy functions!

In [None]:
# using loops
num_nums = 1000000
a = list(range(num_nums))
start = time.time()
for idx in range(num_nums):
  a[idx] = np.sin(a[idx])
print("Took {:.6f} seconds".format(time.time() - start))

# using numpy
a = np.arange(num_nums)
start = time.time()
a = np.sin(a)
print("Took {:.6f} seconds".format(time.time() - start))

Took 1.651563 seconds
Took 0.022850 seconds


Numpy knows that this is the best way to handle things, so you can do simple things like multiply and add numpy arrays!

In [None]:
a = np.arange(10)
b = np.arange(10)
print(2 * a)
print(a * b)

[ 0  2  4  6  8 10 12 14 16 18]
[ 0  1  4  9 16 25 36 49 64 81]


### Challenge example & problem: create a 2d array of products

We will try to create a two-d array such that `a[i, j] = i * j`. Let's see how much faster vectorization is!

In [None]:
# First, a python way
def twod_products_python(n):
  res = np.zeros((n, n))
  for i in range(n):
    for j in range(n):
      res[i, j] = i * j
  return res
print(twod_products_python(4))

[[0. 0. 0. 0.]
 [0. 1. 2. 3.]
 [0. 2. 4. 6.]
 [0. 3. 6. 9.]]


In [None]:
# how fast is Python?
import time
start = time.time()
res = twod_products_python(2000)
print("Took {:.6f} seconds".format(time.time() - start))

Took 0.792772 seconds


In [None]:
# Can we do better?
def twod_products_vec(n):
  arr_vals = np.arange(n)
  return np.outer(arr_vals, arr_vals)

In [None]:
start = time.time()
res_vec = twod_products_vec(2000)
print("Took {:.6f} seconds".format(time.time() - start))
print(len(np.where(res_vec - res > 0)[0]))

Took 0.032668 seconds
0
