# PSTAT 160A Fall 2021: Getting started with Python and Jupyter notebooks

What is Python?

* Python is an interpreted language (no need to compile the code, like C or Java).
* Python is a scripting language (compact code, fast prototyping).
* Python is the most frequently teached programming language in CS departments of universities.
* Many machine learning libraries are developed in Python (e.g. Scikit-Learn, PyBrain, Theano, PyTorch, ...).

What is a Jupyter Notebook? 

* An interactive document that contains both executable computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, LaTeX, HTML etc…).
* Can be used for numerical simulations, statistical modeling, data visualization, machine learning, and much more...
* Code is executed by a notebook kernel (“computational engine”), e.g., the ipython kernel executes python code, but kernels for many other languages exist too.

Some useful links:
* The Python Standard Library: https://docs.python.org/3/library/index.html
* The Python Language Reference: https://docs.python.org/3/reference/index.html#reference-index
* **NumPy** package: http://www.numpy.org
* SciPy package: https://docs.scipy.org/doc/scipy-1.2.1/reference/ <br>
(in particular <tt>scipy.stats</tt> https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html)
* **Matplotlib** package: https://matplotlib.org

Some useful links:
* Learning Python: https://www.codecademy.com/learn/learn-python-3 
* Jupiter: https://jupyter.org, https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html
* Very useful Jupyter Notebook shortcuts: https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330

Other useful package (will not be used in PSTAT 160A):
* Pandas: http://pandas.pydata.org

## 1. Python basics:

In [1]:
print("hello world")

hello world


In [2]:
x = 3

In [3]:
x

3

In [4]:
type(x)

int

In [5]:
x = 1.0

In [6]:
x

1.0

In [7]:
type(x)

float

In [8]:
x = float(1)
x, type(x)

(1.0, float)

In [9]:
x = int(1.0)
x, type(x)

(1, int)

In [10]:
x = 5.4
x , type(x)

(5.4, float)

### Operators

In [None]:
1+2

In [None]:
1.0+2.0

In [None]:
1.0+2

In [None]:
2/4

In [None]:
2**3

Operators can be applied to more complex types of objects, and the way they apply depend on these types:

In [12]:
a = [1,2,3]
a

[1, 2, 3]

In [13]:
type(a)

list

In [14]:
a = list([1,2,3])
a

[1, 2, 3]

In [15]:
L=[3,6,8,10]
L

[3, 6, 8, 10]

In [16]:
L[0], L[1]

(3, 6)

In [17]:
a[0],a[1],a[2]

(1, 2, 3)

In [18]:
b = list([2,3,4])
b

[2, 3, 4]

In [19]:
a[1] + b[1]

5

In [None]:
a + b

### List, range in the Standard Python Library

Always check Python Standard Library Documentation to learn about built-in types, built-in fucntions, etc.


In [None]:
x = [1,2,3]
x

In [None]:
type(x)

In [None]:
x[0]

In [None]:
x[0]+x[1]+x[2]

In [None]:
L = x

Check https://docs.python.org/3/tutorial/datastructures.html?highlight=class%20list for class "list"!

In [None]:
sum(L)

In [None]:
L.append(90)
L

In [None]:
x

In [None]:
x.count(2)

In [None]:
y = [1,1,2,2,2,4,90]

In [None]:
y.count(9)

In [None]:
x.index(3)

In [None]:
x.index(90)

In [None]:
cars = ['BMW','Audi','Benz','Volkswagen']
cars

In [None]:
type(cars)

In [None]:
cars[0]

In [None]:
cars.append('Porsche')

In [None]:
cars

In [None]:
cars.count('BMW')

In [None]:
cars.index('Benz')

Check https://docs.python.org/3/library/stdtypes.html?highlight=range#range for type "range"!

In [None]:
list(range(15))

In [None]:
list(range(4,15,3))

In [None]:
list(range(0,10,1))

In [None]:
z = list(range(2,10,2))
z

In [None]:
z.append(9)
z

### Precedence of operators

In [None]:
1+2*3

In [None]:
(1+2)*3

In [None]:
1.0/2.0/2.0

In [None]:
1.0/(2.0/2.0)

### Functions

In [None]:
def f(x):
    y = x**2
    return y

In [None]:
f(4)

Functions can be seen as a variable:

In [None]:
f = lambda x: x**2

In [None]:
f(3)

In [None]:
g= lambda y: y**(2*y-4)
g(3)

A function does not even need a name:

In [None]:
(lambda x: x**2)(2)

### Iterators

In [None]:
for i in range(1,11,1):
    print(i)     

In [None]:
a = list(range(10))
a
for i in range(10):
    a[i] = f(i)
    print(a[i])

In [None]:
a = []
for i in range(10):
    a.append(f(i))

In [None]:
a

In [None]:
a = [0]
i = 0

while a[i] < 10:
    a.append(i+1)
    i =  i + 1

In [None]:
a

In [None]:
a=[]
a

In [None]:
x = int(-3)

In [None]:
x

In [None]:
x= 5

In [None]:
if x < 0:
    print('x is negative')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('x is larger than 1')

See https://docs.python.org/3/reference/compound_stmts.html for <tt>for-loops</tt>, <tt>if-</tt>/<tt>while-<tt>statements 

## 2. NumPy Package:

* Python is a simple and compact scripting language, but relatively slow!
* NumPy is the fundamental package for scientific computing with Python.

Quickstart Tutorial for NumPy: https://www.numpy.org/devdocs/user/quickstart.html

Documentation: https://www.numpy.org/devdocs/genindex.html

### Performance evaluation

To check that Numpy provides a computational benefit over standard Python, we can compare the running time of a similar computation performed in Python and in Numpy.

In [None]:
import numpy
import time

In [None]:
## Adding two vectors in python 
a = [i for i in range(1000000)] 
b = [1 for i in range(1000000)]
c = [0 for i in range(1000000)] # output vector (initialized to zero)


In [None]:
## Start the computation
start = time.process_time()

for i in range(1000000):
    c[i] = a[i] + b[i]

end = time.process_time()

print("%.3f seconds"%(end-start))

In [None]:
## Adding two vectors in numpy 

a = numpy.arange(1000000)
b = numpy.ones(1000000)
c = numpy.zeros(1000000)

## Start the computation
start = time.process_time()

numpy.add(a,b,out=c)

end = time.process_time()

print("%.3f seconds"%(end-start))
print("%.4f seconds"%(end-start))

The TimeIt “Magic Command”:

In [None]:
A = [i for i in range(100000)]
B = [1 for i in range(100000)]
%timeit a+b

In [None]:
a = numpy.arange(100000)
b = numpy.ones(100000)
%timeit a+b

### Numpy basics

Numpy arrays can be directly initialized by the function <tt>numpy.array<tt/>:

In [None]:
import numpy as np

In [None]:
m = np.array([[1.0,2.0],[3.0,4.0]])
m

In [None]:
m = numpy.array([[1.0,2.0],[3.0,4.0]])

In [None]:
m

In [None]:
print(m)

In [None]:
type(m)

Numpy arrays can be initialized to specific values (<tt>numpy.zeros</tt>, <tt>numpy.ones</tt>, . . . ):

In [None]:
numpy.ones(10)

In [None]:
numpy.zeros([3,2])

In [None]:
np.zeros(5)

Special numpy arrays (e.g. identity) can be created easily:

In [None]:
numpy.identity(2)

In [None]:
numpy.diag([1.0,2.0,3.0])

Multidimensional arrays can be created:

In [None]:
a = numpy.ones([3,3,3]) 
a

The properties of an array:

In [None]:
a = numpy.ones([2,2])   
type(a), a.shape, a.size, a.ndim, a.dtype

In [None]:
a = numpy.ones([3,3,3]) 
type(a), a.shape, a.size, a.ndim, a.dtype

Selection on arrays:

In [None]:
a = numpy.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
a

In [None]:
a[0]

In [None]:
a[0,:]

In [None]:
a[:,0]

In [None]:
a[0:3]

In [None]:
a[1:2]

In [None]:
a[1][0]

In [None]:
a[1,0]

In [None]:
a[:2]

In [None]:
a[:,2]

In [None]:
a[:,2][1]

In [None]:
a[:,2][-1]

In [None]:
a[:,2][:-1]

In [None]:
a[:,:2]

In [None]:
a[1:3,1:3]

In [None]:
a[1:3,1:2]

In [None]:
a[[0,3]]

In [None]:
a[[0,3],2]

In [None]:
a[[0,3]][:,2]

In [None]:
a[[0,3]][:,[0,2]]

In [None]:
a = numpy.ones([3,3,3]) 

In [None]:
a[0]

In [None]:
a[0][0]

Extend arrays:

In [None]:
a = numpy.array([1, 2, 3])
a

In [None]:
numpy.append(a,4)

In [None]:
numpy.append(a,[4,5,6])

In [None]:
numpy.append(a,[4,5,6],axis=0)

In [None]:
b = numpy.append([a],[[4,5,6]],axis=1)
b

In [None]:
b.shape, b.ndim

In [None]:
a.shape, a.ndim

In [None]:
numpy.append([a],[[4,5,6]],axis=0)

Operations (multiplication and division operators apply element-wise!):

In [None]:
a = numpy.array([[1.0,2.0],[3.0,4.0]])
b = numpy.array([[2.0,3.0],[4.0,5.0]])

In [None]:
a+b

In [None]:
a*b

In [None]:
a/b

In [None]:
a+10

In [None]:
a**2

Matrix multiplication

In [None]:
a = numpy.array([[1.0,2.0],[3.0,4.0],[5.0,6.0]])
a

In [None]:
b = numpy.array([[1.0,2.0,1.0,2.0],[3.0,4.0,2.0,1.0]])
b

In [None]:
a.shape

In [None]:
b.shape

In [None]:
numpy.dot(a,b)

In [None]:
numpy.dot(a,b).shape

In [None]:
a = numpy.array([[1.0,2.0],[3.0,4.0]])

In [None]:
numpy.dot(a,a)

In [None]:
a**2

Reshaping:

In [None]:
a = numpy.array([[1.0,2.0],[3.0,4.0],[5.0,6.0]])
a

In [None]:
a.flatten()

In [None]:
numpy.ravel(a)

In [None]:
numpy.ravel(a).shape

In [None]:
a

In [None]:
a.reshape([2,3])

Numpy Reduce-type Functions:

In [None]:
a = numpy.array([[1.0,2.0],[5.0,6.0],[3.0,4.0]])
a

In [None]:
a.sum()

In [None]:
numpy.sum(a)

In [None]:
numpy.cumsum(a)

In [None]:
a.cumsum()

In [None]:
a.sum(axis=0)

In [None]:
a.sum(axis=1)

In [None]:
a.mean()

In [None]:
numpy.mean(a)

In [None]:
a.mean(axis=0)

In [None]:
a.max()

In [None]:
numpy.max(a)

In [None]:
a.argmax()

In [None]:
numpy.argmax(a)

In [None]:
numpy.unravel_index(numpy.argmax(a),a.shape)

### Mathematical functions:

In [None]:
numpy.abs(-8)

In [None]:
numpy.log(1)

In [None]:
numpy.log(numpy.exp(3))

In [None]:
numpy.exp(numpy.log(2))

In [None]:
x = numpy.arange(0,1,0.1)
x

In [None]:
numpy.exp(x)

In [None]:
numpy.exp(x).sum()

In [None]:
len(x)

In [None]:
y = 0
z = 0
for i in range(0,10,1):
    y = numpy.exp(x[i])
    z = z + y 

In [None]:
z

More functions:

In [None]:
a = numpy.array([1,2,3,4,5,6,7,8,9,10]) 

In [None]:
numpy.less_equal(a,5)

In [None]:
sum(numpy.less_equal(a,5))

In [None]:
a <= 5

In [None]:
sum(a <= 5)

In [None]:
numpy.where(a == 5)

Some Linear Algebra:

In [None]:
import numpy.linalg

In [None]:
M = numpy.array([[2.0,3.0,1.0],[5.0,8.0,3.0],[4.0,5.0,6.0]])

In [None]:
print(M)

In [None]:
print(M.T)

In [None]:
M = numpy.dot(M,M.T)

In [None]:
print(M)

In [None]:
numpy.linalg.inv(M)

In [None]:
numpy.dot(M,numpy.linalg.inv(M))

In [None]:
numpy.dot(M,M)

In [None]:
numpy.linalg.matrix_power(M, 2)

In [None]:
numpy.linalg.matrix_power(M, 2)[0,:][2]

In [None]:
eigenval, eigenvect = numpy.linalg.eig(M) # check documentation on function eig

In [None]:
eigenval

In [None]:
eigenvect

In [None]:
numpy.linalg.norm(eigenvect[:,0],2)

In [None]:
numpy.dot(M,eigenvect[:,0])-eigenval[0]*eigenvect[:,0]

In [None]:
numpy.dot(M,eigenvect[:,1])-eigenval[1]*eigenvect[:,1]

In [None]:
numpy.dot(M,eigenvect[:,2])-eigenval[2]*eigenvect[:,2]

## 3. Plot basics

Matplotlib is a Python 2D plotting library:

In [None]:
import matplotlib
from matplotlib import pyplot

In [None]:
pyplot.figure(figsize=(5,5))

x = numpy.linspace(0, 2, 100)  ## Check linspace in the documentation

pyplot.plot(x, x, label="linear")
pyplot.plot(x, x**2, label="quadratic")
pyplot.plot(x, x**3, label="cubic")

pyplot.xlabel("x label")
pyplot.ylabel("y label")

pyplot.title("Simple Plot")

pyplot.legend()

pyplot.show()

## 4. Sampling from distributions with NumPy:

Check https://docs.scipy.org/doc/numpy/reference/routines.random.html for random sampling.

In [None]:
import numpy, numpy.random
import numpy.random as npr

Sampling from a uniform distribution between 0 and 1:

In [None]:
print(numpy.random.uniform(0,1,[10]))

In [None]:
print(numpy.random.binomial(10,0.3,[15]))

In [None]:
print(numpy.random.uniform(0,1,[10]))

In [None]:
print(numpy.random.uniform(0,1,[10]))

Sampling from exponential distribution with scale parameter 10 (i.e. $\lambda$=1/10)

In [None]:
z = numpy.random.exponential(10,10000)
z = npr.exponential(5,7)
z

In [None]:
print(numpy.mean(z))

In [None]:
z.mean()

In [None]:
z.std()

In [None]:
numpy.random.exponential(1,[3,3])

Sampling from normal distibution of mean $\mu$=10 and standard deviation $\sigma$=0.01:

In [None]:
samples = numpy.random.normal(10,0.01,[10])
print(samples)

In [None]:
samples.mean()

In [None]:
numpy.mean(samples)

In [None]:
sum(samples)/10

In [None]:
samples.std()

In [None]:
numpy.sqrt(sum(numpy.power(samples-numpy.mean(samples), 2))/10)

In [None]:
numpy.std(samples)

Display the histogram of the samples, along with the probability density function:

In [None]:
mu, sigma = 0, 0.1

samples = numpy.random.normal(mu, sigma, 1000)

In [None]:
## Make sure that matplotlib.pyplot is imported

pyplot.hist(samples, 30, density = False)   #Check numpy documentation on hist

#pyplot.show()

In [None]:
count, bins, ignored = pyplot.hist(samples, 30, density=True)

pyplot.plot(bins, 1/(sigma * numpy.sqrt(2 * numpy.pi)) * numpy.exp( - (bins - mu)**2 / (2 * sigma**2) ), 
         linewidth=2, color='r')
pyplot.show()

In [None]:
from scipy.stats import norm

count, bins, patches = pyplot.hist(samples, 30, density=True)

pyplot.plot(bins, norm.pdf(bins,mu,sigma), linewidth=2, color='r')
pyplot.show()

Sampling from a discrete unifrom distibution with values in $\{1,2,\ldots,9,10\}$:

In [None]:
numpy.random.randint(1,10,10)

Seed the random generator (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.seed.html?highlight=seed)

In [None]:
numpy.random.seed(2)

print(numpy.random.uniform(0,1,[10]))

In [None]:
print(numpy.random.uniform(0,1,[10]))

Making discrete choices:

In [None]:
fruits = ["watermelon","apple","grape","lemon","banana","cherry"]

In [None]:
print(numpy.random.choice(fruits))
print(numpy.random.choice(fruits))
print(numpy.random.choice(fruits))
print(numpy.random.choice(fruits))

In [None]:
print(numpy.random.choice(fruits,5))

Specify a certain distribution of fruits to choose from:

In [None]:
p = [0.05,0.75,0.05,0.05,0.05,0.05]
sample = numpy.random.choice(fruits,[20],p=p)
print(sample)

In [None]:
sum(numpy.char.count(sample,"apple"))

Another way to make discrete choices:

In [None]:
# Define fruits probabilities
p = [0.05,0.75,0.05,0.05,0.05,0.05]

# Cumulate them
l = numpy.cumsum([0]+p[:-1]) # lower-bounds ]
h = numpy.cumsum(p) # upper-bounds
         
# Draw a number between 0 and 1
u = numpy.random.uniform(0,1)

# Find which basket it belongs to
s = (u>l)*(u<h)

print(s)

print(numpy.argmax(s))

# retrieve the label
fruits[numpy.argmax(s)]

Plot some samples:

In [None]:
x1 = numpy.random.uniform(0,1,[100,1])
x2 = numpy.random.uniform(0,1,[100,1]) * x1**2
X = numpy.concatenate([x1,x2],axis=1)

In [None]:
pyplot.figure(figsize=(5,5))
pyplot.scatter(X[:,0],X[:,1],color="black",s=10)
pyplot.show()

In [None]:
m = X.mean(axis=0)
m

In [None]:
pyplot.figure(figsize=(5,5))
pyplot.scatter(*X.T,color="black",s=10)
pyplot.scatter([m[0]],[m[1]],color="red",s=30)
pyplot.show()