# Extending Python: Numpy 

## Python limits
* Python is a good solid general purpose language, but
* It lacks many basic tools

    * Numerical/vector stuff:  Numpy
    * Graphics: Matplotlib
    * Database/data processing: Pandas

# Vectors

* What is a vector?
* Why do we need them?
* Also, matrices, and arrays too


# Vectors as lists

In [1]:
# Try a vector of a
a = [1, 2.5, 4.5]
print(type(a))
print(a)
print(a[1])
print(a[0:2])
print(2*a)  # Here means repeats for 2 times rather than do the actual calculation.
print(sum(a)) # sum(): the parameter is a list or a range
print('-------------------------')
b = range(1, 11, 1)
print(sum(b))
print(type(b))

<class 'list'>
[1, 2.5, 4.5]
2.5
[1, 2.5]
[1, 2.5, 4.5, 1, 2.5, 4.5]
8.0
-------------------------
55
<class 'range'>


## Problems

* This is ok, but often slow
* Does some weird things (2*a)
* Does not know many functions (log)
* On to numpy

## Loading numpy

* You first need to load the numpy tools into Python
* This is done with **import**
* The following loads numpy, and gives it a prefix np

In [2]:
import numpy as np

## Define a basic numpy vector

In [3]:
a = np.array([0.5,1,3.5,10.,11,20.32])
print(type(a))
print(a)
print(a[1])
print(a[:2]) # slicing is the same as range，including the beginning point while excluding the ending point
print(a[3:])

<class 'numpy.ndarray'>
[ 0.5   1.    3.5  10.   11.   20.32]
1.0
[0.5 1. ]
[10.   11.   20.32]


## Many useful built in functions
* Operate on each element
* Do this pretty fast

In [4]:
print(np.sum(a))
print(np.log(a))
print(np.sqrt(a))

46.32
[-0.69314718  0.          1.25276297  2.30258509  2.39789527  3.01160562]
[0.70710678 1.         1.87082869 3.16227766 3.31662479 4.50777107]


# Vectorized notation makes sense

In [5]:
print(a)
print(2.*a)
print(2.*a + 1.)
print(a ** 2)

[ 0.5   1.    3.5  10.   11.   20.32]
[ 1.    2.    7.   20.   22.   40.64]
[ 2.    3.    8.   21.   23.   41.64]
[2.500000e-01 1.000000e+00 1.225000e+01 1.000000e+02 1.210000e+02
 4.129024e+02]


## Initialization functions

In [6]:
zvec = np.zeros(100)
ovec = np.ones(100)
print(zvec)
print('---------------------------------------------------------------------')
print(ovec)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0.]
---------------------------------------------------------------------
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1.]


# Moving to more dimensions

* Moving to matrices or arrays is pretty easy

In [7]:
print(a)
print('---------------------------------------')
b = np.array([a,2.*a,3.*a])  # When it is an array, it can do the actual calculation without repearing for n times.
print(b[1,2])  # The index in the array is very special, the first number means which row, and the second means which column～
# So, how about I want to jump though this value?
print(b[1,:])
print(b[:,3])
print(b[1:,3:])

[ 0.5   1.    3.5  10.   11.   20.32]
---------------------------------------
7.0
[ 1.    2.    7.   20.   22.   40.64]
[10. 20. 30.]
[[20.   22.   40.64]
 [30.   33.   60.96]]


## Functions are clever with multi dimensions
* They can go in different directions

In [8]:
print(b.sum(axis=0)) # sum over rows 
print(b.sum(axis=1)) # sum over columns
print(b.sum()) # all

[  3.     6.    21.    60.    66.   121.92]
[ 46.32  92.64 138.96]
277.91999999999996


In [9]:
# Intializations
zmat = np.zeros([2,2])
omat = np.ones((2,2))
print(zmat)

[[0. 0.]
 [0. 0.]]


## Array types
* Numpy arrays are orderly in terms of size (row lengths the same)
* Homogeneous in terms of data types
* This allows for big speed improvements
* Remember that list can have any entry in each element

In [10]:
zeroin = 3*np.ones((5,5), dtype='i')
boolin = zeroin == 0   # What if the operater not ==？？？How to represent > or <？？
print(zeroin)
print(boolin)

[[3 3 3 3 3]
 [3 3 3 3 3]
 [3 3 3 3 3]
 [3 3 3 3 3]
 [3 3 3 3 3]]
[[False False False False False]
 [False False False False False]
 [False False False False False]
 [False False False False False]
 [False False False False False]]


# Random numbers
* We will spend a lot of this course looking at computer generated random numbers
* Numpy is very good at generating lots of random numbers

In [11]:
# normal random variables
x = np.random.standard_normal(10)  # Draw samples from a standard Normal distribution (mean=0, stdev=1)
print(x)
b = x > 0
print(b)

[ 0.66379807 -0.26366298 -1.09646654  1.24233598 -0.24371558  0.48746955
 -0.71283256  2.1401661   1.53358738 -1.27614891]
[ True False False  True False  True False  True  True False]


In [12]:
# uniform random variables
# note how the routine is called
y = np.random.uniform(low=0,high=5,size=20) 
print(y)
# Samples are uniformly distributed over the half-open interval [low, high) (includes low, but excludes high). 
# In other words, any value within the given interval is equally likely to be drawn by uniform.

N = 100 
t = np.random.uniform(low=0,high=5,size=N)
b = t < 1.  # Since it's array, so the output is true/false，rather than the specific value satisfying t<1
print('---------------------------------------------------------------------')
print(t)
print('---------------------------------------------------------------------')
print(b)

# How does sum treat booleans?
print(np.sum(b)/float(N))

[4.54431452 3.81335284 0.79306135 1.33636898 3.23467085 3.39745179
 3.8689281  0.78736628 3.9762934  4.61023095 4.84451405 4.00168422
 2.52671744 0.02685906 1.17047372 1.54314577 3.17438685 1.08258221
 1.00431483 3.06678079]
---------------------------------------------------------------------
[1.68163407 0.53541839 4.70537851 4.68805219 3.59021193 4.45809439
 3.50957354 3.24734759 2.00841484 1.39713224 1.47492174 2.13817811
 3.99336559 3.58482128 3.86333271 3.13757767 2.61385876 1.64511798
 4.41493882 3.59667519 0.060439   4.39898496 0.73650444 2.35069959
 1.23710875 0.27146647 2.77057895 3.02336204 4.17633348 1.87996485
 0.30450671 4.58244891 2.4158041  2.34004319 4.14133553 4.76502728
 3.93670982 3.43114279 2.33018768 4.97351951 1.76324029 1.30669795
 3.38211472 0.52756778 3.39384873 3.11400667 0.60799406 3.70518787
 2.23339871 1.33521429 0.60016414 1.12190568 0.32852656 2.51828817
 4.39456217 3.75563352 0.93402725 4.89508952 2.03802028 4.96815115
 2.19906597 3.16592125 1.37845229 4

## Vectorizing
* Notation allows vectored operations
* Be a little careful at this

In [13]:
N = 10
x = np.random.uniform(low=0,high=1,size=N)
# check what arange does
y = np.arange(N)
print(x)
print(y)
print('---------------------------------------------------------------------')
# line up x and y but make 1. length N
z = 2.*x + y + 1.
# multiply elements by eachother
zz = x*y
print(z)
print(zz)

[0.23431332 0.57261324 0.00504411 0.32216295 0.74269058 0.28778846
 0.44922927 0.20396729 0.95182474 0.04858854]
[0 1 2 3 4 5 6 7 8 9]
---------------------------------------------------------------------
[ 1.46862664  3.14522647  3.01008822  4.64432591  6.48538116  6.57557693
  7.89845855  8.40793457 10.90364948 10.09717709]
[0.         0.57261324 0.01008822 0.96648886 2.97076232 1.43894231
 2.69537564 1.427771   7.61459793 0.4372969 ]


In [15]:
# what happens when different lengths?
#y2 = np.arange(2*N)
#z2 = 2.*x + y2 + 1.

## Some interesting things about boolean vectors
* Booleans have some interesting properties
* Will be useful to us


In [16]:
vbool = np.array([True, False, False])
print(vbool)
print(np.mean(vbool))

[ True False False]
0.3333333333333333


## Fancy subscripts
* Booleans can also let us to useful things to get sub vectors
* This will be really useful
* Many other vectored/matrix languages can do this too

In [17]:
vbool = np.array([True, False, True, False])
x = np.array([1., 2., 3., 4.])
# boolean is able to pull out different entries
print(x[vbool])
# This is a conditional mean
y = np.random.uniform(low=0., high=1.,size=10)
print(y)
print(y[y>0.5])
print(np.mean(y[y>0.5]))

[1. 3.]
[0.05846048 0.22907226 0.81105388 0.12579169 0.98952304 0.57292456
 0.94870342 0.6636032  0.62594541 0.91307308]
[0.81105388 0.98952304 0.57292456 0.94870342 0.6636032  0.62594541
 0.91307308]
0.789260940997343


## Functions and data types
* Python is quite amazing about figuring out what to do inside functions
* Data types change what the function does 

In [18]:
def f(x):
    return 2*x + 1

a = np.arange(4.)
print(a)
amat = np.reshape(a,(2,2))
print(amat)
print(type(f(1)))
print(type(f(1.)))
print(type(f(amat)))

[0. 1. 2. 3.]
[[0. 1.]
 [2. 3.]]
<class 'int'>
<class 'float'>
<class 'numpy.ndarray'>


# Some tricky things about Python and memory
* Be very careful with this
* It can be confusing



In [19]:
a = np.arange(10)
b = a
print(a) # 这里很tricky
b[3]=-99
print(a)

[0 1 2 3 4 5 6 7 8 9]
[  0   1   2 -99   4   5   6   7   8   9]


* b is the same as a
* They are talking about the same thing
* To create a new copy, use the copy function

In [20]:
a = np.arange(10)
b = a.copy()
b[3]=-99
print(a)
print(b)


[0 1 2 3 4 5 6 7 8 9]
[  0   1   2 -99   4   5   6   7   8   9]


## Now it gets even trickier

In [21]:
a = np.arange(10)
b = a[3:5]
#b = a[3:5].copy()  #This is different! you copy it, so don't follow the reference rules.
b[1] = -99.
print(a)
b[0:2] = -11*np.ones(2)
print(a)

# a = [0 1 2 3 4 5 6 7 8 9]
# b = [      x x          ]


[  0   1   2   3 -99   5   6   7   8   9]
[  0   1   2 -11 -11   5   6   7   8   9]


# Summary
* This is a brief introduction to numpy
* More info in text, **and** on website docs
* It is very necessary for us, and it does a lot