**Introduction to Numpy**

We sometimes want lists to behave like vectors, i.e. add them and perform scalar multiplication (the vector space operations).

Lists don't behave in the way we might want to for mathematical applications.

In [1]:
x=[1,2]
y=[3,4]
x+y
print(x+y)

[1, 2, 3, 4]


In [2]:
5*x

[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

In [3]:
2.3*x

TypeError: can't multiply sequence by non-int of type 'float'

**Numpy to the rescue!**

We can make a list of floats into a numpy array.

In [4]:
import numpy as np

x=[1.,2.,3.,4.,5.]
y=[5.,6.,7.,8.,9.]
xv=np.array(x) # construct an array
yv=np.array(y)
print(type(xv))
print(xv)
print(xv+yv)
print(3.*xv)

<class 'numpy.ndarray'>
[1. 2. 3. 4. 5.]
[ 6.  8. 10. 12. 14.]
[ 3.  6.  9. 12. 15.]


**Dimension/Shape**

When we create an array from a simple list of numbers we get a *vector* or also referred to as a 1-dimensional (1-d) array.

The *shape* attribute of the array gives a 1-tuple with the number of elements. 

In [5]:
xv.shape

(5,)

Selection works as it does for a list.

In [6]:
print(type(xv[0]))
print(xv[0])
print(xv[-1])

<class 'numpy.float64'>
1.0
5.0


Slices still work as for lists.

In [7]:
xv[1:3]

array([2., 3.])

**Type conversion**

Note that when we created a numpy array, the values were converted to numpy floats - these are not the same as floats.

In [8]:
print(type(x))
print(type(x[0]))
print(type(xv))
print(type(xv[0]))

<class 'list'>
<class 'float'>
<class 'numpy.ndarray'>
<class 'numpy.float64'>


Even if we convert back to a list the values are not floats. 

Remember that lists can hold any types of objects and numpy provided a new type of object.

In [9]:
x=list(xv)
print(type(x))
print(x[0])
print(type(x[0]))

<class 'list'>
1.0
<class 'numpy.float64'>


**Conversion from numpy type**

When we have a numpy float, if we wish to, we can convert it back to a python float.

In [10]:
import numpy as np
x=[1.,2.,3.]
xv=np.array(x)
u=xv[0]
print(type(u))
v=float(u)
print(type(v))

<class 'numpy.float64'>
<class 'float'>


**Some commonly used methods**

**Norms**

There are various *norms* that are used to describe the *size* of a vector $x = (x_1,\ldots,x_d).$

The Euclidean or $L_2$ norm: $\sqrt{\sum_{i=1}^d x_i^2}$

The $L_1$ norm: $\sum_{i=1}^d \vert x_i\vert$

The $L_{\infty}$ norm: $\max_{i=1,\ldots,d} \vert x_i \vert$

and these (among others) are available.

In [11]:
x=np.array([3,4])
print(np.linalg.norm(x,2))
print(np.linalg.norm(x,1))
print(np.linalg.norm(x,np.inf)) # np.inf


5.0
7.0
4.0


**Sum**

We can sum the elements in a numpy array.

In [16]:
x=np.array([1,-2,3])
x.sum()

2

**Dot products**

The dot product between $x=(x_1,\ldots,x_d)$ and $y=(y_1,\ldots,y_d)$ is defined as $\sum_{i=1}^d x_i y_i.$

In [17]:
x=np.array([1,2,3,4,5])
y=np.array([6,7,8,9,10])
print(x.dot(y))
print((x*y).sum()) # equivalent way

130
130


**min** and **max**

In [18]:
print(x.min())
print(x.max())

1
5


**mean** and **standard deviation**

In [19]:
print(x.mean())
print(x.std())

3.0
1.4142135623730951


**Special numpy arrays**

Numpy provides some special arrays. 

**zeros** can be used to create an array of zeros.

In [20]:
import numpy as np
np.zeros(5)

array([0., 0., 0., 0., 0.])

**ones** is used for an array of ones.

In [21]:
np.ones(7)

array([1., 1., 1., 1., 1., 1., 1.])

**linspace**

The numpy linspace function creates an array of equispaced values - a kind of array we often need in applications.
Here were create 10 equispaced values between 2.3 and 4.7.

In [22]:
import numpy as np
xvec=np.linspace(2.3,4.7,10)
print(xvec)

[2.3        2.56666667 2.83333333 3.1        3.36666667 3.63333333
 3.9        4.16666667 4.43333333 4.7       ]


**Numpy mathematical functions**

There are many standard mathematical constants and functions (like we saw in the math library) available in numpy.

These in include 

- pi
- sqrt
- exp
- log
- log10
- sin
- cos

In [23]:
import numpy as np
x=np.pi
print(np.sqrt(x))
print(np.exp(x))
print(np.log(x))
print(np.log10(x))
print(np.sin(x))
print(np.cos(x))

1.7724538509055159
23.140692632779267
1.1447298858494002
0.49714987269413385
1.2246467991473532e-16
-1.0


**Applying a function componentwise**

We can apply a *numpy* function to a **list** of values or a **numpy array** and the result is a numpy array.
And here we compute square roots in a list array componentwise.

In [24]:
xvec=np.array([1.,2.,3.,4.,5.])
yvec=np.sqrt(xvec)

print(type(yvec))
print(yvec)

<class 'numpy.ndarray'>
[1.         1.41421356 1.73205081 2.         2.23606798]


**Non-numpy functions**

This doesn't necessarily work for non-numpy functions.

In [25]:
import numpy as np
import math

x=np.linspace(2.3,4.7,10)
y=math.sqrt(x)
print(y)

TypeError: only size-1 arrays can be converted to Python scalars

And this doesn't work using our own function.

In [26]:
import numpy as np
import math
def myfunction(x):
    y=math.exp(x/100)
    w=math.sin(y)
    return(w)
x=np.linspace(2.3,4.7,100)
y=myfunction(x)

TypeError: only size-1 arrays can be converted to Python scalars

**Numpifying a function**

We can remedy this by creating a numpy function out of ours using numpy.frompyfunc. Here, we to specify the number of arguments and number of values returned by our function.

In [27]:
import numpy as np
import math
def myfunction(x):
    y=math.exp(x/100)
    w=math.sin(y)
    return(w)
f=np.frompyfunc(myfunction,1,1)
# Here myfunction is the original function, the first 1 is the number of expected input argument; the second 1 is the number of output argument. 
x=np.linspace(2.3,4.7,10)
y=f(x)
print(y)

[0.8538130683746732 0.8552322866178672 0.8566488839419311
 0.8580628083923817 0.8594740075987639 0.8608824287723679
 0.8622880187039431 0.86369072376141 0.8650904898875696 0.8664872625978106]


**Numpy data types**

A numpy array can hold various data types, including booleans, ints, and floats. 

Typically, unlike in a Python list, in a numpy array, all of the things being stored have the same type.

The *dtype* attribute reveals the type.

In [31]:
import numpy as np

x=np.array([True,False])
print(x.dtype)
print(x)

x=np.array([1,2,3])
print(x.dtype)

x=np.array([1.,2.,3.])
print(x.dtype)

bool
[ True False]
int32
float64


**Ints**

There is a limit to the size of a 32 bit *signed* int. 

If we have $32$ bits of storage and we want our integer to possibly have a sign, that means we can store the numbers

- 0
- 1,...,$2^{31}-1$
- -1,...,$-2^{31}$

In the following we see that when we store an array of integers, the storage type depends on what the sizes of the numbers (number of bits of storage required) in our list are.

When an integer to be stored requires more than 32 bits, the numbers are taken to be 64 bit integers.

In [32]:
x=np.array([-2**31,2**31-1])
print(x.dtype)
x=np.array([-2**32,2**31])
print(x.dtype)

int32
int64


So even if we put in an integer that can be stored using 32 bits, each int uses 64 bits if one of them does.


In [33]:
x=np.array([-2**32,2**31,0,1,2,3])
for i in range(6):
    print(type(x[i]))

<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>


**Storing Objects**

If we try to store ints requiring more than 64 bits, what happens?

In [34]:
x=2**65
type(x)

int

In [35]:
x=np.array([2**64,2**65,2**66])
x.dtype

dtype('O')

Here the 'O' stands for "object". numpy allows use to store python objects - here each object is stored as a pointer to a python object somewhere in memory.(When numpy can't find a more specific numeric or fixed-size type to represent the data, it will fall back on using a generic object dtype. This is often the case when the array contains mixed types that don't fit into the regular numeric categories or contains non-numeric Python objects.)

We can create numpy arrays of various python types of objects.

In [36]:
d={1:"one",2:"two"}
L=[6,7,[8,9]]
x=np.array([1,2,3,6.7,8.8,L,d,True, False,"dog","cat","rhinocerous"],dtype="object")
print(x)
print(x.dtype)

[1 2 3 6.7 8.8 list([6, 7, [8, 9]]) {1: 'one', 2: 'two'} True False 'dog'
 'cat' 'rhinocerous']
object


In [37]:
type(x[6])

dict

**Multidimensional arrays**

We can create 2-d arrays (matrices) and higher dimensional arrays. 

These also allow for addition (of arrays of same size) and scalar multiplication.

And we can add and multiply componentwise.

In [38]:
import numpy as np
A=np.array([[1,2,3],[4,5,6]])
B=np.array([[7,8,9],[10,11,12]])
print(A)
print("\n")
print(A.shape)
print("\n")
print(2*A)
print("\n")
print(A+B)

[[1 2 3]
 [4 5 6]]


(2, 3)


[[ 2  4  6]
 [ 8 10 12]]


[[ 8 10 12]
 [14 16 18]]


**Componentwise multiplication**

We can also multiply componentwise.
* is for componentwise multiplication. A.dot(B) is for matrix multiplication. 

In [41]:
print(A * B)

[[ 7 16 27]
 [40 55 72]]


**Matrix multiplication**

As you probably anticipated, matrix multiplication is available. np.matmul(A,B) is the same as A.dot(B)

In [45]:
import numpy as np
A=np.array([[1,2,3,4],[5,6,7,8]])
B=np.array([[7,8,9,10,11],[12,13,14,15,16],[17,18,18,20,21],[22,23,24,25,26]])
print(A.shape)
print(B.shape)
np.matmul(A,B)

(2, 4)
(4, 5)


array([[170, 180, 187, 200, 210],
       [402, 428, 447, 480, 506]])

**Special matrices**


**zeros** and **ones** have multidimensional analogues

In [46]:
import numpy as np
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [47]:
np.ones((5,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

**Identity** and **diagonal** matrices

In [52]:
np.eye(4,4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [49]:
np.diag([1,2,3,4])

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

**Sampling** 

Numpy has several functions for pseudo-random samples from specific distributions and arrays of any size and dimension can be created.

In [53]:
import numpy as np
np.random.choice([0,1,2,3],size=(5,3))

array([[3, 1, 1],
       [3, 1, 2],
       [3, 1, 1],
       [0, 2, 2],
       [1, 3, 2]])

We can sample from any discrete probability distribution with finitely many possible values.

In [54]:
np.random.choice([0,1,2,3],size=25,p=(.1,.2,.3,.4))

array([2, 2, 2, 3, 3, 3, 0, 1, 2, 2, 1, 3, 1, 3, 2, 3, 1, 3, 0, 3, 1, 2,
       3, 1, 2])

In this case, p stands for corresponding probability for 0 1 2 3. sum(p) needs to = 1 to avoid errors. 

By default, if size is not specified, a single value is generated.

In [61]:
np.random.choice(['a','b','c'])

'b'

In [56]:
np.random.normal(50,5,size=(4,3))

array([[49.31241693, 39.45351907, 44.27341816],
       [43.76871162, 48.88048408, 45.35151608],
       [44.37695644, 53.70405881, 46.23681854],
       [53.74187783, 52.72283648, 54.15503063]])

In this case, 50 stands for mean and 5 is standard deviation. It randomly generates 50 values of such a normal distrib. 

**Random permutation**

We can generate a random permutation of a sequence. 

The function np.random.permutation returns a permuted (shuffled) sequence of the input.

In [71]:
np.random.permutation(range(10))

array([1, 7, 8, 9, 6, 5, 3, 4, 2, 0])

In [68]:
np.random.permutation(["A","B","C","D","E"])

array(['C', 'E', 'D', 'A', 'B'], dtype='<U1')