# EMATM0048: SDPA 
## Teaching Session 7A: Numpy
`Original - Zahraa Abdallah `
`Modified - Qiang Liu`

NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

In [2]:
import numpy as np
import time
start_time= time.time()
my_arr = np.arange(10000000)
print("time for Numpy", time.time()-start_time)
start_time= time.time()
my_list = list(range(10000000))
print("time for Lists", time.time()-start_time)

time for Numpy 0.18106937408447266
time for Lists 3.102631092071533


## Numpy's basic data structure: the ndarray

ndarray is used for storage of homogeneous data
i.e., all elements the same type
Every array must have a shape and a dtype
Supports convenient slicing, indexing and efficient vectorized computation

In [6]:
import numpy as np
data1 = [6, 7.5, 8, 0, 1]
arr1 =  np.array(data1)
print(arr1)

[6.  7.5 8.  0.  1. ]


convert an array to float

In [4]:
a = np.array([[1,2,3],[4,5,6]],dtype=np.float32)
a

array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)

`ndarray:`
- Arrays can have any number of dimensions, including zero (a scalar).
- Arrays are typed: np.uint8, np.int64, np.float32, np.float64
- Arrays are dense. Each element of the array exists and has the same type.


## Creating an array

- Using lists or tuples
- homogeneous data: zeros, ones
- diagonal elements: diag, eye
- numerical ranges: arange, linspace, logspace
- random numbers: rand, randint
- Reading from files


In [7]:
#List of lists
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]  #list of lists
arr2 = np.array(data2) #变成矩阵了
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [8]:
#homogeneous data: zeros, ones
array= np.zeros((2,3)) #两行三列
print (array)
array = np.ones((2,3))
print (array)

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1.]
 [1. 1. 1.]]


In [10]:
#diagonal elements: diag, eye
array = np.eye(3) # 主对角线
print (array)
# a diagonal matrix
array= np.diag([1,2,3])
print (array)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1 0 0]
 [0 2 0]
 [0 0 3]]


In [11]:
#numerical ranges: arange, linspace, logspace
array = np.arange(0, 10, 2) # arange is an array-valued version of the built-in Python range function #比range快速
print (array)

[0 2 4 6 8]


In [14]:
#random numbers: rand, randint
array = np.random.randint(0, 10, (3,3))
print (array)

[[6 3 3]
 [5 9 6]
 [6 2 2]]


## Type, size and shape of an array

- All elements of an ndarray are of the same type.
- The `ndarray.dtype` property is an attribute that specifies the data type of each element.
- The `ndarray.shape` property is a tuple that indicates the size of each dimension.
- The `ndarray.size` proprety indicates the number of elements in the array



In [15]:
arr = np.random.randint(0,10,(2,3))
print(arr)
print (arr.size, arr.shape, arr.dtype)

[[1 0 2]
 [8 7 8]]
6 (2, 3) int32


**Reshaping** an array: 

- Total number of elements cannot change.
- Use -1 to infer axis shape
- Row-major by default (MATLAB is column-major)

In [16]:
a = np.array([1,2,3,4,5,6])
a = a.reshape(3,2)
print (a)
a = a.reshape(2,-1) # infer axis shape
print(a)
a = a.ravel()
print(a)


[[1 2]
 [3 4]
 [5 6]]
[[1 2 3]
 [4 5 6]]
[1 2 3 4 5 6]


----------

## Accessing Arrays - Slicing and Indexing
### 1- Simple indexing

1d arrays: indexing and slicing as for lists
- first element has index 0
- negative indices count from the end
- slices: [start:stop:step]


In [17]:
a = np.array([1,2,3,4,5,6])
a = a.reshape(3,2)

In [18]:
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [19]:
a[2].shape

(2,)

In [20]:
a[2,:]

array([5, 6])

In [21]:
a[2:, :].shape

(1, 2)

### Slicing, careful it's a view!
A slice does not return a copy, which means that any modifications will be reflected in the source array. This is a design feature of NumPy to avoid memory problems.

In [22]:
arr = np.arange(10)
print(arr)          # [0 1 2 3 4 5 6 7 8 9]

[0 1 2 3 4 5 6 7 8 9]


In [23]:
arr_slice = arr[5:8]
print(arr_slice)            # [5 6 7]

[5 6 7]


Caution!!!

it will set value to the original variable -> arr_sclice was changed to [5 12345 7], the original arr was also adjusted itself to [0 1 2 3 4 5 12345 7 8 9] automatically.

In [24]:
arr_slice[1] = 12345
print(arr)                      # [    0     1     2     3     4     5 12345     7     8     9]

[    0     1     2     3     4     5 12345     7     8     9]


In [25]:
arr_slice[:] = 64
print(arr)                      # [ 0  1  2  3  4 64 64 64  8  9]

[ 0  1  2  3  4 64 64 64  8  9]


### 2. Boolen indexing
Boolean indexing allows you to select data subsets of an array that satisfy a given condition.

In [26]:
#simple example
arr = np.array([10, 20])
idx = np.array([True, False]) # True means you want to get this value, and False means you don't want to get the value
arr[idx]

array([10])

In [27]:
arr

array([10, 20])

In [28]:
idx

array([ True, False])

In [32]:
arr = np.random.randn(10) # generate 10 vlues from standard normal distribution
arr

array([-0.0634447 ,  0.20659623, -0.85112325, -0.65702913, -0.03834749,
       -1.89775369, -0.16361688, -0.32801767, -0.74659077,  0.91234896])

In [33]:
arr<0.5

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
       False])

In [34]:
#using a boolean index array inplace
arr[arr<0.5] #only show those who value < 0.5

array([-0.0634447 ,  0.20659623, -0.85112325, -0.65702913, -0.03834749,
       -1.89775369, -0.16361688, -0.32801767, -0.74659077])

In [35]:
#Multiple conditins for masking
arr[(arr<0.5)&(arr>0)]

array([0.20659623])

In [36]:
#setting the value based on a boolean indexing array
arr[arr< 0] = 0
arr

array([0.        , 0.20659623, 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.91234896])

### 3. Fancy indexing: 
list-of-locations indexing

In [37]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [40]:
arr[6] #指取出第7行

array([6., 6., 6., 6.])

In [41]:
#To select out a subset of the rows in a particular order,
#you can simply pass a list or ndarray of integers specifying the desired order
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [42]:
# or using negative indexing
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

Passing multiple index arrays does something slightly different; it selects a 1D array of
elements corresponding to each tuple of indices:

In [43]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [44]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

Take a moment to understand what just happened: the elements (1, 0), (5, 3), (7,
1), and (2, 2) were selected.

---------------

## Scalar-array operations
We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [45]:
A = np.array([[1,1],[3,3]])
A

array([[1, 1],
       [3, 3]])

In [46]:
A * 2

array([[2, 2],
       [6, 6]])

In [47]:
A + 2

array([[3, 3],
       [5, 5]])

### Element-wise array-array operations: Broadcasting
When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise operations. Vectorized operations between arrays of different sizes and between arrays and scalars are subject to the rules of broadcasting. The idea is quite simple in many cases like with scalars:

In [48]:
print (A)
print (A * A) # element-wise multiplication

[[1 1]
 [3 3]]
[[1 1]
 [9 9]]


点乘是什么？？？

The case of arrays of different shapes is slightly more complicated.
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. 
see http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Broadcasting

In [49]:
#sizes are adjusted. that is called broadcasting and we look into this later
v1= np.arange(0,2)
print (A) #2x2
print (v1)# 1x2
A * v1

[[1 1]
 [3 3]]
[0 1]


array([[0, 1],
       [0, 3]])

### When broadcase can fail?
Only one array gets broadcasted. If both need to be adjusted, that will trigger an error

In [51]:
A= np.ones([7,8])
A

array([[1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1.]])

In [52]:
B= np.ones([9,3])
B

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

in this case, Broadcast function fails

In [53]:
A+B

ValueError: operands could not be broadcast together with shapes (7,8) (9,3) 

----------

## Universal Functions: Fast Element-wise Array Functions
A universal function, or ufunc, is a function that performs elementwise operations on
data in ndarrays. You can think of them as fast vectorized wrappers for simple functions
that take one or more scalar values and produce one or more scalar results. For a full list of unfunc, check https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs

In [54]:
#Many ufuncs are simple elementwise transformations, like sqrt or exp:
arr = np.random.randint(0, 10, (3,3))
print(arr)

[[2 7 6]
 [7 5 2]
 [4 2 2]]


In [55]:
np.sqrt(arr)

array([[1.41421356, 2.64575131, 2.44948974],
       [2.64575131, 2.23606798, 1.41421356],
       [2.        , 1.41421356, 1.41421356]])

In [56]:
np.exp(arr) # exponential, Euler's constant

array([[   7.3890561 , 1096.63315843,  403.42879349],
       [1096.63315843,  148.4131591 ,    7.3890561 ],
       [  54.59815003,    7.3890561 ,    7.3890561 ]])

A set of mathematical functions which compute statistics about an entire array or about
the data along an axis are accessible as array methods. Aggregations (often called
reductions) like sum, mean, and standard deviation std can either be used by calling the
array instance method or using the top level NumPy function:

In [57]:
arr.mean()
#or 
np.mean(arr)

4.111111111111111

In [58]:
arr.sum()

37

Functions like mean and sum take an optional axis argument which computes the statistic
over the given axis, resulting in an array with one fewer dimension:

In [60]:
arr

array([[2, 7, 6],
       [7, 5, 2],
       [4, 2, 2]])

In [62]:
arr.mean(axis=0) #cauculate the mean by column

array([4.33333333, 4.66666667, 3.33333333])

In [61]:
arr.mean(axis=1) #cauculate the mean by row

array([5.        , 4.66666667, 2.66666667])