# NumPy

Examples taken from [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)

In [1]:
!pip install numpy



In [2]:
import numpy as np

In [3]:
np.__version__

'1.18.2'

Creating an array filled with value of zeros

In [4]:
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Creating an array filled with value of ones

In [5]:
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

Creating a multi-dimensional array filled with a value using [np.full()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.full.html)

In [7]:
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

[numpy.arange()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) returns evenly spaced values with given interval

numpy.arange([start, ]stop, [step, ]dtype=None)

In [8]:
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

[numpy.linspace()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) returns evenly spaced numbers over a specified interval.

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)

In [9]:
np.linspace(0, 1, 5) #An array of five values evenly spaced between 0 and 1

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

[numpy.random.random()](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.random.html) Return random floats in the half-open interval [0.0, 1.0).

numpy.random.random(size=None)¶

size : int or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

In [10]:
np.random.random((3,3))

array([[0.98101268, 0.15318138, 0.62891768],
       [0.15801779, 0.36585347, 0.48044594],
       [0.59080572, 0.97395976, 0.00742054]])

[numpy.random.normal()](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.normal.html) Draw random samples from a normal (Gaussian) distribution

numpy.random.normal(loc=0.0, scale=1.0, size=None)

In [11]:
np.random.normal(0, 1, (3,3))

array([[ 0.47980846, -0.70177807,  1.02991489],
       [ 1.2607065 ,  0.64446464, -1.34445208],
       [ 0.2486889 ,  0.54356973, -1.42027255]])

[np.random.randint()](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.randint.html) 

Return random integers from low (inclusive) to high (exclusive). 

numpy.random.randint(low, high=None, size=None, dtype='l')¶

In [12]:
np.random.randint(0,10,(3,3))

array([[9, 8, 7],
       [5, 3, 9],
       [3, 9, 8]])

[np.eye()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.eye.html) Return a 2-D array with ones on the diagonal and zeros elsewhere.

numpy.eye(N, M=None, k=0, dtype=<class 'float'>, order='C')

In [13]:
np.eye(3) #Create an identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## NumPy Array Attributes

For a complete list, refer [here](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.html).

In [80]:
import numpy as np
np.random.seed(0)

x1 = np.random.randint(10, size=6) # One-dimensional array
print(f'x1: {x1}')
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
print(f'x2: {x2}')
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array
print(f'x3: {x3}')

x1: [5 0 3 3 7 9]
x2: [[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
x3: [[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]


In [15]:
print(f'x3 ndim: {x3.ndim}') # print number of dimensions
print(f'x3 shape: {x3.shape}') # print the shape
print(f'x3 size: {x3.size}') # size
print(f'x3 dtype: {x3.dtype}') # data type of the array
print(f'x3 itemsize: {x3.itemsize} bytes') # data type of the array
print(f'x3 nbytes: {x3.nbytes} bytes') # data type of the array

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
x3 dtype: int32
x3 itemsize: 4 bytes
x3 nbytes: 240 bytes


## Slicing

In [16]:
x = np.arange(10)

In [17]:
x[:5]

array([0, 1, 2, 3, 4])

In [18]:
x[5:]

array([5, 6, 7, 8, 9])

In [19]:
x[4:7]

array([4, 5, 6])

In [20]:
x[::2]

array([0, 2, 4, 6, 8])

In [21]:
x[1::2]

array([1, 3, 5, 7, 9])

In [22]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [23]:
x[5::-2]

array([5, 3, 1])

In [24]:
two_dim_array2 = np.random.randint(10, size=(5, 5))

In [25]:
two_dim_array2

array([[4, 3, 4, 4, 8],
       [4, 3, 7, 5, 5],
       [0, 1, 5, 9, 3],
       [0, 5, 0, 1, 2],
       [4, 2, 0, 3, 2]])

In [26]:
two_dim_array2[0:2, 2:4]

array([[4, 4],
       [7, 5]])

In [27]:
two_dim_array2[:2, 1:]

array([[3, 4, 4, 8],
       [3, 7, 5, 5]])

### Accessing array rows and columns
- Assessing single rows or columns
- Can be done by combining indexing and slicing, using empty slide marked by a single colon (:):

In [28]:
print(two_dim_array2[:, 0])

[4 4 0 0 4]


In [29]:
print(two_dim_array2[0, :])

[4 3 4 4 8]


In [30]:
print(two_dim_array2[0]) #same as two_dim_array2[0, :]

[4 3 4 4 8]


### Slicing - views vs copies
NumPy slicing differs from Python list slicing in that **in NumPy slices will be copies.** This is helpful in a situation where you need to work with large amount of data. 

In [31]:
two_dim_sub = two_dim_array2[:2,:2]

In [32]:
two_dim_sub

array([[4, 3],
       [4, 3]])

In [33]:
two_dim_sub[0,0] = 99

In [34]:
two_dim_array2

array([[99,  3,  4,  4,  8],
       [ 4,  3,  7,  5,  5],
       [ 0,  1,  5,  9,  3],
       [ 0,  5,  0,  1,  2],
       [ 4,  2,  0,  3,  2]])

#### Creating array copies

In [35]:
two_dim_sub_copy = two_dim_array2.copy()

In [36]:
two_dim_sub_copy

array([[99,  3,  4,  4,  8],
       [ 4,  3,  7,  5,  5],
       [ 0,  1,  5,  9,  3],
       [ 0,  5,  0,  1,  2],
       [ 4,  2,  0,  3,  2]])

In [37]:
two_dim_sub_copy[0,0] = 4
print(two_dim_sub_copy)
print(two_dim_array2)

[[4 3 4 4 8]
 [4 3 7 5 5]
 [0 1 5 9 3]
 [0 5 0 1 2]
 [4 2 0 3 2]]
[[99  3  4  4  8]
 [ 4  3  7  5  5]
 [ 0  1  5  9  3]
 [ 0  5  0  1  2]
 [ 4  2  0  3  2]]


## Reshaping
The size of the reshaped array needs to match the size of the original array.

In [38]:
one_dim_array = np.arange(0,10)

In [39]:
one_dim_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [40]:
one_dim_array.reshape(2,5) # The number of elements needs to match

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [41]:
one_dim_array.reshape(5,2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [42]:
one_dim_array.reshape(3,3) # This is expected to fail - mismatch in number of elements.

ValueError: cannot reshape array of size 10 into shape (3,3)

In [43]:
one_dim_array[:,np.newaxis] # column vector

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

In [44]:
one_dim_array[:,np.newaxis].shape # column vector

(10, 1)

In [45]:
one_dim_array[np.newaxis,:] # row vector

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [46]:
one_dim_array[np.newaxis,:].shape # row vector

(1, 10)

## Array Concatenation and Splitting

### Concatenation of arrays

In [47]:
x = np.array([1,2,3])
y = np.array([3,2,1])
np.concatenate([x,y])

array([1, 2, 3, 3, 2, 1])

#### Concatenating more than two arrays at once

In [48]:
z = [99, 99, 99]
print(np.concatenate([x,y,z]))

[ 1  2  3  3  2  1 99 99 99]


#### Concatenating two dimensional arrays

In [49]:
grid = np.array([[1,2,3],
                [4,5,6]])

In [50]:
np.concatenate([grid,grid]) # Concatenate along the first axis

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [51]:
np.concatenate([grid,grid], axis=1) # Concatenate along the second axis (zero-indexed)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [52]:
x = np.array([1,2,3])
grid = np.array([[9,8,7],
                [6,5,4]])
np.vstack([x,grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [53]:
y = np.array([[99],
             [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

#### Splitting of arrays

In [54]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1,x2,x3 = np.split(x, [3,5]) # Providing indexes for split points
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [55]:
grid = np.arange(16).reshape((4,4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [56]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [57]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


## Universal Functions (*ufuncs*)
#### Array arithmetic

In [58]:
x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2) # floor division

x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


Unary ufunc

In [59]:
print("-x = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)

-x =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2 =  [0 1 0 1]


Note: These arithmetic operators are [overloaded form of NumPy array arithmetic functions](https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html#Exploring-NumPy's-UFuncs). 

In [60]:
 -(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

#### Trigonometric functions

In [61]:
theta = np.linspace(0, np.pi, 3)

In [62]:
print("theta = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

theta =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]


In [63]:
x = [-1, 0, 1]
print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

x =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]


#### Exponents and logarithms

In [64]:
x = [1, 2, 3]
print("x =", x)
print("e^x =", np.exp(x))
print("2^x =", np.exp2(x))
print("3^x =", np.power(3, x))

x = [1, 2, 3]
e^x = [ 2.71828183  7.3890561  20.08553692]
2^x = [2. 4. 8.]
3^x = [ 3  9 27]


In [65]:
 x = [1, 2, 4, 10]
print("x =", x)
print("ln(x) =", np.log(x))
print("log2(x) =", np.log2(x))
print("log10(x) =", np.log10(x))

x = [1, 2, 4, 10]
ln(x) = [0.         0.69314718 1.38629436 2.30258509]
log2(x) = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


In [66]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))

exp(x) - 1 = [0.         0.0010005  0.01005017 0.10517092]
log(1 + x) = [0.         0.0009995  0.00995033 0.09531018]


## Aggregations

In [67]:
L = np.random.random(100)
np.sum(L)

48.99279074870485

In [68]:
big_array = np.random.rand(1000000)
np.sum(big_array)

500204.4078203073

In [69]:
print(np.min(big_array))
print(np.max(big_array))

1.4057692298008462e-06
0.9999994392723005


### Multidimensional aggregates
For discussion on NumPy axes, refer [here](https://www.sharpsightlabs.com/blog/numpy-axes-explained/).

In [70]:
M = np.random.random((3, 4))
print(M)

[[0.73705298 0.17818634 0.24485169 0.53432654]
 [0.2410882  0.26206357 0.79908324 0.80585532]
 [0.73896809 0.33675004 0.26077314 0.97274849]]


In [71]:
M.sum()

6.111747655205464

In [72]:
M.min(axis=0) # Find the minimum value within each column, 0-axis

array([0.2410882 , 0.17818634, 0.24485169, 0.53432654])

The **axis** keyword specifies the dimension of the array that will be collapsed. 

For a list of NumPy aggregate functions, refer [here](https://www.pythonprogramming.in/numpy-aggregate-and-statistical-functions.html).

## Broadcasting

For arrays of the same size, binary operations are performed on an element-by-element basis

In [73]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

Broadcasting allows binary operations to be performed on arrays of different sizes. In the example below, the scalar value of *5* is stretched, or broadcast, into the array [5, 5, 5] before it is added to array *a*. 

In [74]:
a + 5

array([5, 6, 7])

In [75]:
M = np.ones((3,3))
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [76]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])