# Numpy
Python doesn't have built-in tools for mathematical or scientific computation. We rely on many helpful libraries for analysis. NumPy is one of the most important and the first among these libraries used for data analysis, machine learning and scientific computing. It is a core python library and is the fundamental building blocks of Scikit-Learn, SciPy, Pandas and TensorFlow packages.

More than 4900 packages have NumPy as a dependency. This is a huge feat and it will not be wrong to say that NumPy is the biggest reason for the success of Machine Learning in Python.

`NumPy features` can be classified into three-fold:
mathematical functions
random submodule
ndarray object

Numpy is known for its:
`Syntax`: compact and vectorized syntax allowing for even 100,000 calculations within a single line of code.
`Speed`: faster as the majority of the code is implemented in C.


**1. the basics:**
Data manipulation in Python is almost always equated with NumPy array manipulation. NumPy arrays are a) `homogenous`; b) elements are all of the `same types`. In NumPy, dimensions are called axes.

**2. installation**

In [3]:
# pip install numpy
import numpy as np

In [4]:
# check the version
np.__version__

'1.16.4'

**3. Array creation**

In [5]:
# array from list
# Explicit specification of datatype
np.array([1, 2, 3, 4, 5], dtype='float32')            

array([1., 2., 3., 4., 5.], dtype=float32)

In [6]:
# Create a matrix or multidimensional arrays 
np.array([[1,2],[3,4],[5,6]])

array([[1, 2],
       [3, 4],
       [5, 6]])

In [7]:
# array from scratch
# create an empty array of length 5
np.empty(5)

array([4.e-323, 0.e+000, 0.e+000, 0.e+000, 0.e+000])

In [8]:
np.zeros(5, dtype = int)

array([0, 0, 0, 0, 0])

In [9]:
np.ones(5, dtype = int)

array([1, 1, 1, 1, 1])

In [10]:
np.ones((3, 5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [11]:
np.full((3, 5), 5)

array([[5, 5, 5, 5, 5],
       [5, 5, 5, 5, 5],
       [5, 5, 5, 5, 5]])

In [13]:
# creating a linear sequence array
np.arange(0, 10, 2) #start, end, step

array([0, 2, 4, 6, 8])

In [14]:
# creating an evenly spaced array between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [15]:
# Create a randon numbers of length-3
np.random.random(3)

array([0.31684969, 0.5207489 , 0.29376911])

In [16]:
# Create a randon numbers of 3x3 - uniformly distributed
np.random.random((3,3))

array([[0.67865059, 0.72706797, 0.32483579],
       [0.71062684, 0.85812981, 0.58742637],
       [0.68897833, 0.43210816, 0.84317505]])

In [17]:
# Create a normally distributed 3x3 random numbers
np.random.normal(0,1,(3,3))

array([[-2.29381057,  0.81093582, -0.50219528],
       [-0.12890928, -0.86438662, -0.19642092],
       [ 1.44680796,  0.29918196, -1.74417982]])

In [18]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

**4. Array Attributes**

  * `ndarray.ndim`: number of axes (dimensions) of the array
  * `ndarray.shape`: dimensions of the array
  * `ndarray.size`: number of elements of the array
  * `ndarray.dtype`: type of the elements in the array
  * `ndarray.itemsize`: size in bytes of each element of the array
  * `ndarray.nbytes`: total size in bytes of the array

In [19]:
# Seed for reproducibility
np.random.seed(1)  

# 1D array
x1 = np.random.randint(10, size=5)  

# 2D array
x2 = np.random.randint(10, size=(2, 3))  

# 3D array
x3 = np.random.randint(10, size=(2, 3, 5))  

In [20]:
x1

array([5, 8, 9, 5, 0])

In [21]:
x2

array([[0, 1, 7],
       [6, 9, 2]])

In [22]:
x3

array([[[4, 5, 2, 4, 2],
        [4, 7, 7, 9, 1],
        [7, 0, 6, 9, 9]],

       [[7, 6, 9, 1, 0],
        [1, 8, 8, 3, 9],
        [8, 7, 3, 6, 5]]])

In [23]:
# Print out the array attributes
print("The Array Attributes are:")
print("-"*25)
print(f"x3 ndim:      {x3.ndim}")
print(f"x3 shape:     {x3.shape}")
print(f"x3 size:      {x3.size}")
print(f"x3 dtype:     {x3.dtype}")
print(f"x3 itemsize:  {x3.itemsize}")
print(f"x3 nbytes:    {x3.nbytes}")

The Array Attributes are:
-------------------------
x3 ndim:      3
x3 shape:     (2, 3, 5)
x3 size:      30
x3 dtype:     int64
x3 itemsize:  8
x3 nbytes:    240


For a matrix with l layers, n rows and m columns, the shape will be (l,n,m).

**5. Array indexing**

methodology to access single element of an array. 
the default python indexing is `zero-based` and 1D arrays can be indexed much like the lists

In a multidimensional array, we use `comma-separated` tuple of indices to access an element of the array.

In [24]:
x1

array([5, 8, 9, 5, 0])

In [25]:
# access the first element
x1[0]

5

In [26]:
# access the last element
x1[-1]

0

In [27]:
x2

array([[0, 1, 7],
       [6, 9, 2]])

In [28]:
x2[0, 0]

0

In [29]:
x2[0, 1]

1

In [30]:
x2[-1, -1]

2

**6. Array slicing**

Array indexing is a methodology to access single elements of an array. We can slice the array using the `colon (:) `character. The slicing syntax follows that of the standard Python list.

`x[start:stop:step]`
The default values are start=0, stop=size of dimension, step=1

In [35]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [37]:
x [:5]

array([0, 1, 2, 3, 4])

In [38]:
x[5:7]

array([5, 6])

In [39]:
x[::2]

array([0, 2, 4, 6, 8])

In [40]:
x[1::2]

array([1, 3, 5, 7, 9])

In [41]:
# slicing using a negative step value
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [42]:
x[::-2]

array([9, 7, 5, 3, 1])

In [43]:
# multidimensional
x2

array([[0, 1, 7],
       [6, 9, 2]])

In [45]:
x2[0, : ]

array([0, 1, 7])

In [46]:
x2[:, 0]

array([0, 6])

In [47]:
x2[:2, :3] #two rows, three columns

array([[0, 1, 7],
       [6, 9, 2]])

In [48]:
x2[:, ::2] # all rows, every otehr column

array([[0, 7],
       [6, 2]])

In [49]:
x2[::-1, ::-1] # reverting the array

array([[2, 9, 6],
       [7, 1, 0]])

In [50]:
x2.copy()

array([[0, 1, 7],
       [6, 9, 2]])

In [51]:
# Create a subarray or slice of original array
x2_sub = x2[:2,:2]
print(x2_sub)

[[0 1]
 [6 9]]


In [52]:
# Change first element to 10
x2_sub[0,0] = 10
print(x2_sub)

[[10  1]
 [ 6  9]]


In [53]:
x2

array([[10,  1,  7],
       [ 6,  9,  2]])

**7. reshaping arrays**

In [54]:
x4 = np.arange(1,10)
x4

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [62]:
x4.reshape(1,9)

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [55]:
x5 = x4.reshape((3,3))
x5

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [56]:
x6 = np.array([1,2,3]) # row vector
x6

array([1, 2, 3])

In [59]:
x6.reshape(1,3) #row vector

array([[1, 2, 3]])

In [60]:
x6.reshape(3,1)

array([[1],
       [2],
       [3]])

**8. Transposing arrays**

In [63]:
arr = np.arange(15).reshape((3,5))

In [64]:
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [66]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

**9. Sorting arrays**

In [67]:
arr = np.random.randn(10)

In [68]:
np.sort(arr)

array([-1.30486124, -1.27321995, -0.74362701, -0.43712177, -0.42645009,
       -0.38057504, -0.36945748,  0.09837051,  1.0149868 ,  1.3814073 ])

In [69]:
arr2 = np.random.random((5, 3))
arr2

array([[0.44613451, 0.22212455, 0.07336417],
       [0.46923853, 0.09617226, 0.90337017],
       [0.11949047, 0.52479938, 0.083623  ],
       [0.91686133, 0.91044838, 0.29893011],
       [0.58438912, 0.56591203, 0.61393832]])

In [70]:
# Column-wise sorting, i.e., across rows
np.sort(arr2, axis=0)

array([[0.11949047, 0.09617226, 0.07336417],
       [0.44613451, 0.22212455, 0.083623  ],
       [0.46923853, 0.52479938, 0.29893011],
       [0.58438912, 0.56591203, 0.61393832],
       [0.91686133, 0.91044838, 0.90337017]])

In [71]:
# Indices of the sorted elements
np.argsort(arr)

array([0, 8, 2, 3, 4, 1, 7, 6, 9, 5])

In [72]:
np.argsort(arr2, axis=0)

array([[2, 1, 0],
       [0, 0, 2],
       [1, 2, 3],
       [4, 4, 4],
       [3, 3, 1]])

**10. Array concatenation**

combine multiple arrays into one using array concatenation. Concatenation, or joining of two arrays are done using `np.concatenate`, `np.vstack`, and `np.hstack`.

In [74]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x, y])

array([1, 2, 3, 4, 5, 6])

In [75]:
# Concatenate more than two arrays at once:
z = [10, 100, 1000]
print(np.concatenate([x, y, z]))

[   1    2    3    4    5    6   10  100 1000]


In [76]:
# Concatenate along the first axis
arr = np.array([[1, 2, 3],
                 [4, 5, 6]])
np.concatenate([arr, arr])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [77]:
x = np.array([1, 2, 3])
arr = np.array([[9, 8, 7],
                 [6, 5, 4]])

# Vertically stack the arrays
np.vstack([x, arr])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [78]:
# Horizontally stack the arrays
y = np.array([[55],
              [55]])
np.hstack([arr, y])

array([[ 9,  8,  7, 55],
       [ 6,  5,  4, 55]])

**11. Splitting of arrays**

implemented by the functions `np.split`, `np.hsplit`, and `np.vsplit`

In [80]:
x = np.arange(9)
x1, x2, x3 = np.split(x, 3)
print(x1, x2, x3)

[0 1 2] [3 4 5] [6 7 8]


In [81]:
x1, x2, x3 = np.split(x, [3,5])
print(x1, x2, x3)

[0 1 2] [3 4] [5 6 7 8]


**12. Funcs**

`unary ufuncs` that operator on a single input

`binary ufuncs` that operate on two inputs

`Unary ufuncs  : abs, sqrt, square, exp, log, sign, ceil, floor, cos, sin, tan`

`Binary ufuncs : add, subtract, multiply, divide, power, maximum, minimum, mod`


In [82]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [83]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [84]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [85]:
x = np.random.randn(10)
y = np.random.randn(10)

In [86]:
x

array([ 0.19311796, -1.55300741, -0.01001354,  0.64916553,  1.55857293,
        0.23835181,  0.10808662,  1.01526292,  0.95963678, -0.12730315])

In [87]:
y

array([ 0.69855416, -2.22549508, -1.26822935,  1.39218201,  1.92383191,
        0.99109127,  0.3806108 , -0.74776636, -1.3176775 ,  1.33291008])

In [88]:
np.maximum(x, y)

array([ 0.69855416, -1.55300741, -0.01001354,  1.39218201,  1.92383191,
        0.99109127,  0.3806108 ,  1.01526292,  0.95963678,  1.33291008])

In [90]:
np.minimum(x, y)

array([ 0.19311796, -2.22549508, -1.26822935,  0.64916553,  1.55857293,
        0.23835181,  0.10808662, -0.74776636, -1.3176775 , -0.12730315])

In [91]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x,10, out=y)
print(y)

[ 0. 10. 20. 30. 40.]


**13. aggregates for ufuncs**

A `reduce` method applies a given operation to the elements until a single results remains 

`outer` method can be used to compute the output of pairs of two different inputs.

In [92]:
x = np.arange(1,6)
x

array([1, 2, 3, 4, 5])

In [94]:
np.add.reduce(x)

15

In [95]:
np.multiply.reduce(x)

120

In [96]:
# Accumulate
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15])

In [97]:
np.multiply.accumulate(x)

array([  1,   2,   6,  24, 120])

In [98]:
# outer
np.multiply.outer(x, x)

array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

**14. Array Aggregation**

List of key aggregation functions available in NumPy

`np.sum` sum of elements

`np.prod`	Compute product of elements

`np.mean`	Compute mean of elements

`np.std`	Compute standard deviation

`np.var`	Compute variance

`np.min`	Find minimum value

`np.max`	Find maximum value

`np.argmin`	Find index of minimum value

`np.argmax`	Find index of maximum value

`np.median`	Compute median of elements

`np.percentile`	Compute rank-based statistics of elements

In [100]:
# Arithmetic operations
x = np.arange(5)

print("x      =", x)
print("x + 5  =", x + 5)
print("x - 5  =", x - 5)
print("x * 2  =", x * 2)
print("x / 2  =", x / 2)
print("x // 2 =", x // 2)  # floor division

x      = [0 1 2 3 4]
x + 5  = [5 6 7 8 9]
x - 5  = [-5 -4 -3 -2 -1]
x * 2  = [0 2 4 6 8]
x / 2  = [0.  0.5 1.  1.5 2. ]
x // 2 = [0 0 1 1 2]


In [104]:
numbers = np.random.rand(100000)

In [105]:
print(f'Mininum{np.min(numbers): .4f}, Maximim{np.max(numbers): .4f}')

Mininum 0.0000, Maximim 1.0000


In [106]:
# Shorter syntax to use
numbers.min(), numbers.max(), numbers.sum()

(6.737492946884416e-07, 0.9999809276713819, 49904.84716574471)

In [107]:
%timeit min(numbers)
%timeit np.min(numbers)
%timeit numbers.min()

5.26 ms ± 164 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
29.5 µs ± 727 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
28 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


**15. Multi-dimentional aggregation**

aggregate along a row & column

In [109]:
arr = np.random.random((3,4))
arr

array([[0.9545806 , 0.23699656, 0.70800002, 0.95491124],
       [0.81762344, 0.29703103, 0.49466323, 0.71154474],
       [0.31744334, 0.50475494, 0.77333829, 0.44346663]])

In [110]:
arr.sum()

7.2143540478610655

In [111]:
# aggregation across column
arr.sum(axis = 0)

array([2.08964737, 1.03878253, 1.97600154, 2.10992261])

In [112]:
# aggregation across rows
arr.sum(axis = 1)

array([2.85448842, 2.32086243, 2.03900319])

**16. Boolean**

`&`	np.bitwise_and

`|`	np.bitwise_or

`^`	np.bitwise_xor

`~`	np.bitwise_not

In [114]:
x = np.arange(1,6)
x

array([1, 2, 3, 4, 5])

In [115]:
x < 3

array([ True,  True, False, False, False])

In [116]:
x > 3

array([False, False, False,  True,  True])

In [117]:
x == 3

array([False, False,  True, False, False])

In [118]:
x != 3

array([ True,  True, False,  True,  True])

In [119]:
np.count_nonzero (x<4)

3

In [120]:
np.sum(x<4)

3

In [121]:
np.any(x > 8)

False

In [122]:
np.all(x<8)

True

In [123]:
np.sum((x > 2) & (x < 5))

2

**17. File input & ouptput with arrays**

`ndarray` objects can save and load data from disk files. 
The IO functions load and save handle binary files with `.npy` extension while `loadtxt` and `savetxt` functions handle normal text files.

In [None]:
# Save input array with .npy extension
np.save('outfile', x)

# Save input array as a text file
np.savetxt('outfile.txt', x)

# Load and reconstruct array from outflie.npy
print(np.load('outfile.npy'))

# Load and reconstruct array from outflie.txt
print(np.loadtxt('outfile.txt'))

**18. References**

Numpy documentation https://docs.scipy.org/doc/numpy/

Jake VanderPlas (2016), Python Data Science Handbook: Essential tools for working with data

McKinney (2018), Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
