## Numpy N-dimensional Arrays

#### Motivation:

* Mathematical functions for fast operations on entire arrays of data without having to write loops.
* linear algebra
* random number generation
* data munging and cleaning
* subsetting and filtering
* transformation, and any other kinds of computations
* sorting
* unique
* set operations
* Expressing conditional logic as array expressions instead of loops with if-elif-else branches
* Group-wise data manipulations (aggregation, transformation, function application)

##### NumPy-based algorithms are generally 10 to 100 times faster (or more) than their pure Python counterparts and use significantly less memory.

In [None]:
import numpy as np

In [None]:
arr = [1,2,3,4]
arr

[1, 2, 3, 4]

In [None]:
arr = np.array([1,2,3,4,5])
arr

array([1, 2, 3, 4, 5])

In [None]:
arr2 = np.array([1,2,3,4.0,5])
arr2

array([1., 2., 3., 4., 5.])

In [None]:
arr3 = np.array(['hello',2])
arr3

array(['hello', '2'], dtype='<U21')

In [None]:
l1 = [1,'2',4.0]
l1

[1, 'a', 4.0]

In [None]:
a = np.arange(6) 
print(a) 

[0 1 2 3 4 5]


In [None]:
np.linspace(1, 2, 4)  

array([1.        , 1.33333333, 1.66666667, 2.        ])

In [None]:
np.arange(10, 30, 5)

array([10, 15, 20, 25])

In [None]:
np.random.rand()

0.006346772644350573

In [None]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [None]:
a = [1,2]
b = [3,4]

print(a)
print(b)

[1, 2]
[3, 4]


In [None]:
arr = [[1,2],[3,4],[6,7]]
arr

[[1, 2], [3, 4], [6, 7]]

In [None]:
len(arr[0])

2

In [None]:
# row, col
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
np.full((4,5),3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [None]:
np.random.random((3,3))

array([[0.83181479, 0.05619459, 0.83762906],
       [0.96017102, 0.79294208, 0.45985624],
       [0.14032915, 0.16030955, 0.44373191]])

In [None]:
np.random.randint(0,10,(3,3))

array([[2, 3, 4],
       [2, 5, 9],
       [5, 5, 6]])

In [None]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [None]:
# n-dimensional array

x1 = np.random.randint(10,size=5)
x2 = np.random.randint(10,size=(2,5))
x3 = np.random.randint(10,size=(2,2,2))

print(x1)
print('\n')

print(x2)
print('\n')

print(x3)


[6 2 2 4 4]


[[0 3 3 1 8]
 [9 9 0 5 6]]


[[[1 5]
  [4 4]]

 [[7 5]
  [8 8]]]


##### Random numbers are useful when we have to visualize the distribution of data, we can create random numbers using np.random.randn function of random class

In [None]:
data = np.random.randn(2, 4)
data

array([[-0.19832708, -0.15764621,  0.88544037,  0.36486148],
       [ 0.98874872, -1.1725344 ,  0.79148217, -0.23354255]])

In [None]:
data.mean()

0.15856031131613918

In [None]:
data.std()

0.6904374166569202

## Mathematical operations on numpy arrays

In [None]:
arr = np.random.random(100)
print(arr)

[0.88929922 0.77930962 0.40514615 0.89979832 0.68614969 0.20205667
 0.107712   0.39330353 0.16517556 0.09826604 0.96227779 0.55339067
 0.74481862 0.34377244 0.39260275 0.20716695 0.6515574  0.78264983
 0.69228467 0.8582067  0.64205919 0.86325104 0.61955729 0.84001506
 0.84403731 0.07976155 0.16038319 0.01041679 0.97559198 0.00858289
 0.84388646 0.3189685  0.53425383 0.56852231 0.38112356 0.3476578
 0.67829646 0.91569244 0.96231417 0.59558709 0.768116   0.86787681
 0.88312977 0.47370671 0.46566799 0.33949082 0.76158728 0.59645061
 0.95032255 0.13324071 0.57713626 0.62811099 0.77893485 0.86204346
 0.32019515 0.54442625 0.49189051 0.79312995 0.37442414 0.01238912
 0.55487967 0.5783675  0.87496156 0.54815827 0.72580478 0.07599328
 0.22142366 0.19023634 0.15305167 0.11521121 0.1610467  0.59437559
 0.28808862 0.45814229 0.50165613 0.55165886 0.27596329 0.21079497
 0.42963506 0.77229563 0.37442844 0.09468375 0.3174934  0.49974508
 0.98100813 0.26959662 0.5785209  0.68046458 0.06305069 0.53231

In [None]:
np.sum(arr)

50.74215374224535

In [None]:
np.max(arr)

0.981008130240093

In [None]:
np.min(arr)

0.008582892806858755

In [None]:
data = np.random.randn(2,4)
data

array([[-0.67359414, -2.22456168,  1.69108421,  0.81756886],
       [-0.02306208,  2.56906686, -0.58663781, -0.1678238 ]])

In [None]:
# multiplication

data*2

array([[-1.34718828, -4.44912337,  3.38216842,  1.63513772],
       [-0.04612415,  5.13813371, -1.17327561, -0.3356476 ]])

In [None]:
# addition

data+2

array([[ 1.32640586, -0.22456168,  3.69108421,  2.81756886],
       [ 1.97693792,  4.56906686,  1.41336219,  1.8321762 ]])

In [None]:
self_add = data + data
self_add

array([[-1.34718828, -4.44912337,  3.38216842,  1.63513772],
       [-0.04612415,  5.13813371, -1.17327561, -0.3356476 ]])

In [None]:
data - data

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
arr = [1,2,3,4]

for i in range(len(arr)):
  arr[i] = arr[i] +2

arr


[3, 4, 5, 6]

In [None]:
np.array([1,2,3,4])+2

array([3, 4, 5, 6])

##### Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [None]:
1/data 

array([[ -1.48457348,  -0.44952676,   0.59133661,   1.22313856],
       [-43.36123006,   0.38924639,  -1.70462932,  -5.95863048]])

In [None]:
# square of each element of array

data ** 2

array([[4.53729066e-01, 4.94867469e+00, 2.85976581e+00, 6.68418838e-01],
       [5.31859367e-04, 6.60010451e+00, 3.44143914e-01, 2.81648273e-02]])

##### comaprison of arrays

In [None]:
data2 = np.random.randn(2, 4)
data2


array([[-1.35548559, -0.34444791, -1.96986954, -1.73608008],
       [ 0.71340975, -0.7318783 ,  2.31771313,  0.5983998 ]])

In [None]:
data>data2

array([[ True, False,  True,  True],
       [False,  True, False, False]])

## Dimension Checking

In [None]:
# check the shape or order of data

data.shape

(2, 4)

##### order is 2x4 (2 rows and 4 columns)

In [None]:
# check the dimension of data

data.ndim

2

##### data is 2 dimensional array

In [None]:
data.dtype

dtype('float64')

In [None]:
np.empty((2, 3))

array([[0.0e+000, 4.9e-324, 9.9e-324],
       [1.5e-323, 2.0e-323, 2.5e-323]])

In [None]:
np.zeros((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [None]:
# np.empty((2, 3, 2))

#### It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values

# Indexing

In [None]:
arr = np.array([1, 2, 3, 4])
arr[1:3]

array([2, 3])

In [None]:
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
arr[:1]

array([[1, 2, 3, 4, 5]])

In [None]:
arr[:,1:]

array([[ 2,  3,  4,  5],
       [ 7,  8,  9, 10]])

# Sorting

In [None]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

In [None]:
y = np.array([[2,1,4,3,5],[10,7,9,8,6]])
np.sort(y,axis=0)

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [None]:
help(np.arange)

Help on built-in function arange in module numpy:

arange(...)
    arange([start,] stop[, step,], dtype=None, *, like=None)
    
    Return evenly spaced values within a given interval.
    
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range` function, but returns an ndarray rather than a list.
    
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use `numpy.linspace` for these cases.
    
    Parameters
    ----------
    start : integer or real, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : integer or real
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off 

## Book Reference: Python for Data Analysis (2nd Ed)
## Book : Python for DataScience