# Python Numpy Notes
### Numpy >> Lists
- Speed: It uses fixed data types for all elements
- Uses contiguous memory

### Applications of Numpy
- Matlab Replacement
- Plotting using Matplotlib
- Backend with Pandas
- Machine Learning with Tensorflow

## Basics of Numpy

In [43]:
#import numpy
import numpy as np

### creating arrays

In [44]:
#create a 0D array
zeroDarr = np.array(1)
print(zeroDarr)
print(zeroDarr.ndim)

1
0


In [45]:
#create a 1D array
oneDarr = np.array([1,2,3])
print(oneDarr)
print(oneDarr.ndim)

[1 2 3]
1


In [46]:
#create a 2D array
twoDarr = np.array([[1,2,3],
                    [4,5,6]])
print(twoDarr)
print(twoDarr.ndim)

[[1 2 3]
 [4 5 6]]
2


In [47]:
#create a 3D array
threeDarr = np.array([[[1,2,3],
                        [4,5,6]],
                        
                        [[7,8,9],
                        [10,11,12]]])
print(threeDarr)
print(threeDarr.ndim)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
3


##### using py sequences

In [48]:
#for single list => creates a 1D array
l1 = [1,2,3]
arr1 = np.array(l1)
print(arr1)
print(arr1.ndim)

[1 2 3]
1


In [49]:
#for one nested list => creates a 2D array
l2 = [[1,2,3], [4,5,6]]
arr2 = np.array(l2)
print(arr2)
print(arr2.ndim)

[[1 2 3]
 [4 5 6]]
2


In [50]:
#for multiple nested lists => multidimensional array

In [51]:
#using repeated concatenations:
arr3 = np.array([0]*3)
print(arr3)

[0 0 0]


In [52]:
arr4 = np.array([[[0]*4]*3]*2)
print(arr4)

[[[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]

 [[0 0 0 0]
  [0 0 0 0]
  [0 0 0 0]]]


##### constant arrays

In [53]:
#array of all zeros of specified shape
zeroarr = np.zeros((2,3))
print(zeroarr)

[[0. 0. 0.]
 [0. 0. 0.]]


In [54]:
#array of all ones of specified shape
onearr = np.ones((2,3))
print(onearr)

[[1. 1. 1.]
 [1. 1. 1.]]


In [55]:
#array of all ns of specified shape
narr = np.full((3,2), 5)
print(narr)

[[5 5]
 [5 5]
 [5 5]]


In [56]:
#array of random values of specified shape
ranarr = np.random.random((3,2))
print(ranarr)

[[0.86113028 0.70411613]
 [0.70880284 0.83730733]
 [0.4101436  0.75001776]]


In [57]:
#array of specific number of random integers with a start and end value
#here, start =  1, end = 10, number of values = 9
ranintarr = np.random.randint(1,10,9)
print(ranintarr)

[6 1 5 3 2 3 7 6 3]


In [58]:
#array of specified number of random values that follow normal dstribution
randnormarr = np.random.randn(5)
print(randnormarr)

[ 1.40207178 -0.46627063  0.15945616  0.38055144 -0.57468483]


In [59]:
#n x n identity matrix
identity = np.eye(3)
print(identity)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [60]:
#py sequence to array
tup = (1,2,3)
seqarr = np.asarray(tup)
print(seqarr)
print(type(seqarr))

[1 2 3]
<class 'numpy.ndarray'>


In [61]:
#sequential arrays with specified steps
#here, start = 3, end = 10, step = 2, 
steparr = np.arange(2,10,2)
print(steparr)

[2 4 6 8]


In [62]:
#sequential array with start = 0, end = 5, step = 1, end exclusixe
steparr2 = np.arange(5)
print(steparr2)

[0 1 2 3 4]


In [63]:
#alt for arange()
#print n evenly spaced int in a given interval
#linspace(i, j, n)
#[i, j] => interval and n => number of elements
steparr3 = np.linspace(2, 10, 5)
print(steparr3)

[ 2.  4.  6.  8. 10.]


## some basic array functions

In [64]:
a = np.array([[1,2,3],
            [4,5,6]])
print(a)

[[1 2 3]
 [4 5 6]]


In [65]:
#to get dimensions of an array
a.ndim

2

In [66]:
#to get shape of an array
a.shape

(2, 3)

In [67]:
#to get size of an array: total elements
a.size

6

In [68]:
#to get datatype of an array
a.dtype

dtype('int32')

In [69]:
#to get size of one array item
a.itemsize

4

In [70]:
#to get memory used by array
# also = size * itemsize
a.nbytes

24

In [71]:
#to change the shape/dimensions
print(a.reshape(6))
print(a.reshape(6).ndim)
print(a.reshape(6).shape)

[1 2 3 4 5 6]
1
(6,)


In [72]:
b = a
#to check if 2 arrays share the same memory
np.shares_memory(a,b)

True

In [73]:
#to copy an array to another array without sharing memory:
c = np.copy(a)
print(c)
np.shares_memory(a,c)

[[1 2 3]
 [4 5 6]]


False

In [74]:
#to print the position of maximum value along an axis
#here, to print the position of maximum value along the rows
np.argmax(a, axis = 0)

array([1, 1, 1], dtype=int64)

In [75]:
#to print the position of minimum value along an axis
#here, to print the position of minimum value along the rows
np.argmin(a, axis = 0)

array([0, 0, 0], dtype=int64)

In [76]:
type(a)

numpy.ndarray

In [77]:
a.transpose()

array([[1, 4],
       [2, 5],
       [3, 6]])

## Array Traversal

In [78]:
#using the ndenumerate fucntion:
m = np.array([0, 1, 2, 3, 4, 5])
m

array([0, 1, 2, 3, 4, 5])

In [79]:
#for a 1D array
for i in np.ndenumerate(m):
    print(i)

((0,), 0)
((1,), 1)
((2,), 2)
((3,), 3)
((4,), 4)
((5,), 5)


In [80]:
#for a 2D array
n = m.reshape(2,3)
for i in np.ndenumerate(n):
    print(i)

((0, 0), 0)
((0, 1), 1)
((0, 2), 2)
((1, 0), 3)
((1, 1), 4)
((1, 2), 5)


In [81]:
#for a 3D array:
n = m.reshape(1, 2, 3)
for i in np.ndenumerate(n):
    print(i)

((0, 0, 0), 0)
((0, 0, 1), 1)
((0, 0, 2), 2)
((0, 1, 0), 3)
((0, 1, 1), 4)
((0, 1, 2), 5)


#### Array Traversal Working:
- The left array preserves the ordering of the original data if you traverse the columns within a row, and then proceed to the next row. This is known as **row-major or C-** ordering. (C)
- The array on the right preserves the ordering if you traverse the rows within a given column, and then transition to the next column. This is thus referred to as **column-major or F-** ordering.(Fortran)
- By default numpy uses row-major ordering
<br><br>
- Working of row-major ordering for conversion of arr(24) to arr(2,3,4)
1. Create an empty array of the desired shape: (2, 3, 4).
2. Start by inserting the 0th element from the input array into the (0, 0, 0) element of the output array.
3. Advance the index by *increasing the index of the last axis, first*, and inserting the following element from the input array.
4. If you reach the end of an axis (axis-2, for instance, only has 4 slots), reset the index for that axis to 0, and advance the index of the preceding axis. Go back to 3.

## Vectorization
maps the function over the array applying it to each element and producing a new result array

### Mathematical Functions:

#### Unary Functions:

In [82]:
uarr = np.array([1,2,3])
uarr

array([1, 2, 3])

In [83]:
#absolute value:
np.absolute(uarr)

array([1, 2, 3])

In [84]:
#square root
np.sqrt(uarr)

array([1.        , 1.41421356, 1.73205081])

In [85]:
#sin
#other functions like cos, tan
np.sin(uarr)

array([0.84147098, 0.90929743, 0.14112001])

In [86]:
#natural log
np.log(uarr)

array([0.        , 0.69314718, 1.09861229])

In [87]:
#log to yhe base 10
np.log10(uarr)

array([0.        , 0.30103   , 0.47712125])

In [88]:
#log to yhe base 2
np.log2(uarr)

array([0.       , 1.       , 1.5849625])

In [89]:
#e^x
np.exp(uarr)

array([ 2.71828183,  7.3890561 , 20.08553692])

#### Binary Functions:

In [90]:
barr = np.array([-1,-2,-3])
barr

array([-1, -2, -3])

In [91]:
#multiply 2 arrays
#multiple methods for element wise multiplication
print(uarr * barr)
print(np.multiply(uarr, barr))

[-1 -4 -9]
[-1 -4 -9]


In [92]:
#dot product
print(sum(np.multiply(uarr, barr)))
print(np.dot(uarr, barr))
print(np.matmul(uarr, barr))

-14
-14
-14


In [93]:
#cross product
print(np.cross(uarr, barr.reshape(1,3)))

[[0 0 0]]


In [94]:
#element wise addition
print(np.add(uarr, barr))
print(uarr + barr)

[0 0 0]
[0 0 0]


In [95]:
#element wise subtraction
print(np.subtract(uarr, barr))
print(uarr - barr)

[2 4 6]
[2 4 6]


In [96]:
#element wise division
print(np.divide(uarr, barr))
print(uarr / barr)

[-1. -1. -1.]
[-1. -1. -1.]


In [97]:
# element wise x^y
print(np.power(barr, uarr))
print(barr ** uarr)

[ -1   4 -27]
[ -1   4 -27]


In [98]:
# element wise modulus : x % y
print(np.mod(barr, uarr))
print(barr % uarr)

[0 0 0]
[0 0 0]


In [99]:
#element wise maximum of 2 arrays
print(np.maximum(barr, uarr))

[1 2 3]


In [100]:
#element wise minimum of 2 arrays
print(np.minimum(barr, uarr))

[-1 -2 -3]


#### Sequential functions:

In [101]:
sarr = np.array([1,2,3])
sarr

array([1, 2, 3])

In [102]:
#mean of all values
np.mean(sarr)

2.0

In [103]:
#median of all values
np.median(sarr)

2.0

In [104]:
#variance of all values
np.var(sarr)

0.6666666666666666

In [105]:
#standard deviation of all values
np.std(sarr)

0.816496580927726

In [106]:
#minimum from the array
np.min(sarr)

1

In [107]:
#maximum from the array
np.max(sarr)

3

In [108]:
#sum of all values
np.sum(sarr)

6

These functions by default apply to the whole array<br>
Use ```axis = value``` to apply it to a specific axis

## Array broadcasting
- a mechanism used by NumPy to permit vectorized mathematical operations between arrays of unequal, but compatible shapes.
- an array will be treated as if its contents have been replicated along the appropriate dimensions, such that the shape of this new, higher-dimensional array suits the mathematical operation being performed
- `np.broadcast_to(array_name, shape)`to broadcast an array to the specified shape
- Numpy doesn’t replicate the data but broadcasts the contents to fill the desired shape(if compatible)

### Rules for Broadcasting
- Align the entries of shape such that the trailing dimensions are aligned
- Check for each pair of dimensions:
1. the aligned dimensions have the same size
2. one of the dimensions has a size of 1
- Should satisfy either of the conditions

### Examples for compatilbility check
array-1: 4 x 3<br>
array-2:     3<br>
result-shape: 4 x 3

array-1:     5 x 2<br>
array-2: 5 x 4 x 2<br>
result-shape: INCOMPATIBLE

array-1: 8 x 1 x 3<br>
array-2: 8 x 5 x 3<br>
result-shape: 8 x 5 x 3

array-1: 1 x 3 x 2<br>
array-2:     8 x 2<br>
result-shape: INCOMPATIBLE

array-1: 3 x 1<br>
array-2:     1 x 4<br>
result-shape: 3 x 4

In [109]:
#example
broad_arr = np.array([1,2,3])
broad_arr

array([1, 2, 3])

In [110]:
broad_arr.shape

(3,)

In [111]:
np.broadcast_to(broad_arr, (4,3))

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

##### Adding size-1 dimensions to an array
- dimensions of size-1 are special in that they can be broadcasted to any size
- can introduce size-1 dimensions to an array without changing the overall size (i.e. total number of entries in an array)

In [112]:
#using reshape function
broad_arr.reshape((1,3,1,1))

array([[[[1]],

        [[2]],

        [[3]]]])

In [113]:
#using the newaxis function
new_axis_arr = broad_arr[np.newaxis, :, np.newaxis, np.newaxis]
new_axis_arr

array([[[[1]],

        [[2]],

        [[3]]]])

## Indexing

### Basic Indexing
Always returns a view of data
> <aside>
💡 Given an N-dimensional array, `x`, `x[index]` invokes **basic indexing** whenever `index` is a *tuple* containing any combination of the following types of objects:

- integers
- slice objects
- Ellipsis objects
- numpy.newaxis objects
</aside>

##### using indexes and slices
already done

##### using tuples

In [114]:
#using tuples
tup_arr = np.array([[1,2,3],
                    [4,5,6]])
tup_arr[(1,2)]

6

##### using ellipsis
- represented by: `...`
- only one inside an index

check example:

In [115]:
ellip_arr = np.array([[[ 0,  1,  2,  3],
               [ 4,  5,  6,  7]],

              [[ 8,  9, 10, 11],
               [12, 13, 14, 15]],

              [[16, 17, 18, 19],
               [20, 21, 22, 23]]])
ellip_arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [116]:
# equivalent: y[:, :, 0]
ellip_arr[..., 0]

array([[ 0,  4],
       [ 8, 12],
       [16, 20]])

In [117]:
# using an explicit tuple
ellip_arr[(Ellipsis, 0)]

array([[ 0,  4],
       [ 8, 12],
       [16, 20]])

In [118]:
# equivalent: `y[0, :, 1]`
ellip_arr[0, ..., 1]

array([1, 5])

### Advanced Indexing
- Always produces copy of underlying data

<aside>
💡 Given an N-dimensional array, `x`, `x[index]` invokes **advanced indexing** whenever `index` is:

- an integer-type or boolean-type `numpy.ndarray`
- a `tuple` with at least one *sequence*type object as an element (e.g. a list, tuple, or ndarray)
</aside>

##### Integer Array Indexing:

**Into 1D Arrays**
- directly create array of any size and specify indices and then index the indices array into the original array to get the array of same shape as indices array and values from the locations of of array
- The indexing array can have an arbitrary shape; *the resulting array will match that shape*

In [119]:
new_arr = np.array([ 0, -1, -2, -3, -4, -5])
new_arr

array([ 0, -1, -2, -3, -4, -5])

In [120]:
index = np.array([2, 4, 0, 4, 4, 4])
new_arr[index]

array([-2, -4,  0, -4, -4, -4])

In [121]:
#advanced indexing returns a copy
np.shares_memory(new_arr, new_arr[index])

False

In [122]:
# utilizing a 2D-array as an index
index_2d = np.array([[ 1,  2,  0],
                     [ 5,  5,  5],
                     [ 2,  3,  4]])

In [123]:
# the resulting shape matches the shape of the indexing array
new_arr[index_2d]

array([[-1, -2,  0],
       [-5, -5, -5],
       [-2, -3, -4]])

**into ND arrays:**
- in order to perform this variety of indexing on an N-dimensional array, we must specify N index-arrays; one for each dimension

In [124]:
new_arr = np.array([[[ 0,  1,  2,  3],
                [ 4,  5,  6,  7],
                [ 8,  9, 10, 11]],

               [[12, 13, 14, 15],
                [16, 17, 18, 19],
                [20, 21, 22, 23]]])
new_arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [125]:
# specifies subsequent sheets to access
ind0 = np.array([0, 1, 0])

# specifies subsequent rows to access
ind1 = np.array([0, 2, 1])

# specifies subsequent columns to access
ind2 = np.array([3, 3, 0])

In [126]:
new_arr[ind0, ind1, ind2]
#Here,
#3:  sheet-0, row-0, column-3
#23:  sheet-1, row-2, column-3
#4:  sheet-0, row-1, column-0

array([ 3, 23,  4])

##### Boolean Array Indexing:
- for selecting contents from an array based on logical conditions
- Working:
Suppose `x`is an N-dimensional array, and `ind` is a boolean-value array *of the same shape as* `x`. Then `x[ind]` returns a 1-dimensional array, which is formed by traversing `x`and `ind` using row-major ordering. Wherever an element of `ind`is `True`, the corresponding entry of `x`is added to the end of the resulting array

In [127]:
# advanced indexing using a boolean-array
bool_arr = np.array([[[-0.26,  0.49,  0.18],
                [ 0.43,  0.3 ,  0.29]],

               [[-0.44,  0.3 ,  0.28],
                [ 0.27, -0.09, -0.13]]])
bool_arr

array([[[-0.26,  0.49,  0.18],
        [ 0.43,  0.3 ,  0.29]],

       [[-0.44,  0.3 ,  0.28],
        [ 0.27, -0.09, -0.13]]])

In [128]:
# `True` wherever `x` is positive
bool_ind = bool_arr > 0
bool_ind

array([[[False,  True,  True],
        [ True,  True,  True]],

       [[False,  True,  True],
        [ True, False, False]]])

In [129]:
#displaying the array
bool_arr[bool_ind]

array([0.49, 0.18, 0.43, 0.3 , 0.29, 0.3 , 0.28, 0.27])

##### Boolean to integer indexing
- using `numpy.where(booleanArrayName)`to convert the bool array to int
- produce the *tuple* of index-arrays that access the `True`entries of that array, via integer array indexing

In [130]:
bool_ind

array([[[False,  True,  True],
        [ True,  True,  True]],

       [[False,  True,  True],
        [ True, False, False]]])

In [131]:
np.where(bool_ind)
#a tuple of three integer-valued index-arrays are returned, one for each dimension of bool_ind

(array([0, 0, 0, 0, 0, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 1, 1, 0, 0, 1], dtype=int64),
 array([1, 2, 0, 1, 2, 1, 2, 0], dtype=int64))

In [132]:
#using it in reverse
ind0, ind1, ind2 = np.where(bool_ind)
bool_arr[ind0, ind1, ind2]

array([0.49, 0.18, 0.43, 0.3 , 0.29, 0.3 , 0.28, 0.27])

In [133]:
# unpacking the arrays is not necessary, you can use the tuple as an index
bool_arr[np.where(bool_ind)]

array([0.49, 0.18, 0.43, 0.3 , 0.29, 0.3 , 0.28, 0.27])

## Stacking Arrays
- `np.vstack((arrays))`: vertically stack up arrays i.e. row wise(1D arrays should have same length)
- `np.hstack((arrays))`: vertically stack up arrays i.e. col wise(1D arrays may not same length)
- `np.dstack((arrays))`: stack depth wise i.e. along 3rd axis(1D and 2D arrays should have same shape)
- `numpy.column_stack(tup)` : It works the same as hstack and joins arrays in sequence horizontally(column-wise).
- `numpy.row_stack(tup)` : It works the same as vstack and joins arrays in sequence vertically(row-wise).

In [134]:
#using the stack function
arr_1 = np.array([1, 2, 3])
arr_2 = np.array([2, 3, 4])
print(arr_1)
print(arr_2)

[1 2 3]
[2 3 4]


In [135]:
arr_stack=np.stack((arr_1,arr_2),axis=0)
print(arr_stack)

[[1 2 3]
 [2 3 4]]


In [136]:
print(arr_stack.shape)

(2, 3)


In [137]:
x = np.array([1,2,3])
y = np.array([-1,-2,-3])
print(x,y)

[1 2 3] [-1 -2 -3]


##### vertically stack x and y

In [138]:
np.vstack([x,y])

array([[ 1,  2,  3],
       [-1, -2, -3]])

##### horizontally stack x and y

In [139]:
np.hstack([x,y])

array([ 1,  2,  3, -1, -2, -3])

##### dimensional stacking

In [140]:
#using dstack
dstack_arr=np.dstack((arr_1, arr_2))
print(dstack_arr)

[[[1 2]
  [2 3]
  [3 4]]]


## Splitting Arrays
- split an array into multiple sub arrays
- returns list of split arrays
- `numpy.split(array,Indices_or_sections,axis)`: split into equal parts, else error
- `numpy.array_split(array,Indices_or_sections,axis)`: split into equal or near equal parts
- `numpy.hsplit(array,Indices_or_sections,axis)`: split an array horizontally(col wise) ⇒ split with axis = 1
- `numpy.vsplit(array,Indices_or_sections,axis)`: split an array vertically(row wise) ⇒ split with axis = 0

## Statistics with Numpy
Some basic functions:
![Numpy Statistics image](Untitled.png)