<a href="https://colab.research.google.com/github/Hemkush/Python_Practice/blob/main/8_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NumPy

**NumPy**--which stands for Numerical Python--is the foundational package for performing scientific computing. In addition, it provides the primary data structure--*the n-dimensional array*--on which the **pandas** package is built. NumPy includes extensive functionality, but we will use it primarily for:

* Fast (vectorized) array operations for data processing
* Efficient descriptive statistics
* Manipulations for merging multiple data sets

In [None]:
# Import statement
import numpy as np
arr2d = np.array([[1,2,3],[4,5,6],[5,5,5],[6,6,6]])
print(arr2d)
np.split(arr2d,[1,2],axis=0)

[[1 2 3]
 [4 5 6]
 [5 5 5]
 [6 6 6]]


[array([[1, 2, 3]]),
 array([[4, 5, 6]]),
 array([[5, 5, 5],
        [6, 6, 6]])]

## ndarrays

The ndarray is an n-dimensional array object, similar to a list but designed to facilitate fast computation. However, in order for arrays to be useful, they must hold a single type of object. We will mostly focus on numerical (int, float) and boolean arrays.

Arrays will most likely be loaded from external data sources (later), but for now, we can create them via casting (using the np.**array** function) or using one of the following generating functions (or class of functions):

* np.**arange**(*start*, *stop*, *step*) (similar to range function for lists)
* np.**zeros**(*shape*), np.**ones**(*shape*) (where *shape* is a sequence of dimension sizes)
* np.random.**rand**(*d0*,*d1*,...,*dn*) (where *d0*,*d1*,...,*dn* are dimension sizes)
* np.random.**randn**(*d0*,*d1*,...,*dn*) (where *d0*,*d1*,...,*dn* are dimension sizes)

In [None]:
# Casting from list
np.array([1,5,-1,2,4])
np.arange(0,10,2)
np.zeros((2,3))
np.ones((3,3))
np.random.rand(2,2,2)
np.random.randn(2,2,2)


array([[[ 1.20912068,  1.12284447],
        [ 0.40954276, -0.66342321]],

       [[ 0.3263184 , -0.47849018],
        [ 0.21049037,  0.12509471]]])

In [None]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
# np.arange
arr1d = np.arange(10)
print(arr1d)
arr1d = np.arange(1,10,2)
print(arr1d)
arr1d_dec = np.arange(10,1,-2)
print(arr1d_dec)

[0 1 2 3 4 5 6 7 8 9]
[1 3 5 7 9]
[10  8  6  4  2]


In [None]:
# np.ones, np.zeros
arr1d = np.zeros(5)
print(arr1d)
arr2d = np.zeros((3,4))
print(arr2d)
arr2d = np.ones((5,10))
print(arr2d)
arr3d = np.ones((2,3,3))
print(arr3d)

[0. 0. 0. 0. 0.]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
[[[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]

 [[1. 1. 1.]
  [1. 1. 1.]
  [1. 1. 1.]]]


In [None]:
# np.random.rand: uniform distribution [0,1), np.random.randn: normal distribution
arr2d = np.random.rand(3,3)
arr2d

array([[0.2153479 , 0.70359464, 0.5600838 ],
       [0.96829485, 0.28933264, 0.33225845],
       [0.25028016, 0.34548529, 0.03108037]])

In [None]:
# Common attributes
print(arr2d.ndim) # number of dimensions
print(arr2d.shape) # shape of array
print(arr2d.dtype) # data type

2
(3, 3)
float64


In [None]:
# Casting to other dtypes
x = np.arange(10).astype(float)
x

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

See https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html for additional details on ndarray objects.

## Array Operations

As previously stated, arrays are designed to support fast computation and comparison. The most common type of operations are:

* Between arrays and scalars
* Universal functions (np.func)
    - Unary (performed on a single array): abs, sqrt, exp, log, ceil, floor, logical_not, and more
    - Binary (performed between two arrays): +, -, /, *, **, min, max, mod, >, >=, <, <=, ==, !=, logical_and, logical_or, logical_xor
* Mathematical and statistical functions - Available as NumPy functions (np.func) and array methods (arr.func)
    - Aggregation: mean, sum, std, var, min/max, argmin/argmax
    - Non-aggregation: cumsum, cumprod

In [None]:
# Broadcasting with a scalar
arr2d = np.random.rand(3,3)
print(arr2d)
arr2d + 2

[[0.30349514 0.45868961 0.02249532]
 [0.77317518 0.98036493 0.25075424]
 [0.37857635 0.76541034 0.12933758]]


array([[2.30349514, 2.45868961, 2.02249532],
       [2.77317518, 2.98036493, 2.25075424],
       [2.37857635, 2.76541034, 2.12933758]])

In [None]:
# Broadcasting with 1-d array / list
arr2d = np.random.rand(3,3)
print(arr2d,'\n')
print(arr2d + [1,2,3],'\n')
print(arr2d + [[1],[2],[3]])

[[0.74355566 0.50028852 0.83989381]
 [0.90622982 0.14108609 0.81939683]
 [0.66058322 0.93727066 0.17653179]] 

[[1.74355566 2.50028852 3.83989381]
 [1.90622982 2.14108609 3.81939683]
 [1.66058322 2.93727066 3.17653179]] 

[[1.74355566 1.50028852 1.83989381]
 [2.90622982 2.14108609 2.81939683]
 [3.66058322 3.93727066 3.17653179]]


In [None]:
# Comparison with a scalar
arr2d = np.random.rand(3,3)
print(arr2d)
print(arr2d.shape)
result = arr2d > 0.5
print(result.shape)
result

[[0.24364183 0.07978034 0.91736463]
 [0.96033726 0.43645403 0.82866617]
 [0.21212383 0.77177958 0.21509636]]
(3, 3)
(3, 3)


array([[False, False,  True],
       [ True, False,  True],
       [False,  True, False]])

In [None]:
# Unary functions
arr2d = np.random.rand(3,3)
print(arr2d,'\n')
np.sqrt(arr2d)

[[0.58799391 0.36296019 0.17689022]
 [0.31090154 0.59463525 0.54700181]
 [0.06345681 0.89102851 0.01249155]] 



array([[0.76680761, 0.60246178, 0.42058319],
       [0.55758546, 0.77112596, 0.73959571],
       [0.25190635, 0.94394307, 0.11176561]])

In [None]:
# Binary functions
arr2d = np.random.rand(3,3)
print(arr2d)
arr2d * arr2d

[[0.59851519 0.04745244 0.28351904]
 [0.36166917 0.77390481 0.29898737]
 [0.16090933 0.85507011 0.8579086 ]]


array([[0.35822043, 0.00225173, 0.08038305],
       [0.13080459, 0.59892865, 0.08939345],
       [0.02589181, 0.7311449 , 0.73600717]])

In [None]:
# Aggregation function
arr2d = np.random.rand(3,3)
print(arr2d,'\n')
np.mean(arr2d)

[[0.75016969 0.27372699 0.0486709 ]
 [0.89186565 0.52930071 0.22995081]
 [0.98663099 0.23810289 0.13712192]] 



0.4539489520208044

In [None]:
arr2d = np.random.rand(3,3)
print(arr2d,'\n')
x = np.mean(arr2d,axis=0)
x

[[0.38418307 0.1488595  0.95301889]
 [0.43851462 0.1973881  0.43497509]
 [0.20159641 0.91749105 0.64455275]] 



array([0.34143137, 0.42124622, 0.67751557])

In [None]:
np.mean(arr2d,axis=1)

array([0.49535382, 0.35695927, 0.58788007])

In [None]:
arr3d = np.array([[[1,2,3],[2,2,2]],[[1,1,1],[3,3,3]]])
print(arr3d, '\n')
x = np.mean(arr3d, axis=-2)
print(x)

[[[1 2 3]
  [2 2 2]]

 [[1 1 1]
  [3 3 3]]] 

[[1.5 2.  2.5]
 [2.  2.  2. ]]


In [None]:
# Non-aggregation function
arr2d = np.random.rand(3,4)
print(arr2d,'\n')
print(arr2d.cumsum(axis=1),'\n')
print(arr2d.cumsum(axis=0),'\n')
print(arr2d.cumsum())

[[0.48131556 0.54887534 0.068902   0.12644259]
 [0.97582188 0.04636399 0.90432557 0.01362739]
 [0.69326072 0.32028956 0.04976183 0.83662271]] 

[[0.48131556 1.0301909  1.0990929  1.22553549]
 [0.97582188 1.02218586 1.92651144 1.94013883]
 [0.69326072 1.01355028 1.06331211 1.89993482]] 

[[0.48131556 0.54887534 0.068902   0.12644259]
 [1.45713744 0.59523932 0.97322758 0.14006998]
 [2.15039816 0.91552888 1.02298941 0.9766927 ]] 

[0.48131556 1.0301909  1.0990929  1.22553549 2.20135737 2.24772135
 3.15204693 3.16567432 3.85893504 4.1792246  4.22898643 5.06560914]


In [None]:
print(arr2d,'\n')
arr2d.cumsum(axis=0)

[[0.92611917 0.90722632 0.09125576 0.84093905]
 [0.04984312 0.41894851 0.45788067 0.15396453]
 [0.98801729 0.31434004 0.73592878 0.83175202]] 



array([[0.92611917, 0.90722632, 0.09125576, 0.84093905],
       [0.97596229, 1.32617483, 0.54913643, 0.99490358],
       [1.96397958, 1.64051487, 1.28506521, 1.82665559]])

In [None]:
print(arr2d,'\n')
arr2d.cumsum(axis=1)

[[0.61475176 0.37156081 0.007913   0.04343997]
 [0.41730467 0.78642276 0.50445412 0.14762043]
 [0.92974985 0.57657874 0.39663184 0.96718814]] 



array([[0.61475176, 0.98631257, 0.99422557, 1.03766554],
       [0.41730467, 1.20372743, 1.70818155, 1.85580198],
       [0.92974985, 1.50632859, 1.90296044, 2.87014858]])

## Arrays vs. Lists

Arrays may seem similar to lists (e.g., they are both mutable and iterable sequences), but they are distinct data structures. Be sure to use an array whenever you are performing any large scale computations or comparisons.

In [None]:
# List operations
L = list(range(10))
print(L + L)
print(L * 3)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [None]:
# Array operations
arr = np.arange(10)
print(arr)
print(arr + arr);
print(arr * 3);

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]
[ 0  3  6  9 12 15 18 21 24 27]


In addition, lists do not have restrictions on the size of nested sequences, whereas arrays have restrictions for constructing a useful form of the object.

In [None]:
np.array([[1,2,3],[4,5,6,7],[8,9]])

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

In [None]:
x = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(x)
print(x.ndim)
print(x.shape)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
2
(3, 3)


## Indexing and Slicing

Indexing and slicing arrays is similar to lists...

In [None]:
arr = np.arange(2,20,2)
print(arr)
print(arr[5])
print(arr[-1])
print(arr[5:8])
print(arr[::2])
print(arr[::-2])
print(arr[10:1:-2])

[ 2  4  6  8 10 12 14 16 18]
12
18
[12 14 16]
[ 2  6 10 14 18]
[18 14 10  6  2]
[18 14 10  6]


...but unlike lists, array slices are views on the original array, so any updates to the array slice will be reflected in the original array. Consider the following example, in which we combine indexing with assignment (which also works as you would expect).

In [None]:
# list operations
print(L)
list_slice = L[5:8]
print(list_slice)
list_slice[0] = -10
print(list_slice)
print(L)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6, 7]
[-10, 6, 7]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [None]:
print(arr)
arr_slice = arr[5:8]
print(arr_slice)
arr_slice[0] = -10
print(arr_slice)
print(arr)

[0 1 2 3 4 5 6 7 8 9]
[5 6 7]
[-10   6   7]
[  0   1   2   3   4 -10   6   7   8   9]


Indexing via index and boolean arrays are convenient ways of filtering an array. In this case, these operations return a copy of the array, as opposed to a view on the original array as in the case of slicing.

In [None]:
# Index arrays - Each element in the index array is replaced by the corresponding value in the array
arr = np.arange(2,22,2)
print(arr)
x = arr[np.array([1,0,1,3,9,-1])]
print(x)
x[0] = 5
print(x)
print(arr)

[ 2  4  6  8 10 12 14 16 18 20]
[ 4  2  4  8 20 20]
[ 5  2  4  8 20 20]
[ 2  4  6  8 10 12 14 16 18 20]


In [None]:
# Boolean arrays - Each element in the array is returned
# if the corresponding boolean scalar is True
arr2d = np.random.rand(2,3)
print(arr2d,'\n')
x = np.array([[True,True,False],[True,True,False]])
print(x)
arr2d[x]
#arr[np.array([True, False, False, False, False, True, False, False, False, False])]

[[0.14394267 0.03321684 0.21479921]
 [0.8880587  0.81656329 0.0234431 ]] 

[[ True  True False]
 [ True  True False]]


array([0.14394267, 0.03321684, 0.8880587 , 0.81656329])

In [None]:
print(arr,'\n')

[ 2  4  6  8 10 12 14 16 18 20] 



In [None]:
# Filtering via conditional
print(arr,'\n')
arr % 5 == 0
arr[arr % 5 == 0]

[ 2  4  6  8 10 12 14 16 18 20] 



array([10, 20])

You can also use boolean arrays to learn about your data.

In [None]:
# How much of my data satisfies a given condition?
arr = np.random.randn(2,3)
print(arr,'\n')
print(arr > 0)
(arr > 0).sum(axis=1)

[[-0.67033248  1.31016519 -1.1328515 ]
 [-0.68515227  0.20298363  0.52371892]] 

[[False  True False]
 [False  True  True]]


array([1, 2])

In [None]:
# Do any of my data satisfy a given condition?
print(arr)
(arr > 3).any()

[[-0.47920883 -0.36391979  1.05943069]
 [-0.54791359  0.86788046 -0.27630787]]


False

In [None]:
# Do all of my data satisfy a given condition?
np.abs(arr) < 3
(np.abs(arr) < 3).all()

True

Indexing and slicing multi-dimensional arrays is fairly intuitive. Whereas a single dimensional array contains 0-dimensional values (scalars), a 2-dimensional array is an array of 1-d arrays, where the first dimension represents the position of each 1-d array, and the second dimension refers to a specific position within each 1-d array (where all 1-d arrays have the same length). Similarly, a 3-dimension array has 3 dimensions corresponding to the position of each 2-d array, 1-d array, and scalar value, respectively. And so on, for higher dimensions.

![](https://i.stack.imgur.com/R2IDC.png "Multi-dimensional arrays")

When indexing and slicing into an n-d array, each dimension is accessed in order, either via successive indexing or slicing operations or a sequence of dimensional indices or slices. To retain all of the elements for a particular dimension, use the ':' operator.

In [None]:
# 2-d array
arr2d = np.random.rand(3,3)
arr2d

array([[0.60556636, 0.6701663 , 0.09333857],
       [0.02562839, 0.35311706, 0.33897661],
       [0.25076129, 0.47261466, 0.86936739]])

In [None]:
# Index specific element
print(arr2d[0][0])
print(arr2d[0,0])

0.6055663648109673
0.6055663648109673


In [None]:
# Slice rows
print(arr2d,'\n')
#print(arr2d[0])
print(arr2d[0,:])

[[0.48131556 0.54887534 0.068902   0.12644259]
 [0.97582188 0.04636399 0.90432557 0.01362739]
 [0.69326072 0.32028956 0.04976183 0.83662271]] 

[0.48131556 0.54887534 0.068902   0.12644259]


In [None]:
# Slice columns
arr2d[:,1]

array([0.54887534, 0.04636399, 0.32028956])

In [None]:
# Fancy slicing
# arr2d[x1:x2:x3,y1:y2:y3]
print(arr2d)
print('=====')
arr2d[:2,1:]

[[0.48131556 0.54887534 0.068902   0.12644259]
 [0.97582188 0.04636399 0.90432557 0.01362739]
 [0.69326072 0.32028956 0.04976183 0.83662271]]
=====


array([[0.54887534, 0.068902  , 0.12644259],
       [0.04636399, 0.90432557, 0.01362739]])

In [None]:
# 3-d array
arr3d = np.random.rand(2,3,3).astype(float)
arr3d

array([[[0.62993964, 0.35701787, 0.36612931],
        [0.73867842, 0.85932564, 0.94576253],
        [0.95081841, 0.40172061, 0.42424592]],

       [[0.98741268, 0.4607928 , 0.70855408],
        [0.14907228, 0.1152063 , 0.68200722],
        [0.89877565, 0.46832205, 0.35275677]]])

In [None]:
# Index specific 2-d array
print(arr3d)
print('=====')
print(arr3d[1,:2,1:])

[[[0.62993964 0.35701787 0.36612931]
  [0.73867842 0.85932564 0.94576253]
  [0.95081841 0.40172061 0.42424592]]

 [[0.98741268 0.4607928  0.70855408]
  [0.14907228 0.1152063  0.68200722]
  [0.89877565 0.46832205 0.35275677]]]
=====
[[0.4607928  0.70855408]
 [0.1152063  0.68200722]]


In [None]:
# Slicing 3-d array
print(arr3d)
print('====')
print(arr3d[:,1,:])

[[[0.62993964 0.35701787 0.36612931]
  [0.73867842 0.85932564 0.94576253]
  [0.95081841 0.40172061 0.42424592]]

 [[0.98741268 0.4607928  0.70855408]
  [0.14907228 0.1152063  0.68200722]
  [0.89877565 0.46832205 0.35275677]]]
====
[[0.73867842 0.85932564 0.94576253]
 [0.14907228 0.1152063  0.68200722]]


In [None]:
print(arr3d)
print('====')
print(arr3d[:,:,1])

[[[0.62993964 0.35701787 0.36612931]
  [0.73867842 0.85932564 0.94576253]
  [0.95081841 0.40172061 0.42424592]]

 [[0.98741268 0.4607928  0.70855408]
  [0.14907228 0.1152063  0.68200722]
  [0.89877565 0.46832205 0.35275677]]]
====
[[0.35701787 0.85932564 0.40172061]
 [0.4607928  0.1152063  0.46832205]]


## Other Important Array Methods

### Conditional Logic

We saw that ternary expressions were a convenient way for us to generate conditional values:

*expr1* if *cond* else *expr2*

There are several ways to perform this task for a list (which we could then cast to an array):

1. Use a **for** loop
2. Use **map** with a lambda function
3. Use a list comprehension

For arrays, we use the np.**where** function!

In [None]:
np.where?

In [None]:
# Flip a coin N times
N = 10
print(np.random.rand(N)>0.5)
print(np.where(np.random.rand(N) > 0.5, 'H', 'T'))

[False  True  True False  True  True  True  True False False]
['T' 'T' 'T' 'H' 'T' 'H' 'H' 'T' 'H' 'T']


In [None]:
# Select a value at random
N = 10
a = np.arange(N)
b = np.zeros(N)
cond = np.random.rand(N) > 0.5
print(cond)
np.where(cond, a, b)

[ True  True False False False  True  True  True  True False]


array([0., 1., 0., 0., 0., 5., 6., 7., 8., 0.])

In [None]:
# Nested conditions
N = 10
a = np.ones(N)
b = np.zeros(N)
c = -np.ones(N)
np.where(np.random.rand(N) > 2/3, a, np.where(np.random.rand(N) > 1/2, b, c))

array([ 1., -1., -1.,  0.,  1.,  0.,  1.,  1.,  1.,  0.])

### Sorting

In [None]:
arr = np.random.rand(10)
arr

array([0.06195323, 0.88768089, 0.86317584, 0.77258537, 0.65005799,
       0.89596947, 0.97069916, 0.13337671, 0.01450898, 0.97775632])

In [None]:
# Return a copy of sorted array
np.sort(arr)

array([0.01450898, 0.06195323, 0.13337671, 0.65005799, 0.77258537,
       0.86317584, 0.88768089, 0.89596947, 0.97069916, 0.97775632])

In [None]:
print(arr)

[0.06195323 0.88768089 0.86317584 0.77258537 0.65005799 0.89596947
 0.97069916 0.13337671 0.01450898 0.97775632]


In [None]:
arr = np.random.rand(3,4)
print(arr)
print('====')
np.sort(arr,axis=0)

[[0.32666841 0.08898182 0.02987685 0.73887098]
 [0.31437894 0.24798777 0.31687965 0.22043315]
 [0.40666184 0.73296905 0.16242382 0.15241341]]
====


array([[0.31437894, 0.08898182, 0.02987685, 0.15241341],
       [0.32666841, 0.24798777, 0.16242382, 0.22043315],
       [0.40666184, 0.73296905, 0.31687965, 0.73887098]])

In [None]:
# Return sorting indices
arr = np.random.rand(2,3,4)
print(arr)
print('====')
#print(np.sort(arr,axis=0))
#print('====')
#print(np.sort(arr,axis=2))
#print('=====')
print(np.argsort(arr,axis=1))

[[[0.6725305  0.88951798 0.02213206 0.91467875]
  [0.09962658 0.53670697 0.35614075 0.0531732 ]
  [0.37131937 0.209439   0.70323189 0.68513849]]

 [[0.46281206 0.38574652 0.99535896 0.04864965]
  [0.94032864 0.40192301 0.24287751 0.58413134]
  [0.03821    0.19245988 0.08880189 0.54538272]]]
====
[[[1 2 0 1]
  [2 1 1 2]
  [0 0 2 0]]

 [[2 2 2 0]
  [0 0 1 2]
  [1 1 0 1]]]


### Set Logic

In [None]:
import numpy as np
arr1 = np.arange(10)
arr2 = np.arange(20,0,-2)
print(arr1, arr2)

[0 1 2 3 4 5 6 7 8 9] [20 18 16 14 12 10  8  6  4  2]


In [None]:
# Membership
7 in arr2

False

In [None]:
# Unique elements
print(arr1 * arr2, np.unique(arr1 * arr2))

[ 0 18 32 42 48 50 48 42 32 18] [ 0 18 32 42 48 50]


In [None]:
# Comparisons - np.intersect1d, .union1d, setdiff1d, setxor1d
print(arr1,arr2)
print('===')
print(np.setdiff1d(arr1,arr2))
print('===')
print(np.setxor1d(arr1,arr2))
print('===')
print(np.union1d(arr1,arr2))
print('===')
print(np.intersect1d(arr1,arr2))

[0 1 2 3 4 5 6 7 8 9] [20 18 16 14 12 10  8  6  4  2]
===
[0 1 3 5 7 9]
===
[ 0  1  3  5  7  9 10 12 14 16 18 20]
===
[ 0  1  2  3  4  5  6  7  8  9 10 12 14 16 18 20]
===
[2 4 6 8]


## Manipulating and Combining Arrays

Sometimes, you will need to manipulate or combine multiple arrays of data prior to performing any analysis. There are a lot of built-in functions for these purposes. Very rarely will you need to develop your own code.

#### Manipulating Arrays

In [None]:
arr = np.arange(8)
print(arr.shape)
arr2d = np.random.rand(3,5)
arr2d.shape[0]

(8,)


3

In [None]:
arr = np.arange(8)
print(arr)
print('=====')

[0 1 2 3 4 5 6 7]
=====


In [None]:
# Reshaping arrays: it is very important for the machine learning

print(arr.reshape((-1,2)))
print('=====')
print(arr)
print('=====')
print(arr.reshape((2,-1,2)))
print(arr.reshape((2,1,-1,2))) # automatically determines the other dimension size

[[0 1]
 [2 3]
 [4 5]
 [6 7]]
=====
[0 1 2 3 4 5 6 7]
=====
[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]
[[[[0 1]
   [2 3]]]


 [[[4 5]
   [6 7]]]]


In [None]:
# Transpose
print(arr)
arr.reshape((2,4)).T # Our first example of chaining methods together

[0 1 2 3 4 5 6 7]


array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

In [None]:
# Flatten
arr = np.arange(24)
print(arr)
print('=======')
print(arr.reshape((2,-1,4)))
print('=======')
print(arr.reshape((2,3,4)).flatten('C')) # row-major C-language style
print('=======')
print(arr.reshape((2,3,4)).flatten('F')) # column-major Fortran-language style

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[ 0 12  4 16  8 20  1 13  5 17  9 21  2 14  6 18 10 22  3 15  7 19 11 23]


#### Combining and Splitting Arrays

In [None]:
arr1 = np.random.rand(4,2)
arr2 = np.random.rand(4,2)
print(arr1)
print('=======')
print(arr2)

[[0.3086021  0.81009785]
 [0.96017877 0.19622608]
 [0.9607127  0.6717326 ]
 [0.70845416 0.0972709 ]]
[[0.14757709 0.57373485]
 [0.73985223 0.17124275]
 [0.35440685 0.73904348]
 [0.59075312 0.13297693]]


In [None]:
# Concatenation by row
np.concatenate([arr1, arr2], axis=1)

array([[0.3086021 , 0.81009785, 0.14757709, 0.57373485],
       [0.96017877, 0.19622608, 0.73985223, 0.17124275],
       [0.9607127 , 0.6717326 , 0.35440685, 0.73904348],
       [0.70845416, 0.0972709 , 0.59075312, 0.13297693]])

In [None]:
# Stacking rows
np.hstack([arr1, arr2]) # also, np.row_stack

array([[0.3086021 , 0.81009785, 0.14757709, 0.57373485],
       [0.96017877, 0.19622608, 0.73985223, 0.17124275],
       [0.9607127 , 0.6717326 , 0.35440685, 0.73904348],
       [0.70845416, 0.0972709 , 0.59075312, 0.13297693]])

In [None]:
# Concatenate by column
np.concatenate([arr1, arr2], axis=0)

array([[0.3086021 , 0.81009785],
       [0.96017877, 0.19622608],
       [0.9607127 , 0.6717326 ],
       [0.70845416, 0.0972709 ],
       [0.14757709, 0.57373485],
       [0.73985223, 0.17124275],
       [0.35440685, 0.73904348],
       [0.59075312, 0.13297693]])

In [None]:
# Stacking columns
np.vstack([arr1, arr2]) # also, np.column_stack

array([[0.3086021 , 0.81009785],
       [0.96017877, 0.19622608],
       [0.9607127 , 0.6717326 ],
       [0.70845416, 0.0972709 ],
       [0.14757709, 0.57373485],
       [0.73985223, 0.17124275],
       [0.35440685, 0.73904348],
       [0.59075312, 0.13297693]])

In [None]:
np.split?

In [None]:
# Splitting arrays
#np.split?
arr2 = np.random.rand(100,40)
x= np.split(arr2,[2,10,20],axis=0)
print(len(x))
print(x[3].shape)
print(type(x))
np.split(arr1, [1,2], axis=0) # also, np.hsplit, vsplit

4
(80, 40)
<class 'list'>


[array([[0.3086021 , 0.81009785]]),
 array([[0.96017877, 0.19622608]]),
 array([[0.9607127 , 0.6717326 ],
        [0.70845416, 0.0972709 ]])]

## File Input/Output

There are two primary ways to save/load NumPy arrays to/from a file:

* Binary format (.npy) - np.**save** and np.**load**
* Delimited text file (.txt) - np.**savetxt** and np.**loadtxt** (also, np.**genfromtext** for files with missing data)

In [None]:
arr2d = np.random.rand(3,5)
print(arr2d)
# Print current working directory
%pwd

[[0.76613026 0.48018365 0.91837183 0.69230982 0.81001985]
 [0.95447955 0.75505663 0.61443833 0.42985265 0.11627129]
 [0.49143105 0.34747085 0.69516257 0.69172734 0.00974311]]


'/content'

In [None]:
# Save arr2d to binary file
np.save('test', arr2d) # function will add the .npy extension

In [None]:
# Delete arr2d and re-load from file
del arr2d
arr2d = np.load('test.npy') # must include .npy extension
arr2d

array([[0.76613026, 0.48018365, 0.91837183, 0.69230982, 0.81001985],
       [0.95447955, 0.75505663, 0.61443833, 0.42985265, 0.11627129],
       [0.49143105, 0.34747085, 0.69516257, 0.69172734, 0.00974311]])

In [None]:
# Save arr2d to text file
np.savetxt('test.txt', arr2d, fmt='%.4f', delimiter=',')

In [None]:
# Delete arr2d and re-load from file
del arr2d
arr2d = np.loadtxt('test.txt', delimiter=',')
arr2d

array([[0.8142, 0.6197, 0.9781, 0.2362, 0.509 ],
       [0.7547, 0.3282, 0.3924, 0.6913, 0.3947],
       [0.4907, 0.1055, 0.2457, 0.0116, 0.4858]])