# <a id='toc1_'></a>[Analyze and Prepare data using Python](#toc0_)

Lesson: 03

Time: 00:00:00

**Table of contents**<a id='toc0_'></a>    
- [Analyze and Prepare data using Python](#toc1_)    
  - [NumPy](#toc1_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

---

## <a id='toc1_1_'></a>[NumPy](#toc0_)

In [1]:
import numpy as np

A array of given shape and type, filled with `fill_value`:

In [6]:
np.full(shape=(2,3), fill_value=6)

array([[6, 6, 6],
       [6, 6, 6]])

Identity matrix/array:

In [7]:
np.identity(n=3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Change typing:

In [13]:
x = np.array([1.0, 3.2, 0.8, 4.0, 7.98])
print(x)
print(x.astype(np.int64))

[1.   3.2  0.8  4.   7.98]
[1 3 0 4 7]


In [14]:
y = np.array([1, 3, 0, 4, 7])
print(y)
print(y.astype(np.float64))

[1 3 0 4 7]
[1. 3. 0. 4. 7.]


_Slice_ and _change_ the original array value:

In [41]:
arr = np.arange(10)
print(arr)  # The original values of arr

x = arr[2:6]
print(x)
print()

x[1] = 17
print(x)
print(arr)  # The values of the array after Slicing and changing the value
print()

x[:] = 64
print(x)
print(arr)  # The values of the array after Slicing and changing the value

[0 1 2 3 4 5 6 7 8 9]
[2 3 4 5]

[ 2 17  4  5]
[ 0  1  2 17  4  5  6  7  8  9]

[64 64 64 64]
[ 0  1 64 64 64 64  6  7  8  9]


But with use the `copy()`:

In [40]:
arr = np.arange(10)
print(arr)  # The original values of arr

x = arr[2:6].copy()
print(x)
print()

x[1] = 17
print(x)
print(arr)  # The values of the array after Slicing and changing the value
print()

x[:] = 64
print(x)
print(arr)  # The values of the array after Slicing and changing the value

[0 1 2 3 4 5 6 7 8 9]
[2 3 4 5]

[ 2 17  4  5]
[0 1 2 3 4 5 6 7 8 9]

[64 64 64 64]
[0 1 2 3 4 5 6 7 8 9]


Boolean indexing _(Return True values of a array on other array)_:

In [65]:
names = np.array(['ali', 'sara', 'taha', 'ali'])
print(names)
print(names == 'ali')

['ali' 'sara' 'taha' 'ali']
[ True False False  True]


In [66]:
data = np.random.randint(low=10, size=(4, 3))
print(data)
print()

print(data[names == 'ali'])
print()

print(~data[names == 'ali'])  # ~ is Not symbol

[[6 2 6]
 [3 9 8]
 [8 8 7]
 [8 2 6]]

[[6 2 6]
 [8 2 6]]

[[-7 -3 -7]
 [-9 -3 -7]]


In [67]:
print(data[names == 'ali', 1:])

[[2 6]
 [2 6]]


In [68]:
mask1 = (names == 'ali') | (names == 'taha')
mask2 = (names == 'ali') & (names == 'taha')

print(mask1)
print(data[mask1])
print()

print(mask2)
print(data[mask2])

[ True False  True  True]
[[6 2 6]
 [8 8 7]
 [8 2 6]]

[False False False False]
[]


Convert Negative values to Zero:

In [78]:
x = np.random.randn(3, 4)
print(x)
print()

x[x < 0] = 0
print(x)

[[ 0.77211152 -0.76597199  0.75102038  0.33594753]
 [-0.68683318  1.85224896  0.75384187  1.84777921]
 [-0.25003729  1.63777146  0.89882982 -0.60706259]]

[[0.77211152 0.         0.75102038 0.33594753]
 [0.         1.85224896 0.75384187 1.84777921]
 [0.         1.63777146 0.89882982 0.        ]]


Fancy indexing (_Indexing using integer arrays_):

In [99]:
# Create data
arr = np.empty(shape=(5, 3))

for i in range(arr.shape[0]):  # arr.shape[0] == 5
    arr[i] = 5*i+1  # is a arbitrary value
    
print(arr)

[[ 1.  1.  1.]
 [ 6.  6.  6.]
 [11. 11. 11.]
 [16. 16. 16.]
 [21. 21. 21.]]


In [104]:
# Fancy indexing (with use the two brackets)
print( arr[[0, 2, -2, -5, -3, 1]] )

[[ 1.  1.  1.]
 [11. 11. 11.]
 [16. 16. 16.]
 [ 1.  1.  1.]
 [11. 11. 11.]
 [ 6.  6.  6.]]


A new type of Slicing:

In [115]:
a = np.arange(35).reshape((7, 5))
print(a)
print()

print( a[[6, 0, 2], [2, 4, 3]] )  # First bracket is Row numbers and second is Column numbers
print()

print( a[[2, 6]][:, [0, 3, 1]] )

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]
 [25 26 27 28 29]
 [30 31 32 33 34]]

[32  4 13]

[[10 13 11]
 [30 33 31]]


Transposing arrays and Swapping axes:

In [119]:
arr = np.arange(8).reshape((2, 4))
print(arr)
print()

print(arr.T)  # Transpose

[[0 1 2 3]
 [4 5 6 7]]

[[0 4]
 [1 5]
 [2 6]
 [3 7]]


In [124]:
z = np.arange(60).reshape((3, 4, 5))  # 3*4*5 == 60
print(z)
print()

print(z.shape)

[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]
  [15 16 17 18 19]]

 [[20 21 22 23 24]
  [25 26 27 28 29]
  [30 31 32 33 34]
  [35 36 37 38 39]]

 [[40 41 42 43 44]
  [45 46 47 48 49]
  [50 51 52 53 54]
  [55 56 57 58 59]]]

(3, 4, 5)


In [127]:
# Axes swapping
axe = z.swapaxes(0, 1)  # change the Original axes: (3, 4, 5) -> (4, 3, 5)
print(axe)
print()

print(axe.shape)

[[[ 0  1  2  3  4]
  [20 21 22 23 24]
  [40 41 42 43 44]]

 [[ 5  6  7  8  9]
  [25 26 27 28 29]
  [45 46 47 48 49]]

 [[10 11 12 13 14]
  [30 31 32 33 34]
  [50 51 52 53 54]]

 [[15 16 17 18 19]
  [35 36 37 38 39]
  [55 56 57 58 59]]]

(4, 3, 5)


In [129]:
# A way same with z.swapaxes(0, 1)
print(z.transpose((1, 0, 2)))

[[[ 0  1  2  3  4]
  [20 21 22 23 24]
  [40 41 42 43 44]]

 [[ 5  6  7  8  9]
  [25 26 27 28 29]
  [45 46 47 48 49]]

 [[10 11 12 13 14]
  [30 31 32 33 34]
  [50 51 52 53 54]]

 [[15 16 17 18 19]
  [35 36 37 38 39]
  [55 56 57 58 59]]]


Separation of Decimal and Integer part:

In [132]:
x = np.array([2.6, 8.5, -9])

d, i = np.modf(x)

print(np.modf(x))
print(d)  # Decimal part
print(i)  # Integer part

(array([ 0.6,  0.5, -0. ]), array([ 2.,  8., -9.]))
[ 0.6  0.5 -0. ]
[ 2.  8. -9.]


Maximum Values of Multiple arrays:

In [135]:
x = np.random.randn(4)
y = np.random.randn(4)

print(x)
print(y)
print()

print(np.maximum(x, y))  # x and y must be have same length

[-0.06623655  0.55547904 -2.48191337  2.92861181]
[-0.17258407  1.05849755  0.39515491  0.09070186]

[-0.06623655  1.05849755  0.39515491  2.92861181]


Where ( `np.where()` ):

In [140]:
# without Where function (is bad)
arr1 = np.array([1, 5, 8])
arr2 = np.array([4, 7, 12])
condition = np.array([True, False, True])

r = [(x if cond else y) for x, y, cond in zip(arr1, arr2, condition)]

print(r)

[1, 7, 8]


In [142]:
# with Where function (is good)
res = np.where(condition, arr1, arr2)  # arr1 If condition Else arr2
print(res)

[1 7 8]


another example:

In [165]:
x = np.random.randn(2, 3)
print(x)
print()

print(x > 0)
print()

print(np.where(x > 0, 13, -1))  # 13 If x > 0 Else -1
print()

print(np.where(x > 0, 1, x))  # 1 If x > 0 Else x

[[ 0.72039162 -0.74075494 -0.7425139 ]
 [-0.02910374  0.62051568 -0.96183579]]

[[ True False False]
 [False  True False]]

[[13 -1 -1]
 [-1 13 -1]]

[[ 1.         -0.74075494 -0.7425139 ]
 [-0.02910374  1.         -0.96183579]]
