<img src="./assets/numpy_logo.png" alt="logo" width="400"/>

## Introduction to numpy

NumPy stands for 'Numerical Python'. It is an open-source Python library used to perform various mathematical and scientific tasks. It contains multi-dimensional arrays and matrices, along with many high-level mathematical functions that operate on these arrays and matrices. It contains among other things:

→ a powerful N-dimensional array object

→ sophisticated (broadcasting) functions

→ tools for integrating C/C++ and Fortran code

→ useful linear algebra, Fourier transform, and random number capabilities

## Installing NumPy
When you want to work with numpy locally, you should run the following commands:

You can install NumPy with:\
`pip install numpy`\
or\
`conda install numpy`

In our case, 4Geeks have prepared all the environment in order that you can work comfortably.

## Why should we use NumPy?

Numpy is a library to perform numerical calculation in python. We will use it mainly because it allows us to create and modify matrices, and to do operations on them with ease.

NumPy is, like Pandas, Matplotlib or Scikit-Learn, one of the packages that you cannot miss when you are learning Machine Learning, mainly because this library provides a matrix data structure that has some benefits over regular Python lists. Some of these benefits are: being more compact, quicker access to reading and writing articles, being more convenient and more efficient.

For example, we will see later in the bootcamp that working with images means dealing with three-dimensional matrices as large as 3840 x 2160, which means we will have 3×3840×2160 = 24883200 entries!!! 😱😱😱.

Working with matrices of that magnitude is practically impossible to carry out with lists and dictionaries if one wants to have an efficient and fast programming.

#### Exercise: Import the numpy package under the name `np` (★☆☆)

`numpy` is commonly imported as `np` so we highly recommend to put this alias.

In [109]:
import numpy as np

## What is an array and why it is importante for machine learning?

An array is a data structure consisting of a collection of elements (values or variables), each identified by at least one array index or key.


![alt text](assets/1D.png "1D")

We can have arrays from several dimension. Neural networks sometimes deal with 4D arrays. An array is known as the central data structure of the NumPy library. Array in NumPy is called as NumPy Array. Later, we will use also other kind of arrays thar are called: Tensors.

![alt text](./assets/3D.png "3D")

#### Exercise: Print the numpy version and the configuration (★☆☆)

You can print the version of any package of Python using `name_of_package.__version__`

In [110]:
import numpy as np
print(np.__version__)

1.22.1


#### Exercise: Create a null vector of size 10 (★☆☆)

A `null vector` is an array of zeros (`0`), also called `inicializacion vector`.

Check de function `np.zeros` (https://numpy.org/doc/stable/reference/generated/numpy.zeros.html)

In [111]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

#### Exercise: Create a vector of ones with size 10 (★☆☆)
Check de function `np.ones` (https://numpy.org/doc/stable/reference/generated/numpy.ones.html)

In [112]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

#### Exercise: Create an 1D array with specific start value, end value and number of values (★☆☆)

Check the function `np.linspace` (https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)

In [113]:
np.linspace(0,1,10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

#### Run: Create a vector (1D array) with random integers from 10 to 49 and dimension 1x35 (★☆☆)

When `dimension` is expressed `1x35` it means: One dimension array with 35 items (length=35).

Check the function `np.random` which allows you to create random arrays (https://numpy.org/doc/1.16/reference/routines.random.html)

In [114]:
import numpy as np

## 10 random numbers between (0, 1)
print(np.random.random(10))
## 35 random numbers between (10, 49)
print(np.random.random(35)*39 + 10)  

[0.17015129 0.56698851 0.78633601 0.61929506 0.33563124 0.52511488
 0.96553364 0.66138627 0.69108346 0.14637404]
[14.64038766 12.61994874 36.47640297 46.92855401 17.40408699 36.93033973
 15.22658446 33.79570819 34.10398268 32.24198395 42.94639059 19.97154715
 40.60873308 38.13560069 29.91463209 30.92864331 30.49871559 11.06603372
 42.20564822 20.37699238 39.80666037 27.42600448 20.0764982  21.9318803
 35.92904278 16.24998151 32.82138259 38.90567735 22.89829944 19.99623869
 39.87117795 47.58082493 45.09642731 45.32184815 19.46151858]


In [115]:
## Two ways to create numbers with normal distribution
print(np.random.rand(10)) # 10 random values with distribution N(0,1)
print(np.random.normal(loc = 0, scale = 1, size = 10)) # 10 random values with distribution N(0,1)

[0.01265982 0.008621   0.62328959 0.18069473 0.87350141 0.52001206
 0.5792359  0.47185601 0.86864877 0.71425068]
[-0.74033174 -2.20189508 -0.19214094  0.57960522 -0.35271032 -0.78765776
 -2.07322635  0.47274558  1.74585191  0.81600542]


In [116]:
## Did you notice the difference between both functions? 
print(np.random.normal(loc = -5, scale = 33, size = 10)) # 10 random values with distribution N(-5,33)

[-18.40661625 -49.03996609   2.24733284 -30.07038028  58.08383174
   1.67404869  27.55824293  20.59327864 -46.20942568 -28.78463119]


In [117]:
## 10 random values with uniform distribution. That is, all values have the same probability
print(np.random.uniform(-30,100,10)) # All values are between -30 and 100.

[ -7.25137029  80.66010982  69.55042418   8.97720122  32.83064284
  43.81216802  75.1565566   74.67776193  30.07124677 -17.68552314]


In [118]:
# 10 integers values between 0 and 100.
print(np.random.randint(0, 100, 10))

[24 21 27 53 63 93 36 23 20 15]


In [119]:
# 10 random values with Chi distribution with 5 degrees of freedom
print(np.random.chisquare(5,10))

[ 4.43124733  1.67610969  5.30452866  2.99524636  3.3875423   4.94196446
  3.37233463 11.42520907 11.40818609  4.35991753]


The above examples are the most common distribution and random values you will learn throughout the bootcamp. Now, let's deal with those arrays.

#### Exercise: Reverse one of the last vector we created before (first element becomes last) (★☆☆)
Try with `[::-1]`

In [120]:
rand_int_vect = np.random.randint(0, 100, 10)
print(rand_int_vect)
print(rand_int_vect[::-1])

[ 4 91 44 12 20  2 39 13 88  0]
[ 0 88 13 39  2 20 12 44 91  4]


#### Exercise: Create a 5x5 identidy matrix (★☆☆)
Check the function `np.eye`(https://numpy.org/devdocs/reference/generated/numpy.eye.html)

In [121]:
print(np.eye(5))

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


#### Exercise: Find indices of non-zero elements from [1,2,0,0,4,0] (★☆☆)
Check the function `where`(https://numpy.org/devdocs/reference/generated/numpy.where.html)

In [122]:
print(np.where(np.array([1,2,0,0,4,0]) != 0)[0])

[0 1 4]


#### Exercise: Create a 10x10 array with random values and find the minimum and maximum values (★☆☆)
Check the function `min`(https://numpy.org/devdocs/reference/generated/numpy.where.html) and `max`(https://numpy.org/devdocs/reference/generated/numpy.max.html)

In [123]:
arr_10_by_10 = np.random.random((10,10))
print(arr_10_by_10.min())
print(arr_10_by_10.max())

0.011180237653739655
0.9750409586840012


#### Exercise: Create a random vector of size 30 and find the mean value (★☆☆)

In [124]:
print(np.random.random(30).mean())

0.5287919134203217


#### Exercise: Define a function with input your date of birth (yyyy/mm/dd) that returns a random array with the following dimensions: (★★☆)
$$yyyy-1900 \times |mm - dd|$$

In [125]:
def birth_date(date_of_birth: str) -> np.array:
    date = date_of_birth.split("/")
    size = (int(date[0]) - 1900, np.abs(int(date[1]) - int(date[2])))
    return np.random.random(size) 

birthdate = input("enter your birthday: ")
print(birth_date(birthdate).shape)

(90, 0)


## What is the difference between Python List and a Numpy Array?

- Python list can contain elements with different data types whereas Numpy Array‘s elements are always homogeneous (same data types).
- NumPy arrays are faster and more compact than Python lists.

## Why NumPy Arrays are faster than Lists?

- NumPy Array uses fixed memory to store data and less memory than Python lists.
- Contiguous memory allocation in NumPy Arrays.

#### Exercise: Convert the list `my_list = [1, 2, 3]` to numpy array (★☆☆)

In [126]:
my_list = [1, 2, 3]
numpy_my_list = np.array(my_list)
print(my_list)
print(numpy_my_list)

[1, 2, 3]
[1 2 3]


#### Exercise: Convert the tuple `my_list = (1, 2, 3)` to numpy array (★☆☆)

In [127]:
my_list = (1, 2, 3)
numpy_my_list = np.array(my_list)
print(my_list)
print(numpy_my_list)

(1, 2, 3)
[1 2 3]


#### Exercise: Convert the list of tuples `my_list = [(1,2,3), (4,5)]` to numpy array (★☆☆)

In [128]:
my_list = [(1, 2, 3), (4,5)]
numpy_my_list = np.array(my_list, dtype=np.ndarray)
print(my_list)
print(numpy_my_list)

[(1, 2, 3), (4, 5)]
[(1, 2, 3) (4, 5)]


#### Exercise: Resize a random array of dimensions 5x12 into 12x5 (★☆☆)
Check `reshape`from `numpy`(https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)

In [129]:
size1 = 5,12
size2 = 12,5
arr1 = np.random.random(size1)
arr2 = arr1.reshape(size2)
print(arr1.shape)
print(arr2.shape)

(5, 12)
(12, 5)


#### Exercise: Create a function that normalize a 5x5 random matrix (★☆☆)
Remember from probability (https://en.wikipedia.org/wiki/Normalization_(statistics)) that :
$$ x_{norm} = \frac{x - \bar{x}}{\sigma}$$


In [130]:
arr_to_normalize = np.random.random((5,5))
arr_normalized = (arr_to_normalize - arr_to_normalize.mean()) / arr_to_normalize.std()
print(arr_normalized)

[[ 0.66849046 -1.08481439  0.53956816 -1.33280875 -0.04194667]
 [-1.28818316  0.78670414 -0.04460656  1.91373548  1.801395  ]
 [ 0.27834995  1.11897937  0.28683637  1.42621624 -1.43363307]
 [-0.02034645 -0.53134063 -1.30608893 -1.09176177  0.64709749]
 [ 0.03128855 -0.72761433 -1.04949426  1.02819807 -0.57422031]]


## Stacking numpy arrays

Stacking is used to join a sequence of same dimension arrays along a new axis.
`numpy.stack(arrays,axis)` : It returns a stacked array of the input arrays which has one more dimension than the input arrays.

### You have two ways to do it:


![alt text](./assets/stack.jpeg "stack")


### or


![alt text](./assets/stack2.jpeg "stack")




#### Exercise: Generate two random arrays with integers and apply the stacking using `stack` (★★☆)

In [131]:
arr_1 = np.random.randint(0, 10, (2,2))
arr_2 = np.random.randint(0, 10, (2,2))
print(arr_1)
print()
print(arr_2)
print()
print(np.stack((arr_1, arr_2)))

[[6 0]
 [8 6]]

[[4 2]
 [8 9]]

[[[6 0]
  [8 6]]

 [[4 2]
  [8 9]]]


#### Exercise: Generate two random arrays with integers and apply the stacking using `hstack` and `vstack` (★★☆)

In [132]:
arr_1 = np.random.randint(0, 10, (2,2))
arr_2 = np.random.randint(0, 10, (2,2))
print(arr_1)
print()
print(arr_2)
print()
print(np.hstack((arr_1, arr_2)))
print()
print(np.vstack((arr_1, arr_2)))

[[9 2]
 [8 8]]

[[3 2]
 [7 4]]

[[9 2 3 2]
 [8 8 7 4]]

[[9 2]
 [8 8]
 [3 2]
 [7 4]]


## Basic maths in numpy

You can make typical math operations like:

- Addition,Subtraction,Multiplication and Division between two arrays using numpy.
- Operation on array using sum() & cumsum() function.
- Minimum and Maximum value from an array
- Exponent/Power , Square Root and Cube Root functions

or even apply common trigonometric functions:

- `numpy.sin()`:  Sine (x) Function
- `numpy.cos()`: Cosine(x) Function
- `numpy.tan()`: Tangent(x)Function
- `numpy.sinh()`: Hyperbolic Sine (x) Function
- `numpy.cosh()`: Hyperbolic Cosine(x) Functionv
- `numpy.tanh()`: Hyperbolic Tangent(x)Function
- `numpy.arcsin()`: Inverse Sine Function
- `numpy.arccos()`: Inverse Cosine Function
- `numpy.arctan()`: Inverse Tangent Function
- `numpy.pi`: pi value
- `numpy.hypot(w,h)`: For calculating Hypotenuse $c = \sqrt{(w^2 + h^2)}$
- `numpy.rad2deg()`: Radians to degrees
- `numpy.deg2rad()`: Degrees to radians

#### Exercise: Generate two random 8 - dimensional vectors and apply the most common operation between vectors:  addition, substraction, multiplication, division(★☆☆)

Check the math functions here: https://numpy.org/doc/stable/reference/routines.math.html

In [133]:
vect_1 = np.random.random(8)
vect_2 = np.random.random(8)

print(vect_1 + vect_2)
print(vect_1 - vect_2)
print(vect_1 * vect_2)
print(vect_1 / vect_2)

[1.24214463 0.60153961 0.50078769 0.34281326 0.6413206  1.16579181
 1.04891684 1.18886541]
[-0.14408727 -0.44325631  0.33400171 -0.23559925  0.52669248 -0.39844605
 -0.15437937 -0.27788168]
[0.38054053 0.04134344 0.03480779 0.01550348 0.03347179 0.30007782
 0.26909839 0.33404568]
[ 0.79211665  0.15149686  5.00515336  0.18535907 10.18958561  0.49055568
  0.74340587  0.62109121]


#### Exercise: Generate two random matrices with dimensions between 5 and 10, e.g. 5x7 vs 8x9 and try. Were you able to do the matrix multiplication? why? (★★☆) 

In [134]:
matrix_1 = np.random.random((np.random.randint(5,10), np.random.randint(5,10)))
matrix_2 = np.random.random((np.random.randint(5,10), np.random.randint(5,10)))


#print(matrix_1 * matrix_2) #ValueError: operands could not be broadcast together with shapes (9,7) (6,5) 

#### Exercise: Given 2 numpy arrays as matrices, output the result of multiplying the 2 matrices (as a numpy array) Were you able to do the matrix multiplication? (★★☆) 

$$ a = \left(\begin{matrix}
0 & 1 & 2\\ 
3 & 4 & 5\\ 
6 & 7 & 8
\end{matrix}\right)$$

$$ b = \left(\begin{matrix}
2 & 3 & 4\\ 
5 & 6 & 7\\ 
8 & 9 & 10
\end{matrix}\right)$$




In [135]:
a = np.linspace(0, 8, 9).reshape((3,3))
b = np.linspace(2, 10, 9).reshape((3,3))
print(a)
print(b)
print(a * b)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
[[ 2.  3.  4.]
 [ 5.  6.  7.]
 [ 8.  9. 10.]]
[[ 0.  3.  8.]
 [15. 24. 35.]
 [48. 63. 80.]]


#### Exercise: Multiply a 5x3 matrix by a 3x2 matrix (real matrix product) (★★☆)

In [136]:
matrix_1 = np.random.random((5, 3))
matrix_2 = np.random.random((3, 2))

print(np.dot(matrix_1, matrix_2))

[[0.96937599 1.25642485]
 [0.44064618 0.3261438 ]
 [0.56112449 0.62241882]
 [1.17447833 1.02918752]
 [0.98484107 1.03109583]]


## Data types

Do you think the following preposition is true?
`8==8`

Surely you will definetely say yes, which is true mathematically, but computationally it is not always the same, at least in terms of memory. For example, run the following cell:

In [137]:
import sys

# int64
x = np.array(123)
print("int64: " + str(sys.getsizeof(x)))

# int8
x = np.array(123,dtype=np.int8)
print("int8: " + str(sys.getsizeof(x)))

# float32
x = np.array(123,dtype=np.float32)
print("float32: " + str(sys.getsizeof(x)))

int64: 104
int8: 97
float32: 100


## It turns out that there are many computational representation of the same number and you can create arrays with different Data Types (dtypes) depending what you need:

- Boolean : `np.bool_`
- Char : `np.byte`
- Short : `np.short`
- Integer : `np.short`
- Long : `np.int_`
- Float : `np.single`&np.float32`
- Double :`np.double`&`np.float64`
- `np.int8`: integer (-128 to 127)
- `np.int16`:integer( -32768 to 32767)
- `np.int32`: integer(-2147483648 to 2147483647)
- `np.int64`:integer( -9223372036854775808 to 9223372036854775807)


Sometimes, you will need to load, create or export arrays from different data types.


## Harder exercises

The next exercises are related with real situations you could face while you are working in data science and machine learning and we will be frequently talking about matrices and bidimensional arrays.

#### Exercise: Subtract the mean of each row of a matrix (★★☆)

In [138]:
print("Original matrix:\n")
X = np.random.rand(3, 10)
print(X)
print("\nSubtract the mean of each row of the said matrix:\n")
Y = X - X.mean(axis=1, keepdims=True)
print(Y)
print("\nSubtract the mean of each column of the said matrix:\n")
Z = X - X.mean(axis=0, keepdims=True)
print(Z)

Original matrix:

[[0.97835697 0.71032861 0.28002062 0.79942258 0.94623987 0.88835211
  0.71610242 0.35405523 0.04132503 0.33410359]
 [0.3922052  0.31959137 0.39025686 0.14879659 0.1714645  0.56382451
  0.69586562 0.20816189 0.13658679 0.42129119]
 [0.62248081 0.83951912 0.85810788 0.28784417 0.10636269 0.13556521
  0.4438836  0.30714327 0.73359478 0.59328099]]

Subtract the mean of each row of the said matrix:

[[ 0.37352627  0.10549791 -0.32481008  0.19459188  0.34140917  0.2835214
   0.11127172 -0.25077547 -0.56350567 -0.27072711]
 [ 0.04740075 -0.02521308  0.0454524  -0.19600787 -0.17333995  0.21902006
   0.35106117 -0.13664256 -0.20821766  0.07648674]
 [ 0.12970256  0.34674087  0.36532963 -0.20493408 -0.38641556 -0.35721304
  -0.04889466 -0.18563498  0.24081653  0.10050273]]

Subtract the mean of each column of the said matrix:

[[ 0.31400931  0.08718225 -0.22944116  0.38740147  0.53821752  0.35910483
   0.09748521  0.06426843 -0.2625105  -0.115455  ]
 [-0.27214246 -0.303555   -0.

#### Exercise: How to get the dates of yesterday, today and tomorrow? (★★☆)
Check `np.datetime64`, `np.timedelta64` in numpy (https://numpy.org/doc/stable/reference/arrays.datetime.html)

In [139]:
today = np.datetime64('today', 'D')
yesterday = today - np.timedelta64(1, 'D')
tomorrow = today + np.timedelta64(1, 'D')

print("today", today)
print("yesterday", yesterday)
print("tomorrow", tomorrow)

today 2022-01-20
yesterday 2022-01-19
tomorrow 2022-01-21


#### Exercise: How to get all the dates corresponding to the month of December 2022? (★★☆)
Combine `arange`with `datetime`


In [140]:
all_dates_solution1 = [(np.datetime64('2022-12', 'D') + np.timedelta64(day, 'D')).astype("str") for day in np.arange(31)]
all_dates_solution2 = np.arange('2022-12', '2023-01', dtype='datetime64[D]')
print(all_dates_solution1)
print(all_dates_solution2)


['2022-12-01', '2022-12-02', '2022-12-03', '2022-12-04', '2022-12-05', '2022-12-06', '2022-12-07', '2022-12-08', '2022-12-09', '2022-12-10', '2022-12-11', '2022-12-12', '2022-12-13', '2022-12-14', '2022-12-15', '2022-12-16', '2022-12-17', '2022-12-18', '2022-12-19', '2022-12-20', '2022-12-21', '2022-12-22', '2022-12-23', '2022-12-24', '2022-12-25', '2022-12-26', '2022-12-27', '2022-12-28', '2022-12-29', '2022-12-30', '2022-12-31']
['2022-12-01' '2022-12-02' '2022-12-03' '2022-12-04' '2022-12-05'
 '2022-12-06' '2022-12-07' '2022-12-08' '2022-12-09' '2022-12-10'
 '2022-12-11' '2022-12-12' '2022-12-13' '2022-12-14' '2022-12-15'
 '2022-12-16' '2022-12-17' '2022-12-18' '2022-12-19' '2022-12-20'
 '2022-12-21' '2022-12-22' '2022-12-23' '2022-12-24' '2022-12-25'
 '2022-12-26' '2022-12-27' '2022-12-28' '2022-12-29' '2022-12-30'
 '2022-12-31']


#### Exercise: Extract the integer part of a random array of positive numbers using 2 different methods (★★☆)

In [141]:
arr_random_1 = np.random.rand(10) * 10
print(arr_random_1.astype("int"))
print(np.int64(arr_random_1))


[9 9 9 3 0 0 3 9 6 6]
[9 9 9 3 0 0 3 9 6 6]


#### Exercise: Create a 5x5 matrix with row values ranging from 0 to 4 (★★☆)

In [142]:
m_5x5 = np.int64(np.random.rand(5,5) * 4)
print(m_5x5)

[[1 2 3 0 2]
 [2 1 2 1 3]
 [3 2 2 3 0]
 [2 0 2 0 3]
 [0 0 2 3 0]]


#### Exercise: Consider a generator function that generates 10 integers and use it to build an array (★★☆)

In [143]:
def generate():
   for n in range(10):
       yield n
gen_arr_10int = np.fromiter(generate(),dtype=int)

print(gen_arr_10int)

[0 1 2 3 4 5 6 7 8 9]


#### Exercise: Create a vector of size 10 with values ranging from 0 to 1, both excluded (★★☆)

In [144]:
print(np.linspace(0,1,10)[1:-1])

[0.11111111 0.22222222 0.33333333 0.44444444 0.55555556 0.66666667
 0.77777778 0.88888889]


#### Exercise: Create a random vector of size 10 and sort it (★★☆)

In [145]:
print(np.sort(np.random.rand(10)))

[0.02693832 0.09478285 0.28442163 0.30480718 0.33903514 0.59908453
 0.67816034 0.69776666 0.84697679 0.9023805 ]


#### Exercise: Consider two random array A and B, check if they are equal (★★☆)

In [146]:
A = np.random.rand(2,10)
B = np.random.rand(2,10)
print(np.allclose(A, B))
print(np.isclose(A, B))
##print(np.all(A, B)) # only integer scalar arrays
##print(np.any(A, B)) # only integer scalar arrays
print(np.equal(A, B))
print(A == B)


False
[[False False False False False False False False False False]
 [False False False False False False False False False False]]
[[False False False False False False False False False False]
 [False False False False False False False False False False]]
[[False False False False False False False False False False]
 [False False False False False False False False False False]]


#### Exercise: Consider a random 10x2 matrix representing cartesian coordinates, convert them to polar coordinates (★★★)
Suggestion: check how to calculate the "square of a matrix"

In [147]:
cartesian_coordinates = np.random.rand(10,2)
print(cartesian_coordinates)
#print(np.arctan(cartesian_coordinates[:,1]/cartesian_coordinates[:,0]).reshape(-1,1))
#print(np.sqrt((cartesian_coordinates[:,1])**2 + (cartesian_coordinates[:,0])**2).reshape(-1,1))
polar_coordinates = np.array([np.sqrt((cartesian_coordinates[:,0])**2 + (cartesian_coordinates[:,1])**2), np.arctan(cartesian_coordinates[:,1]/cartesian_coordinates[:,0])])
print(polar_coordinates.reshape(10,2))


[[0.00876709 0.87568928]
 [0.7671717  0.34402272]
 [0.72531661 0.3804504 ]
 [0.55997912 0.31693355]
 [0.20684382 0.53932858]
 [0.03455476 0.19600555]
 [0.41567398 0.41057984]
 [0.71456698 0.47783102]
 [0.50645578 0.01771741]
 [0.1761605  0.77783416]]
[[0.87573317 0.84077586]
 [0.81904011 0.64344657]
 [0.57763283 0.19902815]
 [0.58426079 0.85960948]
 [0.50676559 0.79753276]
 [1.56078502 0.42154745]
 [0.48307857 0.51502445]
 [1.20457622 1.39629462]
 [0.7792329  0.58940905]
 [0.03496887 1.34807779]]


#### Exercise: Create random vector of size 10 and replace the maximum value by 0 (★★☆)

In [148]:
rand_vect = np.random.randint(1,20,size=10)
print(rand_vect)
print(rand_vect.max())
rand_vect[rand_vect == rand_vect.max()]=0
print(rand_vect)

[12  8 19  4 10 15  1  6  5 11]
19
[12  8  0  4 10 15  1  6  5 11]


#### Exercise: How to print all the values of an array? (★★☆)

In [149]:
rand_vect = np.random.rand(12,20)*20 + 3

with np.printoptions(threshold=np.inf):
    print(rand_vect)
print(rand_vect.tolist())

[[11.27166147  7.40252353  3.20513442  3.55456502 12.81399402 13.33661125
  11.33455752 20.59021434 20.17638929 20.38986815 16.16432033 12.34901572
  14.8791023  22.3918881   5.21154672 20.76971407 10.2256909   7.92965157
  17.84681487  3.30254641]
 [10.77860431 17.80409861  8.71064626  5.46175406 21.19002388  4.13045399
   6.79870402  6.00854728  8.80830817 18.32453434 17.39620945 13.71247483
  15.8908031  18.21499207 17.17962249 20.19378986 11.71764751 21.57682867
  11.06915984  6.77663358]
 [20.93628331 10.75209713 19.51739505 16.95013359 14.12173277  7.51278112
   9.35637403  8.50983697 20.13914152 21.00561253  4.4770449  21.35289335
   7.80006838  5.3402917  19.6910711   9.25964977 12.08545846 20.31145389
  21.33467883  8.40942046]
 [14.56120477 13.2524418  14.6903374  19.72460334  9.82992021 11.24932527
  17.04934149 19.88166195 17.52684146  7.54965848  9.63774408 17.5098669
   4.94150495  5.98401461 17.16219029 16.79645087 22.55141529  5.49365174
  21.75258495 11.45082343]
 [21.

#### Exercise: How to convert a float (32 bits) array into an integer (32 bits) in place?
Check https://stackoverflow.com/a/4396247/5989906

In [150]:
rand_vect = np.random.rand(12,20).astype(np.float32) *20 + 3
y = rand_vect.view('int32')
y[:] = rand_vect
print(y)

[[22 11  7  3  9 18 19  5 22  3  8  7  7 20  7 10  7  3 13 10]
 [17 20  9 22 14 12 11  4 13 14 17  4 15  6  7  4 17  4 18 11]
 [12  8 12 14 10 15 17  3 20  8 21 19  4 21 14 22 19 22 21 18]
 [ 9  4 14 20  6  4  5 11 17  9  4 13 20  9 13  9  5 21 17  8]
 [13 19 15 13 20 18 16 14 13  9 14 15  5 20 19  5  6 19 13  5]
 [13 22  6 20 18 13  4 16 14  7  8 17 13 18  4 10  8  3 14 21]
 [21 18 10  3  6 15 21 22 21 13 15 22 20 16 11 11  9  4  5 21]
 [ 8 13 19 13 21 16 13 15 22 11  7  5 20 16 13 16  7 17 10  4]
 [11  7 14 13 19 19 18 11 20 16 16 15 19  9 10 21  4 20 19  3]
 [ 5 15  7 12  7  4 21 17 19  7 12 14 13 10  7  3 11 13  4 15]
 [19  5 12 18 11 17  8 18 10 18 18 20 20  9 12 21  7  3  5  9]
 [19  7 19  5  5 15 21 17  8 15  5  8 13 21 20 22  7 12 15  6]]


#### Exercise: Subtract the mean of each row of a matrix (★★☆)

In [151]:
mtx_a = np.random.randint(1,20,size=(3,3))
print(mtx_a)
print(mtx_a.mean(axis=1))
print(mtx_a - mtx_a.mean(axis=1).reshape(-1,1))
print(mtx_a - mtx_a.mean(axis=1, keepdims=True))

[[13 17 17]
 [17  1 18]
 [19  5  4]]
[15.66666667 12.          9.33333333]
[[ -2.66666667   1.33333333   1.33333333]
 [  5.         -11.           6.        ]
 [  9.66666667  -4.33333333  -5.33333333]]
[[ -2.66666667   1.33333333   1.33333333]
 [  5.         -11.           6.        ]
 [  9.66666667  -4.33333333  -5.33333333]]


#### Exercise: How to sort an array by the nth column? (★★☆)

In [152]:
mtx_a = np.random.randint(1,20,size=(3,5))
print(mtx_a)
print()
#mtx_a = mtx_a.tolist()
cols = [1]
print(mtx_a[np.lexsort(mtx_a.T[cols])])
print()
print(mtx_a[ mtx_a[:,1].argsort()])
print(mtx_a[ mtx_a[:,1].argsort(),:])
mtx_a.view('i8,i8,i8,i8,i8').sort(order=['f1'], axis=0)
print(mtx_a)


[[17 15 10 14 17]
 [13 14 19  6  2]
 [11 17 17 13 11]]

[[13 14 19  6  2]
 [17 15 10 14 17]
 [11 17 17 13 11]]

[[13 14 19  6  2]
 [17 15 10 14 17]
 [11 17 17 13 11]]
[[13 14 19  6  2]
 [17 15 10 14 17]
 [11 17 17 13 11]]
[[13 14 19  6  2]
 [17 15 10 14 17]
 [11 17 17 13 11]]


#### Exercise: Find the position of a the minimum of a 2D matrix? (★★☆)

In [153]:
mtx_a = np.random.randint(1,20,size=(3,5))
print(mtx_a.min())
print(mtx_a == mtx_a.min())
print(np.where(mtx_a == mtx_a.min()))

1
[[False  True False False False]
 [False False False False False]
 [False False False False False]]
(array([0]), array([1]))


#### Exercise: Read an image using openCV, check its dimensions, normalize the numbers and show the image (★★★)
Check: https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/

In [154]:
from cv2 import *
path = "/workspace/ml_pre_work/02-numpy/assets/1D.png"
img = cv2.imread(path)
print(img.size)
print(img.shape)
#img = (img - img.mean()) / img.std()
norm_img = np.zeros((img.shape[0],img.shape[1]))
final_img = cv2.normalize(img,  norm_img, 0, 255, cv2.NORM_MINMAX)
#cv2.imshow('image', img)
cv2.imwrite('/workspace/ml_pre_work/02-numpy/assets/normalized_image.jpg', final_img)

825300
(393, 700, 3)


True

#### Exercise: Considering a four dimensions array, how to get sum over the last two axis at once? (★★★)

In [155]:
A = np.random.randint(0,10,(3,2,3,4))
print(A.shape)
#print(A)
sum = A.reshape(A.shape[:-2] + (-1,)).sum(axis=-1)
print(sum)
sum = A.sum(axis=(-2,-1))
print(sum)

(3, 2, 3, 4)
[[66 55]
 [65 63]
 [66 49]]
[[66 55]
 [65 63]
 [66 49]]


#### Exercise: How to get the diagonal of a dot product? (★★★)

In [156]:
A = np.random.randint(0,10,(3,2))
B = np.random.randint(0,10,(2,3))
c = np.dot(A,B)
print(c)
print(c.diagonal())

[[ 65  83  94]
 [ 81 105 117]
 [ 51  51  75]]
[ 65 105  75]


#### Exercise: Consider an array of dimension (5,5,3), how to mulitply it by an array with dimensions (5,5)? (★★★)

In [157]:
A = np.random.randint(0,10,(5,5,3))
B = np.random.randint(0,10,(5,5))
C = (A.T * B.T).T
print(C)
D = A * B[:,:,None]
print(D)
print(D == C)

[[[ 0  0  0]
  [21 12 21]
  [54  6  6]
  [ 6 24  0]
  [20  8  8]]

 [[30 24 54]
  [20 32 24]
  [27  3  3]
  [12  4 10]
  [ 1  9  8]]

 [[ 0 42  0]
  [ 4 16  8]
  [28 28 42]
  [40 24 40]
  [28 12 24]]

 [[ 8  0 10]
  [ 0 63  7]
  [ 0  0  0]
  [ 2  8 18]
  [10 40 35]]

 [[24 64 56]
  [ 0 36 12]
  [16 48  8]
  [49 56 49]
  [30 30 48]]]
[[[ 0  0  0]
  [21 12 21]
  [54  6  6]
  [ 6 24  0]
  [20  8  8]]

 [[30 24 54]
  [20 32 24]
  [27  3  3]
  [12  4 10]
  [ 1  9  8]]

 [[ 0 42  0]
  [ 4 16  8]
  [28 28 42]
  [40 24 40]
  [28 12 24]]

 [[ 8  0 10]
  [ 0 63  7]
  [ 0  0  0]
  [ 2  8 18]
  [10 40 35]]

 [[24 64 56]
  [ 0 36 12]
  [16 48  8]
  [49 56 49]
  [30 30 48]]]
[[[ True  True  True]
  [ True  True  True]
  [ True  True  True]
  [ True  True  True]
  [ True  True  True]]

 [[ True  True  True]
  [ True  True  True]
  [ True  True  True]
  [ True  True  True]
  [ True  True  True]]

 [[ True  True  True]
  [ True  True  True]
  [ True  True  True]
  [ True  True  True]
  [ True  True  Tr

#### Exercise: How to swap two rows of an array? (★★★)

In [158]:
A = np.random.randint(0,10,(5,5))
print(A)
A[-2:] = np.flip(A[-2:],axis=0)
print(A)

[[2 1 0 4 0]
 [0 6 5 0 1]
 [1 3 0 8 5]
 [1 2 9 1 3]
 [1 0 0 8 3]]
[[2 1 0 4 0]
 [0 6 5 0 1]
 [1 3 0 8 5]
 [1 0 0 8 3]
 [1 2 9 1 3]]


#### Exercise: Read an image using openCV and tranpose it. What did you get exactly? Was the image rotated? Moved? Reflected with respect to an axis? (★★★)

In [159]:
path = "/workspace/ml_pre_work/02-numpy/assets/1D.png"
img = cv2.imread(path)
print(img.size)
print(img.shape)

final_img = cv2.transpose(img)
print(final_img.size)
print(final_img.shape)

cv2.imwrite('/workspace/ml_pre_work/02-numpy/assets/transposed_image.jpg', final_img)
#Reflected with respect to xx axis and rotated 90 degrees right to left

825300
(393, 700, 3)
825300
(700, 393, 3)


True

#### Exercise: Consider an array Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14], how to generate an array R = [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]]? (★★★)

In [160]:
from numpy.lib import stride_tricks

Z = np.arange(1,15,dtype=np.uint32)
R = stride_tricks.as_strided(Z,(11,4),(4,4))
print(R)

[[ 1  2  3  4]
 [ 2  3  4  5]
 [ 3  4  5  6]
 [ 4  5  6  7]
 [ 5  6  7  8]
 [ 6  7  8  9]
 [ 7  8  9 10]
 [ 8  9 10 11]
 [ 9 10 11 12]
 [10 11 12 13]
 [11 12 13 14]]


#### Exercise: How to find the most frequent value in an array? (★★★)

In [161]:
vals, counts = np.unique(R, return_counts=True)
mode_value = np.argwhere(counts == np.max(counts))
print("list of modes: ", vals[mode_value].flatten().tolist())
print("mode frequency: ", np.max(counts))

list of modes:  [4, 5, 6, 7, 8, 9, 10, 11]
mode frequency:  4


#### Exercise: How to get the n largest values of an array (★★★)

In [162]:
N = np.random.randint(0,200,(2,2,3,2))
print(N)
n = 4
ind = np.unravel_index(np.argsort(N, axis=None), N.shape) #for N-dimensional array
print(np.unique(N[ind])[-n:])

[[[[ 66  46]
   [156  47]
   [ 25 177]]

  [[143 148]
   [ 94 180]
   [  4  66]]]


 [[[185  82]
   [171  41]
   [183 127]]

  [[ 10  28]
   [  8  62]
   [ 95 148]]]]
[177 180 183 185]


#### Exercise: Consider a large vector Z, compute Z to the power of 3 using 3 different methods (★★★)

In [182]:
A_vect = np.linspace(1,200,2000)
pow_vect = np.linspace(3,3,2000)
A_vect_1 = A_vect**3
A_vect_2 = np.power(A_vect,3)
A_vect = np.linspace(1,200,2000)
A_vect_3 = np.float_power(A_vect, pow_vect)
#print(A_vect_3)
print(np.sum(A_vect_1 == A_vect_2))
print(np.sum(A_vect_1 == A_vect_3))
print(np.sum(A_vect_2 == A_vect_3))

2000
2000
2000


#### Exercise: Given a two dimensional array, how to extract unique rows? (★★★)

In [194]:
Z = np.random.randint(0,2,(16,4))
Z_unique = np.unique(Z, axis=0)
print(Z_unique)

[[0 0 0 0]
 [0 0 1 1]
 [0 1 0 0]
 [0 1 0 1]
 [0 1 1 1]
 [1 0 0 0]
 [1 0 0 1]
 [1 0 1 0]
 [1 0 1 1]
 [1 1 0 0]]


#### Exercise: Can you have arrray of strings? can you mix different data types in the same array? Can you operate (add, sub, mult) arrays with different data types? (★★★)

In [205]:
ARR = np.array([["rtef",("fag",34.3),"fag"], ["rtef",45,"fag"], ["rt","fag",23], ["ref","fag","fa"]])
print(ARR)
print(ARR.dtype)
ARR = np.array([[27.8,"fag","fag"], ["rtef",45,"fag"], ["rt","fag",23], ["ref",True,"fa"]])
print(ARR)
print(ARR.dtype)

[['rtef' ('fag', 34.3) 'fag']
 ['rtef' 45 'fag']
 ['rt' 'fag' 23]
 ['ref' 'fag' 'fa']]
object
[['27.8' 'fag' 'fag']
 ['rtef' '45' 'fag']
 ['rt' 'fag' '23']
 ['ref' 'True' 'fa']]
<U32


  ARR = np.array([["rtef",("fag",34.3),"fag"], ["rtef",45,"fag"], ["rt","fag",23], ["ref","fag","fa"]])
