### Numpy

In this notebook we are going to learn how to work and manipulate `ndarrays` using numpy.

<p align="center"><img width="200" src="numpy.png" alt="img"/></p>

We are going to learn the following:

* array creation ✅
* input and output ✅
* array datatypes and shapes ✅
* array manipulations  ✅
* binary operations
* string operations
* date time functions
* linear algebra
* statistics
* sorting searching and counting.


### 1. Array Creation

* In this section we are going to learn how to create and initialize arrays in numpy. First things first we need to import our `numpy` package with an alias `np`. 

In [1]:
import numpy as np

Let's check the verison of `numpy` that we are using.

In [2]:
np.__version__

'1.24.3'

Creating an array using the `empty` function. Note that the `empty` function accept the shape of the array to be created.

In [3]:
array = np.empty(shape=(2, 4))
array

array([[0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
        0.00000000e+000],
       [0.00000000e+000, 6.77858066e-321, 1.55542291e-311,
        6.95293141e-310]])

We can also create an `empty` array with the shape of the existing array using the `empty_like` function as follows:

In [4]:
arr1 = np.array([[2, 4], [4, 7]])
array = np.empty_like(arr1)
array

array([[  981201803, -1501991202],
       [ 1268211782,  1562989479]])

We can create an array or `0s` and `1s` using the `eye`.

> The `eye` return a 2-D array with `ones` on the diagonal and `zeros` elsewhere.


In [5]:
array = np.eye(4)
array

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

One of the usecase of the `eye` function is to  create `one_hot_encoding` vectors by using the active index of a class. Let's implement a function that will create a one_hot vectors for our classes (red, green, blue) 

In [6]:
classes = "red, green, blue".split(', ')
def one_hot(index: int):
    return np.eye(len(classes))[index]

# blue
one_hot(2)

array([0., 0., 1.])

We can also create arrays with `1s` on the main diagonal using the `identity` function.

In [7]:
array = np.identity(4)
array

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

We can create a `one_hot` encoding funstion using this numpy `identity` function as follows

In [8]:
def one_hot(index: int):
    return np.identity(len(classes))[index]
# red
one_hot(0)

array([1., 0., 0.])

We can create array of `1s` using the `ones` or `ones_like` in numpy as follows:

In [9]:
ones = np.ones(shape=(2,3))
ones

array([[1., 1., 1.],
       [1., 1., 1.]])

In [10]:
ones = np.ones_like(array)
ones

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

We can also create array of `zeros` using the `zeros` and `zeros_like` function in numpy as follows:

In [11]:
zeros = np.zeros(shape=(2, 3))
zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

In [12]:
zeros = np.zeros_like(ones)
zeros

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In numpy we only create array with `zeros` and `ones` using `zeros` or `ones` function. Let's say we want to create an array with `4s` in it, how can we do that? 

> With the help of `full` and `full_like` we can create an array of numbers filled with a specified digit.

In [13]:
fours = np.full(shape=(2, 3), fill_value=4 )
fours

array([[4, 4, 4],
       [4, 4, 4]])

In [14]:
fours = np.full_like(zeros, fill_value=4 )
fours

array([[4., 4., 4., 4.],
       [4., 4., 4., 4.],
       [4., 4., 4., 4.],
       [4., 4., 4., 4.]])

We are not limited to create an array using the `full` method of just numbers, we can create array of any type for example in the following code cell we are going to create a `2x3` array with the boolean value `True`.

In [15]:
np.full(shape=(2, 3), fill_value=True )

array([[ True,  True,  True],
       [ True,  True,  True]])

We can create arrays using the `array` method. This method is very flexible in creating arrays as it allows us to create arrays with our own values without bothering in specifying the shape of the array.

> Note that the values in this array must have the same datatype. 

In [16]:
array = np.array([2, 3, 4, 5])
array

array([2, 3, 4, 5])

In [17]:
array = np.array([[2, 3, 4], [5, 6, 7.]])
array

array([[2., 3., 4.],
       [5., 6., 7.]])

In [18]:
array = np.array([True, False, True, True, False])
array

array([ True, False,  True,  True, False])

We can create array from a file. Suppose we have a file that has the following contents in it:
    
```
12 15 -17 8 7 89 77
13 14 -17 8 7 89 77
```
And the file name is `numbers.txt` we can load these numbers in a numpy array using a `fromfile` numpy function as follows

In [19]:
array = np.fromfile('numbers.txt', sep=' ')
array

array([ 12.,  15., -17.,   8.,   7.,  89.,  77.,  13.,  14., -17.,   8.,
         7.,  89.,  77.])

The good thing with numpy is that we can also generate arrays from iteratables using the `fromiter`. The following is an example that generates array from an iteratable object.

In [20]:
numbers = iter(range(0, 10 , 2))
array = np.fromiter(numbers, dtype=np.int32)
array

array([0, 2, 4, 6, 8])

In [21]:
numbers = (x for x in range(0, 11, 2))
array = np.fromiter(numbers, dtype=np.int32)
array

array([ 0,  2,  4,  6,  8, 10])

We can generate array fro  strings using the `fromstring` function. In the following code cell we are going to generate array of number from a string as follows:

In [22]:
numbers = '12 15 -17 8 7 89 77'
array = np.fromstring(numbers, sep=' ')
array

array([ 12.,  15., -17.,   8.,   7.,  89.,  77.])

In numpy we can generate array of numbers using the `arange` fuction. We need to specify the start, stop and the step.

In [23]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [24]:
a = np.arange(start=0, stop=-10, step=-2)
a

array([ 0, -2, -4, -6, -8])

### Input and Output

In this section we are going to learn how we can save and load numpy arrays to static file. The first method that we are going to learn is the `save`.

The `np.save()` method allows us to save numpy arrays as a binary file with an extension `.npy`. In the forllowing example we are going to create a huge array that we are going to save as a binary file.



In [25]:
filename = 'myarray.npy'
array = np.random.rand(4, 4, 7)

In [26]:
np.save(filename, array)
print("Saved")

Saved


In [27]:
array[0][0]

array([0.08255779, 0.73669617, 0.45539636, 0.3304466 , 0.13448963,
       0.05222079, 0.68525543])

Now that we have saved our array in a binary file, we can load this file as an array using the `load` method.

In [28]:
my_array = np.load(filename)

In [29]:
my_array[0][0]

array([0.08255779, 0.73669617, 0.45539636, 0.3304466 , 0.13448963,
       0.05222079, 0.68525543])

We can also save our numpy arrays as `text` files using the `savetxt` method. For exmple let's save our array to a text file. **But `savetxt` only saves arrays that are in `2D` whilist `np.save()` allows us to save arrays that are in any dimension as arrays are saved in a binary file.**.

In [30]:
array = np.random.rand(100, 100)

np.savetxt('myarray.txt', array)
print("Saved!!")

Saved!!


In [31]:
array[0][:3]

array([0.07184347, 0.5274782 , 0.34224964])

With the `savetxt()` numpy function we can save arrays with an extenstion `.gz` this will allows us to save automatically in compressed `gzip` format. Let's try to save our array in `.gz` format.

In [32]:
np.savetxt('myarray.gz', array)
print("Saved!!")

Saved!!


We can load these arrays from a `.txt` file using the `loadtxt` method as follows:

In [33]:
a = np.loadtxt('myarray.txt')
a[0][:3]

array([0.07184347, 0.5274782 , 0.34224964])

We can also load the `.gz` format file in an array using the `loadtxt()` method as it understands `gzipped` files transparently.

In [34]:
a = np.loadtxt('myarray.gz')
a[0][:3]

array([0.07184347, 0.5274782 , 0.34224964])

### Array Datatypes and Shapes

The first thing that we need to understand is that numpy arrays have shape and datatypes. In this section we are going to create different arrays with diffen shapes and datatypes. Let's create an array of  numbers.


In [35]:
array1 = np.array([1, 2, 5, -8, 7, 9])
array2 = np.array([1, 2, 5, -8, 7, 9.])
array3 = np.array([True, False, True])
array4 = np.array(["True", "False", "True"])
array5 = np.array(['c', 'a', 'r'])
array6 = np.array([4j, 3j])


We can check the datatype that is used in an aray using the `.dtype`

In [36]:
array1.dtype

dtype('int32')

In [37]:
array2.dtype

dtype('float64')

In [38]:
array3.dtype

dtype('bool')

In [39]:
array4.dtype

dtype('<U5')

In [40]:
array5.dtype

dtype('<U1')

In [41]:
array6.dtype

dtype('complex128')

Casting datatypes in numpy can be done using the `astype` method. Types for numbers are as follows:

1. floats

* `float16`
* `float32`
* `float64`

2. integers
* `int8`
* `int16`
* `int32`
* `int64`
* `uint8`
* `uint16`
* `uint32`
* `uint64`

> Datatypes can be pased as a string ('float32') or you can pass them as `np.float32`

In [42]:
a = array1.astype('float32') 
a.dtype, a

(dtype('float32'), array([ 1.,  2.,  5., -8.,  7.,  9.], dtype=float32))

In [43]:
a = array1.astype(np.bool_)
a

array([ True,  True,  True,  True,  True,  True])

In [44]:
a = array1.astype(np.float16)
a

array([ 1.,  2.,  5., -8.,  7.,  9.], dtype=float16)

You can specify the datatype while creating an array.

In [45]:
b = np.arange(10, dtype=np.float16)
b, b.dtype

(array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float16),
 dtype('float16'))

You can change the datatype using the array method `astype()` after creating an array as follows:

In [46]:
b = np.arange(10).astype('float16')
b, b.dtype

(array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float16),
 dtype('float16'))

In this section of the notebook we are going to have a look at working with array shapes in numpy.

In [47]:
array = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8.]
])
array

array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

We can check the shape of an array by calling the `.shape` property on an array as follows:

In [48]:
array.shape

(2, 4)

In [49]:
b.shape

(10,)

We can check the total number of elements in an array by calling the `.size` property in an array as folloqs

In [50]:
array.size

8

In [51]:
b.size

10

### Array Manupulation

In this section we are going to have a look at how we can manipulate arrays in numpy. We are going to have a look at:

1. reshaping arrays

* reshaping an array means that we are changing the shape of an array from one shape to another. To reshape an array we use the method called `reshape()`
    

In [52]:
b.reshape(5, 2)

array([[0., 1.],
       [2., 3.],
       [4., 5.],
       [6., 7.],
       [8., 9.]], dtype=float16)

> Using a `-1` on the second or any dimension you are telling numpy that decide for me the remaining dim for an array. Example:

In [53]:
b.reshape(2, -1)

array([[0., 1., 2., 3., 4.],
       [5., 6., 7., 8., 9.]], dtype=float16)

In [54]:
np.arange(100).reshape(2, 5, 10)

array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]])

`transpose()` method is used to change or switch dimensions of an array. Let's create a new array `c`.

In [55]:
c = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8.]
])
c

array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

In [56]:
c.shape

(2, 4)

There are two ways of transposing array in numpy. using the property `T` on an array or using the `np.transpose()` function.

In [57]:
t = c.T
t

array([[1., 5.],
       [2., 6.],
       [3., 7.],
       [4., 8.]])

In [58]:
t.shape

(4, 2)

In [59]:
t = np.transpose(c)
t.shape

(4, 2)

In array manipulation we have a method called `flatten` which allows us to flatten an array.

In [60]:
t.flatten()

array([1., 5., 2., 6., 3., 7., 4., 8.])

You can specify the direction in which you want your array to be `flattened` by default it's `C`

In [61]:
t.flatten(order='A') # order = {‘C’, ‘F’, ‘A’, ‘K’}

array([1., 2., 3., 4., 5., 6., 7., 8.])

We have a method that removes axes of length one from an array called `squeeze()`.

In [62]:
x = np.array([[[0], [1], [2]]])
print(x.shape)

np.squeeze(x).shape

(1, 3, 1)


(3,)

In [63]:
b.squeeze()

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float16)

We have a method called `expand_dims` that inserts a new axis that will appear at the axis position in the expanded array shape.

In [64]:
b

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float16)

In [65]:
np.expand_dims(b, axis=0)

array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]], dtype=float16)

In [66]:
np.expand_dims(b, axis=1)

array([[0.],
       [1.],
       [2.],
       [3.],
       [4.],
       [5.],
       [6.],
       [7.],
       [8.],
       [9.]], dtype=float16)

In [67]:
b.dtype

dtype('float16')

We can join our arrays in numpy using the method `concatenate`.

In [68]:
d = np.arange(4, dtype=np.float32)
e = np.arange(5, dtype=np.int16)
d, e

(array([0., 1., 2., 3.], dtype=float32), array([0, 1, 2, 3, 4], dtype=int16))

In [69]:
joined_array = np.concatenate([d, e])
joined_array

array([0., 1., 2., 3., 0., 1., 2., 3., 4.], dtype=float32)

In [70]:
d = np.arange(4, dtype=np.float32).reshape(2, -1)
e = np.arange(8, dtype=np.int16).reshape(2, -1)
d, e

(array([[0., 1.],
        [2., 3.]], dtype=float32),
 array([[0, 1, 2, 3],
        [4, 5, 6, 7]], dtype=int16))

> when concatenating arrays the shape of the inner dimension must match (n) such that `d` and `e` will have shape (-1, n) and (n, -1), and another imoportant thing is that if we specify the axis to be `1` then our arrays should be  multi-dimentional array

In [71]:
joined_array = np.concatenate([d, e], axis=1)
joined_array

array([[0., 1., 0., 1., 2., 3.],
       [2., 3., 4., 5., 6., 7.]], dtype=float32)

The above can be achived using the `hstack` which means arrays are being stacked horizontally but these array must have the same shape. Example:

In [72]:
joined_array = np.hstack([d, e])
joined_array

array([[0., 1., 0., 1., 2., 3.],
       [2., 3., 4., 5., 6., 7.]], dtype=float32)

We have a method called `stack` that allows us to stack arrays together. This method in simple terms join a sequence of arrays along a new axis.

In [73]:
d = np.arange(10, dtype=np.float32)
e = np.arange(10, dtype=np.int16)
f = np.arange(10, dtype=np.int16)

e, d, f

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16),
 array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int16))

In [74]:
s = np.stack([d, e, f], axis=0)
s

array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]], dtype=float32)

The above can be achived using the `vstack` which means arrays are being stacked horizontally. Example:

In [75]:
s = np.vstack([d, e, f])
s

array([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.],
       [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]], dtype=float32)

We can also atck arrays along the `y` axis.

In [76]:
s = np.stack([d, e, f], axis=1)
s

array([[0., 0., 0.],
       [1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.],
       [4., 4., 4.],
       [5., 5., 5.],
       [6., 6., 6.],
       [7., 7., 7.],
       [8., 8., 8.],
       [9., 9., 9.]], dtype=float32)

We can repeate an array using the method called `tile`

In [77]:
a = np.array([[2, 3], [3, 4]])

tiled = np.tile(a, 2)
tiled

array([[2, 3, 2, 3],
       [3, 4, 3, 4]])

We can also repeate an array by calling the method `repeat` for example.

In [78]:
repeated = np.repeat(a, 2, axis=0)
repeated

array([[2, 3],
       [2, 3],
       [3, 4],
       [3, 4]])

In [79]:
repeated = np.repeat(a, 2, axis=1)
repeated

array([[2, 2, 3, 3],
       [3, 3, 4, 4]])

We can add an element to an array by a method called `append` let's add a row in our `a` array.

In [80]:
np.append(a, [[5, 6]], axis=0)

array([[2, 3],
       [3, 4],
       [5, 6]])

We have a numpy function called `argmax`, which returns the index of the largest iterm in an array.

In [84]:
d = np.array([-2, 6, 8, -7, 29, 23, 6])
d

array([-2,  6,  8, -7, 29, 23,  6])

In [85]:
np.argmax(d)

4

We have a function called `argmin` which is an oposite of `argmax`. This function returns the index of the smallest element in an array.

In [87]:
np.argmin(d)

3

We also have a function called `argsort` which sort indeces in `ascending` order.

In [88]:
np.argsort(d)

array([3, 0, 1, 6, 2, 5, 4], dtype=int64)

We can sort our array in `ascending` order using the `sort` function.

In [90]:
np.sort(d)

array([-7, -2,  6,  6,  8, 23, 29])

We have a `ceil` function that returns an array of numbers rounded upward.

In [91]:
arr = np.array([0.5, 0.4, 3.1, 3.7])
np.ceil(arr)

array([1., 1., 4., 4.])

We also have a `floor` method which is the oposite of `ciel`

In [93]:
np.floor(arr)

array([0., 0., 3., 3.])

We also have a function called `round`.

In [95]:
np.round(arr)

array([0., 0., 3., 4.])

We have the `sum` function that returns the total sum in an array.

In [96]:
np.sum(arr)

7.7

We also have the `prod` which returns the `product` in an array

In [99]:
np.prod(arr)

2.2940000000000005

We have the `max` function which maximum element in an array.

In [102]:
np.max(arr)

3.7

We also have the `min` which is the oposite of the `max` function.

In [103]:
np.min(arr)

0.4

The `absolute` function converts negetive elements to `positive` leaving positive elements un-nageted.

In [104]:
np.absolute(d)

array([ 2,  6,  8,  7, 29, 23,  6])

We have a `cumprod` and `cumsum` which returns an array with `cumulative` product and `sum` respectively.

In [105]:
np.cumprod(d)

array([     -2,     -12,     -96,     672,   19488,  448224, 2689344])

In [106]:
np.cumsum(d)

array([-2,  4, 12,  5, 34, 57, 63])

We can calculate the `mean` using the `mean` function.

In [107]:
np.mean(d)

9.0

Which can be achived as follows

In [109]:
np.sum(d)/d.size

9.0

We can also calculate the `median` using the `median` function.

In [111]:
np.median(d)

6.0

We can calculate the standard deviation using the `std` function.

In [113]:
np.std(d)

11.904380946285519

The `variance` can be calculated using the `var` function as follows: 

In [114]:
np.var(d)

141.71428571428572

We can calculate the squareroot of a value using the `sqrt` function as follows:

In [115]:
var = np.var(d)
std = np.sqrt(var)

print(f"std: {std}")
print(f"var: {var}")

std: 11.904380946285519
var: 141.71428571428572


Let's implement a fomular that calculates the `std` in numpy.

<p align="center"><img width="200" src="var.png" alt="std"/></p>

In [134]:
var = np.mean((d-d.mean())**2)
std = var ** 0.5
print(f"std: {std}")
print(f"var: {var}")

std: 11.904380946285519
var: 141.71428571428572


We can do some trigonometry on arrays.

In [136]:
angles = np.array([0, 45, 90, 180, 360])
angles

array([  0,  45,  90, 180, 360])

In [137]:
np.sin(angles)

array([ 0.        ,  0.85090352,  0.89399666, -0.80115264,  0.95891572])

In [138]:
np.cos(angles)

array([ 1.        ,  0.52532199, -0.44807362, -0.59846007, -0.28369109])

In [139]:
np.tan(angles)

array([ 0.        ,  1.61977519, -1.99520041,  1.33869021, -3.38014041])

We can do some mathematics on arrays. Lets try to `add` two arrays together.

In [140]:
a = np.array([2, 3, 4, 4])
b = np.array([3, 3, 8, -9])

In [141]:
a + b

array([ 5,  6, 12, -5])

We can do element wise subtraction:

In [142]:
a - b

array([-1,  0, -4, 13])

The same apply to muplication

In [143]:
a * b

array([  6,   9,  32, -36])

element wise division

In [147]:
a/b

array([ 0.66666667,  1.        ,  0.5       , -0.44444444])

Element wise exponential

In [148]:
b ** a

array([   9,   27, 4096, 6561])

### Copying arrays.

Let's say we have the following array `a`


In [150]:
a = np.array([True, False, True, True])
a

array([ True, False,  True,  True])

In [151]:
b = a
b, a

(array([ True, False,  True,  True]), array([ True, False,  True,  True]))

In [152]:
b[-1] = False
b, a 

(array([ True, False,  True, False]), array([ True, False,  True, False]))

If we change the value of the last element in array `b` then this also affects `a` so to solve this issue we use the method called `copy`.

In [153]:
a = np.array([True, False, True, True])
b = a.copy()
b, a

(array([ True, False,  True,  True]), array([ True, False,  True,  True]))

In [154]:
b[-1] = False
b, a 

(array([ True, False,  True, False]), array([ True, False,  True,  True]))

### References

1. [numpy.org](https://numpy.org/doc/stable/reference/routines.html#)