<img src='img/NumPy_logo_2020.svg' alt='NumPy logo' height='50%'>

# Importing `NumPy`

In [None]:
import numpy as np

# `NumPy` and the Data Science ecosystem:

`NumPy` is a multi-dimensional arrays library, forming the basis for scientific computing in Python.

It is used in most data science libraries. Thus, having solid knowledge and understanding in it is crucial for any data science practitioner.

## _On the shoulders of giants_

<img src='img/Python-scientific-stack.jpg' alt='pydata-stack' width='1000' height='300'>

# Why `NumPy`?

This section outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this. 

The difference between `Python` and `C`:

`Python` is a dynamic-typing language, meaning that variables are assigned a type at runtime and based on the thier values. Compared with static-typing languages, such as `C` and `Java`, which require that all variables have the right specification at compile time.

```c
/* C code */
int result = 0;
for (int i=0; i<100; i++){
    result += i;
}
```

```python
# Python code
result = 0
for i in range(100):
    result += i
```

The flexibility of the `Python` language comes at cost.

## Understanding datatypes in `Python`:

### A `Python` integer is more than just an integer:

In `Python`, if we define an integer as `x = 100`, it's not just a _raw_ integer, it's a pointer which contains several values, such as the variable type, the variable size, and others.

In comparison to `C`, where an integer is simply a label for position in memory.

![Alt text](img/cint_vs_pyint.png)

### A `Python` list is more than just a list:

In [None]:
list_of_integers = [0, 1, 2, 3, 4, 5, 6]

In [None]:
print(type(list_of_integers))
print([type(elem) for elem in list_of_integers])

In [None]:
list_of_floats = [1.4, 2.5, 6.7, 9.1, 3.14]

In [None]:
print(type(list_of_floats))
print([type(elem) for elem in list_of_floats])

In [None]:
list_of_strings = ["Hello", "World"]

In [None]:
print(type(list_of_strings))
print([type(elem) for elem in list_of_strings])

Now, because of Python's dynamic typing, we can even create **heterogeneous** lists:

In [None]:
list_of_mixed_types = [True, "2", 3.0, 4]

In [None]:
print(type(list_of_mixed_types))
print([type(elem) for elem in list_of_mixed_types])

This figure shows the difference between `NumPy` array and `Python` list:

![Alt text](img/array_vs_list.png)

The `NumPy` array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier.

# The `NumPy` array data structure (`ndarray`):

An array object represents a multidimensional, **homogeneous** array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)

The difference between dynamic-type list and a fixed-type.

`NumPy` arrays **must be** of the same data type.

`NumPy` arrays are typed arrays of fixed size. Python lists are **heterogeneous** and thus elements of a list may contain any object type, while NumPy arrays are **homogenous** and can contain object of only one type. An ndarray consists of two parts, which are as follows:

    1. The actual data that is stored in a contiguous block of memory
    2. The metadata describing the actual data


## One, Two, Three, and Four dimensional arrays (vector, matrix, and tensor):

![1d-2d-3d-arrays](img/numpy-1d-2d-3d-arrays.png)

![4d-arrays](img/numpy-4d-array.png)

### A word on one dimensional arrays:

A one-dimensional array can be thought of as a column or row vector.

In [None]:
A = np.arange(1, 6)

In [None]:
A

In [None]:
A.shape

![1d-array-equivalence](img/one-dimensional-array.png)

## Creating arrays:

Unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible

### From `Python` lists:

In [None]:
one_d_arr = np.array([1, 2, 3, 4, 5])
two_d_arr = np.array([[10, 20, 30], [40, 50, 60]])
three_d_arr = np.array(
    [[[12, 14, 16, 18], [22, 24, 26, 28]], [[33, 35, 37, 39], [43, 45, 57, 49]]]
)

In [None]:
array_a_3d = np.array([[[1, 2], [5, 7]], [[5, 7], [3, 10]]])

array_b_3d = np.array([[[1, 6], [3, 9]], [[8, 3], [9, 1]]])

array_c_3d = np.array([[[9, 2], [2, 2]], [[4, 7], [3, 9]]])

array_d_3d = np.array([[[8, 1], [5, 7]], [[0, 3], [2, 7]]])

array_e_3d = np.array([[[4, 2], [1, 7]], [[2, 8], [9, 6]]])

array_f_3d = np.array([[[1, 8], [3, 1]], [[3, 7], [1, 9]]])

array_g_3d = np.array([[[4, 3], [4, 7]], [[4, 8], [3, 6]]])

array_h_3d = np.array([[[6, 5], [5, 6]], [[9, 3], [9, 4]]])

array_i_3d = np.array([[[8, 9], [5, 1]], [[2, 1], [3, 4]]])

four_d_arr = np.array(
    [
        array_a_3d,
        array_b_3d,
        array_c_3d,
        array_d_3d,
        array_e_3d,
        array_f_3d,
        array_g_3d,
        array_h_3d,
        array_i_3d,
    ]
)

### Using built-in functions:

- `np.arange`
- `np.zeros`
- `np.ones`
- `np.full`
- `np.eye`

In [None]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

In [None]:
zeros = np.zeros((10, 2))  # ten rows by two columns

In [None]:
zeros

In [None]:
# create a three by five of floats initialized with ones
ones = np.ones((3, 5), dtype=np.float32)

In [None]:
ones

In [None]:
pie_array = np.full(shape=(2, 2), fill_value=3.14, dtype=float)

In [None]:
pie_array

### Random arrays:

- `np.random.randint`
- `np.random.random`
- `np.random.randn`
- `np.random.normal`
- `np.random.uniform`

In [None]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

In [None]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

In [None]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

## Array attributes

Each array has the following attributes:

- `ndim`: the number of dimensions.
- `shape`: the size of each dimension.
- `size`: the total size of the array.
- `dtype`: the data type of the array.
- `itemsize`: the size (in bytes) of each array element.
- `nbytes`: which lists the total size (in bytes) of the array (`itemsize` * `size`).

In [None]:
def print_array_attributes(input_arr: np.ndarray):
    print(f"shape: {input_arr.shape}")
    print(f"ndim: {input_arr.ndim}")
    print(f"size: {input_arr.size}")
    print(f"dtype: {input_arr.dtype}")
    print(f"nbytes: {input_arr.nbytes}")
    print(f"itemsize: {input_arr.itemsize}")

In [None]:
print_array_attributes(one_d_arr)

In [None]:
print_array_attributes(two_d_arr)

In [None]:
print_array_attributes(three_d_arr)

In [None]:
print_array_attributes(four_d_arr)

# `NumPy` data types:

As we mentioned earlier, `NumPy` arrays contain values of a single type (**homogeneous** arrays).

Common `NumPy` data types (full list can be found [here](https://numpy.org/doc/stable/user/basics.types.html)):

- `bool_`: Boolean (True or False) stored as a byte
- `int32`: Integer 2<sup>32</sup> (-2147483648 to 2147483647)
- `int64`: Integer 2<sup>64</sup> (-9223372036854775808 to 9223372036854775807)
- `float16`: Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
- `float32`: Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
- `float64`: Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

When creating arrays, we can use the `dtype` argument to specify the data type of elements in the array.

## boolean arrays:

In [None]:
bool_array = np.array([True, False, False])  # NumPy will infer the appropriate dtype
another_bool_array = np.array(
    [1, 0, 0], dtype=np.bool_
)  # explicityly specify data type using dtype argument

In [None]:
print(bool_array)
print(another_bool_array)

In [None]:
print(bool_array.dtype)
print(another_bool_array.dtype)

## Numeric arrays:

In [None]:
int_array = np.array([20, 30, 40])
float_array = np.array([3.14, 2.5, 0.9])
float32_array = np.array([3.14, 2.5, 0.9], dtype=np.float32)

In [None]:
print(int_array.dtype)
print(float_array.dtype)
print(float32_array.dtype)

In [None]:
print(float_array.nbytes)
print(float32_array.nbytes)

It's important to pay attention to the array data type. Using the appropriate data type will improve performance as it requires less memory.

## Mixed types:

What happens when we create an array with mixed types?
- integer and float.
- float and string.
- bool and float.

etc ...

As we mentioned before, unlike `Python` lists, `NumPy` is constrained to arrays that all contain the same type. Therefore, if types do not match, `NumPy` will upcast if possible, meaning that all the elements are cast into the largest type (a process known as **upcasting**).



In [None]:
mixed_array_1 = np.array([1, 1.5, 2])  # upcast integer to float
mixed_array_2 = np.array([3.14, 2.9, "Hello", "world"])  # upcast float to unicode
mixed_array_3 = np.array([True, False, 9.9])  # upcast bool to float

# Array indexing, slicing, and reshaping:

- Indexing of arrays: Getting and setting the value of individual array elements.
- Slicing of arrays: Getting and setting smaller subarrays within a larger array.
- Reshaping of arrays: Changing the shape of a given array.

## Indexing:

### One dimensional arrays:

In [None]:
one_d_arr

Accessing the first element of a one dimensional array:

In [None]:
one_d_arr[0]

Accessing the last element of a one dimensional array:

In [None]:
one_d_arr[-1]

Setting a value of a given index:

In [None]:
one_d_arr[0] = 99

In [None]:
one_d_arr

### Two dimensional arrays:

In [None]:
two_d_arr

In [None]:
# accessing first row and first column
two_d_arr[0, 0]

In [None]:
# accessing second row and last column
two_d_arr[1, -1]

Setting a value of a given index:

In [None]:
two_d_arr[0, 2] = 100

In [None]:
two_d_arr

## Slicing:

### One dimensional arrays:

In [None]:
print(one_d_arr)

In [None]:
# first three elements
one_d_arr[:3]

In [None]:
# last two elements
one_d_arr[-2:]

In [None]:
# 2 as step, starting from zero
one_d_arr[::2]

### Two dimensional arrays:

In [None]:
two_d_arr

In [None]:
# all columns of first row, equivalent to: two_d_arr[0]
two_d_arr[0, :]

In [None]:
# all rows and columns, starting from the second column
two_d_arr[:, 1:]

## Reshaping:

### Example 1:

In some cases, we might need to convert an array to a different shape. For example:

1. One dimensional array (vector) to two dimensions array (matrix)
2. Two dimensions array (matrix) to three dimensions array (tensor)

This might occur, for example, when preparing our data to be fed to a machine learning model. Most commonly, a data array should be of two dimensions with `n` rows and `m` columns, where `n` is the number of samples and `m` is the number of features.

However, we might have this data as a one dimensional array.

The below figure shows a one dimensional dataset, where each sample consists of three features (`Age`, `Height`, and `Weight`), and how it looks like after reshaping.

![one-d-array-before-reshape](img/one-d-array-before-reshape.png)

![2d-array-reshaped](img/two-d-array-after-reshape.png)

It's safe to say that the data after reshapin is more common in most applications, thus, reshaping is very important.

A `NumPy` array has a `.reshape` method that accepts the `newshape`, which should be _compatible_ with the old shape.

In [None]:
sample_data_one_d = np.array(
    [30, 180, 60, 26, 167, 70, 36, 159, 66, 27, 168, 80], dtype=np.int16
)

In [None]:
sample_data_one_d.shape

In [None]:
sample_data_two_d = sample_data_one_d.reshape(4, 3)

In [None]:
sample_data_two_d.shape

In [None]:
print(sample_data_two_d)

It's possible to pass `-1` for one of the shapes to let `NumPy` _infer_ it (this is useful in case where we care only about one dimension).

In [None]:
# infer the number of row, given that the number of columns should be 3
sample_data_one_d.reshape(-1, 3)

In [None]:
# infer the number of columns, given that the number of rows should be 4
sample_data_one_d.reshape(4, -1)

### Example 2:

Reshaping a one dimensional array to a matrix with one column can be handy in certain situations.

In [None]:
y_true = np.array([True, False, True, True, False, True, True])

In [None]:
y_true.shape

`NumPy` has a special constant called `newaxis`, which (as the name implies) is used to add a new diemnsion to an exixsting array.

- **1D** array will become **2D** array

- **2D** array will become **3D** array

- **3D** array will become **4D** array

and so on

In [None]:
column_vector = y_true[:, np.newaxis]
row_vector = y_true[np.newaxis, :]

In [None]:
print(column_vector)

In [None]:
print(column_vector.shape)

In [None]:
print(row_vector)

In [None]:
print(row_vector.shape)

# Explaining array axes:

In simple words, `NumPy` axes are the directions along the _rows_ and _columns_ of a two-dimensional array.

The below figure illustrate this:

![2d-array-axes](img/numpy-2d-array-axes.png)

For three dimensional arrays:

![three-d-arrays-axes](img/numpy-3d-array-axes.png)

`NumPy` array axes are numbered starting from 0.

Many functions in `NumPy` have `axis` argument.

Examples include (for **2D** arrays):

- `arr.sum(axis)`: along which axis the _sum_ is performed. 
- `arr.mean(axis)`: along which axis the _mean_ is performed.
- `arr.min(axis)`/`arr.max(axis)`: along which axis the _min_/_max_ is performed.
- `np.concatenate((arr_1, arr_2), axis)`: along which axis `arr_1` and `arr_2` are concatenated (horizontally or vertically).

In [None]:
website_session_data = np.array(
    [
        [10, 17, 3, 9],
        [0, 0, 1, 2],
        [4, 2, 0, 1],
        [15, 20, 3, 1],
        [0, 0, 7, 2],
        [9, 5, 2, 20],
    ]
)

In [None]:
website_session_data.sum(axis=1)

In [None]:
website_session_data.sum(axis=0)

![2d-array-sum-over-rows-and-columns](img/2d-array-sum-axis-0-and-1.png)

# Combining arrays:

- `np.concatenate`: concatneate a sequence (two or more) arrays along an axis.
- `np.hstack`: stack a sequence (two or more) arrays **horizontally** (_column-wise_). Equivalent to `np.concatenate(axis=1)`.
- `np.vstack`: stack a sequence (two or more) arrays **vertically** (_row-wise_). Equivalent to `np.concatenate(axis=0)`.

In [None]:
A = np.array(
    [[31, 180, 77], [19, 149, 58], [50, 167, 86], [46, 178, 74], [29, 179, 81]]
)

In [None]:
B = np.array([[66], [80], [110], [92], [102]])

In [None]:
C = np.array([[17, 159, 81], [24, 177, 83]])

In [None]:
A.shape

In [None]:
B.shape

In [None]:
C.shape

![2d-arrays-horizontal-concatenation](img/2d-arrays-horizontal-concatenation.png)

In [None]:
merged_along_colum_1 = np.concatenate((A, B), axis=1)

In [None]:
merged_along_colum_2 = np.hstack((A, B))

In [None]:
print(merged_along_colum_1)
print(merged_along_colum_1.shape)

In [None]:
print(merged_along_colum_2)
print(merged_along_colum_2.shape)

![2d-arrays-vertical-concatenation](img/2d-arrays-vertical-concatenation.png)

In [None]:
merged_along_row_1 = np.concatenate((A, C), axis=0)

In [None]:
merged_along_row_1 = np.vstack((A, C))

In [None]:
print(merged_along_row_1)
print(merged_along_row_1.shape)

In [None]:
print(merged_along_row_1)
print(merged_along_row_1.shape)

# Computation on arrays

The main reason that `NumPy` is so important in the Python data science world is because it provides an easy and flexible interface to optimized computation with arrays of data.

The key to making it fast is to use _vectorized_ operations, generally implemented through NumPy's _universal functions_ (ufuncs).

## `NumPy` universal functions motivation:

Let's compare`Python` and `NumPy` running time for calculating mean of a large array:

In [None]:
def python_mean(input_array):
    n = len(input_array)
    total_sum = 0.0
    for i in range(n):
        total_sum = total_sum + input_array[i]
    return total_sum / n

In [None]:
def numpy_mean(input_array):
    return input_array.mean()

We'll create a one dimensional array with million elements:

In [None]:
big_array = np.random.randint(low=1, high=100, size=1000000)

In [None]:
%time python_mean(big_array)

In [None]:
%time numpy_mean(big_array)

We can clearly see that `NumPy`'s implementation is faster than writing loops in `Python`

The main reason `NumPy`'s functions are fast is because they perform calculations and looping in `C` (a compiled language), rather than writing loops in `Python`, which are very slow due to the dynamic and flexible types in it.

## Arithmetic functions

### One-dimensional arrays:

This is an example of applying an operation on a `NumPy` array with a scalar.

In [None]:
damascus_sep_22_temperature_celsius = np.array(
    [
        29.9,
        27.9,
        26.5,
        26.3,
        24.8,
        25.2,
        25.2,
        25.8,
        25.9,
        25.3,
        24.4,
        23.9,
        24.1,
        24.8,
        23.3,
        23.3,
        21.7,
        23.2,
        24.8,
        25.3,
        26.4,
        24.9,
        23.5,
    ]
)

In [None]:
damascus_sep_22_temperature_fahrenheit = (
    1.8 * damascus_sep_22_temperature_celsius
) + 32.0

In [None]:
print(damascus_sep_22_temperature_fahrenheit)

**TODO**: here add example of summing two one-d arrays.

### Two-dimensional arrays:

In [None]:
negative_positive_arr = np.arange(-4, 5).reshape((3, 3))

In [None]:
negative_positive_arr

In [None]:
absolute_arr = np.abs(negative_positive_arr)

**TODO**: add example of operation on a two-d array with a scaler.

**TODO**: add example of operation on 2 two-d arrays (such as sum, multiply).
This to emphasize that operations on pairs of arrays are performed element-wise.

## Aggregation functions:

Aggregation functions are most useful when working with large amounts of data. A coomon step is to first calculate summary (or descriptive) statistics on the data, such as **mean**, **standard deviation**, **sum**, **min**, and **max**.

### One-dimensional arrays:

In [None]:
damascus_sep_22_temperature_celsius.mean()

In [None]:
damascus_sep_22_temperature_celsius.min()

In [None]:
damascus_sep_22_temperature_celsius.max()

### Two-dimensional arrays:

For multi-dimensional arrays, aggregation functions take an additional argument specifying the _axis_ along which the aggregate function is computed.

In [None]:
two_d_data = np.array([[1, 2], [5, 3], [4, 6]])

In [None]:
two_d_data

In [None]:
two_d_data.max(axis=0)

In [None]:
two_d_data.max(axis=1)

![2d-array-aggregation](img/numpy-matrix-aggregation.png)

# Array broadcasting:

## Same size arrays:

Recall that for arrays of the **same size**, binary operations are performed on an **element-by-element** basis.

### One dimensional arrays

In [None]:
math_grades = np.array([80, 91, 56, 73, 59, 61, 48, 77])
science_grades = np.array([31, 66, 67, 54, 68, 77, 55, 81])

In [None]:
all_grades = math_grades + science_grades

In [None]:
print(f"math_grades shape: {math_grades.shape}")
print(f"science_grades shape: {science_grades.shape}")
print(f"all_grades shape: {all_grades.shape}")

In [None]:
print(all_grades)

![one-d-same-size-arrays-computation](img/one-d-same-size-arrays-computation.png)

### Two dimensional arrays:

In [None]:
data = np.array([[1, 2], [3, 4]])

In [None]:
ones = np.ones((2, 2), dtype=np.int32)

In [None]:
data.shape

In [None]:
ones.shape

In [None]:
data + ones

![2d-element-wise-arithmetic](img/numpy-matrix-arithmetic.png)

## Different size arrays:

What if the two arrays we are working with have different shapes?

Broadcasting allows these types of binary operations to be performed on arrays of **different-sizes**. For example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array:

If the shapes of the two arrays are different, and certain rules are met, the smaller array is "broadcast" across the larger array.

### Example 1:

The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [None]:
a = np.array([1, 2, 3])
b = 2

In [None]:
a.shape

In [None]:
b.shape  # not an array!

In [None]:
result = a * b

In [None]:
print(result)
print(result.shape)

We can think of the scalar `b` being stretched during the arithmetic operation into an array with the same shape as `a`. The new elements in` `b, as shown in the figure below, are simply copies of the original scalar. The stretching analogy is only _conceptual_. `NumPy` is smart enough to use the original scalar value without actually making copies so that broadcasting operations are as memory and computationally efficient as possible.

![scalar-stretching-to-array](img/scalar-stretching-to-array.png)

### Example 2:

In [None]:
# let's create a 2D array of 4 rows and 3 columns
data = np.array([[10, 50, 30], [20, 60, 60], [30, 70, 90], [40, 80, 120]])

In [None]:
# and create a one-dimensional array of 3 elements
scaling_factors = np.array([0.1, 0.2, 0.3])

In [None]:
print(data.shape)

In [None]:
print(scaling_factors.shape)

In [None]:
scaled_data = data * scaling_factors

In [None]:
scaled_data.shape

In [None]:
scaled_data

![2d-matrix-and-vector-arithmetic](img/broadcasting/1-broadcasting-data-and-scaling-factors.png)

![padding-vector-with-ones](img/broadcasting/2-broadcasting-padding-scaling-factors.png)

![stretching-vector-to-match-matrix-in-shape](img/broadcasting/3-broadcasting-stretching-scaling-factors.png)

![performing-arithmetic-operation](img/broadcasting/4-broadcasting-calculating-result.png)

## Broadcasting rules:

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

1. Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
2. Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
3. Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.


# Advanced indexing:

## Boolean masks:

### Example 1:

Boolean masks help us extract (or locate) elements in an array based on some criterion.

For example, we can imagine we have a one-dimensional `NumPy` array of student grades and we want to *select* (or count) the grades above a specified threshold.

We first construct a boolean array (*mask*) of the same shape of the original array. The boolean array at index *i* is set to `True` if the criterion holds for the corresponding element in the original array, and `False` otherwise.

Then, the *mask* can be used to get the elements that meet the condition. This is know as *masking operation*.

Notice that the comparison operator `>` is a universal function and is performed element-wise with no need to use for-loops. Other comparison operators like `==`, `!=`, `<` work the same.

In [None]:
grades = np.array([80, 91, 56, 73, 59, 61, 48, 77])

In [None]:
above_60_mask = grades > 60

In [None]:
grades[above_60_mask]

![one-d-boolean-mask-example](img/one-d-boolean-mask.png)

In [None]:
above_60_mask.dtype

Since `above_60_mask` is a `NumPy` array of type boolean, we can get the sum of its elements, where `False` is interpreted as `0` and `True` is interpreted as `1`.

In [None]:
above_60_mask.sum()

This way we can answer questions like: **What is the number of students who scored above 60?**

### Example 2:

Like the previous example, but here, we show how we can use information from two mask using *bitwise* logic.

If we have two (or more) boolean masks, and we want, for example, to take only the elements that satisfy both masks, we can use the `&` (logical and) operator.

In this example we show that given students' grades in two subjects, can we retrieve the ones who succeeded (scored above 60) in both subjects.

![boolean-mask-bitwise-logic](img/boolean-mask-with-and.png)

In [None]:
students = np.array(
    ["Ahmed", "Sami", "Rami", "Mouna", "Qusai", "Yamen", "Huda", "Ramez"]
)
math_grades = np.array([80, 91, 56, 73, 59, 61, 48, 77])
science_grades = np.array([31, 66, 67, 54, 68, 77, 55, 81])

In [None]:
math_mask = math_grades > 60
science_mask = science_grades > 60

In [None]:
print(f"science_mask: {science_mask}")
print(f"math_mask: {math_mask}")

In [None]:
succeed_mask = math_mask & science_mask

In [None]:
students[succeed_mask]

## Fancy indexing:

We have shown how we can index `NumPy` arrays using the following methods:
- Scalar indexing: `x[0]`. Select first element in array `x`.
- Slicing: `x[:5]`. Select first `5` elements in array `x`.
- Boolean indexing: `x[mask]`. Select values from array `x` at positions where `mask` is `True`. Both `x` and `mask` should be of the same shape.

However, sometimes we might need to access several elements using their indices. For example, if we the following two arrays for students and their grades, and we want to get the grades for only students `Ahmed`, `Qusai`, and `Ramez`:

In [None]:
students = np.array(
    ["Ahmed", "Sami", "Rami", "Mouna", "Qusai", "Yamen", "Huda", "Ramez"]
)
math_grades = np.array([80, 91, 56, 73, 59, 61, 48, 77])

In [None]:
students_indices = [0, 4, 6]

In [None]:
print(math_grades[students_indices])

![fancy-indexing-example](img/fancy-indexing.png)

# NumPy's Structured Arrays

## Motivation:

| Name   |   Age |   Weight |   Height |
|:-------|------:|---------:|---------:|
| Alice  |    25 |     55   |      160 |
| Bob    |    45 |     85.5 |      175 |
| Cathy  |    37 |     68   |      159 |
| Doug   |    19 |     61.5 |      168 |


In current approach, we need to use four separate arrays. One for each variable.

In [None]:
name = ["Alice", "Bob", "Cathy", "Doug", "Jack"]
age = [25, 45, 37, 19, 33]
weight = [55.0, 85.5, 68.0, 61.5, 77.0]
height = [160, 175.5, 159.9, 168, 182.0]

Harder to maintain than having a songle array.

## Structured arrays:

arrays with compound data types

In [None]:
data = list(zip(name, age, weight, height))

In [None]:
print(data)

In [None]:
numpy_data = np.array(
    data,
    dtype=np.dtype(
        [("name", "U10"), ("age", "i4"), ("weight", "f4"), ("height", "f4")]
    ),
)

In [None]:
numpy_data.shape

The handy thing with structured arrays is that you can now refer to values either by index or by name:

In [None]:
# Get all names
numpy_data["name"]

In [None]:
# Get first row of data
numpy_data[0]

In [None]:
# Get the name from the last row
numpy_data[-1]["name"]

Using boolean masking for advanced filtering:

In [None]:
# Get names where age is under 30
numpy_data[numpy_data["age"] < 30]["name"]

# Use case 1: dot product:

![dot-product](img/numpy-matrix-dot-product-1.png)

# Use case 2: mean squred error

$$\Large MSE=\frac{1}{n} \sum_{i=1}^{n} (y_{i}-\hat{y_{i}})^2$$

$$\Large n: \text{number  of samples}$$

$$\Large y_{i}: \text{real value}$$

$$\Large \hat{y_{i}}: \text{predicted value}$$

![mse-plot](img/mse-plot.png)

In [None]:
y_true = np.array([0.9, 0.5, 0.3, 0.2, 0.8])
y_pred = np.array([0.3, 0.6, 0.7, 0.5, 0.6])

In [None]:
n = len(y_true)

In [None]:
mse = np.square(y_true - y_pred).sum() * 1.0 / n

In [None]:
print(mse)

# Use case 3

Churn prediction example

# Use case 4:

Centering an array example

# Topics we didn't cover

- `np.where` and `np.select`
- Set functions: `np.intersect1d`, `np.isin`, `np.setdiff1d`
- Linear algebra moduble.
- Saving and loading data: `np.save` and `np.load`

# Resources:

1. [100 numpy exercises (with solutions)](https://github.com/rougier/numpy-100)
2. [Stanford CS231n Convolutional Neural Networks for Visual Recognition - Python Numpy Tutorial](https://cs231n.github.io/python-numpy-tutorial/)
3. [A Visual Intro to NumPy and Data Representation](https://jalammar.github.io/visual-numpy/)
4. [DataCamp - Python Numpy Array Tutorial](https://www.datacamp.com/tutorial/python-numpy-tutorial)
5. [Scientific Computing in Python: Introduction to NumPy and Matplotlib](https://sebastianraschka.com/blog/2020/numpy-intro.html)
6. [Random Number Generator Using Numpy Tutorial - DataCamp](https://www.datacamp.com/tutorial/numpy-random)
7. [Look Ma, No For-Loops: Array Programming With NumPy](https://realpython.com/numpy-array-programming/)
8. [Numpy Vectorization](https://medium.com/@mikeliao/numpy-vectorization-d4adea4fc2a)
9. [1000x faster data manipulation: vectorizing with Pandas and Numpy](https://www.youtube.com/watch?v=nxWginnBklU&ab_channel=PyGotham2019)
10. [NumPy tutorials](https://numpy.org/numpy-tutorials/index.html)
11. [Reshape numpy arrays in Python — a step-by-step pictorial tutorial](https://towardsdatascience.com/reshaping-numpy-arrays-in-python-a-step-by-step-pictorial-tutorial-aed5f471cf0b)
12. [Numpy Axes, Explained](https://www.sharpsightlabs.com/blog/numpy-axes-explained/)