# <a id='contents'>Contents</a>

* [Preface](#preface)
* [Installation](#installation)
* [Introduction](#introduction)
    
    * [Comparison between NumPy Arrays and Lists](#comparison)
    * [Importing](#import)
    * [Creating an Array and Data Types](#create_array)
        * [Simple Array](#simple_array)
        * [Data Types](#data_types)
        * [Array Range](#range)
        * [Random Arrays](#random)
    * [Other Attributes](#attributes)
* [Mathematical Functions](#math)
* [Array Manipulation Routines](#manipulation)
    * [Reshape](#reshape)
    * [Concatenate](#concatenate)
    * [Transpose](#transpose)
* [Indexing on ndarrays](#indexing)
* [Array Creation Routines](#creation)

    * [Linspace](#linspace)
    * [Zeros](#zeros)
    * [Ones](#ones)
    * [Identity](#identity)
* [Boolean Logic](#bool)
    * [Boolean Logic in Python](#bool-Python)
    * [Boolean Operators and Arrays in NumPy](#bool-operators-arrays)
    * [Boolean Functions](#bool-functions)
        * [Where](#where)
        * [All](#all)
        * [Any](#any)
* [Conclusion](#conclusion)

# <a id='preface'>Preface</a>

This notebook just presents a concise introduction to NumPy with the corresponding applications. There are a lot of articles and videos available covering NumPy in a variety of details, including [NumPy documentation](https://numpy.org/doc/1.26/). You are free to explore them, however, I highly recommend you to consider [Python Data Science Handbook by Jacob T. VanderPlas
](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html) as well. The author perfectly explains what's going on behind NumPy and shows all the necessary functionalities of the library. 

As an author of this notebook, my objective is not to teach you NumPy (although I am supposed to) but encourage you to apply your math in practice. We will be using NumPy to bring our knowledge into practical manners after studying the library itself sufficiently. 

Who knows, maybe you will develop your own neural network with NumPy from scratch in the future?

My note for beginners in Python is not to worry about the course challenge. Python basics are essential at first, but studying library is generally about learning its functions (methods), attributes, and their applications. You will experience the importance of NumPy in the computational mathematics. So we will be diving into different side of Python. 

If you are studying fields not related to IT, such as chemical engineering or petroleum engineering, it is very likely that you will learn things that university will never teach you :) So, try to enjoy it. Have a nice reading!


Warm Regards,

Mahammad Mehdi, tutor


# <a id='installation'>Installation</a>

You just need to uncomment and run the below cell. [Pip](https://pypi.org/) is a repository storing Python packages for easier centralized access. Many developers create their own libraries and share them in this platform for particular implementation. NumPy library is one of them and its implementation will be elucidated in further sections. 

In [17]:
# !pip install numpy -q

# <a id='introduction'>Introduction</a>

NumPy is a library in Python that is widely used in data science community. The primary reasons are the speed, less memory consumption, large availability of computational math related functions and more. In NumPy, instead of lists or tuples, we use numpy arrays in one- or multi- dimensions to store and process the data. 

## <a id='comparison'>Comparison between NumPy Arrays and Lists</a>

* Arrays must contain the same type of data, whereas lists are welcome to any type. Each element of lists carries individual class information, like int, float, another list, tuple, however, arrays are previously introduced to the type of data and corresponding class information so that it can generalize the whole data instead of going through one by one. 

    Think about this example. You are organizing a party for university staff and students. The party is your list and contains names of participants on the paper as values. Additionally, their class information (consider it as a group of people), such as student, teacher, professor, cleaner, and more is provided next to the names. It is memory consuming to save both values and class simultaneously. 
    
    Now, you want to organize another party just for students, excluding everyone else from university. The party is your array and contains names of participants on the paper as values. Additionally, their class information - student is provided at the top of the paper once instead of being put next to names each time. You knew everyone in the party could be students and, therefore, ignored the repetition of the same class info. 
    
    This scenario shows the crucial difference between NumPy arrays and Python lists. Always, keep it in mind.

    And further change in the type of data in arrays is possible, but the conversion should match. For example, you can't change non-numerical string to int or float. The below will end with ValueError as an example:
                                           int('123a')        


* The above feature leads to less memory consumption thanks to avoidance from unnecessary repetition of the same class information. At the same time, it results in a faster implementation. The fact that NumPy is written in C language also contributes to the speed since it is well known that C language incredibly works faster than Python


* NumPy provides element-wise operations between scalar and vector, scalar and matrix, vector and matrix, and matrix and matrix. Dot product is also easily executed as well. There are a lot useful functions to conduct element-wise operations and dot product and to determine mean, median, maximum, minimum, variance, standard deviation and more. We will be exploring more useful mathematical functionalities in NumPy throughout the notebook.

## <a id='import'>Importing</a>

By importing, we introduce all methods of the library to our running notebook. Generally accepted notation of NumPy is np, however, you are not restricted to choose your own notation. In any case, I prefer being stuck with np.

In [18]:
import numpy as np

## <a id='create_array'>Creating an Array and Data Types</a>

### <a id='simple_array'>Simple Array</a>

Simply, we create a list (or tuple) and convert it to the array.

In [19]:
# Create a list
xs = [1, 2, 3, 4]

# Convert it to numpy arrays
arr = np.array(xs)
print(arr)


[1 2 3 4]


### <a id='data_types'>Data Types</a>

Just like in Python, NumPy also recognizes specific data types. 

The below shows examples for Python:
- int
- float
- str

In NumPy, we have similar data types with different memory consumption restriction:
- int8 
- int16
- int32
- int64
- float16
- float32
- float64

The number represents the number of bits for each elements of the array. For instance, 8 bits or 1 byte value can be stored in the array with data type of int8.

double type also exists in NumPy which is equivalent to float64. This is the info for those from the interview process who confused double type with default data types in Python.

You might want to explore more from the [documentation](https://numpy.org/doc/stable/user/basics.types.html).

Let's go back to the previous array and see the data type by using **dtype** attribute.

In [20]:
arr.dtype

dtype('int32')

We might want to change the data type.

In [21]:
arr.astype(np.int8)

array([1, 2, 3, 4], dtype=int8)

However, the dtype is still the same.

In [22]:
arr.dtype

dtype('int32')

astype function helps to convert the type of the data from one to another. The function returns a new array, so it is a good idea to assign it to a new variable

In [23]:
arr = arr.astype(np.int8)

In [24]:
arr.dtype

dtype('int8')

In [25]:
arr

array([1, 2, 3, 4], dtype=int8)

We might also create an array with pre-defined data type.

In [26]:
xs = [1, 2, 3, 4]
arr = np.array(xs, dtype = np.float16)
print(arr)

[1. 2. 3. 4.]


In [27]:
arr.dtype

dtype('float16')

We introduced float data type for 16 bits to the array.

### <a id='range'>Array Range</a>

Just like range function in Python, NumPy provides arange function with the similar characteristics to create an array from one specific value to another. 

In [28]:
# We create a list from 0 to 10 by 1 step
list(range(0, 10, 1))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [29]:
# We create an array from 0 to 10 by 1 step
np.arange(0,10,1)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [30]:
# Similarly, by 2 step
np.arange(2, 11, 2)

array([ 2,  4,  6,  8, 10])

You can also input a step at float type unlike range function which only expects integer.

In [31]:
np.arange(2, 11, 0.5)

array([ 2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,  6.5,  7. ,
        7.5,  8. ,  8.5,  9. ,  9.5, 10. , 10.5])

In [32]:
np.arange(2.0, 11.0, 1)

array([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

### <a id='random'>Random Arrays</a>

Creating "randomized" arrays is also feasible in NumPy. The below code shows the exemplary scenario with random library

In [33]:
from random import randint

# We create a list with 10 "random" numbers in the range of 1 and 20
xs = [randint(1, 20) for _ in range(10)]
xs

[1, 1, 6, 5, 14, 17, 1, 16, 1, 11]

In [34]:
# Similarly, we create an array with 10 "random" numbers in the range of 1 and 20
arr = np.random.randint(1, 20, 10)
arr

array([16, 17, 15, 12, 17, 16, 12,  2,  7, 10])

In [35]:
arr.dtype

dtype('int32')

## <a id='attributes'>Other Attributes</a>

As we shortly introduced, **dtype** is an attribute in NumPy arrays. The following demonstrates some examples:
- **dtype**: the data type of an array.
- **shape**: the shape of an array. Until now, we only worked with 1d arrays
- **ndim**: the number of dimensions. Again, we only worked with 1d arrays. 
- **size**: the size of an array. How many elements do you have in the array? 
- **nbytes**: the total memory captured by the array
- **itemsize**: the memory captured by one element in the array


In [36]:
# Here is a nested list to be converted to an array. The list contains 2 lists, each of which has 3 values
xs = [
    [1, 2, 3],
    [4, 5, 6]
]

print(xs)

[[1, 2, 3], [4, 5, 6]]


In [37]:
arr = np.array(xs)
print(arr)

[[1 2 3]
 [4 5 6]]


In [38]:
arr.dtype

dtype('int32')

2 lists and 3 values, so the shape is (2, 3)

In [39]:
arr.shape

(2, 3)

The number of dimensions is equal to the number of values in shape attribute. The shape of (2, 3) indicates 2 dimensions.

In [40]:
arr.ndim

2

In total, we have 6 values

In [41]:
arr.size

6

The following attribute provides how many bytes the array captures. In our case, it is 24 bytes.

In [42]:
arr.nbytes

24

Bytes of each element of the array is accessed by itemsize attribute. In our case, it is 4 bytes.

In [43]:
arr.itemsize

4

Notice how nbytes corresponds to the multiplication of itemsize by size.

In [44]:
print('Estimated the memory consumed by the array:', arr.size * arr.itemsize)
print('Actual memory:', arr.nbytes)

Estimated the memory consumed by the array: 24
Actual memory: 24


After the gentle introduction, we will continue with [NumPy documentation](https://numpy.org/doc/1.26/). Throughout the rest of the notebook, you will read about many functions cateogorized with the corresponding references to the docs.

# <a id='math'>Mathematical Functions</a>

[reference](https://numpy.org/doc/stable/reference/routines.math.html)

Functions to cover:
* add - adding two arrays
* subtract - subtracting one array from another
* multiply - multiplying two arrays
* divide - dividing one array by another
* sum - summing all elements of the array
* prod - getting product of all elements of the array
* max - getting the maximum value from the array
* min - getting the minimum value from the array

They are simply what their names suggest. Add functions is to add 2 arrays, for instance. Let's create a couple of randomized arrays and apply the above functions.

In [45]:
arr1 = np.random.randint(1, 10, size = 5)
arr2 = np.random.randint(1, 10, size = 5)

In [46]:
arr1

array([6, 6, 3, 5, 6])

In [47]:
arr2

array([7, 3, 3, 4, 5])

In [32]:
# element-wise addition
np.add(arr1, arr2)

array([ 6, 11, 12, 14,  4])

In [33]:
# element-wise subtraction
np.subtract(arr1, arr2)

array([-4, -7, -6, -4, -2])

In [34]:
# element-wise multiplication
np.multiply(arr1, arr2)

array([ 5, 18, 27, 45,  3])

In [35]:
# element-wise divison
np.divide(arr1, arr2)

array([0.2       , 0.22222222, 0.33333333, 0.55555556, 0.33333333])

Of course, there is another way to do the same, which is quite easier to write down.

In [36]:
arr1 + arr2

array([ 6, 11, 12, 14,  4])

In [37]:
arr1 - arr2

array([-4, -7, -6, -4, -2])

In [38]:
arr1 * arr2

array([ 5, 18, 27, 45,  3])

In [39]:
arr1 / arr2

array([0.2       , 0.22222222, 0.33333333, 0.55555556, 0.33333333])

The above syntax is feasible in NumPy arrays and return the exact output of previous functions.

The other functions are used as follows.

In [40]:
arr1

array([1, 2, 3, 5, 1])

In [41]:
np.sum(arr1)

12

In [42]:
np.prod(arr1)

30

In [43]:
np.max(arr1)

5

In [44]:
np.min(arr1)

1

# <a id='manipulation'>Array Manipulation Routines</a>

[reference](https://numpy.org/doc/stable/reference/routines.array-manipulation.html)

Functions to cover:
* reshape
* concatenate
* transpose

## <a id='reshape'>Reshape</a>

We created the following array previously to understand a different point of view concerning dimensions. But what if I want the shape of (3, 2) instead of (2, 3) with the same elements and order? Let's check it out manually!

In [45]:
xs = [
    [1, 2, 3],
    [4, 5, 6]
]

arr = np.array(xs)
print("array:")
print(arr)
print()
print("Shape:", arr.shape)

array:
[[1 2 3]
 [4 5 6]]

Shape: (2, 3)


In [46]:
xs = [
    [1, 2],
    [3, 4],
    [5, 6]
]


arr = np.array(xs)
print("array:")
print(arr)
print()
print("Shape:", arr.shape)

array:
[[1 2]
 [3 4]
 [5 6]]

Shape: (3, 2)


It is like rearranging all the elements in the same sequence and combining them together at different shape. There is a useful function to do so.

In [47]:
arr

array([[1, 2],
       [3, 4],
       [5, 6]])

In [48]:
arr.shape

(3, 2)

In [49]:
arr.reshape(2, 3)

array([[1, 2, 3],
       [4, 5, 6]])

reshape function returns a new array. So again it is crucial to assign it to a new variable.

In [50]:
print("Previou shape:", arr.shape)
arr = arr.reshape(2, 3)
print("Final shape:", arr.shape)

Previou shape: (3, 2)
Final shape: (2, 3)


## <a id='concatenate'>Concatenate</a>

Imagine you have a couple of lists and you want to have just one list combining these separate lists. To manage this, we use a technique in Python called concatenation.

In [51]:
xs = [1, 5, 17]
ys = [38, 18, 5]

zs = xs + ys

In [52]:
# We combined lists to get this
zs

[1, 5, 17, 38, 18, 5]

It is not just about lists. You can do the same with string and tuples, but it is out of the context. So, let's apply the similar syntax with NumPy.

In [53]:
arr1 = np.array([1, 5, 17])
arr2 = np.array([38, 18, 5])

In [54]:
arr1 + arr2

array([39, 23, 22])

As already covered previously, it isn't concatenation but addition in NumPy arrays. To concatenate two different arrays into one single array, we use concatenate function. 

In [55]:
np.concatenate([arr1, arr2])

array([ 1,  5, 17, 38, 18,  5])

In [56]:
# or
np.concatenate((arr1, arr2))

array([ 1,  5, 17, 38, 18,  5])

You might notice the function expects a collection of arrays to concatenate. The above examples show a tuple or list of arrays. I personally prefer introducing a tuple of arrays but you are free to choose one. To avoid confusion due to double brackets, I will merely consider the list of arrays. Let's just have another example!

In [57]:
arr1 = np.random.randint(1, 20, size = 5)
arr2 = np.random.randint(1, 20, size = 3)
arr3 = np.random.randint(1, 20, size = 2)
arr4 = np.random.randint(1, 20, size = 6)
arr5 = np.random.randint(1, 20, size = 7)

In [58]:
newarr = np.concatenate(
    [arr1, arr2, arr3, arr4, arr5]
)
newarr

array([ 4,  1, 17, 10, 11,  5, 10,  2, 15, 16,  8, 14,  4, 14,  8, 18, 16,
       17, 10,  9,  8,  2,  5])

In [59]:
newarr.shape

(23,)

We created 5 randomly generated arrays at different sizes and combined them into one single array. 

We will be covering concatenating multi-dimensional arrays as well in further tutorials. 

Key word for you before that tutorial: axis in NumPy

## <a id='transpose'>Transpose</a>

Usually NumPy arrays have a **T** attribute by default, giving access to the transposed version of the array.

In [60]:
arr = np.arange(1, 21)
arr = arr.reshape(5, 4)

In [61]:
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [62]:
# Transpose of the matrix
arr.T

array([[ 1,  5,  9, 13, 17],
       [ 2,  6, 10, 14, 18],
       [ 3,  7, 11, 15, 19],
       [ 4,  8, 12, 16, 20]])

In [63]:
arr.shape

(5, 4)

In [64]:
arr.T.shape

(4, 5)

NOTE: DO NOT CONFUSE TRANSPOSE WITH RESHAPE! THEY ARE NATURALLY AND COMPLETELY DIFFERENT.

Reshaping is reordering the sequence based on horizontal arrangement of the array elements. Transpose is like rotating the array around the diagonal.

Check the below example.

In [65]:
arr = np.arange(1, 13)
arr = arr.reshape(3, 4)
arr

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [66]:
arr.shape

(3, 4)

Let's reshape it!

In [67]:
arr_reshaped = arr.reshape(4, 3)
arr_reshaped

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

Let's get transpose!

In [68]:
arr_transpose = arr.T
arr_transpose

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [69]:
print("The shape of reshaped array:", arr_reshaped.shape)
print("The shape of transpose array:", arr_transpose.shape)

The shape of reshaped array: (4, 3)
The shape of transpose array: (4, 3)


Both of new arrays have the same shape, but when you look at values closely, you will observe their arrangements are different. Transpose array does not maintain the order of 1-12 through the horizontal sequence unlike reshaped array. Instead, via vertical axis, we see 1-12 arrangement is preserved. It is because we made the matrix dance around its diagonal.

For the record, transpose will help us to implement easier optimization of ML model in the future.

It would be good for you if you keep in mind that there is transpose function available in NumPy.

In [70]:
arr.transpose()

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [71]:
np.transpose(arr)

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

I am not diving into more details, but, please, note that this function is extremely useful when you are dealing with multi-dimensional arrays starting from 3D. How could you transpose 3D or more dimensional arrays?

# <a id='indexing'>Indexing on ndarrays</a>

[reference](https://numpy.org/doc/stable/user/basics.indexing.html)

Just like in the list, we do the same approach in 1D array to do indexing.

In [1]:
xs = [1, 5, 10, 4, 2, 18, 20, 13, 0, 19]

print(xs[0])
print(xs[1])
print(xs[2])

1
5
10


Let's convert it into the array.

In [7]:
arr = np.array(xs)

print(arr[0])
print(arr[1])
print(arr[2])

1
5
10


One advantage of NumPy arrays is that you can call multiple indexes once. Let's say I want to get the above indexes.

In [74]:
# Just don't forget double brackets. We input the list of indexes to be called.
arr[[0, 1, 2]]

array([ 1,  5, 10])

In [8]:
# or
indexes = [0, 1, 2]
arr[indexes]

array([ 1,  5, 10])

In [10]:
# The syntax with tuple is wrong.
indexes = (0, 1, 2)
arr[indexes]

IndexError: too many indices for array: array is 1-dimensional, but 3 were indexed

The slicing is also the same in 1d arrays.

In [76]:
xs[0:3]

[1, 5, 10]

In [77]:
arr[0:3]

array([ 1,  5, 10])

Slicing with step.

In [78]:
xs[1:9:2]

[5, 4, 18, 13]

In [79]:
arr[1:9:2]

array([ 5,  4, 18, 13])

What if we have ndarray instead of just 1d array?

In [13]:
arr = np.arange(1, 21).reshape(4, 5)
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

Let's just first understand what we get when we call 0th index.

In [81]:
arr[0]

array([1, 2, 3, 4, 5])

0th index returns a new 1d array from the 2d array, which is the first row in the matrix. 

Can we also get number 4 from this array? Similarly, we do the below.

In [82]:
newarr = arr[0]
newarr[3]

4

In [83]:
# or
arr[0][3]

4

It might remind you a nested list - the list containing list(s) where we call values from the list in this way. However, NumPy offers a different way of indexing, too.

In [84]:
arr[0, 3]

4

You just put a comma and it returns what you need. 

In a mathematical form, assuming the matrix A, you should probably notate, as shown in the below picture, to call that value.

![notation.png](attachment:notation.png)

You will notice that Python expects indexing starts at 0, offering subtraction of mathematically notated indexes from 1 to precisely call values.  

Let's say I want to call the element of 14 from the array. As we can see, it is positioned at 3rd row and 4th column. Remember that 0th index returned **1st** row. So, 3rd row should be called by 2nd index while 4th column is referred by 3rd index.

In [85]:
arr[2, 3]

14

In [86]:
# or imagine like this
row_index = 3
column_index = 4
arr[row_index - 1, column_index - 1]

14

Slicing ndarray. See the following example. I called the first row again and I want to get 3 and 4 together in the same slice.

In [87]:
arr[0]

array([1, 2, 3, 4, 5])

In [88]:
newarr = arr[0]
newarr[2:4]

array([3, 4])

In [89]:
# or
arr[0][2:4]

array([3, 4])

Again NumPy says...

In [90]:
arr[0, 2:4]

array([3, 4])

What we have done is to get the third and forth column values of the first row. 

You can also get the first 2 rows

In [91]:
arr[:2]

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

And extract second column from there as an example.

In [92]:
arr[:2, 1]

array([2, 7])

Let's make it more interesting. Slicing both rows and columns at the same time. 

In [15]:
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

In [94]:
arr[:2, :2]

array([[1, 2],
       [6, 7]])

I got the matrix from the matrix by slicing the first two rows and columns.

See the other examples.

In [95]:
# third and forth rows together with third and forth columns
arr[2:4, 2:4]

array([[13, 14],
       [18, 19]])

In [18]:
# first two rows and three columns by step 2
arr[:2, 0::2]

array([[ 1,  3,  5],
       [ 6,  8, 10]])

In [97]:
# calling the first and fifth columns from 1st-3rd rows
arr[0:3, [0, 4]]

array([[ 1,  5],
       [ 6, 10],
       [11, 15]])

In [98]:
# calling the first and fourth rows from 2nd-4th columns
arr[[0, -1], 1:4]

array([[ 2,  3,  4],
       [17, 18, 19]])

In [1]:
# calling these indexes - 1st row 3rd column, 4th row 1st column, 1st row 5th column, and 3rd row 5th column
arr[[0, 3, 0, 2], [2, 0, 4, 4]]

NameError: name 'arr' is not defined

In [19]:
# For the previous example. You can manipulate the return to make it 2D array
arr[[[0, 3, 0, 2]], [2, 0, 4, 4]]

array([[ 3, 16,  5, 15]])

# <a id='creation'>Array Creation Routines</a>

[reference](https://numpy.org/doc/stable/reference/routines.array-creation.html)

* array - already introduced 
* arange - already introduced 
* linspace
* zeros
* ones
* identity

## <a id='linspace'>Linspace</a>

Imagine you are asked to create an array in the size of 10 from 1 to 10 in which each of adjacent values has the same distance. See the below example with an implementation of np.arange.

In [101]:
np.arange(1, 11, 1)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In this one, we didn't provide the size of the array, instead, only start, stop, and step (distance in our context) are given. However, the question was different. You don't know the step. You are expected to get this array with start, stop, and **size**.

The function of linspace helps in this case.

In [7]:
np.linspace(start = 1, stop = 10, num = 10)

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

num parameter is the expected size of the array from the starting point to stop. You will also observe the returned array has float data. The below example shows the reason.

What if I want to create an array from 1 to 10 at num = 21? What step am I supposed to have?

In [103]:
arr = np.linspace(1, 10, num = 21)
arr

array([ 1.  ,  1.45,  1.9 ,  2.35,  2.8 ,  3.25,  3.7 ,  4.15,  4.6 ,
        5.05,  5.5 ,  5.95,  6.4 ,  6.85,  7.3 ,  7.75,  8.2 ,  8.65,
        9.1 ,  9.55, 10.  ])

In [104]:
print('Difference between some adjacent values:')
print(arr[1] - arr[0])
print(arr[2] - arr[1])
print(arr[3] - arr[2])

Difference between some adjacent values:
0.44999999999999996
0.44999999999999996
0.4500000000000002


It seems that approximately 0.45 is the right value of step to complete 21 samples within the range of 1 and 10.

## <a id='zeros'>Zeros</a>

You can create zero matrix as follows.

In [105]:
zero_matrix = np.zeros(5)
zero_matrix

array([0., 0., 0., 0., 0.])

In [106]:
zero_matrix = np.zeros((5, 5))
zero_matrix

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Notice how I input the shape of the matrix with a tuple. 

## <a id='ones'>Ones</a>

The same principle.

In [107]:
one_matrix = np.ones(5)
one_matrix

array([1., 1., 1., 1., 1.])

In [108]:
one_matrix = np.ones((5, 5))
one_matrix

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

## <a id='identity'>Identity</a>

You can create an identity matrix.

In [109]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

# <a id='bool'>Boolean Logic</a>

This time I put the reference to [Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/02.06-boolean-arrays-and-masks.html) since boolean logic is well covered there in my opinion. 

However, you can also find [NumPy docs](https://numpy.org/doc/stable/reference/routines.logic.html) useful.

Things to cover:
* boolean logic in Python
* boolean operators and arrays in NumPy
* boolean functions

## <a id='bool-Python'>Boolean Logic in Python</a>

Boolean values in both Python and NumPy are True and False. We use the following operators in Python to conduct the implementation of boolean logic.

In [54]:
a = 3
b = 4

In [116]:
# b bigger than a?
b > a

True

In [117]:
# a bigger than b?
b < a

False

In [118]:
# a bigger than or equal to b?
b <= a

False

In [119]:
# equal?
a == b

False

In [120]:
# not equal?
a != b

True

In [55]:
# a bigger than 5?
a > 5

False

In [56]:
# a not bigger than 5?
not a > 5

True

In [122]:
# is a positive and odd number?
(a > 0) and (a % 2 == 1)

True

In [123]:
# is a positive or odd number?
(a > 0) or (a % 2 == 1)

True

In [157]:
# is a negative or even number?
(a < 0) or (a % 2 == 0)

False

In [125]:
bool(0)

False

In [126]:
bool(1)

True

In [9]:
bool('a')

True

## <a id='bool-operators-arrays'>Boolean Operators and Arrays in NumPy</a>

Operators:
* *>*
* <
* *>=*
* <=
* &
* |
* ~

& is a replacement of **and** 

| is a replacement of **or**

~ is a replacement of **not**

Let's create a new array.

In [22]:
arr = np.array([2, 7, 4, 9, 10, 3])
arr

array([ 2,  7,  4,  9, 10,  3])

There is no element in the array lower than 2.

In [129]:
arr < 2

array([False, False, False, False, False, False])

There is only one element in the array lower than or equal to 2.

In [130]:
arr <= 2

array([ True, False, False, False, False, False])

There are three elements for being equal to or lower than 4.

In [57]:
4 >= arr

array([ True, False,  True, False, False,  True])

There are three elements for being not both equal and lower than 4.

In [66]:
~(4 >= arr)

array([False,  True, False,  True,  True, False])

This is what happens when you use **not**

In [25]:
not (4 >= arr)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We shortly investigated how an array behaves with few operators. NumPy returns a new array corresponding to the variable **arr** with boolean values. We can assign it to a new variable.

In [3]:
newarr = (arr == 10)

In [133]:
newarr

array([False, False, False, False,  True, False])

In [134]:
newarr.dtype

dtype('bool')

In [135]:
newarr.itemsize

1

In [136]:
newarr.nbytes

6

In [137]:
# Searching for not being equal to 10
arr != 10

array([ True,  True,  True,  True, False,  True])

This time we searched for values equal to 10 and examined some attributes, such as dtype, itemsize, and nbytes. We see that data type is bool, containing 1-byte elements in the size of 6. In total, this array consumes 6 byte  memory.

So, sometimes you might have multiple conditions instead of just one as shown above. What if I am looking for even values bigger than 5 in the array? Let's apply this logic in Python.

In [138]:
arr

array([ 2,  7,  4,  9, 10,  3])

In [6]:
xs = [2, 7, 4, 9, 10, 3]

In [5]:
for element in xs:
    print(    
        (element > 5) and (element % 2 == 0),
        end = ' '    
    )

False False False False True False 

You will realize I defined two different conditions. element > 5, element % 2 == 0. I separated them with brackets and put **and** to state that I am looking for both cases instead of any. It is also acceptable in Python to avoid brackets. I prefer that way since it just makes the code more understandable. 

Nevertheless, brackets are must in NumPy. We will explain it later.

Let's look at both conditions separately.

In [141]:
arr > 5

array([False,  True, False,  True,  True, False])

In [142]:
arr % 2 == 0

array([ True, False,  True, False,  True, False])

What we are searching for is to have True at the same indexes of both returned arrays.  For instance, 0th index contains both False and True. The same occurs for 1st, 2nd, and 3rd indexes as well, whereas the last index is only False. 

In [143]:
# Those indexes will return False for this reason
print(True and False)
print(False and True)
print(False and False)

False
False
False


4th index, on the other hand, refers to True in these arrays. It is a good indication that we found our answer at 4th index!

In [144]:
# That index will return True for this reason
print(True and True)

True


Let's combine conditions.

In [145]:
(arr > 5) & (arr % 2 == 0)

array([False, False, False, False,  True, False])

Here we are! So the last second index is True while the others are False. To indicate **and** statement, we use **&** in NumPy arrays. If you will put **and**, instead, the following error will occur.

In [146]:
(arr > 5) and (arr % 2 == 0)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So, be careful with this case. It is also the same for **or** statement. For instance, let's replace **and** with **or**

In [147]:
(arr > 5) or (arr % 2 == 0)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Therefore, it is better to use **|**

In [148]:
(arr > 5) | (arr % 2 == 0)

array([ True,  True,  True,  True,  True, False])

Now, let's understand why most of elements are True. 

The first condition states values bigger than 5 as True as below.

In [149]:
arr > 5

array([False,  True, False,  True,  True, False])

What values are there that are not bigger than 5? Let's look at the array.

In [150]:
arr

array([ 2,  7,  4,  9, 10,  3])

Values at indexes of False are 2, 4, and 3. 

Let's see the second condition.

In [151]:
arr % 2 == 0

array([ True, False,  True, False,  True, False])

Let's investigate different combinations of True and False with **or** operator.

In [152]:
print(True or True)
print(True or False)
print(False or True)

True
True
True


If we see True values as a result of (arr > 5) | (arr % 2 == 0), it is mainly because of the above cases. Otherwise, the below shows how we get False.

In [153]:
print(False or False)

False


3 is at the index of False since it is not divisible of 2 and not bigger than 5.

In [154]:
arr

array([ 2,  7,  4,  9, 10,  3])

In [155]:
(arr > 5) | (arr % 2 == 0)

array([ True,  True,  True,  True,  True, False])

Additional note about multiple conditions is to be attentive with brackets. The below is completely wrong in NumPy array syntax.

In [7]:
arr > 5 & arr % 2 == 0

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [9]:
arr > 5 | arr % 2 == 0

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Now, we got the True and False indexes, but how do we use them? We were genuinely capable of interpreting how to find the correct value from the array based on boolean array as we've seen in some above examples. 

Fortunately, NumPy array provides a way to call a new array for a given condition. Let's disclose values bigger than 5 only.

In [12]:
arr

array([ 2,  7,  4,  9, 10,  3])

In [18]:
newarr = (arr > 5)
newarr

array([False,  True, False,  True,  True, False])

In [11]:
arr[newarr]

array([ 7,  9, 10])

As we can see, 7, 9, and 10 was returned in a new array. See other examples.

In [26]:
# It is the same with what we did above
arr[arr > 5]

array([ 7,  9, 10])

In [29]:
# odd numbers bigger than 5 were returned in a new array
# don't forget adding brakcets in multi-conditions.
arr[(arr > 5) & (arr % 2 == 1)]

array([7, 9])

## <a id='bool-functions'>Boolean Functions</a>

[reference](https://numpy.org/doc/stable/reference/routines.logic.html)

Functions to cover:
* where
* all
* any

### <a id='where'>Where</a>

Remember we created a boolean array based on a given condition, such as arr > 5. 

In [11]:
arr > 5

array([False,  True, False,  True,  True, False])

At which indexes are True values located?

In [13]:
i = 0
for boolean in arr > 5:
    if boolean:
        print(i, end = ' ')
    i += 1

1 3 4 

If we want to get indexes only, we might want to use where function. Where the hell are those True values?

In [36]:
np.where(arr > 5)

(array([1, 3, 4], dtype=int64),)

### <a id='all'>All</a>

You are given a boolean array and you are asked to identify if all the instances are True.

In [26]:
arr_all_True = np.array([True, True, True])
arr_not_all_True = np.array([False, True, True])

In [48]:
print(arr_all_True)
for boolean in arr_all_True:
    if not boolean:
        print('There is one False at least')
        break
else:
    print('All of them are True')

[ True  True  True]
All of them are True


In [49]:
print(arr_not_all_True)
for boolean in arr_not_all_True:
    if not boolean:
        print('There is one False at least')
        break
else:
    print('All of them are True')

[False  True  True]
There is one False at least


NumPy provides one function to get True or False indicating whether or not instances are all True.

In [33]:
np.all(arr_all_True)

True

In [34]:
np.all(arr_not_all_True)

False

### <a id='any'>Any</a>

You are given a boolean array and you are asked to identify if one instance is True at least.

In [42]:
arr_any_True = np.array([False, True, True])
arr_all_False = np.array([False, False, False])

In [46]:
print(arr_any_True)
for boolean in arr_any_True:
    if boolean:
        print('There is one True at least')
        break
else:
    print('There is no True at all')

[False  True  True]
There is one True at least


In [47]:
print(arr_all_False)
for boolean in arr_all_False:
    if boolean:
        print('There is one True at least')
        break
else:
    print('There is no True at all')

[False False False]
There is no True at all


NumPy provides one function to get True or False indicating whether or not one instance is True at least.

In [51]:
np.any(arr_any_True)

True

In [52]:
np.any(arr_all_False)

False

# <a id='conclusion'>Conclusion</a>

Well done! You just finished essential functionalities of NumPy from which we will be benefitting during the course. 

What we haven't covered is mainly axis that will be postponed for another notebook since it has a bit complication in terms of imagination. Furthermore, Linear Algebra will be one of our crucial topics. Broadcasting will be touched on as well together with axis. You will also refer to the things you learned in this notebook again while dealing with axis. For example, some functions contain parameters for defining the axis value, such as sum, max, min, all, any, and more. Even a nd array can have a transpose via different multiple axes. 

Because we will be applying (somehow) pure math in NumPy to boost our understanding in practical manners, NumPy is the primary library that should be elucidated in greater details even compared to Pandas and MatPlotLib. 

Imagine you develop your own ML model in purely written NumPy script... (We will have fun with such stuffs sometimes not every time)

So, be excited and have a nice learning!

Should you have any question, just contact me :)