# **NumPy Basic Tutorial**



## Prerequisites
- Basic Python knowledge.
- Rudimentary understanding on the working of computers. 
- Fundamental linear algebra know-how

## Introduction

### **What is NumPy?**

NumPy is short for "Numerical Python". It was created in 2005 by Travis Oliphant as an open source project to be used freely. The official documentation for NumPy defines NumPy as "the fundamental package for scientific computing in Python."

*So, NumPy is simply a Python library.*

It's essential importance is highlighted by it's ability to provide an efficient and convenient method to work with arrays. It is a Python library that provides a multidimensional array object and various other derived objects along with a mixed bag of routines for quick operations on arrays such as mathematics, sorting, shape manipulation, basic linear algebra, basic statistical operations, random simulation to name a few.

As w3schools puts it aptly,

- NumPy is a Python library.
- NumPy is used for working with arrays.
- NumPy is short for "Numerical Python".

### **Why NumPy?**

A question that often crops up in the minds of novice implementers of this library is why do we opt to use NumPy when Python has already graced us with it's in-built implementation of arrays in the form of `lists`.

NumPy aims to provide an array object which is **50 times faster than traditional Python lists**; not to mention the plethora of supporting functions that makes working with `ndarrays` (array objects in NumPy) so much more advantageous.

#### **Why is NumPy just so quick in comparision to Lists?**

NumPy arrays are stored at one continuous place in memory unlike lists, and it is this contiguous storage that allows efficient access and manipulation of arrays.

### **Other notable differences between NumPy arrays and standard Python lists/sequences**

- NumPy arrays have a fixed size at creation. Conversely, Python lists do not; opting for changing size by growing dynamically. Changing the size of an `ndarray` in turn, requires the creation of a new array and deleting the original.

- The elements in a NumPy array are all required to be of the same data type, and thus take up exactly the same size in memory.

- As mentioned earlier, NumPy arrays enables numerous operations on arrays always executed more efficiently and with less code than it's in-built Python counterpart.

- A growing torrent of other scientific and mathematical Python-based packages/libraries use NumPy arrays. They often convert input to NumPy arrays prior to processing, often outputting NumPy arrays. Thus, cementing NumPy's place as an essential package in the Python community. 

## Installation

Assumes that you have Python and PIP already installed on your systems.

Install NumPy using the following command on your terminal:

`pip install numpy`

If the command fails, then use a python distribution that has NumPy pre-installed like Anaconda, Spyder, etc.

Alternatively, you can also use online Jupyter Notebook Environments like Google Colab, Kaggle that have NumPy pre-installed. Such environments allow anyone to write and execute python code through the browser by running entirely in the cloud. 

## Tutorial

### **Importing NumPy**

On successful installation, import NumPy in your projects by adding the `import` keyword.

In [1]:
import numpy

arr = numpy.array([1, 2, 3])
print(arr)

[1 2 3]


### **Importing NumPy as `np`**

NumPy is generally imported under the alias `np`. An alias, for those unaware, works exactly like an alternate name. This alias has been universally accepted by the Python community and you are bound to find this alias being used almost every place where NumPy is being used. It also saves us the hastle of typing out the entire name while refering to the library. 

An alias is created by using the `as` keyword while importing:

In [2]:
import numpy as np

arr = np.array([1, 2, 3])
print(arr)

[1 2 3]


### **Checking NumPy Version**

In [3]:
print(np.__version__)

1.21.6


### **Creating a NumPy `ndarray` Object**
A few codeblocks ago, we created the `ndarray` by using the `array()` function. It accepts a `list`, `tuple` or any array-like object as a parameter and converts it into an `ndarray`. We can chefck the type of the object by using the built-in `type()` function.

In [4]:
arr1 = np.array([1, 2, 3])
arr2 = np.array((4, 5, 6))
print(f"arr1: {arr1}")
print(f"arr2: {arr2}")
print(f"arr1 type: {type(arr1)}")
print(f"arr2 type: {type(arr2)}")

arr1: [1 2 3]
arr2: [4 5 6]
arr1 type: <class 'numpy.ndarray'>
arr2 type: <class 'numpy.ndarray'>


### **Dimensions in Arrays**
Dimensions in arrays is implemented by using nested arrays, a.k.a., arrays that have arrays as their elements.

Let's explore this in greater detail starting with:

#### **O-D Arrays**
0-D arrays are nothing but scalars in mathematical terms. NumPy arrays provide the `ndim` attribute that returns an integer telling us how many dimensions the array have. 

In [5]:
arr = np.array(8)
print(f"arr: {arr}")
print(f"\nNumber of Dimension/s: {arr.ndim}")

arr: 8

Number of Dimension/s: 0


#### **1-D Arrays**

1-D arrays are arrays that have 0-D arrays as it's elements. Also refered to as uni-dimensional arrays.

In [6]:
arr = np.array([1, 2, 3])
print(f"arr: {arr}")
print(f"\nNumber of Dimension/s: {arr.ndim}")

arr: [1 2 3]

Number of Dimension/s: 1


#### **2-D Arrays**
An array that has 1-D arrays as it's elements is a 2-D array. Often used to represent matrices or 2nd order tensors.

In [7]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(f"arr: {arr}")
print(f"\nNumber of Dimension/s: {arr.ndim}")

arr: [[1 2 3]
 [4 5 6]]

Number of Dimension/s: 2


#### **3-D Arrays**
An array that has 2-D arrays, i.e., matrices as it's elements is called a 3-D array. Used to represent 3rd order tensors.

In [8]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(f"arr: {arr}")
print(f"\nNumber of Dimension/s: {arr.ndim}")

arr: [[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]

Number of Dimension/s: 3


#### **Higher Dimensional Arrays**
Arrays can obviously be n-dimensional, a fact highlighted by the name given to arrays in NumPy. This higher dimensionality can be explicitly stated to NumPy by using the `ndmin` argument.

In [9]:
arr = np.array([1, 2, 3], ndmin = 8)
print(f"arr: {arr}")
print(f"\nNumber of Dimension/s: {arr.ndim}")

arr: [[[[[[[[1 2 3]]]]]]]]

Number of Dimension/s: 8


### **NumPy Array Indexing**
Array indexing, just like List indexing, is used to access an array element. The indexes in NumPy are also zero-indexed, i.e., the first element has index 0, and the second element has index 1, so on and so forth.

In [10]:
arr = np.array([1, 2, 3])
print(f"1-st element: {arr[0]}")
print(f"2-nd element: {arr[1]}")

1-st element: 1
2-nd element: 2


2-D and higher dimensional arrays are represented by using comma separated integers which represents the dimension and the index of the element. 

2-D arrays work like a table/matrice, the first dimension is the row and the index represents the column.

It can also be indexed by first slicing the required dimension and slicing the element from that dimension. Slicing will be gone over in much greater detail coming up.

In [11]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
print(f"1-st element on 2-nd row of arr1: {arr1[1, 0]}")
print(f"1-st element on 2-nd row of arr1 (By Slicing): {arr1[1][0]}")

1-st element on 2-nd row of arr1: 4
1-st element on 2-nd row of arr1 (By Slicing): 4


Accessing 3-D Arrays has been explained in great detail by w3schools and it is as follows:

In [12]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr[0, 1, 2])

6


#### **Example Explained**
`arr[0, 1, 2]` prints the value `6`.

And this is why:

The first number represents the first dimension, which contains two arrays:

[[1, 2, 3], [4, 5, 6]]

and:

[[7, 8, 9], [10, 11, 12]]

Since we selected `0`, we are left with the first array:

[[1, 2, 3], [4, 5, 6]]

The second number represents the second dimension, which also contains two arrays:

[1, 2, 3]

and:

[4, 5, 6]

Since we selected `1`, we are left with the second array:

[4, 5, 6]

The third number represents the third dimension, which contains three values:

4

5

6

Since we selected `2`, we end up with the third value:

6

#### **Negative Indexing**
Used to access an array from it's ends.

In [13]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Last element of last row: {arr[-1, -1]}")

Last element of last row: 6


### **NumPy Array Slicing**

Similar to List slicing in it's definition.

Takes the form `[start:end]`

If step is to be included, the form is modified as `[start:end:step]`

- If start is not passed, it is considered as 0. 

- If end is not passed, it is considered as the length of the array for that dimension.

- The default value for step is 1.

Negative slicing is also supported.

**Point of great importance: The start index is included, i.e., the element at that index will a part of the output, while the end index is not included, i.e., the element at that index is not a part of the output.** 

In [14]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(arr)
print(f"Elements from index-1 to index-5: {arr[1:5]}")
print(f"Elements from index-5 to the end of the array: {arr[5:]}")
print(f"Elements from the beginning to index-5 (not included): {arr[5:]}")
print(f"Elements from index-5 from the end to index-1 from the end: {arr[-5:-1]}")
print(f"Every other element from index-1 to index-5: {arr[1:5:2]}")


arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print("\n", arr)
print(f"From the first row, elements from index-1 to index-5: {arr[0, 1:5]}")
print(f"From both rows, elements from index-1 to index-3: {arr[0:, 1:3]}")

[1 2 3 4 5 6 7 8]
Elements from index-1 to index-5: [2 3 4 5]
Elements from index-5 to the end of the array: [6 7 8]
Elements from the beginning to index-5 (not included): [6 7 8]
Elements from index-5 from the end to index-1 from the end: [4 5 6 7]
Every other element from index-1 to index-5: [2 4]

 [[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
From the first row, elements from index-1 to index-5: [2 3 4 5]
From both rows, elements from index-1 to index-3: [[2 3]
 [7 8]]


### **NumPy Array Iterating**

Similar to iterating through Python lists/tuples.

Iterating, to the unintiated, is going through each elements one by one. Implemented using the trusted `for` loop. Best understood by taking up multiple examples.

In [15]:
arr = np.array([1, 2, 3])

print("Iterating through a 1-D array (on the elements):")
for x in arr:
  print(x)


Iterating through a 1-D array (on the elements):
1
2
3


In [16]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print("Iterating through a 2-D array (on the elements):")
for x in arr:
  print(x)


Iterating through a 2-D array (on the elements):
[1 2 3]
[4 5 6]


As seen, the loop iterates through each row instead of each scalar element. This is demonstrated by the loop printing the row entirely, instead of the scalars it contains individually. Therefore, if we iterate on a n-D array, it will go through n-1th dimension one by one. To return the actual values, i.e. scalars, follow the following piece of code.

In [17]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
  for y in x:
    print(y)

1
2
3
4
5
6


Same concept applies for 3-D and higher dimensional arrays.

In [18]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
  print(x)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


In [19]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
  for y in x:
    for z in y:
      print(z)

1
2
3
4
5
6
7
8
9
10
11
12


### **NumPy Array Shape**
The shape of an array is the number of elements in each dimension.

To know the shape of an `ndarray`, we can tap into it's shape attribute that returns a tuple with the index representing the corresponding dimension and the number at that index representing the number of corresponding elements in that respective dimension.


In [20]:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(f"arr shape: {arr.shape}")

arr shape: (2, 5)


(2, 5) demonstrates that the array has 2 dimensions (indicated by the number of elements in the tuple). The first dimension contains 2 elements (corresponding to the 2 rows in the above example) and the second has 5 (5 columns).

### **NumPy Array Copy and View**
What are these 2 new elusive terms? By nomenclature, it is easy to gather that a copy of an array is simply a new array that is a copy of the original array while view is simply a view of the original array.

- The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.

- The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

Difference b/w copy and view can be clearly explained through examples. Let us make an array, a copy of the array, change the original array and display the two of them.

**The copy is not affected by the changes made to the original array.**

In [21]:
arr_original = np.array([1, 2, 3, 4, 5])
arr_copy = arr_original.copy()
arr_original[0] = 8
print(f"The original array is: {arr_original}")
print(f"The copy of the array is: {arr_copy}")

The original array is: [8 2 3 4 5]
The copy of the array is: [1 2 3 4 5]


Let us replicate the same procedure but make a view of the array instead.

In [22]:
arr_original = np.array([1, 2, 3, 4, 5])
arr_view = arr_original.view()
arr_original[0] = 8
print(f"The original array is: {arr_original}")
print(f"The view of the array is: {arr_view}")

The original array is: [8 2 3 4 5]
The view of the array is: [8 2 3 4 5]


Thus, **the view is affected by the changes made to the original array.** 

Let us create an array, make a view, change the view and display both the arrays.

In [23]:
arr_original = np.array([1, 2, 3, 4, 5])
arr_view = arr_original.view()
arr_view[0] = 8
print(f"The original array is: {arr_original}")
print(f"The view of the array is: {arr_view}")

The original array is: [8 2 3 4 5]
The view of the array is: [8 2 3 4 5]


**The original array is affected by the changes made to the view.**

#### **Check if array is a copy or view through ownership**

In [24]:
arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)

None
[1 2 3 4 5]


The copy returns `None` as there is no array that owns it. The view returns the original array as it has it's ownership.

### **NumPy Array Reshaping**
Reshaping means changing the shape of an array. By reshaping, we can add or remove dimensions or change the number of elements in each dimension.

In [25]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)

print(f"The original array: {arr}")
print(f"The re-shaped array is: {newarr}")

The original array: [ 1  2  3  4  5  6  7  8  9 10 11 12]
The re-shaped array is: [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [26]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(2, 3, 2)

print(f"The original array: {arr}")
print(f"The re-shaped array is: {newarr}")

The original array: [ 1  2  3  4  5  6  7  8  9 10 11 12]
The re-shaped array is: [[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


#### **Re-shape into any shape**
We can re-shape an array into any shape as long as there are required number of elements to shape the array. 

For example, an 1-D array with 8 elements can be shaped into a (4, 2) as a (4, 2) array contains 4 x 2 = 8 elements. However, it cannot be shapped into a (3, 3) because a (3, 3) would required 3 x 3 = 9 elements; one more element than what is supplied.

In [27]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarr = arr.reshape(3, 3)

print(f"The re-shaped array is: {newarr}")

ValueError: ignored

#### **Unknown Dimension**
You are allowed 1 unknown dimension. Pass -1 value and NumPy will calculate what that number should be accordingly.

In [28]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
newarr = arr.reshape(2, 2, -1)

### **NumPy Joining Array**
Joining is obviously putting contents of two or more arrays into a single array. Joining arrays is done by axes. 

The `concatenate()` function accepts arrays to be joined along with the axis. If the axis is not explicitly stated, it takes the default value of 0.

Along axis-0:


In [29]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
print(arr)

[1 2 3 4 5 6]


Along axis-1:

In [30]:
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
print(arr)

[[1 2 5 6]
 [3 4 7 8]]


#### **Stacking**
Stacking is the same as concatenation, the only difference is stacking is done along a new axis. 

2 1-D arrays can be joined along the second axis, by putting them one over the other.

In [31]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.stack((arr1, arr2), axis=1)
print(arr)

[[1 4]
 [2 5]
 [3 6]]


To stack along rows, use `hstack()`.

In [32]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.hstack((arr1, arr2))
print(arr)

[1 2 3 4 5 6]


To stack along columns, use `vstack()`.

In [33]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.vstack((arr1, arr2))
print(arr)

[[1 2 3]
 [4 5 6]]


To stack along height, which is the same as depth, use `dstack()`.

In [34]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.dstack((arr1, arr2))
print(arr)

[[[1 4]
  [2 5]
  [3 6]]]


### **NumPy Splitting Array**
Splitting is the reverse of joining. Joining merges multiple arrays into 1, while splitting breaks 1 array into multiple.

Implemented using the `array_split()` function.

In [35]:
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 3)
print(f"First array: {newarr[0]}")
print(f"Second array: {newarr[1]}")
print(f"Third array: {newarr[2]}")

First array: [1 2]
Second array: [3 4]
Third array: [5 6]


If the array has less elements than required, it will adjust accordingly.
A `split()` method also exists but it doesnt not adjust the elements when the elements are less in the source array.

In [36]:
arr = np.array([1, 2, 3, 4, 5, 6])
newarr = np.array_split(arr, 4)
print(newarr)

[array([1, 2]), array([3, 4]), array([5]), array([6])]


Same syntax used while splitting 2-D arrays.

In [37]:
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
print(newarr)

[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]


You can also specify the axis along which the split is to be performed.

In [38]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])
newarr = np.array_split(arr, 3, axis=1)
print(newarr)

[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]


### **NumPy Searching Arrays**
To search an array, we use the `where()` method. Returns indices of numbers which fulfill the given condition.

In [39]:
arr = np.array([1, 2, 3, 4, 5, 4, 4])
print(np.where(arr == 4))

(array([3, 5, 6]),)


Can pass functions like find numbers divisible by 2, i.e., even numbers.

In [40]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
print(np.where(arr%2 == 0))

(array([1, 3, 5, 7]),)


### **Data Types in NumPy**
NumPy comes with it's own fair share of datatypes, along with the default 5 python data types.

- i - integer
- b - boolean
- u - unsigned integer
- f - float
- c - complex float
- m - timedelta
- M - datetime
- O - object
- S - string
- U - unicode string
- V - fixed chunk of memory for other type (void)

#### **Checking Data Type of Array**
`dtype` attribute returns the data type of the array.

In [41]:
arr = np.array([1, 2, 3])
print(f"Data Type of array: {arr.dtype}")

Data Type of array: int64


The `array()` function also accepts `dtype` that allows us to define what the expected data type of the array elements must be. We can also define the size for `i`, `u`, `f`, `S`, and `U`.

In [42]:
arr = np.array([1, 2, 3], dtype='S')
print(f"arr: {arr}")
print(f"Data Type of array: {arr.dtype}")

arr = np.array([1, 2, 3, 4], dtype='i4')
print(f"arr: {arr}")
print(f"Data Type of array: {arr.dtype}")

arr: [b'1' b'2' b'3']
Data Type of array: |S1
arr: [1 2 3 4]
Data Type of array: int32


#### **If a Value cannot be converted**
NumPy will raise a ValueError.

In [43]:
arr = np.array(['a', '1', '2', '3'], dtype='i')

ValueError: ignored

#### **Converting datatype of existing arrays**

The astype() function creates a copy of the array, and allows you to specify the data type as a parameter.

In [44]:
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i')
print(f"arr: {newarr}")
print(f"Data Type of array: {newarr.dtype}")

arr: [1 2 3]
Data Type of array: int32


### **NumPy Sorting Arrays**
Orders elements according to a sequence. Implemented using the `sort()` function. String arrays are sorted alphabetically, numeric arrays are sorted in an ascending order, boolean are also sorted in an ascending order taking into consideration False is 0 and True is 1 and multidimensional arrays are sorted within it's dimensions.

In [45]:
arr = np.array([3, 2, 0, 1])
print(f"1-D integer array: {np.sort(arr)}")

arr = np.array(['banana', 'cherry', 'apple'])
print(f"1-D strings array: {np.sort(arr)}")

arr = np.array([True, False, True])
print(f"1-D boolean array: {np.sort(arr)}")

arr = np.array([[3, 2, 4], [5, 0, 1]])
print(f"2-D integer array: {np.sort(arr)}")

1-D integer array: [0 1 2 3]
1-D strings array: ['apple' 'banana' 'cherry']
1-D boolean array: [False  True  True]
2-D integer array: [[2 3 4]
 [0 1 5]]
