<a href="https://www.kaggle.com/code/alfredkondoro/numpy-tutorial?scriptVersionId=119774756" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## **Numpy Introduction**
**Numpy:** a python library. It is used for working with arrays. It is an acronym for "Numerical Python".
* It has functions for working in domain of linear algebra, fourier transform, and matrices.
* NumPy aims to provide an array object that is up to 50 times faster than traditional Python lists.
* NumPy is faster than lists because, all NumPy arrays are stored in one continuous place uin memory unlike lists, so processes can manipulate them very efficiently. This behaviour is called locality of reference. Also it is optimized to work with the latest CPU architecture.
* NumPy is written partially in Python and C or C++ in parts that require fast computation.

In [1]:
#Importing numpy, we also use an alias 'np' for easier referencing
import numpy as np

In [2]:
#To check the numpy version
print(np.__version__)

1.21.6


**NumPy** is used to work with arrays. The array object in NumPy is called **ndarray**.
* **type():** this built-in Python function tells us the type of the object passed to it.

In [3]:
#Creating a simple list array
import numpy as np
arr = np.array([1,2,3,4,5])
print(arr)
print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>


In [4]:
#Using a tuple to create a NumPy array
import numpy as np
arr = np.array((1,2,3,4,5))
print(arr)
print(type(arr))

[1 2 3 4 5]
<class 'numpy.ndarray'>


**Dimension in Arrays:** level of array depth.<br>
**Nested array:** array that have arrays as their elements.

* **0-D Arrays:** are elements in array. Also called scalars.
* **1-D Arrays:** an array that has 0-D arrays as its elements. Also called uni-dimensional array.
* **2-D Arrays:** an array that has 1-D arrays as its elements. These are often used to represent matrices.
* **3-D Arrays:** an array that has 2-D arrays(matrices) as its elements. These are often used to represent a 3rd order tension.

**ndim attribute:** returns the integer that tells us how many dimensions does the array have

In [5]:
import numpy as np

#0-D Array
a = np.array(37)

#1-D Array
b = np.array([1,2,3])

#2-D Array
c = np.array([[1,2,3],[4,5,6]])

#3-D Array
d = np.array([[[1,2,3],[4,5,6]],[[6,7,8],[9,10,11]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


In [6]:
#You can define the number of dimensions using the ndmin argument
import numpy as np
arr = np.array([1,2,3], ndmin=5)

print('The array -> ', arr)
print('Number of dimensions in array ->', arr.ndim)

The array ->  [[[[[1 2 3]]]]]
Number of dimensions in array -> 5


**Array Indexing:** accessing an array element. By referring to its index number. Indexes in NumPy arrays start with 0.

In [7]:
import numpy as np
arr = np.array([1,2,3,4,5])

#To get the first element in the array
print("The first element ->", arr[0])

#The addition of the third and fourth element 
print("The sum of the third and fourth element ->", arr[2] + arr[3])

The first element -> 1
The sum of the third and fourth element -> 7


To access elements in multi-dimensional arrays we can use comma separated integers representing the dimension and the index of the element.
**Negative indexing:** access an array from the end.

In [8]:
import numpy as np

#access in 2D Arrays
arr2D = np.array([[1,2,3,],[4,5,6]])
print("1st element on 2nd row ->", arr2D[0,1])

#access in 3D Arrays
arr3D = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print("3rd element of the second array of the first array ->", arr3D[0,1,2])

1st element on 2nd row -> 2
3rd element of the second array of the first array -> 6


In [9]:
import numpy as np
arrNg = np.array([[1,2,3,4],[5,6,7,8]])

#using negative indexing
print("The last element from the second array ->", arrNg[1,-1])

The last element from the second array -> 8


## **NumPy Array Slicing**
**Slicing:** taking elements from one given index to another given index.
We define step as: [start:end:stop]
The result includes the start index, but excludes the end index.

In [10]:
import numpy as np

arr = np.array([1,2,3,4,5,6,7])
print("The array ->", arr)

#slice elements from index 1 to index 5
print("From index 1 to 5 ->", arr[1:5])

#slice elements from beginning to index 4(not included)
print("From beginning to index ->", arr[:4])

The array -> [1 2 3 4 5 6 7]
From index 1 to 5 -> [2 3 4 5]
From beginning to index -> [1 2 3 4]


**Negative Slicing:** Use the minus operator to refer to an index from the end

In [11]:
import numpy as np

arr = np.array([1,2,3,4,5,6,7])
print("The array ->", arr)

#slice from the index 3 from the end to index 1 from the end
print("From index 3 from the end to index 1 from the end ->", arr[-3:-1])

The array -> [1 2 3 4 5 6 7]
From index 3 from the end to index 1 from the end -> [5 6]


**Step:** use the step value to determine the step of the slicing

In [12]:
import numpy as np

arr = np.array([1,2,3,4,5,6,7])
print("The array ->", arr)

#return every element from index 1 with a step of 2
print("Element from index 1 with a step of 2", arr[1::2])

The array -> [1 2 3 4 5 6 7]
Element from index 1 with a step of 2 [2 4 6]


In [13]:
import numpy as np

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print("The array ->", arr)

#from the second element, slice elements from index 1 to index 4(not included)
print("2nd element, slice from index 1 to index 4 ->", arr[1,1:4])

#from both elements return index 2
print("Both elements return index 2 ->", arr[0:2, 2])

The array -> [[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
2nd element, slice from index 1 to index 4 -> [7 8 9]
Both elements return index 2 -> [3 8]


## **NumPy Data Types**
By default Python have these data types:
* *strings* - used to represent text data.
* *integer* - used to represent integer number.
* *float* - used to represent real number.
* *boolean* - used to represent True or False.
* *complex* - used to represent complex number.


NumPy has other extra data types, and refer to data types with one character.
* **i** - integer
* **b** - boolean
* **u** - unsigned integer
* **f** - float
* **c** - complex float
* **m** - timedelta
* **M** - datetime
* **O** - object
* **S** - string
* **U** - unicode string
* **V** - fixed chunk of memory for the other type(void)

The NumPy array object has a property called dtype that returns the data type of the array

In [14]:
import numpy as np

#integer array
arrNum = np.array([1,2,3,4,5])
print("This array datatype ->", arrNum.dtype)

#string array
arrStr = np.array(['one', 'two', 'three'])
print("This array datatype ->", arrStr.dtype)

This array datatype -> int64
This array datatype -> <U5


The ***array()*** function to create arrays, this function can take an optional argument: dtype that allowas us to define the expected data type of array elements.

For **i**, **u**, **f**, **S** and **U** we can define size as well.

If a type is given in which elements can't be casted then NumPy will raise a ValueError.

In [15]:
import numpy as np

arr = np.array([1,2,3,4,5], dtype='S')

print("The array ->", arr)
print("The datatype of array elements ->", arr.dtype)

The array -> [b'1' b'2' b'3' b'4' b'5']
The datatype of array elements -> |S1


In [16]:
import numpy as np

arr = np.array([1,2,3,4], dtype='i4')

print("The array ->", arr)
print("The datatype of array elements ->", arr.dtype)

The array -> [1 2 3 4]
The datatype of array elements -> int32


**Converting datatype on existing arrays**<br>
The **astype()** method: creates a copy of the array, amd allows you to specify the data type as a parameter.
The data type can be specified using a string, like **'f'** for float, **'i'** for integer etc. or you can use the data type directly like **float** for float and **int** for integer.

In [17]:
import numpy as np

arr = np.array([1.1, 2.1, 3.1])

#copying the array and changing data type to integer
arrcopy = arr.astype('i')

print("Original array ->", arr)
print("Copy array ->", arrcopy)

Original array -> [1.1 2.1 3.1]
Copy array -> [1 2 3]


In [18]:
import numpy as np

arr = np.array([1,0,3])

#copying the array and changing data type to boolean
newarr = arr.astype(bool)

print("Original array ->", arr)
print("Copy array ->", newarr)

Original array -> [1 0 3]
Copy array -> [ True False  True]


## **NumPy Array Copy vs View**
* The **copy** owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy
* The **View** does not own the data and changes made to the view will affect the original array, and any changes made to the original array will affect the view.

In [19]:
import numpy as np

arr = np.array([1,2,3,4,5])

#making a copy of the original array
arrcopy = arr.copy()

#making a change to the original array
arr[0] = 8

print("Original array ->", arr)
print("Copy array ->", arrcopy)

Original array -> [8 2 3 4 5]
Copy array -> [1 2 3 4 5]


In [20]:
import numpy as np

arr = np.array([1,2,3,4,5])

#making a view of the original array
arrcopy = arr.view()

#making a change to the original array
arr[0] = 8

print("Original array ->", arr)
print("View array ->", arrcopy)

Original array -> [8 2 3 4 5]
View array -> [8 2 3 4 5]


In [21]:
import numpy as np

arr = np.array([1,2,3,4,5])

#making a view of the original array
arrcopy = arr.view()

#making a change to the view array
arrcopy[0] = 9

print("Original array ->", arr)
print("View array ->", arrcopy)

Original array -> [9 2 3 4 5]
View array -> [9 2 3 4 5]


Every Numpy array has the attribute **base** that returns **None** if the array owns the data. Otherwise , the **base** attribute refers to the original object.

In [22]:
import numpy as np

arr = np.array([6,7,8,9,0])

arrcopy = arr.copy()
arrview = arr.view()

print("Copy array base ->", arrcopy.base)
print("View array base ->", arrview.base)

Copy array base -> None
View array base -> [6 7 8 9 0]


## **Numpy Array Shape**
The shape of an array is the number of elements in each dimension. **Shape** attribute returns a tuple with each index having the number of corresponding elements.

In [23]:
import numpy as np

arrone = np.array([1,2,3])
arrtwo = np.array([[1,2,3],[4,5,6]])
arrtri = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])

print("1st array shape ->", arrone.shape)
print("2nd array shape ->", arrtwo.shape)
print("3rd array shape ->", arrtri.shape)

1st array shape -> (3,)
2nd array shape -> (2, 3)
3rd array shape -> (2, 2, 3)


In [24]:
#creating an array with 5 dimensions using ndmin with values 1,2,3,4, and verify that

import numpy as np

arr = np.array([1,2,3,4], ndmin=5)

print("The original array ->", arr)
print("The shape of array ->", arr.shape)

The original array -> [[[[[1 2 3 4]]]]]
The shape of array -> (1, 1, 1, 1, 4)


## **NumPy Array Reshape**
Changing the shape of an array. By reshaping we can add or remove dimensions or change number of elements in each dimension.
We can reshape an array into any shape as long as elements are equal in both shapes.

In [25]:
#reshape from 1-D to 2-D
import numpy as np

arr = np.array([1,2,3,4,5,6])

newarr = arr.reshape(3,2)
print(newarr)

[[1 2]
 [3 4]
 [5 6]]


In [26]:
#reshape from 1-D to 3-D
import numpy as np

arr = np.array([1,2,3,4,5,6,7,8])

newarr = arr.reshape(2,2,2)
print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [27]:
#Is a reshape view or copy
import numpy as np

arr = np.array([1,2,3,4,5,6,7,8])

print(arr.reshape(2,4).base)
#it returns the original array, so it is a view

[1 2 3 4 5 6 7 8]


**Unknown dimension**<br>
Allowed to have one "unknown" dimension. Meaning that you do not have to specify an exact number for one of the dimensions in the reshape method.
Pass **-1** as the value, and NumPy will calculate this number for you.

In [28]:
import numpy as np

arr = np.array([1,2,3,4,5,6,7,8])

newarr = arr.reshape(2,2,-1)
print(newarr)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


**Flattening the arrays**<br>
Converting a multidimensional array into a 1-D array.
We can use **reshape(-1)**

In [29]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
newarr = arr.reshape(-1)

print(newarr)

[1 2 3 4 5 6 7 8 9]


## **Numpy Array Iterating**
**Iterating:** going through elements one by one.
Do this through ***for*** loop.

In [30]:
#Iterate on the elements of the 1-D array
import numpy as np

arr = np.array([1,2,3,4])

for x in arr:
    print(x)

1
2
3
4


In [31]:
#Iterate on the elements of the 2-D array
import numpy as np

arr = np.array([[1,2,3],[4,5,6]])

for x in arr:
    print(x)

[1 2 3]
[4 5 6]


In [32]:
#Iterate on each scalar elements of the 2-D array
import numpy as np

arr = np.array([[1,2,3],[4,5,6]])

for x in arr:
    for y in x:
        print(y)

1
2
3
4
5
6


In [33]:
#Iterate on each elements of the 3-D array
import numpy as np

arr = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])

for x in arr:
    print(x)

[[1 2]
 [3 4]]
[[5 6]
 [7 8]]


In [34]:
#Iterate on each scalar elements of the 3-D array
import numpy as np

arr = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])

for x in arr:
    for y in x:
        for z in y:
            print(z)

1
2
3
4
5
6
7
8


**Iterating arrays using nditer()**<br>
Function that can be used from very basic to very advanced iterations.<br>
Help in iterating on each scalar element.

In [35]:
import numpy as np

arr = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])

for x in np.nditer(arr):
    print(x)

1
2
3
4
5
6
7
8


**Iterating array with different data types**<br>
**op_dtypes** argument and pass it the expected datatype to change the datatype of elements while iterating.<br>
NumPy does not change the datatype of where the element is in array so it needs some other space to perform this action, that extra space is called ***buffer***, and in order to enable it in ***nditer()*** we pass ***flags=['buffered']***

In [36]:
#Iterate through the array as a string
import numpy as np

arr = np.array([1,2,3])

for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
    print(x)

b'1'
b'2'
b'3'


In [37]:
#Iterate through every scalar element of the 2D array skipping 1 element
import numpy as np

arr = np.array([[1,2,3],[4,5,6]])

for x in np.nditer(arr[:,::2]):
    print(x)

1
3
4
6


**Enumerated iteration using ndenumerate()**<br>
Mentioning sequence number of somethings one by one.

In [38]:
#Enumerate on 1D arrays
import numpy as np

arr = np.array([1,2,3])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0,) 1
(1,) 2
(2,) 3


In [39]:
#Enumerate on 2D array
import numpy as np

arr = np.array([[1,2,3],[4,5,6]])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(1, 0) 4
(1, 1) 5
(1, 2) 6


## **NumPy Array Join**<br>
Putting contents of two or more arrays in a single array. Arrays will be joined by axes.

In [40]:
#Joining two arrays
import numpy as np

arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

arr = np.concatenate((arr1, arr2))
print(arr)

[1 2 3 4 5 6]


In [41]:
#Joining 2D arrays along rows
import numpy as np

arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[5,6],[7,8]])

arr = np.concatenate((arr1, arr2), axis=1)
print(arr)

[[1 2 5 6]
 [3 4 7 8]]


**Joining array using stack functions**<br>
**Stacking:** same as concatenation, the only difference is that stacking is done along a new axis.<br>
We can concatenate 2 1D arrays along the second axis which would result in putting them one over the other.<br>
We pass a sequence of arrays that we want to join to the stack ***stack()*** method along with the axis. If axis is not explicitly passed it is taken as 0.

In [42]:
import numpy as np

arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

arr = np.stack((arr1, arr2), axis=1)
print(arr)

[[1 4]
 [2 5]
 [3 6]]


In [43]:
#stacking along rows
import numpy as np

arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

arr = np.hstack((arr1, arr2))
print(arr)

[1 2 3 4 5 6]


In [44]:
#stacking along columns
import numpy as np

arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

arr = np.vstack((arr1, arr2))
print(arr)

[[1 2 3]
 [4 5 6]]


In [45]:
#stacking along depth(height)
import numpy as np

arr1 = np.array([1,2,3])
arr2 = np.array([4,5,6])

arr = np.dstack((arr1, arr2))
print(arr)

[[[1 4]
  [2 5]
  [3 6]]]


## **NumPy Array Split**
This is the reverse operation of joining.<br>
Difference between **split()** and **array_split()** is that **split()** will not adjust the elements when elements are less in the source array for splitting.

In [46]:
import numpy as np

arr = np.array([1,2,3,4,5,6])

newarr = np.array_split(arr, 3)
print(newarr)

[array([1, 2]), array([3, 4]), array([5, 6])]


In [47]:
import numpy as np

arr = np.array([1,2,3,4,5,6])

newarr = np.array_split(arr, 4)
print(newarr)

[array([1, 2]), array([3, 4]), array([5]), array([6])]


In [48]:
#You can access the split arrays as array elements of the result
import numpy as np

arr = np.array([1,2,3,4,5,6])

newarr = np.array_split(arr, 3)
print('The first element ->', newarr[0])
print('The second element ->', newarr[1])
print('The third element ->', newarr[2])

The first element -> [1 2]
The second element -> [3 4]
The third element -> [5 6]


Use the same syntax when splitting 2D arrays.<br>
In addition, you can specify which axis you want split around.
Alternative you can use **hsplit()**, **vsplit()**, and **dsplit()**

In [49]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])

newarr = np.array_split(arr, 3)
print(newarr)

[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 7,  8,  9],
       [10, 11, 12]]), array([[13, 14, 15],
       [16, 17, 18]])]


In [50]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])

newarr = np.array_split(arr, 3, axis=1)
print(newarr)

[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]


In [51]:
import numpy as np

arr = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])

newarr = np.vsplit(arr, 3)
print(newarr)

[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 7,  8,  9],
       [10, 11, 12]]), array([[13, 14, 15],
       [16, 17, 18]])]


### **NumPy Array Search**
You can search an array for a certain value, and return the indexes that get a match

In [52]:
import numpy as np

arr = np.array([1,2,3,4,5,4,4])

x = np.where(arr == 4)
print(x)

(array([3, 5, 6]),)


In [53]:
#Can pass a condition in there where() function
#In this case we are finding indexes where values are odd
import numpy as np

arr = np.array([1,2,3,4,5,6,7,8])

x = np.where(arr%2 == 1)
print(x)

(array([0, 2, 4, 6]),)


**searchsorted():** performs a binary search in the array, and returns the index where the specified value would be inserted to maintain the search order.<br>
The method starts the search from the left by default and returns the first index where the given number is no longer larger the next value.<br>
It can be reversed for the right side, where the number is no longer less than the next value.

In [54]:
import numpy as np

arr = np.array([5,6,7,8,9])

x = np.searchsorted(arr, 7)
print(x)

2


In [55]:
#You can specify the right side
import numpy as np

arr = np.array([5,8,11,13,19])

x = np.searchsorted(arr, 9, side='right')
print(x)

2


**Multiple values**<br>
To search for more than one value, use an array with a specified values

In [56]:
import numpy as np

arr = np.array([1,2,3,5,7])

x = np.searchsorted(arr, [4,6])
print(x)

[3 4]


## **NumPy Array Sort**
Putting elements in an ordered sequence.<br>
*Ordered sequence* any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.<br>
This method returns a copy of the array, leaving the original unchanged.

In [57]:
import numpy as np

arr = np.array([13,4,2,7,21])
print(np.sort(arr))

[ 2  4  7 13 21]


In [58]:
#sort the array alphabetically
import numpy as np

arr = np.array(['banana', 'apple', 'cake'])
print(np.sort(arr))

['apple' 'banana' 'cake']


In [59]:
#sort a boolean array
import numpy as np

arr = np.array([True, False, True])
print(np.sort(arr))

[False  True  True]


In [60]:
#sorting 2D array
import numpy as np

arr = np.array([[5,1,4],[21,6,9]])
print(np.sort(arr))

[[ 1  4  5]
 [ 6  9 21]]


## **NumPy Array Filter**
Getting some elements out of an existing array and creating a new array out of them.<br>
In NumPy, you filter an array using a boolean index list.<br>
If the value at an index is *True* that element is contained in the filtered array, if the value at that index is *False* that element is excluded from the filtered array.

In [61]:
#filtering index 0 and 2
import numpy as np

arr = np.array([41, 42, 43, 44])
x = [True, False, True, False]

newarr = arr[x]
print(newarr)

[41 43]


In [62]:
#creating the filter array
import numpy as np

arr = np.array([41, 42, 43, 44])

#create an empty list
filter_arr = []

#go through each element in arr
for element in arr:
    #if the element is higher than 42, set the value to True, otherwise False:
    if element > 42:
        filter_arr.append(True)
    else:
        filter_arr.append(False)
        
newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, False, True, True]
[43 44]


In [63]:
#create a filter array that will return only even elements from the original array
import numpy as np

arr = np.array([12,34,15,54,19,31,97,23])

#create an empty list
filter_arr = []

#go through each element in arr
for element in arr:
    #if the element is completely divisible by 2, set value to True, otherwise False
    if element%2 == 0:
        filter_arr.append(True)
    else:
        filter_arr.append(False)
        
newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[True, True, False, True, False, False, False, False]
[12 34 54]


**Creating filter directly from array**<br>
Directly substitute the array instead of the iterable variable in condition.

In [64]:
import numpy as np

arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42
newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False False  True  True]
[43 44]


In [65]:
import numpy as np

arr = np.array([12,34,15,54,19,31,97,23])

filter_arr = arr%2 == 0

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[ True  True False  True False False False False]
[12 34 54]
