# Learning NumPy

*NumPy* is a popular Python library that supports large, multi dimensional arrays. *NumPy* stands for 'Numerical Python', and it is commonly used when working with large datasets. 

while *NumPy* arrays are similar to standard lists in python, these arrays are much faster when dealing with data of a larger scale. (almost 100x faster).

In [1]:
import numpy as np

### Introduction

A *NumPy* array may be created using the `np.array()` method.

In [2]:
# an array depicting number of notifications received each day for a week.
notifications = np.array([157, 270, 210, 269, 137, 205, 225])

print(notifications)
print(notifications.shape) # the shape property may be used to find the shape of the array, ie. the number of rows and columns.

[157 270 210 269 137 205 225]
(7,)


*NumPy* arrays may be sliced just like lists.

In [3]:
print(notifications[0:2])
print(notifications[4:])
print(notifications[:3])
print(notifications[-2])

[157 270]
[137 205 225]
[157 270 210]
205


### Multi-Dimensional Arrays

One can also create multidimensional arrays in *NumPy*. 

For example, consider a use case of 3 egg cartons. A machine has assessed the eggs in each carton and predicted how far the eggs are rotten. (1-fresh, 0-completely rotten).

In [4]:
egg_carton1 = np.array([
  [0.89, 0.90, 0.83, 0.89, 0.97, 0.98], 
  [0.95, 0.95, 0.89, 0.95, 0.23, 0.99]
])

egg_carton2 = np.array([
  [0.89, 0.95, 0.84, 0.92, 0.94, 0.93], 
  [0.93, 0.95, 0.02, 0.03, 0.23, 0.99]
])

egg_carton3 = np.array([
  [0.83, 0.95, 0.89, 0.54, 0.37, 0.92], 
  [0.98, 0.99, 0.19, 0.23, 0.89, 0.91]
])

print(egg_carton1)

[[0.89 0.9  0.83 0.89 0.97 0.98]
 [0.95 0.95 0.89 0.95 0.23 0.99]]


Now we can check which carton is the safest buy, by calculating the average freshness of each carton. This is done using `np.average()` method.

In [5]:
carton1=np.average(egg_carton1)
carton2=np.average(egg_carton2)
carton3=np.average(egg_carton3)

print(carton1)
print(carton2)
print(carton3)

0.8683333333333333
0.7183333333333333
0.7241666666666666


From the above output, we can see that carton 1 contains the most number of fresh eggs.

### Operators and Functions

There are various different operators in *NumPy*. the basic operators are:
+ addition 
+ subtraction
+ multiplication
+ division
+ modulo

these operators can be applied to all the elements of the *NumPy* array at once.

In [6]:
arr=np.array([1, 2, 3, 4, 5, 6])

print(arr+6)
print(arr*3)

[ 7  8  9 10 11 12]
[ 3  6  9 12 15 18]


*NumPy* also contains many math functions. some of the basic ones are:
+ np.min()
+ np.max()
+ np.sum()
+ np.average() or np.mean()

By default, these functions apply to the entire array. But, one may use the *axis* parameter to apply the functions row or column wise.
+ *axis=0* means each column produces a separate result
+ *axis=1* means each row produces a separate result

In [7]:
# declare a 2D array
mat=np.array([[1, 2, 3], [3, 4, 5]]) 

# Operation performed on the entire array
print(np.min(mat))
# Operation performed on each column
print(np.sum(mat, axis=0))
# Operation performed on each row
print(np.average(mat, axis=1))

1
[4 6 8]
[2. 4.]


The *reshape()* function may be used to change the shape of an array.

In [8]:
print(mat) # old shape
mat2=mat.reshape(3, 2) # changinf the shape of the array
print(mat2) # new shape

[[1 2 3]
 [3 4 5]]
[[1 2]
 [3 3]
 [4 5]]


the np.arange() works like the range() function, in that it creates an array of elements within a specified range. the parameters are:
+ start: where the range starts (the stratinh element)
+ stop: where the range stops (non-inclusive)
+ step: by how much each subsequent element in the array should increase/decrease

In [9]:
arr1=np.arange(2, 10, 2)
print(arr1)

arr2=np.arange(10, 1, -2)
print(arr2)

[2 4 6 8]
[10  8  6  4  2]


### Copy vs View

+ Use the `.copy()` method to create a copy of a *NumPy* array, and
+ Use the `.view()` method to create a view.

In [10]:
# declare an array
arr=np.array([1, 2, 3, 4, 5, 6])

# create a copy
arr2=arr.copy()
print(arr2)

# create a view
arr3=arr.view()
print(arr3)

[1 2 3 4 5 6]
[1 2 3 4 5 6]


These both produce the same result. However the difference is that:

+ A *copy* creates a new, independent array from an existing one. Any change made to the original will not be reflected in the new one, and vice-versa.
+ A *view* simply displays the array. Hence any change made to the array will be reflected in the view, and vice-versa.

In [11]:
# make a change to arr
arr[0]=10
print(arr)

# arr2 is a copy of arr, so it remains the same
print(arr2)

# arr3 is a view of arr, so it changes
print(arr3)

[10  2  3  4  5  6]
[1 2 3 4 5 6]
[10  2  3  4  5  6]


In [12]:
# similarly, make a change in arr2 (the copy)
arr2[1]=20

# arr remains the same
print(arr)

# make a change in arr3 (the view)
arr3[2]=30

# arr changes
print(arr)

[10  2  3  4  5  6]
[10  2 30  4  5  6]


### Iteration

You can iterate over the *NumPy* array just like in a normal list.

In [13]:
for i in arr:
    print(i)

10
2
30
4
5
6


This approach becomes difficult with higher dimensional arrays, as the number of nested loops becomes too much. For this reason, we use the `np.nditer()` method.

In [14]:
# 2D array
arr=np.array([[1, 2, 3], [4, 5, 6]])

# using for loops
for i in arr:
    for j in i:
        print(j, end=" ")

print()

# using np.nditer()
for i in np.nditer(arr):
    print(i, end=" ")

1 2 3 4 5 6 
1 2 3 4 5 6 

The datatype of the element may be changed while iterating. *NumPy* does not perform this conversion in place, therefore it requires a buffer space. this is provided in the parameter 
`flags=['buffered]`.

In [15]:
for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
  print(x)

b'1'
b'2'
b'3'
b'4'
b'5'
b'6'


The step size may be changed while iterating.


In [16]:
for x in np.nditer(arr[:, ::2]): # iterate over every other element in the array
    print(x)

1
3
4
6


If you want to display the index during iteration, use the `np.ndenumerate()` method.

In [17]:
for i, x in np.ndenumerate(arr):
    print(i, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(1, 0) 4
(1, 1) 5
(1, 2) 6


### Exercise

Given below is the data of 50 passengers from the Titanic dataset. Each row of the array contains 4 values:
1. passenger id
2. survived? (1-yes, 0-no)
3. passenger class (1-upper, 2-middle, 3-lower)
4. age

Analyse the data and find the following information:

1. What is the shape of this array?
2. What is the average age of the passengers?
3. What is the passenger number of the oldest passenger? Who is the youngest?
4. What is the percentage of folks that survived?
5. Percentage of people that survived, based on their passenger class

In [18]:
passengers = np.array([
   [1, 0, 3, 22],
   [2, 1, 1, 38],
   [3, 1, 3, 26],
   [4, 1, 1, 35],
   [5, 0, 3, 35],
   [6, 0, 3, 18],
   [7, 0, 1, 54],
   [8, 0, 3, 2],
   [9, 1, 3, 27],
  [10, 1, 2, 14],
  [11, 1, 3, 4],
  [12, 1, 1, 58],
  [13, 0, 3, 20],
  [14, 0, 3, 39],
  [15, 0, 3, 14],
  [16, 1, 2, 55],
  [17, 0, 3, 2],
  [18, 1, 2, 12],
  [19, 0, 3, 31],
  [20, 1, 3, 8],
  [21, 0, 2, 35],
  [22, 1, 2, 34],
  [23, 1, 3, 15],
  [24, 1, 1, 28],
  [25, 0, 3, 8],
  [26, 1, 3, 38],
  [27, 0, 3, 2],
  [28, 0, 1, 1],
  [29, 1, 3, 5],
  [30, 0, 3, 18],
  [31, 0, 1, 40],
  [32, 1, 1, 70],
  [33, 1, 3, 33],
  [34, 0, 2, 66],
  [35, 0, 1, 28],
  [36, 0, 1, 42],
  [37, 1, 3, 5],
  [38, 0, 3, 18],
  [39, 0, 3, 18],
  [40, 1, 3, 14],
  [41, 0, 3, 40],
  [42, 0, 2, 27],
  [43, 0, 3, 29],
  [44, 1, 2, 0],
  [45, 1, 3, 19],
  [46, 0, 3, 33],
  [47, 0, 3, 14],
  [48, 1, 3, 22],
  [49, 0, 3, 41],
  [50, 0, 3, 18]
])

In [19]:
# 1. shape of the array
print(passengers.shape)

# 2. average age of the passengers
print(np.average(passengers[:, 3]))

# 3. oldest passenger
maxage = np.max(passengers[:, 3])
oldest=passengers[passengers[:, 3] == maxage]
print(oldest[0][2])

# youngest passenger
minage = np.min(passengers[:, 3])
youngest=passengers[passengers[:, 3] == minage]
print(youngest[0][2])


(50, 4)
25.5
1
2


In [20]:
# 4. percentage of passengers that survived
survived = passengers[passengers[:, 1] == 1]
survival_rate = (survived.shape[0]/passengers.shape[0])*100
print(survival_rate)

44.0


In [21]:
# 5. percentage of passengers that survived based on their passenger class
first_class = passengers[passengers[:, 2] == 1]
second_class = passengers[passengers[:, 2] == 2]
third_class = passengers[passengers[:, 2] == 3]

first_class_survived = first_class[first_class[:, 1] == 1]
second_class_survived = second_class[second_class[:, 1] == 1]
third_class_survived = third_class[third_class[:, 1] == 1]

first_class_survival_rate = (first_class_survived.shape[0]/first_class.shape[0])*100
second_class_survival_rate = (second_class_survived.shape[0]/second_class.shape[0])*100
third_class_survival_rate = (third_class_survived.shape[0]/third_class.shape[0])*100

print(first_class_survival_rate)
print(second_class_survival_rate)
print(third_class_survival_rate)

50.0
62.5
37.5


### Sources

+ *Codedex* - https://www.codedex.io/numpy
+ *w3schools* - https://www.w3schools.com/python/numpy/default.asp