# Numpy

## Import Libraries

In [3]:
import numpy as np  # Numerical Python

In [1]:
presidents_heights = [
    189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180,
    168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191
]

how many presidents are taller than 188cm?

In [2]:
cnt = 0
for height in presidents_heights:
    if height > 188:
        cnt +=1
print(cnt)

5


numpy way:

In [5]:

heights_arr = np.array(presidents_heights)
print((heights_arr > 188).sum())

5


An array class in Numpy is called an `ndarray` or **n-dimensional array**:

In [7]:
print(type(heights_arr))

<class 'numpy.ndarray'>


## Size and Shape

In [6]:
print(heights_arr.size)

45


Attribute `size` in numpy is similar to the built-in method `len` in python that is used to compute the length of iterable python objects like str, list, dict, etc.

In [10]:
print(len(heights_arr))

45


Shape tells us the dimension:

In [9]:
print(heights_arr.shape)

(45,)


The output is a **tuple**, recall that the built-in data type tuple is immutable whereas a list is mutable, containing a single value, indicating that there is only one dimension, i.e., axis 0. Along axis 0, there are 45 elements (one for each president) Here, heights_arr is a 1d array. 

## Reshape

In [11]:
presidents_ages = [
    57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46,
    54, 49, 51, 47, 55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55,
    56, 61, 52, 69, 64, 46, 54, 47, 70
]

In [12]:
heights_and_ages = presidents_heights + presidents_ages 
# convert a list to a numpy array
heights_and_ages_arr = np.array(heights_and_ages)
print(heights_and_ages_arr.shape)

(90,)


This produces one long array. It would be clearer if we could align height and age for each president and reorganize the data into a 2 by 45 matrix where the first row contains all heights and the second row contains ages. To achieve this, a new array can be created by calling `numpy.ndarray.reshape` with new dimensions specified in a tuple:

In [21]:
heights_and_ages_arr = heights_and_ages_arr.reshape((2,45))

In [22]:
print(heights_and_ages_arr)

[[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
  174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
  182 183 177 185 188 188 182 185 191]
 [ 57  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


Numpy can calculate the shape (dimension) for us if we indicate the unknown dimension as -1. For example, given a 2darray `arr` of shape (3,4), arr.reshape(-1) would output a 1darray of shape (12,), while arr.reshape((-1,2)) would generate a 2darray of shape (6,2).

## Data Type

Another characteristic about numpy array is that it is **homogeneous**, meaning each element must be of the same data type.

In [14]:
print(heights_arr.dtype)

int64


### Type coercion

In [17]:
heights_float = [
    189.0,  # we mixed a float number in
    170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183,
    193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182,
    188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191
]
heights_float_arr = np.array(heights_float)
print(heights_float_arr)
print("\n")
print("Type of heights_float_arr:", heights_float_arr.dtype)

[189. 170. 189. 163. 183. 171. 185. 168. 173. 183. 173. 173. 175. 178.
 183. 193. 178. 173. 174. 183. 183. 180. 168. 180. 170. 178. 182. 180.
 183. 178. 182. 188. 175. 179. 183. 193. 182. 183. 177. 185. 188. 188.
 182. 185. 191.]


Type of heights_float_arr: float64


Numpy supports several data types such as `int` (integer), `float` (numeric floating point), and `bool` (boolean values, True and False). The number after the data type, ex. `int64`, represents the **bitsize** of the data type.

## Indexing

We can use array indexing to select individual elements from arrays. Like Python lists, numpy **index starts from 0**.

To access the height of the **3rd** president Thomas Jefferson in the 1darray `heights_arr`:

In [18]:
print(heights_arr[2])

189


In a **2darray**, there are two axes, axis 0 and 1. Axis 0 runs downward down the **rows** whereas axis 1 runs horizontally across the **columns**. 

In the 2darrary heights_and_ages_arr, recall that its dimensions are (2, 45). To find Thomas Jefferson’s age at the beginning of his presidency you would need to access the second row where ages are stored:

In [23]:
print(heights_and_ages_arr[1,2])

57


In 2darray, the **row is axis 0** and the **column is axis 1**, therefore, to access a 2darray, numpy first looks for the position in rows, then in columns. So in our example heights_and_ages_arr[1,2], we are accessing row 2 (ages), column 3 (third president) to find Thomas Jefferson’s age.

## Slicing

What if we want to inspect the **first three elements from the first row** in a 2darray? We use **":"** to select all the elements from the index **up to but not including the ending index**. This is called **slicing**:

In [24]:
print(heights_and_ages_arr[0, 0:3])

[189 170 189]


When the starting index is 0, we can omit it as shown below:

In [25]:
print(heights_and_ages_arr[0, :3])

[189 170 189]


What if we’d like to see **the entire fourth column**? Specify this by using a **":"** as follows:

In [27]:
print(heights_and_ages_arr[:, 3])

[163  57]


Numpy slicing syntax follows that of a python list: `arr[start:stop:step]`. When any of these are unspecified, they default to the values **start=0**, **stop=size of dimension**, **step=1**.

## Assigning Single Values

Sometimes you need to change the values of particular elements in the array. For example, we noticed the fourth entry in the heights_arr was incorrect, it should be 165 instead of 163, we can re-assign the correct number by:

In [28]:
heights_arr[3] = 165
print(heights_arr)

[189 170 189 165 183 171 185 168 173 183 173 173 175 178 183 193 178 173
 174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
 182 183 177 185 188 188 182 185 191]


In a 2darray, single values can be assigned easily. You can use indexing for one element. For example, change the fourth entry in heights_arr to 165:

In [29]:
heights_and_ages_arr[0, 3] = 165
print(heights_and_ages_arr)

[[189 170 189 165 183 171 185 168 173 183 173 173 175 178 183 193 178 173
  174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
  182 183 177 185 188 188 182 185 191]
 [ 57  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


Or we can use slicing for multiple elements. For example, to replace the first row by its mean 180 in heights_and_ages_arr:

In [30]:
heights_and_ages_arr[0,:] = 180
print(heights_and_ages_arr)

[[180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180
  180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180
  180 180 180 180 180 180 180 180 180]
 [ 57  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


We can also combine slicing to change any subset of the array. For example, to reassign 0 to the left upper corner:

In [31]:
heights_and_ages_arr[:2, :2] = 0
print(heights_and_ages_arr)

[[  0   0 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180
  180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180
  180 180 180 180 180 180 180 180 180]
 [  0   0  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


It is easy to update values in a subarray when you combine arrays with slicing. For more on basic slicing and advanced indexing in numpy check out this [link](https://numpy.org/doc/stable/reference/arrays.indexing.html).

## Assigning an Array to an Array

In addition, a 1darray or a 2darry can be assigned to a subset of another 2darray, as long as their shapes match. Recall the 2darray heights_and_ages_arr:

If we want to update both height and age of the first president with new data, we can supply the data in a list:

In [33]:

heights = [189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191]

ages = [57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47, 55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70]

heights_and_ages = heights + ages
heights_and_ages_arr = np.array(heights_and_ages)
heights_and_ages_arr = heights_and_ages_arr.reshape((2,45))

heights_and_ages_arr[:, 0] = [190, 58]
print(heights_and_ages_arr)

[[190 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
  174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
  182 183 177 185 188 188 182 185 191]
 [ 58  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


We can also update data in a subarray with a numpy array as such:

In [34]:
new_record = np.array([[180, 183, 190], [54, 50, 69]])
heights_and_ages_arr[:, 42:] = new_record
print(heights_and_ages_arr)

[[190 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
  174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
  182 183 177 185 188 188 180 183 190]
 [ 58  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  50  69]]


Note the last three columns' values have changed. 

## Combining Two Arrays

If we reshape the heights_arr to (45,1), the same as 'ages_arr', we can stack them horizontally (by column) to get a 2darray using `hstack`:

In [40]:
heights = [
    189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180,
    168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191
]
ages = [
    57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47,
    55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70
]

heights_arr = np.array(heights)
ages_arr = np.array(ages)

heights_arr = heights_arr.reshape((45, 1))
ages_arr = ages_arr.reshape((45, 1))

height_age_arr = np.hstack((heights_arr, ages_arr))
print(height_age_arr.shape)
print(height_age_arr[:10,])


(45, 2)
[[189  57]
 [170  61]
 [189  57]
 [163  57]
 [183  58]
 [171  57]
 [185  61]
 [168  54]
 [173  68]
 [183  51]]


Similarly, if we want to combine the arrays vertically (by row), we can use `vstack`.

In [37]:
heights = [
    189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180,
    168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191
]
ages = [
    57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47,
    55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70
]


heights_arr = np.array(heights)
ages_arr = np.array(ages)

heights_arr = heights_arr.reshape((1,45))
ages_arr = ages_arr.reshape((1,45))

height_age_arr = np.vstack((heights_arr, ages_arr))
print(height_age_arr.shape)
print(height_age_arr)

(2, 45)
[[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
  174 183 183 180 168 180 170 178 182 180 183 178 182 188 175 179 183 193
  182 183 177 185 188 188 182 185 191]
 [ 57  61  57  57  58  57  61  54  68  51  49  64  50  48  65  52  56  46
   54  49  51  47  55  55  54  42  51  56  55  51  54  51  60  62  43  55
   56  61  52  69  64  46  54  47  70]]


## Concatenate

More generally, we can use the function `numpy.concatenate`. If we want to concatenate, link together, two arrays **along rows**, then pass `axis = 1` to achieve the same result as using `numpy.hstack`; and pass `axis = 0` if you want to combine arrays **vertically** like `numpy.vstack`. 

In [42]:
heights = [
    189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180,
    168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191
]
ages = [
    57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47,
    55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70
]


heights_arr = np.array(heights)
ages_arr = np.array(ages)

heights_arr = heights_arr.reshape((45,1))
ages_arr = ages_arr.reshape((45,1))

# height_age_arr = np.hstack((heights_arr, ages_arr))
height_age_arr = np.concatenate((heights_arr, ages_arr), axis=1)

print(height_age_arr.shape)
print(height_age_arr[:10,])

(45, 2)
[[189  57]
 [170  61]
 [189  57]
 [163  57]
 [183  58]
 [171  57]
 [185  61]
 [168  54]
 [173  68]
 [183  51]]


## Operations

### Mathematical Operations on Arrays

Performing mathematical operations on arrays is straightforward. For instance, to convert the heights from centimeters to feet, knowing that 1 centimeter is equal to 0.0328084 feet, we can use multiplication:

In [43]:
print(height_age_arr[:,0]*0.0328084)

[6.2007876 5.577428  6.2007876 5.3477692 6.0039372 5.6102364 6.069554
 5.5118112 5.6758532 6.0039372 5.6758532 5.6758532 5.74147   5.8398952
 6.0039372 6.3320212 5.8398952 5.6758532 5.7086616 6.0039372 6.0039372
 5.905512  5.5118112 5.905512  5.577428  5.8398952 5.9711288 5.905512
 6.0039372 5.8398952 5.9711288 6.1679792 5.74147   5.8727036 6.0039372
 6.3320212 5.9711288 6.0039372 5.8070868 6.069554  6.1679792 6.1679792
 5.9711288 6.069554  6.2664044]


Other mathematical operations for **addition**, **subtraction**, **division** and **power** (+, -, /, **) work the same way on arrays.

### Numpy Array Methods

In addition, there are several methods in numpy to perform more complex calculations on arrays. For example, the sum() method finds the sum of all the elements in an array:

In [44]:
heights = [
    189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180,
    168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191
]
ages = [
    57, 61, 57, 57, 58, 57, 61, 54, 68, 51, 49, 64, 50, 48, 65, 52, 56, 46, 54, 49, 51, 47,
    55, 55, 54, 42, 51, 56, 55, 51, 54, 51, 60, 62, 43, 55, 56, 61, 52, 69, 64, 46, 54, 47, 70
]

heights_arr = heights_arr.reshape((45,1))
height_age_arr = np.hstack((heights_arr, ages_arr))

print(height_age_arr.sum())

10575


The sum of all heights and ages is 10575. **In order to sum all heights and sum all ages separately, we can specify axis=0 to calculate the sum across the rows**, that is, it computes the sum for each column, or column sum. On the other hand, to obtain the row sums specify axis=1. In this example, we want to calculate the total sum of heights and ages, respectively:

In [45]:
print(height_age_arr.sum(axis=0))

[8100 2475]


Other operations, such as `.min()`, `.max()`, `.mean()`, work in a similar way to `.sum()`.

## Comparisons

We can use operations including `<`, `>`, `>=`, `<=`, and `==` to do so. For example, in the height_age_arr dataset, we might be interested in only those presidents who started their presidency younger than 55 years old.

In [46]:
print(height_age_arr[:, 1] < 55)

[False False False False False False False  True False  True  True False
  True  True False  True False  True  True  True  True  True False False
  True  True  True False False  True  True  True False False  True False
 False False  True False False  True  True  True False]


To find out **how many rows satisfy the condition**, use `.sum`() on the resultant 1d boolean array, e.g., (height_age_arr[:, 1] == 51).sum(), to see that there were exactly five presidents who started the presidency at age 51. **True is treated as 1 and False as 0 in the sum**.

In [47]:
(height_age_arr[:, 1] == 51).sum()

5