# Numpy

In [1]:
import numpy as np

## Vectorized Operations

Doing addition with every single list element requires a loop or a list comprehension:

In [2]:
[1, 2, 3] + 2

TypeError: can only concatenate list (not "int") to list

Numpy can make this, and almost any other operation easier (**vectorized**):

In [4]:
a = np.array([1, 2, 3, 4, 5])

In [12]:
a

array([3, 4, 5, 6, 7])

In [11]:
a += 2

In [13]:
a - 10

array([-7, -6, -5, -4, -3])

In [14]:
a * 2

array([ 6,  8, 10, 12, 14])

In [15]:
a / 20

array([0.15, 0.2 , 0.25, 0.3 , 0.35])

In [16]:
a

array([3, 4, 5, 6, 7])

In [17]:
a == 5

array([False, False,  True, False, False])

In [18]:
a < 5

array([ True,  True, False, False, False])

In [19]:
a >= 6

array([False, False, False,  True,  True])

## Indexing

In [20]:
a

array([3, 4, 5, 6, 7])

In [21]:
a[0]

3

In [22]:
a[:3]

array([3, 4, 5])

Comparison operators produce boolean lists, or *masks*. These masks can be used to index into arrays to extract only the elements where the mask is true.

Demo:

1. Create a boolean mask
2. Index into the array
3. Negate the mask

In [25]:
mask = [True, True, True, False, False]
a[mask]

array([3, 4, 5])

In [26]:
a

array([3, 4, 5, 6, 7])

In [27]:
a < 5

array([ True,  True, False, False, False])

In [28]:
a[a < 5]

array([3, 4])

In [29]:
a[a > 5]

array([6, 7])

In [32]:
a[a % 2 == 1]

array([3, 5, 7])

In [31]:
odd_number_filter = a % 2 == 1
a[odd_number_filter]

array([3, 5, 7])

In [34]:
# a <= 3 or a > 6
a[(a <= 3) | (a > 6)]

array([3, 7])

For comparing numpy arrays, you **cannot** use `and` / `or` / `not`.

Instead use:

- `&`: and
- `|`: or
- `~`: not

In [35]:
elements_less_than_3 = a <= 3
greater_than_6 = a > 6

In [36]:
elements_less_than_3

array([ True, False, False, False, False])

In [37]:
greater_than_6

array([False, False, False, False,  True])

In [38]:
elements_less_than_3 | greater_than_6

array([ True, False, False, False,  True])

In [39]:
~ elements_less_than_3

array([False,  True,  True,  True,  True])

In [42]:
a[elements_less_than_3 | greater_than_6]

array([3, 7])

In [44]:
a[(a <= 3) | (a > 6)]

array([3, 7])

## Methods

- `min`
- `max`
- `sum`
- `std`
- `any`, `all`

In [45]:
a

array([3, 4, 5, 6, 7])

In [46]:
a.min()

3

In [47]:
a.max()

7

In [48]:
a.sum()

25

In [49]:
a.std()

1.4142135623730951

- `.all` -- every single element is `True`
- `.any` -- at least one element is `True`

In [52]:
# Are all the elements in a less than 10?
(a < 10).all()

True

In [54]:
# Are there any negative numbers?
(a < 0).any()

False

## More Examples

In [55]:
# Generate 1000 random numbers between 1 and 100
np.random.seed(123)
a = np.random.randint(1, 101, 1000)

In [56]:
a

array([ 67,  93,  99,  18,  84,  58,  87,  98,  97,  48,  74,  33,  47,
        97,  26,  84,  79,  37,  97,  81,  69,  50,  56,  68,   3,  85,
        40,  67,  85,  48,  62,  49,   8, 100,  93,  53,  98,  86,  95,
        28,  35,  98,  77,  41,   4,  70,  65,  76,  35,  59,  11,  23,
        78,  19,  16,  28,  31,  53,  71,  27,  81,   7,  15,  76,  55,
        72,   2,  44,  59,  56,  26,  51,  85,  57,  50,  13,  19,  82,
         2,  52,  45,  49,  57,  92,  50,  87,   4,  68,  12,  22,  90,
        99,   4,  12,   4,  95,   7,  10,  88,  15,  84,  71,  13,  55,
        28,  39,  18,  62,  75, 100, 100,  66,  48,  17,   6,  87,  47,
        16,  60,  41,  26,  46,  50,   1,  36,  30,   2,  84,  69,  31,
         8,  94,  61,  66,  77,  68,  45,  52,   8,  89,  71,  14,  29,
        64,  85,  37,  97,  41,  89,  64,  59,  78,   9,  79,   7,  66,
        95,  71,  41,  75,  77,  77,  26,   8,  14,  45,   2,  42,  79,
        57,  88,  64,  98,   4,  18,  89,  88,  70,  98,  51,   

We can use numpy to answer some questions:

In [57]:
# 1. How many data points are there?
a.shape

(1000,)

In [61]:
# 2. How many data points are greater than 70? (.shape + .sum)
a[a > 70].shape

(311,)

In [66]:
# 2. How many data points are greater than 70?
(a > 70).sum()

311

In [68]:
# 3. What is the sum of the odd numbers?
a[a % 2 == 1].sum()

23401

In [78]:
a[a < 10].shape

(91,)

In [72]:
# 4. Take all the numbers between 30 and 80 (inclusive), square them, what is the highest resulting number?
(a[(a >= 30) & (a <= 80)] ** 2).max()

6400

In [79]:
# 4. Take all the numbers between 30 and 80 (inclusive), square them, what is the highest resulting number?
more_than_30 = a >= 30
less_than_80 = a <= 80

in_our_desired_range = more_than_30 & less_than_80

desired_numbers = a[in_our_desired_range]
desired_numbers_squared = desired_numbers ** 2

desired_numbers_squared.max()

6400

`np.where` will produce values conditionally, based on a boolean array

In [80]:
# 5. Square the odd numbers in the array. What is the average of the resulting data set? (np.where)
odd_numbers_squared = np.where(a % 2 == 1, a ** 2, a)

odd_numbers_squared.mean()

1595.057

In [83]:
# 6. Square the even numbers in the array. Remove any odd number less than 40.
#    Double odd numbers greater than 80. What is the sum of the resulting dataset?
evens_squared = np.where(a % 2 == 0, a ** 2, a)
x = evens_squared[(evens_squared % 2 == 1) & (evens_squared < 40)]
x = np.where(x % 2 == 1, x * 2, x)
x.sum()

7740

## Matrices

- construction
- manipulating
- axis=

In [None]:
m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]