# Numpy

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Why-numpy" data-toc-modified-id="Why-numpy-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Why numpy</a></span></li><li><span><a href="#Creating-arrays" data-toc-modified-id="Creating-arrays-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Creating arrays</a></span><ul class="toc-item"><li><span><a href="#Custom" data-toc-modified-id="Custom-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Custom</a></span><ul class="toc-item"><li><span><a href="#1-dimensional" data-toc-modified-id="1-dimensional-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>1 dimensional</a></span></li><li><span><a href="#2-dimensional" data-toc-modified-id="2-dimensional-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>2 dimensional</a></span></li><li><span><a href="#3-dimensional" data-toc-modified-id="3-dimensional-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>3 dimensional</a></span></li></ul></li><li><span><a href="#Built-in" data-toc-modified-id="Built-in-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Built-in</a></span><ul class="toc-item"><li><span><a href="#arange" data-toc-modified-id="arange-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span><code>arange</code></a></span></li><li><span><a href="#linspace" data-toc-modified-id="linspace-2.2.2"><span class="toc-item-num">2.2.2&nbsp;&nbsp;</span><code>linspace</code></a></span></li></ul></li></ul></li><li><span><a href="#Properties" data-toc-modified-id="Properties-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Properties</a></span></li><li><span><a href="#Slicing" data-toc-modified-id="Slicing-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Slicing</a></span><ul class="toc-item"><li><span><a href="#Conditional-slicing" data-toc-modified-id="Conditional-slicing-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Conditional slicing</a></span></li></ul></li><li><span><a href="#Useful-array-methods" data-toc-modified-id="Useful-array-methods-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Useful array methods</a></span></li><li><span><a href="#Broadcasting" data-toc-modified-id="Broadcasting-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Broadcasting</a></span></li><li><span><a href="#Further-materials" data-toc-modified-id="Further-materials-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Further materials</a></span></li></ul></div>

In [1]:
import numpy as np
import pandas as pd

Convention for this notebook:
 - `a` will always be a 1-dimensional array.
 - `b` will always be a 2-dimensional array.
 - `c` will always be a 3-dimensional array.

## Why numpy

NumPy (numerical Python) is used to do numerical computations **efficiently** in Python

In [None]:
lst = [1, 2, 3, 4, 5]

In [None]:
type(lst)

I want to multiply all elements by `2`

In [None]:
lst * 2

Did not work as expected

In [None]:
a = np.array(lst)

In [None]:
type(a)

In [None]:
lst

In [None]:
a

In [None]:
a * 2

As expected

In [None]:
a + 100

In [None]:
our_money = [100, 240, 1100, 85]

In [None]:
our_money = np.array(our_money)

I am given 2% compound interest yearly. I have 240$. How much will I have in 3 years time?

In [None]:
240 * (1.02) ** 3

Lets build a function

In [None]:
def my_money(initial_money, years, perc_interest, r=0):
    factor = 1 + perc_interest / 100
    
    return round(initial_money * factor ** years, r)

In [None]:
our_money

In [None]:
(our_money * 1.02 ** 3).round(0)

Takes less time to do stuff, C programming language optimized

In [None]:
lst2 = list(range(1_000_000))

In [None]:
type(lst2)

In [None]:
lst2[:20]

In [None]:
%%timeit
list(map(lambda x: x * 2, lst2))

In [None]:
%%timeit
doubles = [n * 2 for n in lst2]

In [None]:
arr2

In [None]:
type(arr2)

In [None]:
%%timeit
doubles = arr2 * 2

## Creating arrays

### Custom

#### 1 dimensional

In [None]:
a = np.array([1, 2, 3, 11])

In [None]:
type(a)

In [None]:
a

In [None]:
a.shape

In [None]:
a.ndim

In [None]:
a[2]

#### 2 dimensional

In [None]:
b = np.array([[1, 2, 3, 11], [4, 5, 6, 23], [5, 5, 5, 5]])

In [None]:
b

In [None]:
b.shape

Meaning 3 rows, 4 columns.

In [None]:
b.ndim

#### 3 dimensional

In [None]:
c = np.array([
    [[55, 66, 3], [40, 90, 3]],
    [[10, 10, 3], [10, 11, 3]],
    [[8, 9, 354], [6, 75, 34]],
    [[2, 3, 443], [3, 4, 199]]
])

In [None]:
c

In [None]:
c.shape

Example 3 dimensional arrays: RGB images

Loading an image

In [None]:
import cv2

In [None]:
import matplotlib.pyplot as plt

In [None]:
img = cv2.imread("capit.jpeg")

In [None]:
type(img)

In [None]:
# to gray
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

In [None]:
type(img_gray)

In [None]:
# show image using plt library
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

In [None]:
plt.imshow(img_gray, cmap="gray")

White is 255  
Black is 0

In [None]:
# we can zoom an image
plt.imshow(img_gray[70:120, 120:170], cmap="gray")

In [None]:
img_gray[100:110, 150:160]

In [None]:
img[100, 150]

In [None]:
img_gray.shape

In [None]:
plt.imshow(img[:, 20:-20])

In [None]:
plt.imshow(img[:, 40:])

### Built-in

In [None]:
# generate array full of zeros
np.zeros((3, 5))

In [None]:
np.zeros((3, 5, 2))

In [None]:
np.zeros((3, 5), dtype=int)

In [None]:
# generate array full of ones
np.ones((2, 3))

In [None]:
np.ones((2, 3)) * 4

In [None]:
np.eye(4)

In [None]:
# generate numbers from 0 to 1 randomly
np.random.random(10)

In [None]:
# generate random image gray 10x10
img_random = np.random.randint(low=0, high=255, size=(10, 10))

In [None]:
img_random

In [None]:
plt.imshow(img_random, cmap="gray")

In [None]:
# generate random color image 10x10
img_random_col = np.random.randint(low=0, high=255, size=(10, 10, 3))

In [None]:
plt.imshow(img_random)

In [None]:
# generate numbers taken from a normal distribution
np.random.randn(5)

In [None]:
# generate numbers taken from a normal distribution in a 3x3 array
np.random.randn(3, 3)

In [None]:
# generate 100 random integers from 1 to 6 (simulate throwing a die)
np.random.randint(1, 7, 100)

#### `arange`

Similar to `range`

In [None]:
# numbers from 0 to 20 with step of 3
np.arange(0, 20, 3)

#### `linspace`

In [None]:
# 9 numbers between 0 and 2 equispaced
np.linspace(0, 2, 9)

In [None]:
np.linspace(0, 1, 3)

In [None]:
np.linspace(0, 4, 5)

In [None]:
np.linspace(0, 4, 6)

## Properties

In [None]:
b = np.random.randint(0, 100, (3, 4))

In [None]:
b

In [None]:
b.shape

In [None]:
b.size

In [None]:
b.ndim

In [None]:
c

In [None]:
c.shape

In [None]:
c.ndim

In [None]:
b.dtype

## Slicing

In [None]:
a = np.array([4, 6, 88, 100, 244, 444])

In [None]:
a[1:3]

In [None]:
a[1:]

In [None]:
a[:-2]

In [None]:
b = np.random.randint(0, 100, (3, 4))

In [None]:
b

In [None]:
b.shape

We access array's entries with syntax `b[rows, cols]`

In [None]:
b[0, 2]

In [None]:
b[2, 1]

In [None]:
b[0, :]

In [None]:
b[:, 1]

In [None]:
b[:, 1:3]

In [None]:
b

In [None]:
b[:2, :-1]

In [None]:
b[1:, 0:2]

In [None]:
b[:,-1]

In [None]:
b[1:3, 0:2].shape

### Conditional slicing

In [None]:
a = np.random.randint(0, 100, 9)

In [None]:
a

In [None]:
a.shape

In [None]:
a + 1

In [None]:
a > 50

In [None]:
a % 3

In [None]:
a % 3 == 0

`&` is vectorized `and` for numpy arrays  
`|` is vectorized `or` for numpy arrays  

In [None]:
a

In [None]:
a > 50

In [None]:
a % 3 == 0

In [None]:
(a > 50) & (a % 3 == 0)

In [None]:
(a > 50) | (a % 3 == 0)

In [None]:
a

In [None]:
a > 50

In [None]:
(a > 50).sum()

In [None]:
a

In [None]:
a[2:5]

In [None]:
a[a > 50]

In [None]:
a[a > 60]

In [None]:
a

In [None]:
a[a % 5 == 0]

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame({"names": ["paola", "borja", "ana", "angela"], "ages": [80, 82, 77, 54], "has_children": [True, True, False, True]})

In [None]:
df

In [None]:
df.ages

In [None]:
df.ages + 10

In [None]:
df.ages > 78

`df[subset_condition]` returns the subdataframe that fulfills that condition

In [None]:
df

In [None]:
df[(df.ages > 60) & (df.has_children)]

In [None]:
df[(df.ages > 60) | (df.has_children)]

In [None]:
import seaborn as sns

In [None]:
df[df.has_children]

In [None]:
df

In [None]:
df[df.names.str.contains("o")]

**Exercise**: how many families have more than 2 children?

In [None]:
childrens = np.random.randint(0, 5, 100)

In [None]:
childrens.shape

In [None]:
childrens

In [None]:
childrens > 2

In [None]:
(childrens > 2).sum()

A bit of pandas

In [None]:
df = pd.read_csv("../datasets/pokemon.csv")

In [None]:
df.shape

In [None]:
df.head()

In [None]:
df.sample(5)

In [None]:
df["Type 1"].value_counts()

In [None]:
df.head()

In [None]:
df_legend = df.loc[df.Legendary, ["Name", "Attack", "Defense"]]

In [None]:
df_legend["Quot_attack_defense"] = (df_legend.Attack / df_legend.Defense).round(2)

In [None]:
df_legend.head()

In [None]:
df_legend.sort_values("Quot_attack_defense", ascending=False)

Reminder: use `&` for "and" and `|` for "or" in numpy

## Useful array methods

In [None]:
a = np.random.randint(0, 1000, 30)

In [None]:
a

In [None]:
type(a)

In [None]:
a.max()

In [None]:
a.min()

In [None]:
a.sum()

In [None]:
a.mean()

In [None]:
# standard deviation
a.std()

`dir(object)` shows all possible methods that can be applied to the object

In [None]:
[method for method in dir(a) if "__" not in method][:20]

In [None]:
b = np.random.randint(0, 1000, (4, 5))

In [None]:
b

In [None]:
b.max()

In [None]:
b.max(axis=0)

In [None]:
b.max(axis=1)

In [None]:
b.mean()

In [None]:
b.mean(axis=0)

In [None]:
b.mean(axis=1)

In [None]:
a = np.random.random(20) * 5

In [None]:
a

In [None]:
a.round(2)

In [None]:
b

In [None]:
b.shape

In [None]:
# turn to 1D array
b.flatten()

In [None]:
a

In [None]:
a.shape

In [None]:
a.reshape((5, 4))

In [None]:
a.reshape((5, 5))

In [None]:
b

In [None]:
b.shape

In [None]:
# "girada"
b.transpose()

In [None]:
b.transpose().shape

In [None]:
# equiv
b.T

In [None]:
b = np.random.randint(0, 100, (3, 3))

In [None]:
b

In [None]:
b / 2

In [None]:
1 / b

In [None]:
# inverse of a matrix
b_inv = np.linalg.inv(b)

In [None]:
# matrix or vector multiplication
np.dot(b, b_inv).round(3)

In [None]:
b = np.random.randint(0, 10, (3, 3))

In [None]:
b

In [None]:
b ** 2

In [None]:
2 ** b

In [None]:
b

In [None]:
b == 3

In [None]:
b < 10

In [None]:
(b < 10).all()

In [None]:
(b < 6).all()

In [None]:
(b < 6).any()

In [None]:
(b < 0).any()

In [None]:
b = np.random.randint(0, 10, (5, 5))

In [None]:
b2 = np.random.randint(0, 10, (5, 5))

In [None]:
b

In [None]:
b2

In [None]:
b2 < 5

In [None]:
# what coordinates match a condition?
np.argwhere(b2 < 5)

In [None]:
b == 3

In [None]:
np.argwhere(b == b2)

## Broadcasting

Example

In [None]:
df.head()

In [None]:
dff = df[["Defense", "Attack", "Speed"]]

How many Pokemons have the 3 properties under the mean?

In [None]:
dff.head()

In [None]:
dff.mean()

In [None]:
dff.shape

In [None]:
dff.mean().shape

In [None]:
# deviations
devs = dff - dff.mean()

In [None]:
devs.head()

In [None]:
(devs < 0).all(axis=1).sum()

Obtained from [link](https://cs231n.github.io/python-numpy-tutorial/#numpy)

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. 

Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

For example, suppose that we want to add a constant vector to each row of a matrix.

1. We can do it *manually*, NOT making use of broadcasting

In [None]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

think of rows as shirt, hat, trousers, boots  
think of columns as Monday Tuesday Wednesday

In [None]:
x

In [None]:
v = np.array([1, 0, 1])

In [None]:
v

In [None]:
# create array same dimension as x
y = np.zeros(x.shape)   

In [None]:
y

In [None]:
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    print(i)
    y[i, :] = x[i, :] + v

In [None]:
x

In [None]:
y

2. With the use of broadcasting

In [None]:
x

In [None]:
v

In [None]:
# add v to each row of x using broadcasting
y2 = x + v  

In [None]:
y2

In [None]:
y == y2

In [None]:
(y == y2).all()

In [None]:
x

In [None]:
x.shape

In [None]:
v.shape

In [None]:
v2 = np.array([1, 2, 3, 4])

In [None]:
v2

In [None]:
v2.shape

In [None]:
x + v2

In [None]:
v3 = v2.reshape(4, 1)
v3

In [None]:
x + v3

**Example**: how much do products prices deviate from the mean?

In [None]:
x

In [None]:
# mean by day
x.mean(axis=0)

In [None]:
x - x.mean(axis=0)

3 dimensions: 2 shops, 4 products, 3 days

In [None]:
c = np.random.randint(0, 100, (2, 4, 3))

In [None]:
c

In [None]:
c.mean(axis=0)

In [None]:
c.mean(axis=1)

In [None]:
c.mean(axis=2)

## Further materials

[NumPy Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)