# Numpy'ufuncs
Universal Funcstions (ufuncs) they operate element-wise on arrarys

In [1]:
# element-wise operations with python list
# make list b contain each element in list a + 5
a = [1,3,2,4,3,1,4,2]
b = [val + 5 for val in a]
print(b)

[6, 8, 7, 9, 8, 6, 9, 7]


In [2]:
# same actions with numpy array
import numpy as np
a = np.array(a)

b = a + 5 #element-wise
print(a)
print(a.shape) #array (col vector) with 8 rows
print(b)

[1 3 2 4 3 1 4 2]
(8,)
[6 8 7 9 8 6 9 7]


## ufuncs are fast

In [3]:
a = list(range(100_000))
%timeit [val+5 for val in a]

8.43 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [4]:
a = np.array(a)
%timeit a+5

60.9 µs ± 4.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


There are many ufuncs available in numpy:
- Arithmetic Operatiors:$ + - * / // % **$
- Bitwise Operators: $& | ~ ^ >> <<$
- Compearison Oper's: $< > >= <= == !=$
- Trig Family: $np.sin, np.cos, np.tan ...$
- Exponential Family: $np.exp, np.log, np.log10 ...$
- Special Functions: $scipy.special.*$

... and many, many more.

# Numpy's agrregations 

Aggeregations are fuctions which summarize the values in an array(e.g min, max, sum, mean etc.)

In [5]:
from random import random
c = [random() for i in range(100_000)]
%timeit min(c)

1.31 ms ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [12]:
c = np.array(c)
%timeit c.min()

37 µs ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


#### Numpy aggregations also work on multi-dimensional arrays

In [6]:
M = np.random.randint(0,10,(3,5))
print(M)

[[4 2 3 6 5]
 [5 1 1 6 2]
 [5 1 0 0 3]]


In [14]:
M.sum(axis=0)

array([13, 16, 19, 11, 12])

In [15]:
M.sum(axis=1)

array([21, 32, 18])

Lots of aggregations available ...

# Numpy's broadcasting

Broadcasting is a set of rules by which ufuncs operate on arrays of different sizes and/or dimensions

# Numpy's slicing, masking, and fancy indexing

In [16]:
L = [2,3,5,7,11]
L[0] # integer index
L[1:3] #slice for multiple elements

[3, 5]

#### Numpy arrays are similar...

In [10]:
L = [2,3,5,7,11]
L = np.array(L)
print(L)
print(L[0])
print(L[1:3])

[ 2  3  5  7 11]
2
[3 5]


#### Numpy offers other fasat and convenient indexing options as well

- **Masking**: indexing with boolean masks 

In [11]:
L = np.array(L)
print(L)

# A mask is a boolean array:
mask = np.array([False,True,True,False,True])
L[mask]

[ 2  3  5  7 11]


array([ 3,  5, 11])

**Masks** are often constructed using **comparsion** operators and **boolean** logic. 

In [30]:
L = np.array(L)
mask = ((L<4) | (L>8)) # "|" = 'bitwise or'
L[mask]

array([ 2,  3, 11])

- **Fancy Indexing**: passing a list/array of indices..


In [31]:
L = np.array(L)
ind = [0,4,2]
L[ind]

array([ 2, 11,  5])

- **Multi-dimension**: use commas to seperate indices!

In [32]:
M = np.arange(6).reshape(2,3)
print(M)

[[0 1 2]
 [3 4 5]]


In [34]:
# multiple indices seperated by comma
M[0,1] # [:] la slicing(goi 1 dong so), [r,c] la indices (goi 1 so nhat dinh)

1

In [35]:
# mixing slices and indices
M[:,1]

array([1, 4])

In [36]:
# masking the full array
M[np.abs(M-3)<2]

array([2, 3, 4])

In [38]:
# mixing fancy indexing and slicing
M[[1,0],:2] #fancy indexing of the row and slicing the col

array([[3, 4],
       [0, 1]])

In [51]:
# using fancy indexing(list) with 2D array se hieu la chon row => row1 len truoc row0
print(M[[1,0],:]) 

[[3 4 5]
 [0 1 2]]


In [50]:
# using slicing with columns 2D array no se slicing theo col
print(M[:,:2])

[[0 1]
 [3 4]]


In [13]:
# mixing masking and slicing
M = np.arange(6).reshape(2,3)
print(M)
print('\n',M[M.sum(axis=1)>4,1:])

[[0 1 2]
 [3 4 5]]

 [[4 5]]


## Computing the nearest neighbors

In [15]:
# 1000 points in 3 dimensions
X = np.random.random((1000,3))
X.shape

(1000, 3)

In [17]:
# broadcasting to find pairwise differences
diff = X.reshape(1000,1,3) - X
diff.shape

(1000, 1000, 3)

In [20]:
# aggregate to find pairwise distance
D = (diff**2).sum(axis=2)
D.shape

(1000, 1000)

In [22]:
# set diagonal to infinity to skip self-neighbors
i = np.arange(1000)
D[i,i] = np.inf

In [24]:
# print the indices of the nearest neighbors
i = np.argmin(D,1)
print(i[:10])

[793 225 977 873 476 181 186 385 769 603]


In [26]:
# double check with scikit-learn
from sklearn.neighbors import NearestNeighbors

d,i = NearestNeighbors().fit(X).kneighbors(X,2)
print(i[:10,1])

[793 225 977 873 476 181 186 385 769 603]
