In [1]:
import platform
platform.uname()

uname_result(system='Windows', node='Laptop_HSoni', release='10', version='10.0.17134', machine='AMD64', processor='Intel64 Family 6 Model 37 Stepping 5, GenuineIntel')

In [2]:
platform.python_version()

'3.7.2'

In [3]:
import numpy as np
print("NumPy Version:",np.__version__)

NumPy Version: 1.15.4


# NumPy Array Operations

Loops over array elements should be avoided as this is computationally inefficient. Instead, NumPy offers lots of efficient functions that operate on the whole array at once.

In [4]:
my_list = [2, 10, 8, 14]
x = np.asarray(my_list)
type(x)

numpy.ndarray

Let's do some addition, subtraction, multiplication, and division with scalar values:

In [5]:
print("x  = ",x)
print("x + 3.5 = ", x + 3.5)
print("x - 3.5 = ", x - 3.5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2) # floor division
print("x ** 2 =", x ** 2) # exponentiation (power)

x  =  [ 2 10  8 14]
x + 3.5 =  [ 5.5 13.5 11.5 17.5]
x - 3.5 =  [-1.5  6.5  4.5 10.5]
x * 2 = [ 4 20 16 28]
x / 2 = [1. 5. 4. 7.]
x // 2 = [1 5 4 7]
x ** 2 = [  4 100  64 196]


In [6]:
x ** 2

array([  4, 100,  64, 196], dtype=int32)

In [7]:
print("-x = ", -x)
print("x % 3 = ", x % 3)

-x =  [ -2 -10  -8 -14]
x % 3 =  [2 1 2 2]


In [8]:
y = np.r_[1:5] + 3
y 

array([4, 5, 6, 7])

In [9]:
print("x + y = ", x + y)
print("x - y = ", x - y)
print("x * y =", x * y)
print("x / y =", x / y)
print("x // y =", x // y) # floor division

x + y =  [ 6 15 14 21]
x - y =  [-2  5  2  7]
x * y = [ 8 50 48 98]
x / y = [0.5        2.         1.33333333 2.        ]
x // y = [0 2 1 2]


In [10]:
y // x

array([2, 0, 0, 0], dtype=int32)

In [11]:
# strung together
-(0.5*x + 1) ** 2

array([ -4., -36., -25., -64.])

# Working with Multidimensional Arrays

In [12]:
x = np.arange(1, 21)
y = np.arange(1, 21)
np.random.seed(42)
np.random.shuffle(x)
np.random.shuffle(y)
x.shape = 5, 4
y.shape = 5, 4
print(x)
print(y)

[[ 1 18 16  2]
 [ 9  6 12  4]
 [19 17 14  3]
 [10 20  5 13]
 [ 8 11 15  7]]
[[20 17 16  6]
 [ 5 13 15  8]
 [ 4  7  3 10]
 [14 11 19  9]
 [18 12  1  2]]


In [13]:
print("x + y = \n", x + y)
print("x - y = \n", x - y)
print("x * y = \n", x * y)
print("x / y = \n", x / y)

x + y = 
 [[21 35 32  8]
 [14 19 27 12]
 [23 24 17 13]
 [24 31 24 22]
 [26 23 16  9]]
x - y = 
 [[-19   1   0  -4]
 [  4  -7  -3  -4]
 [ 15  10  11  -7]
 [ -4   9 -14   4]
 [-10  -1  14   5]]
x * y = 
 [[ 20 306 256  12]
 [ 45  78 180  32]
 [ 76 119  42  30]
 [140 220  95 117]
 [144 132  15  14]]
x / y = 
 [[ 0.05        1.05882353  1.          0.33333333]
 [ 1.8         0.46153846  0.8         0.5       ]
 [ 4.75        2.42857143  4.66666667  0.3       ]
 [ 0.71428571  1.81818182  0.26315789  1.44444444]
 [ 0.44444444  0.91666667 15.          3.5       ]]


## Vector Products

The **dot** or **inner product** c = **a · b** of the vectors $a$ and $b$, each of size *m*, is defined as the scalar

\begin{equation}
c=\sum\limits_{k=1}^{m}{{{a}_{k}}{{b}_{k}}}
\end{equation}

It can also be written in the form $c = a^{T}b$. 

In NumPy the function for the dot product is **`dot(a,b)`** or **`inner(a,b)`**.

The outer product $c=a\otimes b$ is defined as the matrix

\begin{equation}
c_{ij}={{a}_{ij}}{{b}_{ij}}
\end{equation}

An alternative notation is $c = ab^{T}$.

The NumPy function for the outer product is **`outer(a,b)`**.

In [14]:
x = np.array([7, 3])
y = np.array([2, 1])
A = np.array([[1, 2], [3, 2]])
B = np.array([[1, 1], [2, 2]])

In [15]:
# Dot product
print("dot(x,y) =\n",np.dot(x,y)) # {x}.{y}
print("dot(A,x) =\n",np.dot(A,x)) # [A]{x}
print("dot(A,B) =\n",np.dot(A,B)) # [A][B]

dot(x,y) =
 17
dot(A,x) =
 [13 27]
dot(A,B) =
 [[5 5]
 [7 7]]


In [16]:
# Inner product
print("inner(x,y) =\n",np.inner(x,y)) # {x}.{y}
print("inner(A,x) =\n",np.inner(A,x)) # [A]{x}
print("inner(A,B) =\n",np.inner(A,B)) # [A][B_transpose]

inner(x,y) =
 17
inner(A,x) =
 [13 27]
inner(A,B) =
 [[ 3  6]
 [ 5 10]]


In [17]:
# Outer product
print("outer(x,y) =\n",np.outer(x,y))
print("outer(A,x) =\n",np.outer(A,x))
print("outer(A,B) =\n",np.outer(A,B))

outer(x,y) =
 [[14  7]
 [ 6  3]]
outer(A,x) =
 [[ 7  3]
 [14  6]
 [21  9]
 [14  6]]
outer(A,B) =
 [[1 1 2 2]
 [2 2 4 4]
 [3 3 6 6]
 [2 2 4 4]]


## Specifying output

For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored. Rather than creating a temporary array, you can use this to write computation results directly to the memory location where you’d like them to be. 

In [18]:
x = np.r_[1:6]
y = np.zeros(x.shape,dtype=x.dtype)
# y = np.empty(x.shape)
np.multiply(x, 10, out=y)

array([10, 20, 30, 40, 50])

In [19]:
print(y)

[10 20 30 40 50]


In [20]:
y = np.zeros(10,dtype=x.dtype)
np.power(2, x, out=y[::2])

array([ 2,  4,  8, 16, 32])

In [21]:
print(y)

[ 2  0  4  0  8  0 16  0 32  0]


If we had instead written `y[::2] = 2 ** x`, this would have resulted in the creation of a temporary array to hold the results of `2 ** x`, followed by a second operation copying those values into the `y` array. This doesn’t make much of a difference for such a small computation, but for very large arrays the memory savings from careful use of the out argument can be significant.

## Broadcasting

Broadcasting allows binary operations (addition, subtraction, multiplication, etc.) to be performed on arrays of different sizes.

In [22]:
x = np.ones((3, 3))
x

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [23]:
y = np.r_[0:3]
y

array([0, 1, 2])

In [24]:
x + y

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

Here the one-dimensional array `y` is stretched, or broadcast, across the second dimension in order to match the shape of `x`.

In [25]:
x*y

array([[0., 1., 2.],
       [0., 1., 2.],
       [0., 1., 2.]])

## Use Case:

Imagine you are dealing with data having 10 instances (observations) with three attributes (variables). We will store in a $10 \times 3$ array:

In [26]:
X = np.random.randint(1, 50,(10, 3))
X

array([[ 9, 39, 18],
       [ 4, 25, 14],
       [ 9, 26,  2],
       [20, 28, 47],
       [ 7, 44,  8],
       [47, 35, 14],
       [17, 36, 40],
       [ 4,  2,  6],
       [42,  4, 29],
       [18, 26, 44]])

In [27]:
Xmean = X.mean(axis=0) # mean of each column
# Xmean = X.mean(0)
Xmean

array([17.7, 26.5, 22.2])

And now we can center the `X` array by subtracting the mean (this is a broadcasting operation):

In [28]:
X_centered = X - Xmean
X_centered

array([[ -8.7,  12.5,  -4.2],
       [-13.7,  -1.5,  -8.2],
       [ -8.7,  -0.5, -20.2],
       [  2.3,   1.5,  24.8],
       [-10.7,  17.5, -14.2],
       [ 29.3,   8.5,  -8.2],
       [ -0.7,   9.5,  17.8],
       [-13.7, -24.5, -16.2],
       [ 24.3, -22.5,   6.8],
       [  0.3,  -0.5,  21.8]])

**Fact:** $\sum{\left( X-\overline{X} \right)}=0$. Let's verify this fact. 

In [29]:
np.round_(X_centered.sum(0))

array([0., 0., 0.])