[Back to Lecture Overview](Overview.ipynb)

# Numpy Introduction
* Author: Johannes Maucher
* Last Update: 02.07.2019

<figure align="center">
<img width="800" src="https://maucher.home.hdm-stuttgart.de/Pics/DS_Python_Libs_All.png">
</figure>

In [77]:
import numpy as np
print(np.__version__)

1.18.1


## Numpy Array
In Numpy arrays are the main type of objects. In arrays elements are arranged in multidimensional tables. Array elements are indexed by positive integers, starting at 0.
### Create a numpy array
Create a 1-dimensional array and access its type, number of dimensions and shape and size:

In [78]:
A=np.array([1,2,3.0])
print(A)
print("Type of variable:     ",type(A))
print("Type of elements:     ",A.dtype)
print("Number of dimensions: ",A.ndim)
print("Shape:                ",A.shape)
print("Size:                 ",A.size)

[1. 2. 3.]
Type of variable:      <class 'numpy.ndarray'>
Type of elements:      float64
Number of dimensions:  1
Shape:                 (3,)
Size:                  3


Create a 2-dimensional array and access its type, number of dimensions and shape and size:

In [79]:
B=np.array([[1,2,3,4],[5,6,0,1],[1,2,0,1]])
print(B)
print("Type of variable:     ",type(B))
print("Type of elements:     ",B.dtype)
print("Number of dimensions: ",B.ndim)
print("Shape:                ",B.shape)
print("Size:                 ",B.size)

[[1 2 3 4]
 [5 6 0 1]
 [1 2 0 1]]
Type of variable:      <class 'numpy.ndarray'>
Type of elements:      int64
Number of dimensions:  2
Shape:                 (3, 4)
Size:                  12


Create 2-dimensional array of zeros:

In [80]:
Z=np.zeros((5,2))
print(Z)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


Create a 3-dimensional array of ones

In [81]:
O=np.ones((2,3,4))
print(O)

[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]


Create 2-dimensional array of random integers between 10 (inclusive) and 20 (exclusive)

In [82]:
RI=np.random.randint(10,20,(5,3))
print(RI)

[[11 15 17]
 [18 18 15]
 [19 13 12]
 [16 11 15]
 [18 17 10]]


Create a 2-dimensional array of random numbers according to a Gaussian normal distribution with mean 0 and standard deviation 1.

In [83]:
RN=np.random.randn(5,5)
print(RN)

[[-1.39  0.21 -1.13 -1.45  0.78]
 [-0.93  0.03 -1.69  0.77  1.06]
 [ 0.44  0.14 -0.08  0.59  1.53]
 [ 1.66 -0.61 -0.15 -0.45 -0.98]
 [-0.33 -0.15 -1.1   0.16  0.08]]


Create a 2-dimensional array of random numbers according to a Gaussian normal distribution with mean m=100 and standard deviation s=5.

In [84]:
mean=100
sigma=5
RN100=sigma*np.random.randn(4,8)+mean
print(RN100)

[[101.64 111.99  94.61  97.88 102.6   97.92 100.68 103.95]
 [ 97.18  95.62 103.18  97.36  92.64 102.82  97.88 107.17]
 [100.09 100.77  98.56 103.43  97.46  93.08 105.31 103.3 ]
 [ 94.97  96.92  98.49 101.46  91.93  97.89  87.23 104.67]]


Set print options, such that only 2 positions after the decimal point are printed.

In [85]:
np.set_printoptions(precision=2)
print(RN100)

[[101.64 111.99  94.61  97.88 102.6   97.92 100.68 103.95]
 [ 97.18  95.62 103.18  97.36  92.64 102.82  97.88 107.17]
 [100.09 100.77  98.56 103.43  97.46  93.08 105.31 103.3 ]
 [ 94.97  96.92  98.49 101.46  91.93  97.89  87.23 104.67]]


Throw a balanced dice for 30 times. The first element of the resulting array is the number of times a 1 has been thrown, the last element is the number of times a 6 has been thrown.   

**Check if data is Normally Distributed:**

In [86]:
from scipy import stats
k2, p = stats.normaltest(RN.flatten())
alpha = 1e-3
print("p = {:g}".format(p))

if p < alpha:  # null hypothesis: x comes from a normal distribution
    print("The null hypothesis can be rejected")
else:
    print("The null hypothesis cannot be rejected")

p = 0.818722
The null hypothesis cannot be rejected


In [87]:
RM=np.random.multinomial(30,[1/6.]*6)
print(RM)

[4 6 3 7 5 5]


Create a 1-dimensional array, containing a sequence of numbers, starting from 2, terminating before 18 and a stepsize of 4.  

In [88]:
S=np.arange(2,18,4)
print(S)

[ 2  6 10 14]


Reshape the 1-dimensional array to a 2-dimensional array. Note that the number of elements in the original and the reshaped array must be equal.

In [89]:
SR = S.reshape((2,2))
print(SR)

[[ 2  6]
 [10 14]]


In the _reshape()_ function the last shape-element need not be specified. It is enough to place a _-1_ at the corresponding position. Then this shape-element is calculated from the other shape-elements and the number of elements in the array. Below another _reshape()_ is performed. Note that the result is a _2-dimensional array_ of 1 row and 4 columns. 

In [90]:
SRR=SR.reshape((1,-1))
print(SRR)

[[ 2  6 10 14]]


The _flatten()_ function yields a _1-dimensional_ array.

In [91]:
SRRR=SR.flatten()
print(SRRR)

[ 2  6 10 14]


### Accessing Array Elements

In [92]:
print(B)

[[1 2 3 4]
 [5 6 0 1]
 [1 2 0 1]]


Access the element in the first row and third column:

In [93]:
b=B[0,2]
print(b)

3


Access the second row:

In [94]:
r=B[1,:]
print(r)

[5 6 0 1]


Access the first column:

In [95]:
c=B[:,0]
print(c)

[1 5 1]


Access elements 2 and 3 of the second column:

In [96]:
p=B[1,1:3]
print(p)

[6 0]


Access the lower right (2x2) subarray of B:

In [97]:
l=B[1:,2:]
print(l)

[[0 1]
 [0 1]]


Negative integers can be applied to access array elements in the reverse order. E.g. B[-1,:] is the last row of B. Thus the lower right (2x2) subarray of B can also be obtained by:

In [98]:
l2=B[-2:,-2:]
print(l2)

[[0 1]
 [0 1]]


Reversing columns and rows in an array:

In [99]:
RB=B[::-1,::-1]
print("B=\n",B)
print("Reverse order of columns and rows:\nRB=\n",RB)

B=
 [[1 2 3 4]
 [5 6 0 1]
 [1 2 0 1]]
Reverse order of columns and rows:
RB=
 [[1 0 2 1]
 [1 0 6 5]
 [4 3 2 1]]


In [100]:
a=10*np.arange(10)
idx=np.array([1,9,3])
print(a)
print(a[idx])

[ 0 10 20 30 40 50 60 70 80 90]
[10 90 30]


### Type-Cast of Numpy Arrays
Given a 2-dimensional list with entries of different types:

In [102]:
mylist=[[1,2,3,"tom","hanks",78.0],
        [4,5,6,"meryl","streep",78.0]]

Convert this list into a numpy-array:

In [104]:
myArray=np.array(mylist)
myArray

array([['1', '2', '3', 'tom', 'hanks', '78.0'],
       ['4', '5', '6', 'meryl', 'streep', '78.0']], dtype='<U21')

Since there are strings in the 2d-list, all entries of the array are represented as strings (array elements can not have different types!). 

Next, we like to extract only the numeric columns and convert them to a numeric datatype:

In [105]:
numericCols=[0,1,2,5]
myNumericArray=myArray[:,numericCols].astype(np.float16)
myNumericArray

array([[ 1.,  2.,  3., 78.],
       [ 4.,  5.,  6., 78.]], dtype=float16)

### Filter Numpy-Arrays by Value

In [106]:
B=np.random.randint(0,10,(8,4))
B

array([[6, 9, 3, 5],
       [0, 4, 6, 5],
       [7, 6, 1, 4],
       [2, 1, 4, 6],
       [9, 1, 5, 8],
       [6, 1, 2, 9],
       [7, 7, 8, 0],
       [3, 3, 9, 1]])

Given the array `B`, we like to extract all rows, whose first element is $>5$:

In [107]:
filteredRows=B[B[:,0]>5,:]
filteredRows

array([[6, 9, 3, 5],
       [7, 6, 1, 4],
       [9, 1, 5, 8],
       [6, 1, 2, 9],
       [7, 7, 8, 0]])

### Copying and Editing Array Elements

Copy B to CB:

In [25]:
CB=B
print(CB)

[[1 2 3 4]
 [5 6 0 1]
 [1 2 0 1]]


Editing part of B:

In [26]:
B[1:,2:]=np.zeros((2,2))
print("B=\n",B)
print("CB=\n",CB)

B=
 [[1 2 3 4]
 [5 6 0 0]
 [1 2 0 0]]
CB=
 [[1 2 3 4]
 [5 6 0 0]
 [1 2 0 0]]


As shown above copying by __=__ just copies the reference. A **true copy** can be implemented as follows:

In [27]:
CB=B.copy()
B[1:,2:]=np.ones((2,2))
print("B=\n",B)
print("CB=\n",CB)

B=
 [[1 2 3 4]
 [5 6 1 1]
 [1 2 1 1]]
CB=
 [[1 2 3 4]
 [5 6 0 0]
 [1 2 0 0]]


## Basic Operations

### Add and multiply array elements with a scalar

In [28]:
print(B)

[[1 2 3 4]
 [5 6 1 1]
 [1 2 1 1]]


In [29]:
print(3*B)

[[ 3  6  9 12]
 [15 18  3  3]
 [ 3  6  3  3]]


In [30]:
print(100+B)

[[101 102 103 104]
 [105 106 101 101]
 [101 102 101 101]]


### Elementwise addition and multiplication of matrices of the same size

In [31]:
A=np.random.randint(0,7,B.shape)
print("B=\n",B)
print("A=\n",A)

B=
 [[1 2 3 4]
 [5 6 1 1]
 [1 2 1 1]]
A=
 [[0 3 2 2]
 [3 5 5 1]
 [6 6 5 4]]


In [32]:
S=A+B
print(S)

[[ 1  5  5  6]
 [ 8 11  6  2]
 [ 7  8  6  5]]


In [33]:
P=A*B
print(P)

[[ 0  6  6  8]
 [15 30  5  1]
 [ 6 12  5  4]]


### Some other elementwise operations:

In [34]:
P2=B**2
print("All elements of B raised to a power of 2\n P2=\n",P2)

All elements of B raised to a power of 2
 P2=
 [[ 1  4  9 16]
 [25 36  1  1]
 [ 1  4  1  1]]


In [35]:
SR=np.sqrt(B)
print("Elementwise squareroot\n SR=\n",SR)

Elementwise squareroot
 SR=
 [[1.   1.41 1.73 2.  ]
 [2.24 2.45 1.   1.  ]
 [1.   1.41 1.   1.  ]]


In [36]:
SI=np.sin(B)
print("Elementwise sine\n SI=\n",SI)

Elementwise sine
 SI=
 [[ 0.84  0.91  0.14 -0.76]
 [-0.96 -0.28  0.84  0.84]
 [ 0.84  0.91  0.84  0.84]]


### Dot product

In [37]:
a=np.arange(3)
b=np.array([2,4,1])
print(a)
print(b)

[0 1 2]
[2 4 1]


In [38]:
c=np.dot(a,b)
print(c)

6


### Matrix multiplication

In [39]:
print("B=\n",B)
print("A=\n",A)

B=
 [[1 2 3 4]
 [5 6 1 1]
 [1 2 1 1]]
A=
 [[0 3 2 2]
 [3 5 5 1]
 [6 6 5 4]]


The following code-line raises an error, because the number of columns in $A$ is unequal to the number of rows in $B$.

In [40]:
C=np.dot(A,B)
print("C=\n",C)

ValueError: shapes (3,4) and (3,4) not aligned: 4 (dim 1) != 3 (dim 0)

In [None]:
BT=np.transpose(B)
print("BT=\n",BT)
C=np.dot(A,BT)
print("C=\n",C)

The same operation can also be implemented as follows:

In [None]:
BT2=A.dot(BT)
print("BT2=\n",BT2)

### Simple statistic operations on arrays

In [None]:
print("A=\n",A)

In [None]:
print("Minimum = %3d\nMaximum = %3d\nMean = %3.2f"%(A.min(),A.max(),A.mean()))

In [None]:
print("Minimum = %s\nMaximum = %s\nMean = %s"%(A.min(axis=0),A.max(axis=0),A.mean(axis=0)))

In [None]:
print("Minimum = %s\nMaximum = %s\nMean = %s"%(A.min(axis=1),A.max(axis=1),A.mean(axis=1)))

## Stacking and splitting arrays 

In [None]:
print("A=\n",A)
print("B=\n",B)

In [None]:
V=np.vstack((A,B))
print("V=\n",V)

In [None]:
H=np.hstack((A,B))
print("H=\n",H)

In [None]:
Hs=np.hsplit(H,(3,5))
print("1st part=\n",Hs[0])
print("2nd part=\n",Hs[1])
print("3rd part=\n",Hs[2])

In [None]:
Vs=np.vsplit(V,(2,))
print("1st part=\n",Vs[0])
print("2nd part=\n",Vs[1])

## Ordering array elements

In [None]:
b=np.random.randint(0,100,15)
print(b)

In [None]:
bs=np.sort(b)
print(bs)

In [None]:
sortIdx=np.argsort(b)
print(sortIdx)

In [None]:
bss=b[sortIdx]
print(bss)

## Array Query

In [None]:
print(A)

In [None]:
print(A.nonzero())

In [None]:
print(np.where((A>2) & (A<5)))

In [None]:
print(np.where((A>2) & (A<5),np.ones(A.shape),np.zeros(A.shape)))

In [None]:
print(A)

In [None]:
print(B)

In [None]:
print(A>B)

In [None]:
print(np.any(A>B,axis=0))

In [None]:
print(np.any(A>B,axis=1))

In [None]:
print(np.all(A>B,axis=0))

In [None]:
print(np.all(A>B,axis=1))

## Import and Export Data from/to Files 

In [None]:
print(A)

Save array A to a binary file in NumPy .npy format. In the example below the file is saved in the working directory. However, an arbitrary path can be specified in the string-parameter of _save()_.

In [None]:
np.save("binFileA",A)

Load array, which has been saved to a binary file into NumPy:

In [None]:
AL=np.load("binFileA.npy")
print(AL)

Save array to a text-file. Note that _savetxt()_ also has parameters for separating columns (_delimiter=''_) and lines (_newline='\n'_). Thus this method and the corresponding _loadtxt()_-method can also be applied for writing to and reading from .csv.

Load data from a textfile into a NumPy array:

In [None]:
np.savetxt("textFileA",A,fmt="%4.2f")

In [None]:
AT=np.loadtxt("textFileA")
print(AT)