# Programming for Data Analysis - NumPy

## **Introduction to numerical computing**

NumPy (Python library): creating and manipulating numerical data.
NumPy provides a high-performance multidimensional array object, and tools for working with these arrays. \
References and detailed documentation can be found: https://numpy.org/doc/.

In [18]:
#Importing the library (convention)
import numpy as np

*_Memory efficient container that provides fast numerical operations._*

In [None]:
L = range(1000)     #all integers from 0 to 999

#time it takes to raise 0-999 to the power of 2 with a regular list:
%timeit [i**2 for i in L]

#time it takes using numpy:
a = np.arange(1000)
%timeit a**2

In a NumPy array, the number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension. \
<a id="1darrays" ></a>
**_1D arrays_**

In [None]:
a = np.array([12,34,56,78,90]) #rank 1 array
print("The array you first created is:", a)
print("The type of the array is ", type(a))
print("The dimensions of the array are:", a.ndim)
print("The shape of the array is:", a.shape)
print("The length of the array is:", len(a))
print("The first, second and third elements of the array are:", a[0], a[1], a[2])

In [None]:
#Changing elements in the array
a[0] = 100
print("The array after modification is:", a)

 <a id="multi" ></a>
**_Multidimensional Arrays_**

In [None]:
# 2D arrays
b = np.array([[12,34,56],[98,76,54]])
print("The array you created is:", b)
print("The dimensions of the array are:", b.ndim)
print("The shape of the array is:", b.shape)
print("The first, second and third columns of the array:", b[:,0], b[:,1], b[:,2])
print("The first and second rows of the array:", b[0,:], b[1,:]) 

In [None]:
# 3D arrays
c = np.array([[[11], [22]], [[33], [44]]])
print("The array you created is:" , c)
print("The dimensions of the array are:", c.ndim)
print("The shape of the array is:", c.shape)

#### _Exercise_
* Create a simple two dimensional array. First, redo the examples from above. And then create your own: how about odd numbers counting backwards on the first row, and even numbers on the second? 

* Use the functions len(), numpy.shape() on these arrays. How do they relate to each other? And to the ndim attribute of the arrays?

**_Main functions for creating arrays_** 
* np.arange(), np.linspace()
* np.ones(), np.zeros(), np.full(), np.eye(), np.diag()
* np.random.rand(), np.random.randn()

In [None]:
#create arrays of ordered and equally spaced values: np.arange() and np.linspace()
a = np.arange(1, 9, 2) # start, end (exclusive), step
print(a)


b = np.linspace(0, 1, 6)   # start, end, number of values
print(b)
c = np.linspace(0, 1, 5, endpoint=False) #values are spaced like b but the last one is excluded
print(c)

In [6]:
a_ = np.zeros((3, 3)) #array of zeros with a defined shape (3,3)
print(a_, "Shape of array:", a_.shape)

b_ = np.ones((2, 2))  #array of ones with a defined shape (2,2)
print(b_, "Shape of array:", b_.shape)

c_ = np.full((2,2), 7) #array of (2,2) of 7s 
print(c_, "Shape of array:", c_.shape)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]] Shape of array: (3, 3)
[[1. 1.]
 [1. 1.]] Shape of array: (2, 2)
[[7 7]
 [7 7]] Shape of array: (2, 2)


In [10]:
d_ = np.eye(3) #identity matrix of size 3, diagonal is filled with 1, zeroes elsewhere
print(d_,"Shape of array:", d_.shape)

e_ = np.diag(np.array([1, 2, 3, 4])) #diagonal is equal to the elements of the array, zeroes elsewhere
print(e_,"Shape of array:", e_.shape)

# notice that because we give an array of integers, the zeroes are also integers now
# we can specify the data type of an array using the dtype parameter:

e_ = np.diag(np.array([1, 2, 3, 4], dtype=float))
print(e_)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]] Shape of array: (3, 3)
[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]] Shape of array: (4, 4)
[[1. 0. 0. 0.]
 [0. 2. 0. 0.]
 [0. 0. 3. 0.]
 [0. 0. 0. 4.]]


In [None]:
a__ = np.random.rand()       # random floats between [0, 1)
print(a__)

b__ = np.random.randn(5)      # normal distribution (mean=0, sd=1)
print(b__)

#### **Indexing and Slicing** 
_Indices begin at 0, like other Python sequences (and C/C++). In contrast, in  Matlab, indices begin at 1._

In [None]:
a = np.arange(10)
print("Original array a:", a)
print(a[0], a[2], a[6])  #first, third and seventh element of the array
print("Using the index ::-1, you can print the reverse of the original array:",  a[::-1])

**_Slicing_** \
\
Slicing an array requires specifying the start, end and step of the slicing. If not specified, they are set to the default values: by default, start is 0, end is the last and step is 1:

In [None]:
print(a)
print(a[2:9:3]) #start, end, step
print("Going through the elements with a step of 2:", a[::2]) #start and end are not included, so by default they are the first and last element, last index: step = 2
print(a[1:3]) #start and end are specified, step is taken as 1 by default

print("The first 4 elements of the array:", a[:4])
print(a[4:]) #start is specified, end & step by default

**_Indexing in Multidimensional Arrays_**

In [20]:
a_ = np.diag(np.arange(3))
print(a_)
#First index is row, second is column
print("The element in the third row, second column of the matrix is:", a_[2,1])     #also works: a_[2][1]
print("The element in the third row, third column of the matrix is:", a_[2,2])
print("The element in the first row, third column of the matrix is:", a_[0,2])

[[0 0 0]
 [0 1 0]
 [0 0 2]]
The element in the third row, second column of the matrix is: 0
The element in the third row, third column of the matrix is: 2
The element in the first row, third column of the matrix is: 0


#### _Exercise_
* Try to implement slicing in the 2D array.

_Note:_ By modifying the view (the sliced version of the array), the original array is modified as well. To prevent this, you have to save the view as a copy and perfom all modification in the copy.

### **Numerical Operations in NumPy**

**Elementwise Operations** 
Addition, Subtraction, Multiplication, Division

In [None]:
#one array and a scalar
a = np.array([0,1,2,3])
print("Original array:", a)
print("Addition with a scalar: ", a+1)
print("Subtraction by a scalar: ", a-2)
print("Multiplication to a scalar: ",a*2)
print("Division by a scalar: ", a, a/2)

In [None]:
#between 2 arrays
b = np.array([10,11,12,13])
print("Adding the two arrays:",  a + b)
print("Subtracting between the two arrays:",  b - a)
print("Multiplying the two arrays:",  a * b)
print("Division between the two arrays:",  a/b)

In [None]:
#joining arrays into one:
c = np.concatenate((a,b))
print(c)

#also works with multidimensional arrays (careful with shape)
d = np.concatenate((np.zeros((2,3)), np.ones((5,3))), axis=0)
print(d)

**Matrix Multiplication and Transposition** 

In [None]:
x = np.ones((3,2))
y = np.full((2,4),3)
product_matrix = np.dot(x,y)

print(product_matrix)
print(product_matrix.shape)

In [23]:
a = np.arange(9)
b = a.reshape(3,3)
print(a)
print(a.shape)
print(b)
print(b.shape)

[0 1 2 3 4 5 6 7 8]
(9,)
[[0 1 2]
 [3 4 5]
 [6 7 8]]
(3, 3)


In [25]:
#Transpose
print("The original matrix:\n", b)
print("The transpose of the matrix:\n", b.T)

The original matrix:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
The transpose of the matrix:
 [[0 3 6]
 [1 4 7]
 [2 5 8]]


**Other Operations**

In [29]:
# Comparing elements of two arrays (must be same shape)
a = np.arange(3)
b = np.ones(3)
print("a:",a,"b:",b)
print(a == b)
print(a < b)

a: [0 1 2] b: [1. 1. 1.]
[False  True False]
[ True False False]


In [30]:
x = np.array([10,4,2,20,99])
print(x)
print("Sum of all the elements of the array is:", x.sum())
print("The minimum value in the array is:", x.min())
print("The largest value in the array is:", x.max())
print("The index of the minimum value:", x.argmin())
print("The index of the maximum value:", x.argmax())

#x.argmin() is the index; the below 2 lines confirm whether the array[index] is equal to the actual minima/maxima of the array
print(x[x.argmin()] == x.min())
print(x[x.argmax()] == x.max())

[10  4  2 20 99]
Sum of all the elements of the array is: 135
The minimum value in the array is: 2
The largest value in the array is: 99
The index of the minimum value: 2
The index of the maximum value: 4
True
True


In [None]:
# Basic statistics
print("The mean of the array:", x.mean())           #also works with np.mean(x)
print("The median of the array:", np.median(x))
print("The standard deviation of the array:", np.std(x))

In [None]:
# Sum values along a given axis only
y = np.arange(6).reshape(2,3)
print(y)
print("Sum of each column:", y.sum(axis = 0))
print("Sum of each row:", y.sum(axis = 1))

# With more dimensions... don't lose track of your axes
z = np.random.rand(2,3,4)
print(z.shape)
print(z.sum(axis = 1).shape)
print(z.sum(axis = (0,2)).shape)

### **Loading Data Files** 

In [None]:
my_data = np.random.rand(10,2)
print(my_data)

# saving as numpy file: easy to import again with numpy
np.save('my_data.npy', my_data)

# saving as text file: can be opened with text editor, excel, etc.
np.savetxt('my_data.txt', my_data)
np.savetxt('my_data.csv', my_data, delimiter=';')

#by default, files will save in the same directory as the notebook

In [None]:
imported_data = np.load('my_data.npy')
print(imported_data)

#same for text files using np.loadtxt
imported_data2 = np.loadtxt('my_data.csv', delimiter=";")
print(imported_data2)

# by default, numpy looks for the file in the same directory as the notebook

### _Exercice_

Try to use numpy with exercises 6 and 8 of the exercises sheet