# Programming for Data Analysis - NumPy

## **Introduction to numerical computing**

NumPy is a Python library for creating and manipulating numerical data.
NumPy provides a high-performance multidimensional array object, and tools for working with these arrays. \
References and detailed documentation can be found: https://numpy.org/doc/.

In [None]:
# Importing the library
import numpy as np

# we use the 'as' keyword to give an alias to our library name: we can now simply write 'np' instead of 'numpy'.
# you can technically use any alias you want, but the most popular libraries have conventions: numpy is usually np.

<a id="1darrays" ></a>
## 1D arrays

In [None]:
# Creating an array:
a = np.array([12,34,56,78,90])
print(a)

In [None]:
# We can check some properties of our array
print("The number of dimensions:", a.ndim)
print("The shape of the array:", a.shape )
print("The length of the array:", len(a) )

In [None]:
# Like with the lists, we can look at specific elements in the array

print(a[0])         # first element of the array: remember we count from zero!
print(a[-1])        # last element of the array: we can count backwards
print(a[3:])        # from the second element to the last
print(a[::2])       # from first to last, but only every other two

In [None]:
# Changing elements in the array
a[0] = 100
print("The array after modification is:", a)

**Basic mathematical operations**

We can perform simple operations between a 1D array (vector) and a scalar.

In [None]:
b = np.array([1,2,3,4,5,6,7])
print("The original array:", b)

print("Addition:", b+10)
print("Subtraction:", b-3)
print("Multiplication:", b*2)
print("Division:", b/5)

We can also perform operations between arrays.

In [None]:
a = np.array([1,2,3])
b = np.array([7,8,9])
print("The arrays are:", a, "and", b)

print("Addition:", a+b)
print("Subtraction:", a-b)
print("Multiplication:", a*b)

**Special operations functions**

NumPy has shortcuts for the most common operations:

- _np.linalg.norm()_:  calculating the norm of an array
- _np.dot()_:  dot product of two arrays.

In [None]:
# Norm of an array
a_norm = np.linalg.norm(a)
print("The norm of array", a, "is", a_norm)


# Dot product (or inner product) of two arrays:
a_b_product = np.dot(a,b)
print("The dot product of", a, "and", b, "is", a_b_product)

In [None]:
# What happens if you try to perform operations on arrays of different sizes?

c = [1,2,3,4,5,6,7]
print(b+c)

### Exercises

1. Use numpy to calculate the norm of the following vectors (from Exercise 2, slide 19):

- $a=\begin{pmatrix}3&-1\end{pmatrix}$
- $b=\begin{pmatrix}4&3&2\end{pmatrix}$
- $c=\begin{pmatrix}1&2&-1&3&1\end{pmatrix}$

2. Use numpy to calculate the dot product of these two arrays (from Exercise 3, slide 33):

    $\begin{pmatrix}1&2&-3\end{pmatrix} \begin{pmatrix}4\cr-5\cr6\cr\end{pmatrix}$


 <a id="multi" ></a>
## Multidimensional Arrays

Multidimensional arrays work mostly like 1D arrays, but the greater number of dimensions requires to be more careful when manipulating the data.

In [None]:
# 2D arrays
A = np.array([[1,2,3],[4,5,6]])
print(A)

In [None]:
# General size properties
print("The number of dimensions:", A.ndim)
print("The shape of the array:", A.shape)

In [None]:
# Indexing
## For 2D arrays, remember it's always [row, column].
print("The first column:", A[:,0])
print("The second row:", A[1,:]) 
print("The element in the last row and second column:", A[-1,1])

**Basic matrix operations**

In [None]:
# Addition, subtraction, multiplication or division by a scalar work just like for 1D arrays:
print(A+2)
print(A*10)

In [None]:
# You can add, subtract, multiply or divide 2D arrays together if they are the same shape

B = np.array([[10,20,30],[40,50,60]])
print("A:", A)
print("B:", B)

print(A+B)
print(A*B) #etc.

**Specific operations functions**

In [None]:
#Transpose of a matrix
print(A.T)

In [None]:
# Matrix and vector product: same function as before
A = np.array([[1,2,3],[4,5,6]])
x = np.array([2,-1,1])

print(A)
print(x)

print(np.dot(A,x))

In [None]:
# Matrix and matrix product
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[2,-3],[1, 0], [9,-5]])

print("A")
print(A)
print("B")
print(B)

print("A.B")
print(np.dot(A,B))

print("B.A")
print(np.dot(B,A))

### Exercises

4. Use NumPy to perform these matrix products (from Exercise 4, slide 64).

$a) \begin{bmatrix}1&3\cr0&2\end{bmatrix} \begin{bmatrix}-1\cr2\end{bmatrix}$

$b) \begin{bmatrix}-2&1\cr1&2\end{bmatrix} \begin{bmatrix}1\cr3\end{bmatrix}$

$c) \begin{bmatrix}0&2\cr-1&1\end{bmatrix} \begin{bmatrix}3\cr1\end{bmatrix}$

$d) (\begin{bmatrix}1&3\cr-0&2\end{bmatrix}+\begin{bmatrix}-2&1\cr-1&2\end{bmatrix}) \begin{bmatrix}3\cr1\end{bmatrix}$

5. Use NumPy to calculate all the possible products among the following matrices (from Exercise 5, slide 64):

$A=\begin{bmatrix}1&2&3\end{bmatrix}$

$B=\begin{bmatrix}1\cr2\end{bmatrix}$

$C=\begin{bmatrix}2&1\cr-3&0\cr1&2\end{bmatrix}$

$D=\begin{bmatrix}-2&5\cr5&0\end{bmatrix}$

$E=\begin{bmatrix}-1&1&3\cr-1&-4&0\cr0&2&5\end{bmatrix}$

## Creating arrays

So far we were creating all our arrays manually by giving lists of lists to the np.array() function. Let's take a look at some numpy functions to generate some specific arrays.

**_Main functions for creating arrays_** 
* np.arange(), np.linspace()
* np.ones(), np.zeros(), np.full(), np.eye(), np.diag()
* np.random.rand(), np.random.randn()

In [None]:
# Create arrays of ordered and equally spaced values: np.arange() and np.linspace()
a = np.arange(1, 11, 2)      # start, end (exclusive), step
print(a)

b = np.linspace(0, 1, 6)   # start, end (inclusive), number of values
print(b)
c = np.linspace(0, 1, 5, endpoint=False) #values are spaced like b but the last one is excluded
print(c)

In [None]:
a = np.zeros((3, 3)) #array of zeros with a defined shape (3,3)
print(a, "Shape of array:", a.shape)

b = np.ones((2, 2))  #array of ones with a defined shape (2,2)
print(b, "Shape of array:", b.shape)

c = np.full((2,2), 7) #array of (2,2) of 7s 
print(c, "Shape of array:", c.shape)

In [None]:
d_ = np.eye(3) #identity matrix of size 3, diagonal is filled with 1, zeroes elsewhere
print(d_,"Shape of array:", d_.shape)

e_ = np.diag(np.array([1, 2, 3, 4])) #diagonal is equal to the elements of the array, zeroes elsewhere
print(e_,"Shape of array:", e_.shape)

# notice that because we give an array of integers, the zeroes are also integers now
# we can specify the data type of an array using the dtype parameter:

e_ = np.diag(np.array([1, 2, 3, 4], dtype=float))
print(e_)

In [None]:
a = np.random.rand(6)      # 10 random floats between [0, 1)
print(a)

b = np.random.randn(5)    # 5 random floats from a normal distribution (mean=0, sd=1)
print(b)

## Loading and saving data files

Creating arrays is fun and useful, but when it comes to data analysis you rarely input your data by hand into Python. What we do instead is import files which contain the data you want to analyze.

In [None]:
my_data = np.random.rand(10,2)
print(my_data)

# saving as numpy file: easy to import again with numpy
np.save('my_data.npy', my_data)

# saving as text file: can be opened with text editor, excel, etc.
np.savetxt('my_data.txt', my_data)
np.savetxt('my_data.csv', my_data, delimiter=';')

# by default, files will be saved in the same directory as the jupyter notebook

In [None]:
imported_data = np.load('my_data.npy')
print(imported_data)

#same for text files using np.loadtxt
imported_data2 = np.loadtxt('my_data.csv', delimiter=";")
print(imported_data2)

# by default, numpy looks for the file in the same directory as the notebook

## Some more useful operations


**Basic statistics**

In [None]:
data = np.array([7,2,8,10,14,16,3,18,12,6])

print("The mean:", data.mean())           # you can also use np.mean(data)
print("The median:", np.median(data))
print("The standard deviation:", np.std(data))

print("The sum of all the elements:", data.sum()) # you can also use np.sum(data)
print("The product of all the elements:", data.prod()) # you can also use np.prod(data)

print("The maximum value:", data.max())
print("The minimum value:", data.min())
print("The index of the maximum value:", data.argmax())
print("The index of the minimum value:", data.argmin())


**Combining arrays together**

While easy when working with 1D arrays, multidimensional arrays require to be careful with the shape

In [None]:
# Create two 1D arrays
a = np.array([1,2,3])
b = np.array([4,5,6])

# Combining the two arrays into one long array:
a_then_b = np.concatenate((a,b))
print("Concatenation of", a, "and", b, "is", a_then_b)

# Stacking the two arrays into a matrix with two rows:
AB = np.array([a,b])
print(AB, ": arrays stacked into a 2D array")

# Reshaping one long 1D array into an array with more dimensions
c = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
new_c = c.reshape((3,4))
print(new_c, ": reshaped array")

**Logical tests**

In [None]:
# Comparing elements of two arrays (must be same shape)
a = np.arange(3)
b = np.ones(3)
print("a:",a)
print("b:",b)
print("Equal?", a == b)
print("a<b?", a < b)

In [None]:
# Testing each element of one array: np.where()
## you will find this function is very handy in many situations!

a = np.array([1,-5,3,-2,-8,6,3])
where_negative = np.where(a<0)  #finds elements of the array that pass the test
print("Indexes of elements of a that are <0:", where_negative[0])

edit_negative = np.where(a<0, 100, a)  # where the test is True, replace by 100, otherwise don't change
print(edit_negative)