# Exercise 0.2 -  Numpy
prepared by M.Hauser

`NumPy` is *the* fundamental package for scientific computing with Python. It provides fast (matlab-like) multi-dimensional-arrays (nd-arrays), which contain uniform data types with an arbitrary number of dimensions. It also provides many mathematical functions (`mean`, `std`, ...) and array methods (slicing, broadcasting, ...).

In [None]:
# numpy is generally abbreviated as np

import numpy as np

This course is not about processing data - we will mostly use data that was already prepared and is stored in NetCDF format (see later exercises). However, one function to create data that we will use is `np.arange`


## np.arange

`np.arange` creates evenly spaced values within a given interval. 

In [None]:
np.arange(10)

This created an array with 10 elements. Notice how it starts at 0 and ends at 9.

The array has one dimension:

In [None]:
np.arange(10).shape

`arange` stands for 'a range' (i.e. not for arrange). The call signature is

`np.arange(stop)`

`np.arange(start, stop)`


`np.arange(start, stop, step)`


### Exercise

 * Create an array with elements from 1 to 12

### Solution

In [None]:
np.arange(1, 13)

It can be a bit confusing that you need to set `stop = 13` for this array - but you'll get used to it. This also holds when you set `step`:

In [None]:
np.arange(0, 1, 0.2)

### Exercise
 * Create an array from 0 to 5 in steps of 0.5

### Solution

In [None]:
np.arange(0, 5.1, 0.5)

## Other ways to create arrays

Arrays can be created in several ways

#### from a list

In [None]:
# with a list
a = np.asarray([1, 2, 3])

print(a.shape)
print(a.size)
print(a.ndim)
print(a.dtype)

# only the last command in the cell yields an output
a

#### array with random values

In [None]:
# a 2d vector of random numbers

y = np.random.rand(3, 5)

print(y.shape)
print(y.size)
print(y.ndim)
# note the different dtype
print(y.dtype)

y

#### array of zeros, ones, and identity matrix

In [None]:
# other ways to create arrays

y = np.zeros((1, 3))
print(y)
print("")

y = np.ones((2, 2)) * 3.1
print(y)
print("")


# identity matrix
y = np.eye(3)
print(y)

Yes, the syntax is not entirely consistent: you do `np.random.rand(3, 4)` but `np.zeros((3, 4))`.

### Exercises

1. Create an array from a tuple: e.g.: `t = (1, 2, 3)`
* `np.random.rand` creates uniformly distributed numbers in `[0, 1)`. Can you create random integers between 0 and 55? Hint: `np.random.randint?`. (There are other ways, of course.)
* Can you create random numbers from a normal distribution? Hint: `np.random.ra<Tab>`.

### Solution

In [None]:
# 1
t = (1, 2, 3)
t = np.asarray(t)
print(t)

# 2
print(np.random.randint(0, 56))
print(int(np.random.rand() * 55))

# 3
np.random.randn()

## Array indexing

Arrays are indexed with square brackets

#### Create 1D array

In [None]:
# create an array from 0..49 with shape (50)
x_1d = np.arange(50)
print(x_1d.shape)
print(x_1d)

#### Get one element - as number, using `x[0]`

In [None]:
print("The value:", x_1d[0])
print("The shape:", x_1d[0].shape)
print("The type:", type(x_1d[0]))
print("It's not a ndarray!")

#### Get one element - as array, using `x[0:1]`

In [None]:
print("The value:", x_1d[0:1])
print("The shape:", x_1d[0:1].shape)
print("The type:", type(x_1d[0:1]))
print("It's a ndarray!")

#### Get the first ten elements

In [None]:
print("x[0:10]:", x_1d[0:10])
print("x[:10]:", x_1d[:10])

#### Get the last element(s)

In [None]:
print("Get the last element:")
print(x_1d[-1])

print("Get the last ten elements:")
print(x_1d[-10:])

#### Create 2D array

In [None]:
# create an array from 0..50 with shape (5, 10)
x_2d = np.arange(50).reshape(5, 10)
print(x_2d.shape)
x_2d

#### Get the first row

In [None]:
x_2d[0, :]

#### Get the last column (1D)

In [None]:
x_2d[:, -1]

#### Get the first element

In [None]:
x_2d[0, 0]

#### Looping through the 2D array - this happens row by row

In [None]:
for i in x_2d:
    print(i)

#### Looping through a 1D array - this happens element by element

In [None]:
for i in x_2d.flat[:5]:
    print(i)

## Calculations

You can do all sorts of calculations. Usually they are the done element-by-element.

#### create 2 random arrays

In [None]:
x = np.random.rand(3, 5)
y = np.random.rand(3, 5)
print(x)
print(y)

#### add both arrays - this is done element by element

In [None]:
z = x + y

z

#### a scalar is added to all elements

In [None]:
# add a scalar
x = x + 1

x

### functions

numpy offers all sorts of mathematical functions, e.g. `mean`

#### Mean of all elements

In [None]:
y.mean()

#### Mean over all rows

In [None]:
y.mean(axis=0)

#### Mean over all columns

In [None]:
y.mean(axis=1)

### Exercise

Calculate the standard deviation of

 * the whole array
 * the rows 
 * the columns

### Solution

#### all elements

In [None]:
y.std()

#### all rows

In [None]:
y.std(axis=0)

#### all columns

In [None]:
print(y.std(axis=1))

## Broadcasting

Broadcasting can be very helpful as it allows us to add arrays with different shapes.

In [None]:
# create 2 arrays
x = np.arange(15).reshape(3, -1)
y = np.asarray([0.1, 0.2, 0.3, 0.4, 0.5])

print("x.shape:", x.shape)
print("y.shape:", y.shape)

(Note: `reshape` changes the shape of the array and the `-1` in `reshape(3, -1)` infers the correct number of elements).

Although `x` and `y` do not have the same shape, we can add them:

In [None]:
x + y

<font size="4"><b>Optional exercise</b></font>

- Create a python list with a million integers.
Look at how to time a piece of python code (e.g. look at the "time" module).
Calculate the execution time for adding a number to each element of the list.
Repeat the above using a NumPy array.
- Repeat the first exercise calculating dot product of two lists/arrays.

In [None]:
# Time execution time of adding a number to 1_000_000 integers using a Python list


In [None]:
# Time execution time of adding a number to 1_000_000 integers using numpy


### Solution

In [None]:
import time

data = list(range(1000000))
start = time.time()
data_add = [i + 5 for i in data]
end = time.time()
print(end - start)

In [None]:
import time

data = np.array(list(range(1000000)))
start = time.time()
data_add = data + 5
end = time.time()
print(end - start)