<!-- dom:TITLE: Introduction to Python (MOD510): NumPy arrays (ndarray) -->
# Introduction to Python (MOD510): NumPy arrays (ndarray)
<!-- dom:AUTHOR: Oddbjørn Nødland -->
<!-- Author: -->  
**Oddbjørn Nødland**

Date: **Aug 20, 2019**

In [1]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

**Summary.** The aim of this workbook is to provide a rapid introduction to Python
NumPy array objects, and to show some examples of how you can work with
them. Both similarities and differences between NumPy arrays and ordinary Python
lists will be discussed.

Note that only 1-dimensional examples will be given here. Multidimensional
arrays will be covered in a separate notebook.








## Introductory examples
<div id="numpy_arrays"></div>

An alternative to using built-in Python lists are using
[NumPy](https://www.numpy.org/) arrays (objects of the type *ndarray*).
Most of the operations you can apply to lists, you can apply to ndarrays also.
For example, you can access and change elements by index in exactly the same
way, you can slice them, you can iterate over them etc.:

In [2]:
# Examples of numpy array creation and manipulation:

# Create array filled with zeros:
a = np.zeros(10)  # default: array of floats
print('Empty numpy array of floats:', a)
# Try to uncomment this line:
# a[0] = 'Not allowed.'

b = np.zeros(10, dtype='int')  # specify int as number type
print('Empty numpy array of ints:', b)
# Floats are automatically converted to ints (integer part only, not rounded):
b[0] = 1.5
print('Type of b[0]:', type(b[0]))
print('After changing first entry, b=', b)

first_ten_integers = np.arange(1, 11)
for n in first_ten_integers:
    print(n)

# Create array of ones:
c = np.ones(10)
print('c=', c)

# Fill array with a constant value:
c.fill(42)
print('c=', c)

In [3]:
n_arr = np.zeros(4)
n_arr[2] = 3  # set third entry

for n in n_arr:
    print(n)

sliced_array = n_arr[1:]  # access all except the first entry

There are also many special routines that you can apply to them, e.g.:

In [4]:
# Concatenate arrays:
x_arr = np.array([1,2,3])
y_arr = np.array([4,5,6])
z_arr = np.concatenate([x_arr,y_arr])
print(z_arr)

# Split array into list of subarrays:x
x = np.arange(6)
y = np.split(x, 3)  # divide into 3 evenly sized pieces
print(y)

# Append elelement:
z = np.append(x, [6, 7, 8])
print(z)
# Note that the new elements have been added to a copy of x;
# the latter is unchanged:
print(x)

# Reverse order:
w = np.flip(x)
print(w)

While less flexible than lists in several respects, ndarrays are very useful
for numerical computation. One reason for this is that they take up less
space in memory; because they have a fixed size at creation, and can only
store objects of a single type, they need to store less information than
a corresponding list would. Indeed, if the size of an *ndarray* is
altered during program execution, what really happens is that a new array
is created.

## Vectorization
<div id="numpy_vectorization"></div>

However, the main advantage is that NumPy arrays support
[vectorized computation](https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html),
meaning that we can avoid time consuming loops and call a function
on a whole array with a single call.
The way vectorization works is that Python makes use of optimized,
pre-compiled C code behind the scenes, but as a Python programmer,
you do not need to worry about all of the messy details.
This is illustrated below:

In [5]:
# Evaluate cosine function on an entire array in one go:

# 100 equidistant numbers from 0 to 2*pi:
x_values = np.linspace(0, 2 * np.pi, 100)  # notice: NumPy version of pi
func_eval = np.cos(x_values)  # notice: NumPy version of cos(x)

# Check that the function looks as expected:
fig_cos = plt.figure()
plt.xlabel('x')
plt.ylabel('y')
plt.scatter(x_values, func_eval, label='cos(x)')
plt.legend()

# A (much) more time-consuming alternative would be to use a for loop
# (valid for both lists and arrays):
func_eval = []
for x in x_values:
    fx = np.cos(x)
    func_eval.append(fx)

Note the usage of *np.cos* rather than the version of the cosine function
from the built-in [math](https://docs.python.org/3/library/math.html)
library. NumPy provides many well-known mathematical functions such as sin,
cos, and exp; they are examples of so-called
[universal functions](https://docs.scipy.org/doc/numpy/reference/ufuncs.html).

As is evident from the next code snippet, using vectorized computation
on NumPy arrays instead of for loops has the potential to speed up the
CPU time by several orders of magnitude!

In [6]:
# Python lists or numpy arrays for numerical computation?

def f(x_vals):
    return x_vals*x_vals

def f2(x_vals):
    f_eval = []
    for x in x_vals:
        f_eval.append(x*x)
    return f_eval

N = 100
x_values = np.linspace(-2, 2, N)
# Uncomment these lines (takes time...):
#%timeit f(x_values)
#%timeit f2(x_values)

For simple functions only involving standard arithmetic operations,
and/or functions provided by the NumPy library, it is easy to build
new functions that support vectorized computation:

In [7]:
# Vectorized code for simple custom functions:

def quadratic_polynomial(x, a=1, b=0, c=0):
    """
    Represents a quadratic polynomial function in a single variable:

        f(x)= a*x^2 + bx + c

    If no coefficients are specified, by default: f(x)=x^2.

    :param x: A single number given as argument to the function,
                or else a NumPy array of values.
    :return: f(x); either a single number or a NumPy array.
    """
    return a*x*x + b*x + c

x_vals = np.linspace(-2, 2, 50)  # 50 values from x=-2 to x=+2
f_vals = quadratic_polynomial(x_vals)

# Plot function:
fig_poly2 = plt.figure()
plt.scatter(x_vals, f_vals)

def custom_function(x):
    """
    A function f=f(x) that uses elementary trigonometric functions,
    plus addition and multiplication operators.

    :param x: A single number given as argument to the function,
                or else a NumPy array of values.
    :return: f(x); either a single number or a NumPy array.
    """
    return x*np.sin(2*x)+x*x+2.0

f2_vals = custom_function(x_vals)
fig_custom_func = plt.figure()
plt.scatter(x_vals, f2_vals)

There are options to vectorize more complicated functions as well, though we
will not go much into that here. As a simple example, we show you can
define a 'split function':

In [8]:
# Example of a function that is defined differently in different domains:
def split_function(x):
    return np.where(x <= 0, -1.0, 1.0)

f3_vals = split_function(x_vals)
fig_split_func = plt.figure()
plt.scatter(x_vals, f3_vals)

# Visualize all three functions on [-2,2] in the same plot.
func_plot, ax = plt.subplots()
ax.plot(x_vals, f_vals, color='blue', linestyle='-')
ax.plot(x_vals, f2_vals, color='red', linestyle='--')
# note: Linear interpolation in-between points of discontinuity:
ax.plot(x_vals, f3_vals, color='green', linestyle='-.')

In general, working with *ndarrays* is reminiscent of working
with vectors in mathematics, in that you can perform arithmetic
operations on them (element-by-element) very easily, e.g.:

In [9]:
# Arithmetic on numpy arrays:
x_vals = np.linspace(0, 1, 10)
f1_vals = np.sqrt(x_vals)  # square root of x
f2_vals = x_vals**2        # x squared

# Add values together:
f3_vals = f1_vals+f2_vals  # sqrt(x) + x^2
print(f3_vals)

# Multiply all values in x_vals by a scalar:
print(2.0*x_vals)

# Subtract the same number from all entries of the array:
y_vals = x_vals - 10.0
print(y_vals)

In [10]:
# Initialize array of zeros:
c_arr = np.zeros(10)
c_arr[0] = 1.0  # Set first entry
print('Before vectorized calculation, c=', c_arr)
# Update array:
c_arr[1:] = c_arr[0:-1] + c_arr[1:]
print('After vectorized calcuation update, c=', c_arr)

For the last calculation above, we might attempt to get the same end result
with a for loop:

In [11]:
# A slower version that supposedly does the same thing:
c_arr = np.zeros(10)
c_arr[0] = 1.0  # Set first entry
print('Before calculation in for loop, c=', c_arr)
for i in range(1, len(c_arr)):
    c_arr[i] += c_arr[i-1]
print('After calculation in for loop, c=', c_arr)
# Why do we get a different result than for the vectorized code?

In [12]:
# On the other hand, this works:
c_arr = np.zeros(10)
c_arr[0] = 1.0  # Set first entry
c_old = c_arr.copy()
for i in range(1, len(c_arr)):
    c_arr[i] = c_old[i-1] + c_old[i]
print(c_arr)

## Boolean indexing
<div id="numpy_boolean"></div>

Another useful feature with NumPy arrays is that they allow so-called
Boolean indexing, e.g.:

In [13]:
# Boolean indexing is possible with numpy:
numbers = np.array([-3, -2, -1, 0, 1, 2, 3])
print('Before:', numbers)

# Replace all negative values with zero:
number_is_negative = numbers < 0.0
numbers[number_is_negative] = 0.0
print('After first array modification:', numbers)

# Replace all numbers divisible by 3 by zero:
number_is_div_by_three = numbers % 3 == 0
numbers[number_is_div_by_three] = 0.0
print('After second array modification', numbers)

## An important difference between *ndarrays* and lists
<div id="ndarray_list_important_diffs"></div>

The observant reader might note that if we attempt to apply a slicing
operation to a NumPy array, the behaviour is different than for Python lists:

In [14]:
# Changing a slice of an ndarray changes the original:
x = [i for i in range(10)]
x_arr = np.array(x)

y = x[:]  # for a list, this is a (shallow) copy
y_arr = x_arr[:]  # for a NumPy array, it is a view of the original array

print('List before:', x)
print('Array before:', x_arr)
y[0] = 99
y_arr[0] = 99
print('List after:', x)
print('Array after:', x_arr)

If one is not aware of this, it can be the source of many hard-to-find
bugs. For more information about slicing and indexing in NumPy, see, e.g.,
[here](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).