# Python for scientific computing

Python is extremely popular for scientific computing, due to such factors as
- the accessible and flexible nature of the language itself,
- the huge range of high quality scientific libraries now available,
- the fact that the language and libraries are open source,
- the popular Anaconda Python distribution (with Jupyter and Spyder), which simplifies installation and management of those libraries, and
- the recent surge of interest in using Python for machine learning and artificial intelligence.

## Python libraries (modules)

In terms of popularity, the big four in the world of scientific Python libraries are
- Pandas (data handling)
- NumPy (calculations - and a basic *array* data type (think of vectors and matrices))
- SciPy (calculations - and builds on NumPy by adding the kinds of numerical methods that are routinely used in science (interpolation, optimization, root finding, etc.))
- Matplotlib (plotting - with a focus on plotting data stored in NumPy arrays)

Pure python is easy to write but not very fast because it has so many checks and balances. Python has to check the type of each variable:

In [None]:
# integers
a, b = 10, 10 
a + b

In [None]:
# strings
a, b = 'foo', 'bar' 
a + b

In [None]:
# lists
a, b = ['foo'], ['bar'] 
a + b

To perform calculations fast, we need to import functions from the libraries above which have been **vectorized**. These functions use vectors and matrices which are stored in a fast structure in memory.

In [None]:
import random
import numpy as np

In [None]:
n = 1_000_000 # this is the same as writing
n = 1000000

In [None]:
# to measure time elapsed:
n = 1_000_000

import time
start_time = time.time()

y = 0 # Will accumulate and store sum
for i in range(n):
    x = random.uniform(0, 1)
    y += x**2


print(f'Runtime: {time.time() - start_time:.4f} seconds')

In Jupyter, we can use something called a *line magic* to get how long it takes to execute the code in a cell. But it has to be the first string in the cell:

%%time



Vectorization is much faster than for-loops and while -loops:

Create 1 million random numbers and their square with a for-loop:

In [None]:
%%time
# base Python

y = 0 # Will accumulate and store sum
for i in range(n):
    x = random.uniform(0, 1)
    y += x**2

Create 1 million random numbers and their square with vector multiplication:

In [None]:
%%time
# vectorized functions from numpy (np)

x = np.random.uniform(0, 1, n)
y = np.sum(x**2)

In [None]:
def f(x, y):
    return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

Another example of how vectorization is much faster than loops:

In [None]:
%%time

# base Python version
grid = np.linspace(-3, 3, 1000)
m = -np.inf
for x in grid:
    for y in grid:
        z = f(x, y)
        if z > m:
            m = z

In [None]:
%%time
# vectorized version
x, y = np.meshgrid(grid, grid)
np.max(f(x, y))

This is some realizations of the function f(x,y) that we have defined:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

xgrid = np.linspace(-3, 3, 50)
ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
                y,
                f(x, y),
                rstride=2,
                cstride=2,
                cmap=cm.jet,
                alpha=0.7,
                linewidth=0.25) 
ax.set_zlim(-0.5, 1.0)
ax.set_xlabel('$x$', fontsize=14)
ax.set_ylabel('$y$', fontsize=14)
plt.show()