![intro_banner](./Images/23-1-2.3-Banner.png)

---

This notebook will introduce the main Python libraries dedicated to scientific computing.

Quoting the documentation from [Scipy](https://www.scipy.org/about.html) :

SciPy refers to several related but distinct entities:
* The SciPy ecosystem, a collection of open source software for scientific computing in Python.
* The community of people who use and develop this stack.
* Several conferences dedicated to scientific computing in Python - SciPy, EuroSciPy, and SciPy.in.
* The [SciPy library](https://www.scipy.org/scipylib/index.html), one component of the SciPy stack, providing many numerical routines.

Scientific computing in Python builds upon a small core of packages:
* [Python](https://www.python.org/), a general purpose programming language. It is interpreted and dynamically typed and is very well suited for interactive work and quick prototyping, while being powerful enough to write large applications in.
* [NumPy](http://www.numpy.org/), the fundamental package for numerical computation. It defines the numerical array and matrix types and basic operations on them.
*The [SciPy library](https://www.scipy.org/scipylib/index.html), a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics, and much more.
* [Matplotlib](http://matplotlib.org/), a mature and popular plotting package that provides publication-quality 2-D plotting, as well as rudimentary 3-D plotting.


# Numpy

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

NumPy is often used along with packages like SciPy (Scientific Python) and Mat−plotlib (plotting library). This combination is widely used as a replacement for MatLab, a popular and costly platform for technical computing

**Arrays**

A numpy array is a grid of values, all of the same type. It is indexed by a tuple of non-negative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension. NumPy’s array class is called ndarray. 

We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [None]:
import numpy as np  # np is a short name for numpy

A = np.array([[1, 2, 3], [4, 5, 6]])  # Create a rank 2 array
print(A.shape)  # Prints "(2, 3)"
print(A[0, 0], A[0, 1], A[1, 0])  # Prints "1 2 4"

print("Create a 2x2 array of all zeros")
A = np.zeros((2, 2))  # Create an array of all zeros
print(A)

print("Create a 4x2 array of all ones")
A = np.ones((4, 2))  # Create an array of all ones
print(A)

print("Create an array of constant values")
A = np.full((2, 2), 7)  # Create a constant array
print(A)

print("Create 2x2 identity matrix")
A = np.eye(2)  # Create a 2x2 identity matrix
print(A)

print("Create an array filled with random values")
A = np.random.random((2, 2))  # Create an array filled with random values
print(A)

In [None]:
A = np.array([1, 2, 3])  # Create a rank 1 array
print("The object a is of type: ", type(A))
print("It's shape is: ", A.shape)
print("The array is: ", A)
print("The first element of an array has index '0', e.g. A[0] ", A[0])
print(
    "We can manually access all the elements of this array: ", A[0], A[1], A[2]
)  # Prints "1 2 3"
print(
    "The value of an element of an array can be edited by assigning a new value, like this A[0] = 5"
)
A[0] = 5
print("The final array is: ", A)

Numpy offers several methods to generate arrays of different dimensions

**Array indexing**

Numpy offers several ways to index arrays (i.e. to access elements of on array)

Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [None]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
A = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("First create a 3x4 array\n", A)
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]

B = A[:2, 1:3]
print("Now, let's slice this array into a 2x2 array\n", B)
# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(
    "Let's print the value for the first row / second column: ", A[0, 1]
)  # Prints "2"

Boolean array indexing

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])

bool_idx = a > 2  # Find the elements of a that are bigger than 2;
# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.

print(bool_idx)  # Prints "[[False False]
#          [ True  True]
#          [ True  True]]"

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])  # Prints "[3 4 5 6]"

**Basic maths with numpy**

In [None]:
A = np.array([1, 2, 3, 4])
B = np.arange(4)
C = A + B
print("Print the sum of 2 arrays: ", C)
print("We can compute the square of an array: ", A ** 2)

Note that unlike in Matlab , "*" is elementwise multiplication, not matrix multiplication. 

We instead use the **dot** function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

In [None]:
A = np.array([[1, 1], [0, 1]])
B = np.array([[2, 0], [3, 4]])

print("Matrix A\n", A)
print("Matrix B\n", B)

print("Elementwise product:\n", A * B)
print("Matrix product:\n", A.dot(B))  # Matrix product
print(
    "Matrix product can also be computed with the arobase symbol:\n", A @ B
)  # another matrix product

Numpy provides many useful functions for performing computations on arrays : basic arithmetics, trigonometrics, etc...

In [None]:
A = np.array([[1, 2], [3, 4]])

print("Matrix A\n", A)
print("Sum of all elements:", np.sum(A))  # Compute sum of all elements; prints "10"
print(
    "Sum along columns", np.sum(A, axis=0)
)  # Compute sum of each column; prints "[4 6]"
print("Sum along rows", np.sum(A, axis=1))  # Compute sum of each row; prints "[3 7]"
print("Cosine\n", np.cos(A))
print("Maximum value is: ", np.max(A))
print("Minimum value is: ", np.min(A))

# Matplotlib

[Matplotlib](https://matplotlib.org/) is a library for creating static, animated, and interactive visualizations in Python

In [None]:
from matplotlib import pyplot as plt

# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)

# Plot the points using matplotlib
plt.plot(x, y)
plt.show()  # You must call plt.show() to make graphics appear.

In [None]:
y_sin = np.sin(x)
y_cos = np.cos(x)

# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()

Display images with imshow()

In [None]:
img = plt.imread('./Images/24-MatplotLibLogo.jpeg') 
logo_plot = plt.imshow(img)
# Let's hide the axis , as an image does not need axis to be added.
logo_plot.axes.get_xaxis().set_visible(False) 
logo_plot.axes.get_yaxis().set_visible(False)

Let's draw a 3D graph of a sine on the rooted square of both axis : 

In [None]:
from matplotlib import cm
from matplotlib.ticker import LinearLocator
import numpy as np

fig, ax = plt.subplots(subplot_kw={"projection": "3d"})

# Make data.
X = np.arange(-5, 5, 0.05)
Y = np.arange(-5, 5, 0.05)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = 2 * np.sin(R)

# Plot the surface.
surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,
                       linewidth=0, antialiased=False)

# Customize the z axis.
ax.set_zlim(-2.01, 2.01)
ax.zaxis.set_major_locator(LinearLocator(5))
# A StrMethodFormatter is used automatically
ax.zaxis.set_major_formatter('{x:.01f}   ')

# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.4, aspect=5)

plt.show()

# Scikit-image

[scikit-image](https://scikit-image.org/) is a collection of algorithms for image processing

## Image segmentation

First, open an image and display it with rasterio

In [None]:
import rasterio
from rasterio.plot import show
from skimage.exposure import histogram
from matplotlib import pyplot as plt
import numpy as np
src = rasterio.open("./Products/USA/Crop_colorado.jpg");
img = src.read()
show(img);

Boost image exposition to help isolate the most important circular crops with a segmentation algorithm

In [None]:
from skimage import exposure
scale = exposure.rescale_intensity(img, in_range=(130, 250))
show(scale);

Use a segmentation algorithm to detect the edge of the circles

In [None]:
from skimage.filters import sobel
elevation = sobel(scale)
show(exposure.rescale_intensity(elevation[0], in_range=(.05, .4)));

Finally, detect the objects (i.e. the crops properly irrigated)

In [None]:
from skimage.segmentation import watershed
markers = np.zeros_like(scale)
markers[img < 160] = 1
markers[img > 200] = 2 # Define size of objects to detect
segmentation = watershed(elevation, markers) # Segmentation algorithm
plt.tight_layout() # Display
show(segmentation[0]);

## Image manipulation

Scikit-image offers several tools for image manipulation, such as filters, morphology detection, histogram manipulation etc.

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

from skimage import data
from skimage.util.dtype import dtype_range
from skimage.util import img_as_ubyte
from skimage import exposure
from skimage.morphology import disk
from skimage.morphology import ball
from skimage.filters import rank

# Load an example image
img = src.read()[0]
plt.imshow(img)
plt.title("Original image");

The histogram manipulation tools may be useful to improve pattern detection. For example in the agricultural are of the previous cell

In [None]:
# Global equalization of the histogram
img_rescale = exposure.equalize_hist(img)
plt.imshow(img_rescale)

In [None]:
# Equalization
selem = disk(25)
img_eq = rank.equalize(img, selem)
plt.imshow(img_eq)

To detect edges in a image, one can use [OpenCV](https://opencv.org) instead of scikit image.

OpenCV is an open source, cross-platform library that includes hundreds of computer vision algorithm. It is much more complex than scikit image, but is mentioned here for reference.

In [None]:
import cv2 as cv
edges = cv.Canny(img_eq, 10, 500, 1)
plt.imshow(edges, cmap="Greens")

# Pandas

Pandas is a Python package providing fast and flexible data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
- Any other form of observational / statistical data sets. The data need not be labeled at all to be placed into a pandas data structure



In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

## Object creation

**Pandas Series** (similar to numpy arrays)

Manual creation of a Series

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

Series of dates using the 'date_range' method

In [None]:
dates = pd.date_range("20130101", periods=6)
dates

**DataFrames** (i.e. tabular data)

Create a DataFrame with 6 rows and 4 columns. The index can be created with a Pandas Series (e.g. the dates Series created in the cell just above)

In [None]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

To check the type of each column

In [None]:
df.dtypes

Display only first 3 rows


In [None]:
df.head(3)

Display last 2 rows only

In [None]:
df.tail(2)

Note that head and tail can be used to create new DataFrames (i.e. not only for display purposes)

In [None]:
new_df = df.head(3)
new_df

To create a new column : 

In [None]:
df["E"] = [1, 2, 3, 4, 5, 6]
df

## Selection

Select a single column


In [None]:
df["A"]

Select by slicing

As with numpy or basic lists, data within a DataFrame can be selected by slicing:

In [None]:
df[0:3]

Select by label


In [None]:
df.loc[dates[0]]

In [None]:
df.loc[dates[1:3], "A"]

Boolean indexing

DataFrames can be indexed with conditions, e.g.: 

In [None]:
df[df["A"] > 0]

Selecting values from a DataFrame where a boolean condition is met.

In [None]:
df[df > 0]

## Basic operations

Pandas offers many useful functions to simplify data analysis on DataFrames (or Series)

Compute the mean of all columns

In [None]:
df.mean()

Sum columns

In [None]:
df.sum(0, skipna=False)

Sum rows (note the first argument is 1 instead of 0)

In [None]:
df.sum(1, skipna=False)

Minimum value of each column

In [None]:
df.min()

Get maximum value of a single column

In [None]:
print(df["A"].max())
print(df.A.max())

Display general statistics on DataFrame

In [None]:
df.describe()

It is also possible to use used-defined functions

In the example below, we will multiply the last column ("E") by 3

In [None]:
df["E"].apply(lambda x: 3 * x)

Aggregation functions are similar to SQL queries

In [None]:
df.agg(["sum"])

Compute sum and mean for each column

In [None]:
df.agg(["sum", "mean"])

Fetch the names of all columns

In [None]:
df.columns

Pandas is based on matplotlib to plot graphs

In [None]:
plt.figure()
df.plot()

# SciPy


SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering

From its [documentation](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) : SciPy is a collection of mathematical algorithms and convenience functions built on the NumPy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data. With SciPy, an interactive Python session becomes a data-processing and system-prototyping environment rivaling systems, such as MATLAB, IDL, Octave, R-Lab, and SciLab.

**Note** : SciPy is a an extensive and complex library, only some basic examples will be shown here. Please refer to the [documentation](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) for more

## Interpolation

In [None]:
from scipy.interpolate import interp1d
from matplotlib import pyplot as plt
import numpy as np

**Linear interpolation**

In [None]:
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
f = interp1d(x, y)
f2 = interp1d(x, y, kind='cubic')
xnew = np.linspace(0, 10, num=41, endpoint=True)

plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')
plt.legend(['data', 'linear', 'cubic'], loc='best')
plt.show()

**Spline interpolation**

In [None]:
from scipy import interpolate
x = np.arange(0, 2*np.pi+np.pi/4, 2*np.pi/8)
y = np.sin(x)

tck = interpolate.splrep(x, y, s=0)
xnew = np.arange(0, 2*np.pi, np.pi/50)
ynew = interpolate.splev(xnew, tck, der=0)

In [None]:
plt.figure()
plt.plot(x, y, 'x', xnew, ynew, xnew, np.sin(xnew), x, y, 'b')
plt.legend(['Linear', 'Cubic Spline', 'True'])
plt.axis([-0.05, 6.33, -1.05, 1.05])
plt.title('Cubic-spline interpolation')
plt.show()

## Analysis of linear systems

Scipy offers tools to solve linear systems


$$
\begin{array}{c}
3x + 2y = 12 \\
2x - y = 1
\end{array}
$$


In [None]:
from scipy.linalg import solve

a = np.array([
    [3, 2],
    [2, -1]
])
b = np.array([12, 1]).reshape((2, 1))
x = solve(a, b)

print("The solution of the system is : x = {0} and y = {1}".format(x[0], x[1]))

Scipy can be helpful to perform matrix operations

Example : finding the determinant of

$$
	\begin{bmatrix} 
	1 & 2 \\
	3 & 4 \\
	\end{bmatrix}
	\quad
$$


In [None]:
from scipy import linalg
A = np.array([[1,2],[3,4]])
x = linalg.det(A)
print("The determinant is: ", x)

Computing norms

In [None]:
import numpy as np
from scipy import linalg

A = np.array([[1,2],[3,4]])

linalg.norm(A)

# Rasterio

Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. [**Rasterio**](https://rasterio.readthedocs.io/en/latest/) reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON


In [None]:
import rasterio
from matplotlib import pyplot as plt
from rasterio.plot import show
src = rasterio.open('./Products/Cameroon/Lake_Lagdo_crop.tif')

In [None]:
show(src.read())

Get informations about the product

In [None]:
print("Width: ", src.width)
print("Height: ", src.height)
print("File name: ", src.files)
print("Is the system a projected one ?", src.crs.is_projected)
print("Projection system used: ", src.crs) # https://spatialreference.org/ref/epsg/32633/

To get the spatial coordinates of a pixel, use the dataset’s *xy()* method. The coordinates of the center of the image can be computed like this.

In [None]:
print("Central coordinates: ", src.xy(src.height // 2, src.width // 2))
print("Coordinates of the bounding box", src.bounds)

Since the product is a geotiff, it is possible to fetch geospatial information

In [None]:
print("The product central longitude and latitude are: lon = {0}, lat = {1}".format(src.lnglat()[0], src.lnglat()[1]))

However, collecting longitude and latitude for other parts of the products is more complicated, since every points of the image need to be converted.

Rasterio can map the pixels of a destination raster with an associated coordinate reference system and transform to the pixels of a source image with a different coordinate reference system and transform. This process is known as reprojection.

Hopefully, rasterio provides several utilities to make this processing easier. In the cells below we will : 

* Open a product with rastertio, get all the information about it's projection system
* User rasterio's utilities to reproject the GeoTIFF
* Write the GeoTIFF on the disk, in a new coordinate reference system
* Display the result

In [None]:
from rasterio.warp import calculate_default_transform, reproject, Resampling

dst_crs = "EPSG:4326"  # WGS 84 -- WGS84 - World Geodetic System 1984, used in GPS

# Open source image and get its parameters
with rasterio.open("./Products/Cameroon/Lake_Lagdo_crop.tif") as src:
    transform, width, height = calculate_default_transform(
        src.crs, dst_crs, src.width, src.height, *src.bounds
    )
    kwargs = src.meta.copy()
    kwargs.update(
        {"crs": dst_crs, "transform": transform, "width": width, "height": height}
    )
    # Write output image after reprojection
    with rasterio.open(
        "./Products/Cameroon/Lake_Lagdo_crop.wgs84.tif",
        "w",
        **kwargs
    ) as dst:
        for i in range(1, src.count + 1): # Loop over all bands
            reproject(
                source=rasterio.band(src, i),
                destination=rasterio.band(dst, i),
                src_transform=src.transform,
                src_crs=src.crs,
                dst_transform=transform,
                dst_crs=dst_crs,
                resampling=Resampling.nearest,
            )

Let's open this GeoTIFF with rasterio

In [None]:
src = rasterio.open("./Products/Cameroon/Lake_Lagdo_crop.wgs84.tif") # Open file
img = src.read() # Read file as a numpy array

Fetch the coordinates, to later display the image as a new layer on a map

In [None]:
x1, y1, x2, y2 = src.bounds  # Get coordinates of image bounds
print("Coordinates of the bounding box in the EPS:4326 reference system\n")
print("Bottom left: ", (x1, y1))
print("Top right: ", (x2, y2))

In [None]:
import folium
lon, lat = src.lnglat()  # Get longitude and latitude
m = folium.Map(location=[lat, lon], zoom_start=12)

folium.raster_layers.ImageOverlay(
    image=img[0], bounds=[[y1, x1], [y2, x2]], opacity=0.7
).add_to(m)

m

*Notice the water level difference between both images*

Rasterio also provides a show_hist() function for generating histograms of single or multiband rasters:

In [None]:
from rasterio.plot import show_hist
show_hist(src, bins=50, lw=0.0, stacked=False, alpha=0.3, histtype='stepfilled', title="Histogram")

# Numba

[**Numba**](http://numba.pydata.org/) is an open source [JIT compiler](https://www.ibm.com/docs/en/ztpf/1.1.0.15?topic=reference-jit-compiler) that translates a subset of Python and NumPy code into fast machine code. Numba-compiled numerical algorithm in Python can approach the speeds of C or Fortran.

No need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your Python function, and Numba does the rest. 

Numba's performance depends on what your code looks like, if your code is numerically orientated (does a lot of math), uses NumPy a lot and/or has a lot of loops, then Numba is often a good choice.

In [None]:
from numba import njit
import math
import numpy as np

def std(xs):
    """
    This function takes a list of numbers, and returns the standard deviation of these numbers. 
    """
    mean = 0
    for x in xs: 
        mean = mean + x
    mean = mean / len(xs)
    # compute the variance
    ms = 0
    for x in xs:
        ms += (x-mean)**2
    variance = ms / len(xs)
    std = math.sqrt(variance)
    return std

@njit
def jit_std(xs):
    """
    This function takes a list of numbers, and returns the standard deviation of these numbers. 
    It is the exact same method as above but will use Numba to spee-up computations
    """
    mean = 0
    for x in xs: 
        mean = mean + x
    mean = mean / len(xs)
    # compute the variance
    ms = 0
    for x in xs:
        ms = ms + ((x-mean)**2)
    variance = ms / len(xs)
    std = math.sqrt(variance)
    return std

Let's execute the same computation without Numba, then with Numba and compare the computation time

Without Numba: 

In [None]:
%%time
a = np.random.normal(0, 1, int(1e6))
std(a)

With Numba, the computation is much faster !

In [None]:
%%time
jit_std(a)