# Introduction to Python Packages

Contributors: Daniel Lay

This is Part 2 of an introductory notebook. This assumes that you understand functions, lists, print statements, and so on, at least at the level of the first Jupyter notebook.

For this notebook, you don't need to install anything on your computer. In general, you will want to install all of the packages used below; a guide to Anaconda is linked in the resources page of the previous notebook.

As a simple bit of motivation, suppose we want to compute the eigenvalue decomposition of a matrix (you will learn about this in the summer course if you're unfamiliar). There are a number of algorithms metioned on the Wikipedia page, https://en.wikipedia.org/wiki/Eigenvalue_algorithm, but how do you know which one to use? Do you even want to write your own? After all, doing so is time-consuming and error-prone.

The answer is, no: you do not have to write your own! Python lets you use what is called a *package*, or a collection of functions/classes, whenever you want. This makes it easy to use pre-written code for common things you might want to do, such as:
- Matrix operations (transposes, inverses, eigenvalue decomposition, etc.)
- Plotting data (scatterplot, contour plot, 3d graphics, etc.)
- Using special functions (trigonometric, polynomials, special functions, etc.)
- Reading to/writing from a file

So, for an enormous amount of things you want to do, you basically don't have to write *any code at all*. These applications, and many more, will be detailed throughout the summer course, and the rest of the Jupyter notebooks. For now, let's first learn how to use a package.

## Importing packages; numpy

The first package we'll use is called 'numpy', which stands for numerical python. It is used for matrix operations. To use a package, we *import* it, using the 'import' statement. We can import a package and call it something else, using the following code:

In [1]:
import numpy as np

Now, 'np' behaves like the classes you've been introduced to in the previous notebook. To understand this, let's use an example. You're already familiar with a list, and numpy has an analogous object, an array:

In [2]:
my_list = [1,2,3]
my_array = np.array([1,2,3])

print('my_list:',my_list)
print('my_array:',my_array)

my_list: [1, 2, 3]
my_array: [1 2 3]


We created an instance of an array by calling 'np.array', with a list as an argument. It looks like 'my_array' contains the same information as 'my_list'. But what if we want to perform algebraic operations on the list? Say, for instance, we want to square them. With an array, it is trivial: you simply use the '\*\*' operator on the array:

In [3]:
print('my_array squared:',my_array**2)

my_array squared: [1 4 9]


That couldn't be any easier! If you're feeling confident, try to accomplish the same with 'my_list' (I won't show it - it's basically always better to use an array).

The same is true for a whole lot of operations, such as multiplication by a number:

In [4]:
print('2 * my_array:',2*my_array)

2 * my_array: [2 4 6]


Commonly in physics, your data is multi-dimensional - for instance, you could have data defined on an $(x,y)$ grid. The natural way to store this data, then, is to have an object that uses two indices: one for the $x$ coordinate, and another for the $y$ coordinates. Numpy makes this easy:

In [5]:
my_2d_array = np.array([[1,2,3],
                        [4,5,6]])

print('my_2d_array:')
print(my_2d_array)

my_2d_array:
[[1 2 3]
 [4 5 6]]


If you're clever, you notice that we create this array by feeding in a list of lists. The reason to use an array instead is, again, for the ease of algebraic operations. These are accomplished the same as with 'my_array'.

There are a lot of ways to make arrays. An incomplete list includes the following:

In [6]:
#An array of ones, of length 5
print('np.ones:',np.ones(5))

#The integers from 0 to 5, similar to Python's 'range' statement
print('np.arange:',np.arange(5))

#Linearly spaced values from -1 to 1, for a total of 11 values
print('np.linspace',np.linspace(-1,1,11))

#Random numbers
print('np.random.rand',np.random.rand(5))

np.ones: [1. 1. 1. 1. 1.]
np.arange: [0 1 2 3 4]
np.linspace [-1.  -0.8 -0.6 -0.4 -0.2  0.   0.2  0.4  0.6  0.8  1. ]
np.random.rand [0.33410863 0.19856239 0.40811521 0.14106184 0.40119994]


One other thing that makes arrays incredibly useful is the ability to *reshape* arrays. For example, we can generate 'my_2d_array' from above much more easily:

In [7]:
my_2d_array = np.arange(6).reshape((2,3))
print('my_2d_array:')
print(my_2d_array)

my_2d_array:
[[0 1 2]
 [3 4 5]]


The *shape* of an array is just the number of dimensions along each axis. You see that the shape of my_2d_array is (2,3), meaning it has 2 rows and 3 columns. Numpy is nice: it keeps track of the shape of an array on its own. To print this, you do the following:

In [8]:
print('my_2d_array.shape:',my_2d_array.shape)

my_2d_array.shape: (2, 3)


You can even have 3, 4, 5, etc. dimensional arrays (up to a maximum of 32). This is something to remember when writing code in the future. For instance, when studying quantum mechanics, you may have 3 eigenstates, each defined in $(x,y,z)$ space. The natural way to store this in your code is a 4-dimensional array: the first dimension/index is for which state you're considering, and the remaining 3 are $(x,y,z)$.

To understand basic matrix operations, suppose we want to multiply two matrices, $M$ and $N$. They must have compatible dimensions: if $M$ is a matrix of shape $(a,b)$, then $N$ must be of shape $(b,c)$ for the multiplication $MN$ to make sense. We can check this by using 'M.shape', but numpy does this for us when actually trying to multiply them. There are equivalent function calls: either 'np.matmul(M,N)', or the shorthand 'M @ N'.

Suppose we try to multiply 'my_2d_array' with itself. We see that numpy raises a 'ValueError', telling us that the dimensions are incompatible:

In [10]:
print(np.matmul(my_2d_array,my_2d_array))

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)

One way we could make the shapes compatible is by taking the transpose of 'my_2d_array'. We accomplish this using the '.T' operation. You see that the shapes now make sense for matrix multiplication:

In [11]:
print('my_2d_array.shape:',my_2d_array.shape)
print('my_2d_array.T.shape:',my_2d_array.T.shape)

my_2d_array.shape: (2, 3)
my_2d_array.T.shape: (3, 2)


Notice that you can chain operations involving a '.' - this is true everywhere in Python.

Indeed, the matrix multiplication now works:

In [13]:
print('my_2d_array @ my_2d_array.T:')
print(my_2d_array @ my_2d_array.T)

my_2d_array @ my_2d_array.T:
[[ 5 14]
 [14 50]]


The last thing I want to talk about with numpy is indexing. We have a 2d array; how do we actually get items out of it (or edit the contents of it)? Well, like a list, we ca

For the sake of time, I will now move on from numpy. You will see it a lot in the future, but for now I'll just leave you with some common use cases, and the corresponding functions:

- Trace of a matrix: np.trace
- Eigenvalue decomposition: np.linalg.eig, np.linalg.eigh
- Singular value decomposition: np.linalg.svd
- General linear algebra: see methods within np.linalg
- Polynomial fitting, special polynomials (Chebyshev, etc.): see np.polynomial
- Random numbers (sampled from Gaussian, uniform distributions, etc.): see np.random
- General matrix-vector products (can be confusing, but is useful for speed): np.einsum

And, of course, the package itself has a beginner's guide: https://numpy.org/doc/stable/user/absolute_beginners.html

## Plotting

TODO: plot nonsense array from numpy, plot random scatter plot, maybe do eigenvalue decomposition and plot that or something. Want a mix of line/scatter/contour/3d

## Common functions

TODO: use np.sin, scipy.special.whatever, and so on. then use those in plots

## Resources

TODO: common packages for common use cases