![University Logo](../Durham_University.svg)

# Introduction to NumPy

### Recap: Python Lists are a container for arbitrary data
 - We can create a list within `[]`
 - Data is accessed by *index*, with the index starting with 0.

In [None]:
# Create a list with arbitrary data

# Accessing the data in the list


#### You can count from the back with negative indicees and get a slice using the colon

In [None]:
# Copy of our list for reference
my_list = [1, 'hello', True, 3.14]

# Accessing the data in the list using negative index

# Slicing the list with indexes


#### Lists are mutable: You can also modify the data by index

In [None]:
# Copy of our list for reference
my_list = [1, 'hello', True, 3.14]

# Modifying the data in the list


#### NumPy Arrays: Key Features

- Homogeneous: Only contain elements of the same data type.
- Fixed Size: Size cannot be changed once created.
- Memory-Efficient: More efficient than Python lists.
- Vectorized Operations: Perform operations on entire arrays at once.
- Multidimensional: Efficiently store and manipulate multi-dimensional data.
- Built-in Functions: Ease mathematical operations and data manipulation.


#### We use NumPy by importing it

Typically we use the alias `np`, as projects tend to use numpy a lot.

In [None]:
# import NumPy module


We can now access anything in numpy via np.function

In [None]:
# Use pi from NumPy


#### We can create NumPy arrays from lists

In [None]:
# create a list with floats from 1 to 5

# convert into list

# display



The object type is a numpy array. The type of contained data is dependent on the content

In [None]:
# check type and dtype


Show that you can change the data type of member of the array to cast to float instead

If a single string is present, everything will be cast into a string! In general, you should already input homogeneous lists

In [None]:
# Show casting in an array with a string


#### We can also read NumPy arrays from disk
For this first step we will use a set of bond lengths determined by X-ray diffraction. Let us have a look at the first lines of the file. 

In [None]:
data1_path = '../Data/presentation/bond_lengths.txt'

# Read data and display the first three lines


Speaker Notes:

In this section explain: Opening files in python. Control statements including the for loop, if statement and break. End on the statement that we have different data types!

## Let us now load the data using NumPy
Because we have different data types in the lines we need to specify this. We load every column separately into a new variable

In [None]:
# Load data from txt, unpack and demonstrate data types


Speaker Notes:

In this section: Revise element unpacking, and function arguments.

#### A quick look at what whe loaded

In [None]:
# Do some slices through our array with start stop and skip


#### Let us have a look at the NumPy datatypes:

| Python| NumPy      | Numpy Short |
|-------|------------|-------------|
| float | float64    |   f8       |
| int   | int64      |   i8       |
| bool  | bool       |   b1        |
|complex| complex128 |   c16       |
| str   | str_       |  <U{n}       |

A list of all data types can be accessed with

In [None]:
np.sctypeDict.keys()

#### Let us determine the mean bond length and its standard deviation
For this we can use NumPys build-in functions. The full list can be found in the routines section of the [NumPy documentation](https://numpy.org/doc/stable/reference/routines.html). Specifically under [statistics](https://numpy.org/doc/stable/reference/routines.statistics.html).

In [None]:
# print mean and std of bond lengths


#### NumPy supports more sophisticated slicing than python lists

In [None]:
# first five entries


In [None]:
# slicing by indexing


#### We can create conditional selections with masks
A mask is an array of booleans that has the same size as the array we want to apply the mask to

In [None]:
# create a mask array for values where element 1 is Si


In [None]:
# Apply our mask to length


very often you would not store the mask in a variable

In [None]:
# Create the mask in line


#### Let us calculate means and standard deviations of different bond types
We will use the [logical operations](https://numpy.org/doc/stable/reference/routines.logic.html#logical-operations) for this purpose. `np.logical_and(arr1, arr2)` can be used to get the elementwise truth of "arr1 and arr2"

In [None]:
# np.logical_and can be used to get the elementwise truth of "arr1 and arr2"
# C-C bonds


In [None]:
# Show & as shorthand


In [None]:
# Also get C-H bond lengths


#### Numpy arrays can be multidimensional.

In [None]:
# create a simple 2D array


In [None]:
# second row or third column


In [None]:
# we can also use this for assignment


Explain that this is a view of the original data and not a new object in memory

#### We can change the arrangement with reshape

In [None]:
# create a linspace with size 12 in new var


In [None]:
# use reshape


### Calculating with NumPy arrays, Basic arithmetic
let us load a new dataset. This time we skip the first column, which contains the atom type and only load the coordinates. Because all loaded data columns have the same type, NumPy will automatically detect the type correctly.

In [None]:
data_path2 = '../Data/presentation/molecule1.xyz'

# Load non-number columns from file (1,2,3)


Mathematical operations are calculated elementwise. Here we take the difference between to points

In [None]:
# Difference between arrays


The distance is the norm of that vector

In [None]:
# calculate the length of that vector


#### Broadcasting an operation
We now want to calculate the distance of the first atom to all other atoms in the molecule

In [None]:
# Calculate the distance of all atoms to atom 1


# Linear Algebra
We now want to apply an inversion matrix to an atomic positions.

In [None]:
# Create an inversion matrix


In [None]:
# use matmul with @


#### If we want to apply it to all coordinates we can use np.einsum

In [None]:
# apply the inversion matrix using einsum


# Summary and Key Takeaways

- **NumPy Arrays**: NumPy arrays are homogeneous, fixed-size, memory-efficient data structures that support vectorized operations and multi-dimensional data.

- **Array Operations**: NumPy provides a wide range of operations that can be performed on arrays, including arithmetic operations, reshaping, indexing, slicing, and statistical operations.

- **Broadcasting**: NumPy's broadcasting feature allows for arithmetic operations between arrays of different shapes.

- **Boolean Indexing**: NumPy arrays support boolean indexing, which can be used to filter data based on certain conditions.

- **Data Reading**: NumPy can be used to read data from files, enabling further data manipulation and analysis.

- **Statistical Measures**: We learned how to calculate mean, standard deviation, and other statistical measures using NumPy.

- **Advanced Operations**: We explored advanced operations like `np.einsum` which provides a way to compute multivariate operations on arrays.

And now we will move on to the examples