# A Simple Introduction to Python
by Alexander S. Tygesen, DTU Energy

Why do we use Python?

* Interpreted language
* Excellent for scripting
* ___simple___
* Does not use {} all over the place

It differs a lot from "standard" programming languages like C and Java.

We will not need to dive too deep into Python, but here we will only go through some of the basics to get started. Many things can be found on Google!

I think the best way to learn Python, is just to do it, and see what happens.

Note, that cells are executed using `shift+enter`.

### Learning by doing

As any good programmer would, let's start by printing out "Hello World"

In [None]:
print('Hello World')

In [None]:
print(3 + 5)

One useful feature in python, is that you can combine strings using the `+` operator

In [None]:
print('Hello ' + 'World')

Assigning is done using `=`

In [None]:
a = 2
b = 3
c = a + b
print(c)

In [None]:
a = 'Foo'
b = 'Bar'
print(a + b)

## Datastructures

There are some important datastructures in Python, which are good to know.

* Lists
* Dictionaries
* Tuples
* NumPy arrays

#### A `list` is an ordered collection of arbitrary objects

In [None]:
# a list
l = [1, ('eggs', 7), 'spam', 1.35]
print(l)
print(l[1])
print(l[-2])  # indexing with negative numbers counts from the end

It is important to note, that Python starts list indexing from 0.

#### A `dict`  is a mapping from keys to values 

In [None]:
d = {'s': 0, 'p': 1}
print(d)
print(d['p'])
del d['s']
print(d)

#### A `tuple`  is an ordered collection like a list but is *immutable*
useful for keywords in `dict` and more. However, most often a list is sufficient.

In [None]:
# with a list we can reassign values
x = [2, 3]
x[0] = 100
print(x)
# this it not possible with a tuple
y = (2, 3)
print('y = ', y)
try:
    y[0] = 100
except Exception as x:
    print(x)
print('y =', y)

## NumPy

NumPy is perhaps one of the most important packages in scientific computing in Python. The name is a contraction of "Numerical Python", so it is perhaps not surprising that it is used a lot in such fields.

Perhaps the most powerful feature of Numpy is the `array` - a simple list-like container, but has features which are a lot more intuitive and useful for numerical science.
#### Let's try

In python, modules are "imported" like so

In [None]:
import numpy as np

Note the `as np` part here. In python, you can rename your imported modules to more convinient names. In the case of Numpy, it is historically named `np`.

Now we can start using it

In [None]:
x = np.array([1, 2, 3])
print(x)
print(x.mean())

x[1] = 5
print(x)
print(x.mean())

Numpy contains far too many useful features to cover here, however you can solve most numerical issues using numpy (or the related package `scipy`).

We can also do matrices (or multidimensional arrays as they are called in numpy-lingo)

In [None]:
a = np.array([[1, 2, 3], [7, 8, 9]])

print(a)
print('Shape:', a.shape)
print('Number of dimensions:', a.ndim)

We can do many of the standard matrix type operations

In [None]:
print(a.T)  # .T is the transpose
b = np.dot(a, a.T)  # dot-product between a and a.T
print(b)

However, be careful with using the `*` operator. If you are used to say MatLab, it behaves similar to that: It's an element-wise multiplication __NOT__ matrix multiplication.

In [None]:
m = np.array([[1, 2], [3, 4]])
print(m * m)  # Element wise
print(np.dot(m, m))  # matrix-matrix multiplication

`+` and `-` also work elementwise, which is often quite useful, especially for 1D arrays

In [None]:
v1 = np.array([4, 5, 6])
v2 = np.array([0, 1, 2])

print(v1+v2)
print(v1-v2)

ASE relies quite heavily on numpy arrays, and many of the results you will receive from ASE, e.g. when asking for the forces, will be in numpy arrays.

Numpy is easy to learn, but hard to master.

## Plotting with Matplotlib

matplotlib is a matlab style plotting library, which is very commonly used for plotting data.

In [None]:
# For technical reasons, this is used in Jupyter to show figures inside the notebook
# It only needs to be run once per notebook
%matplotlib inline

#### Let's try plotting some stuff

In [None]:
import matplotlib.pyplot as plt

# Construct some data
x = np.linspace(0, 2 * np.pi, 50)  # 50 linearly spaced values in [0; 2*pi]
y = np.sin(x)

plt.plot(x, y, label='sin(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('My sine curve')
plt.legend(loc='best')

## String manipulation

Strings can be manipulated in python in many ways. We are mostly going to be using the `format` method to insert values into our strings, as they provide a nice and convenient way to make pretty readable outputs. Again, there are many ways of manipulating strings, and I recommend using Google to learn more about this topic.

The format uses the `{}` to denote when to modify a string, followed by the `format` method. Without any arguemnts inside `{}`, it will try and just insert whatever you pass into the string.

In [None]:
a = 2
b = 4.7

s1 = 'Hurray!'

lst = [1, 2, 'a', [-1, -2]]

mystr = 'a is {}, b is {}, a+b = {}. {}'.format(a, b, a+b, s1)
print(mystr)

anotherstr = 'Here is a list: {}'.format(lst)
print(anotherstr)

Sometimes though, floats contain a lot of digits, and we use a floating point formatter, as we don't care about more than, say 3 digits

In [None]:
from numpy import pi

s = 'pi with many digits: {}\npi with less digits: {:.3f}'.format(pi, pi)
print(s)

## Functions

Functions are simple to make in python. They consist of a `def` statement, function name, arguments, a function body and optionally a return statement - although, unlike C or Java, a function does not require a return statement.

Functions can take two kinds of arguments: positional arguments and keyword arguments.

Notice that in python, we use indentation to define the scope of a statement, which makes it a lot more human-readable.

In [None]:
def f(x, m=2, n=1):
    y = x + n
    return y**m  # The ** operator is simply the power operator, so in this case it means y^m

print(f(5))
print(f(5, n=8))

In the function above, `f` is the function name, `x` is a positional argument and `m` & `n` are keyword arguments. Keyword arguments basically function as default values for the function if you don't manually specify a new value.

## Conditional statements

In python, if/else statements look like the following

In [None]:
x = 3
if x > 4:
    print('Greater')
else:
    print('Smaller')


# Define some function
def myfunc(a):
    if a > 4:
        print(a, 'is greater than 4')
    elif a == 4:
        print('The input is 4')
    else:
        print(a, 'is smaller than 4')
        
myfunc(3)
myfunc(5)
myfunc(4)

## Loops

Loops are extremely convenient in Python, as we will see in a moment.

The traditional C/java style loops would look something like

In [None]:
x = [8, 11, 13, 21]
for i in range(len(x)):
    print(x[i])

However, in python, we can loop through so-called "iterables" directly. `list`, `tuple` and `np.array` are all examples of iterables which can be looped through in this manner

In [None]:
for i in x:
    print(i)

In [None]:
x = ('Spam', 'Eggs', 'Foo', 23)
for s in x:
    print(s)

Looping through dictionaries is a little different, but simple enough. The iterator is generated by using the `items` method

In [None]:
mydict = {'Foo': 'Bar',
         'Eggs': 'Bacon',
         12: [1, 9, 18, 4]}

for key, value in mydict.items():
    print(key, value)

Quite often we generate data in a loop, and then store it in a list. One of the ways we can do this, is with what's called a "list comprehension". It has the general form of

```python
[f(x) for x in iterable if condition(x)]
```
The if condition is optional, but it's quite convinient for sorting through data.

In [None]:
mylst = [i**2 for i in range(10) if i%2==0]
print(mylst)

If the operation is more complex, it can still be done in a traditional `for`-loop, and e.g. with the `append` method of a `list`.

In [None]:
mylst = []
for i in range(10):
    if i%2 == 0:
        mylst.append(i**2)
print(mylst)

## Reading and writing files

There are many ways to read & write files in Python.

Opening a file consists of opening and closing the file. It can be opened in several "modes", the most common being `r` (read), `w` (write) and `a` (append). Note that `w` mode overwrites anything already on file, while `a` will add to the end of the file, so be careful not to accidentally delete stuff you didn't want to delete with `w` mode.

This can be done using a `open` statement, followed by a `close` method. This however, is __not__ the recommended way, as you will probably forget to close the file, or if your program crashes, it might not close your file propperly, etc.

This is how you should do it the "automatic" way:

In [None]:
with open('myfile.txt', 'w') as f: # Open the file
    print('Hello World', file=f)
    print(123 + 53, file=f)
    
    try:
        print('Writing to f', file=f) # We can no longer print to that file stream
    except ValueError: # Writing to a closed filestream raises ValueError
        print('Nope, cant do that!')

# When we indent back out, the file is closed for us
try:
    print('Writing to f', file=f) # We can no longer print to that file stream
except ValueError:  # Writing to a closed filestream raises ValueError
    print('Nope, cant do that!')
    
# Let's append something
with open('myfile.txt', 'a') as f:
    print('Appending this line to the file', file=f)

Note that we nowhere explicitly say `close`, but python automatically does it for us. 

Filestreams in read-mode are iterable, meaning that we can loop through a file

In [None]:
with open('myfile.txt', 'r') as f:
    for line in f:
        line = line.strip()  # Remove newline characters
        print(line)

__Note__: Be careful with file IO when working in parallel calculations! See further down on how to do that.

# ASE (Atomic Simulation Environment)

ASE is a module designed for working with atoms. It uses the units of Ångstrom (Å) for length and electron volts (eV) for energy.

In essence, ASE contains the `Atoms` object, which is a collection om `Atom` object - thus, when we loop through the `Atoms` object, we get an `Atom` object. The `Atoms` object can then be associated with a so-called `calculator` object, which is just an object which knows how to calculate energies and forces, e.g. GPAW.

ASE and GPAW are quite complex modules, but there are good tutorials for doing many things, which can be found on their respective wiki pages.

https://wiki.fysik.dtu.dk/ase/

https://wiki.fysik.dtu.dk/gpaw/

#### Let's see what that looks like

Here we set up a CO molecule with a bond length of 1.1 Å.

In [None]:
from ase import Atoms
d = 1.1
atoms = Atoms('CO', positions=[[0, 0, 0], [0, 0, d]])

ASE contains tools to visualize the system. This opens a new window for viewing the atoms

In [None]:
from ase.visualize import view
view(atoms)

And as mentioned, we can loop through the `Atoms` object to get `Atom` objects

In [None]:
print(atoms)
for atom in atoms:
    print(atom)

As you can see, the first print statement is `Atoms`, which contains more than a single atom, while the `Atom` object only contains 1 atom.

From this point on, we can start building more complex systems. But before we get to that, let's take a look at how ASE can do simulations.

In [None]:
from ase.calculators.emt import EMT

calc = EMT()
atoms.set_calculator(calc)
print(atoms.get_forces())

Now, the `EMT` calculator is very primitive, and typically will not do a very good job - but it's fast! We will need to turn to GPAW to do any worthwile simulations.

#### Let's take a look at a real example in todays exercise on calculating bulk structures for iron and titanium, which will use many of the things we just saw.

Don't worry if you didn't understand everything the first time, it has a rather steep learning curve.

# Submitting jobs to the queue

Often, DFT calculations take a while to run. A normal execution of a python program will only run on 1 core, however, that is very rarely ever enough to do any real simulations, so we turn to parallel execution of the programs.

To make a long story short, this process is simplified through the GPAW software, as this takes care of doing the parallel code for you! However, we will need to start GPAW in the correct manner, before it can run in parallel.

Running things in parallel obviously takes up most, if not all, of the available CPU's, depending on how many cores you request. This is why we will be running those calculations on the queue, where you will be assigned a slot on a machine, where your simulation can run without disturbing others.

We have set up a program called `qsub.py`, which will take care of most of that for you. The syntax in the terminal is

```bash
qsub.py -t T -p NPROC myscript.py
```
which will submit `myscript.py` to the queue, requesting `NPROC` number of cores for the duration of `T` hours. So for example, it could look something like
```bash
qsub.py -t 1 -p 8 myscript.py
```
which would submit `myscript.py` to the queue for 1 hour on 8 processors. We can then look at our queue with the command 

```bash
qstat -u $USER
```
which gives us information about the jobs we currently have in the queue, whether they are waiting to start, running or completed. You can delete a job from the job with the command

```bash
qdel JOBID
```
where `JOBID` is the ID number of the job, which we can get with the `qstat` command above.


# A few neat tricks

See also https://wiki.fysik.dtu.dk/ase/tips.html


### Looping through a series of systems
Clever utilization of `ase.build` (or databases, see https://wiki.fysik.dtu.dk/ase/ase/db/db.html) can be extremely useful, and make reusing code much easier. Often, our calculator objects don't need (many) adjustments depending on the system, so we can reuse them in a loop. Possibly adjustments could be handled using `if` statements.

In [None]:
# Looping through a list of systems, e.g. molecules
from ase.build import molecule
from ase.calculators.emt import EMT
import pickle

energies = {}  # We can use the system names as keys
mysystems = ['NH3', 'H2', 'N2']

for name in mysystems:
    system = molecule(name)
    calc = EMT()  # We could use any calculator

    system.set_calculator(calc)
    en = system.get_potential_energy()  # We could potentially also do a relaxation here
    energies[name] = en  # Store this value
    
# Now "energies" is a dictionary which contains all of our potential energies
# and we can easily see which energy belongs to which system

# Let's store "energies" to disk, so we can read it later
# We use the "pickle" module for that
# note the 'wb' means we write 'myenergies.pckl' as a binary file (not human readable)
with open('myenergies.pckl', 'wb') as f:
    pickle.dump(energies, f)
    
# Imagine we in a 2nd file now wanted to read those energies for post-proccessing after an expensive DFT run
# note the 'rb' to read binary files
with open('myenergies.pckl', 'rb') as f:
    energies_loaded = pickle.load(f)
    
print(energies_loaded == energies) # check that the loaded version is identical to the original
print(energies_loaded)

### Printing in parallel

When you submit your jobs using the `qsub.py` file, your jobs will be running in parallel on a specified number of nodes. This means that every time you do a `print`, each processor will do that print, so if you do a `print('Hello')` on 4 processors, you would get `'Hello'` back 4 times.

This is not very nice, and ASE has an entire module to handle this, see https://wiki.fysik.dtu.dk/ase/ase/parallel.html

The most useful features of this module is the `parprint` and `paropen`. They also work correctly, even if you are not running in parallel. Consider the following example

In [None]:
%%writefile parallel_example.py
from ase.parallel import parprint, paropen, world

# world.rank tells us which processor is executing the code
print('Hello from print statement on rank', world.rank)

parprint('Hello from parprint on rank', world.rank)

with paropen('a_parallel_file.txt', 'w') as f:
    print(123, file=f)
    print('A line!', file=f)
    print('Foo', file=f)
    print('Theres only one of me!... right?', file=f)

The following runs the above code in parallel on 4 cores, without submitting it to the queue. We will not be needing this in general, but in this case it's easier than submitting to the queue, as we get the output back directly.

In [None]:
!mpiexec -np 4 gpaw-python parallel_example.py

Depending on the execution order, you might even see that the print statements aren't in order. That's due to the nature of how parallelism works, but we will not be discussing that subject in this course, as that's an entire topic itself.

Suffice it to say, that `parprint` makes printing our outputs much cleaner, so you should probably be using that whenever you are printing anything from a parallel calculation.

About the file we wrote to in parallel, let's have a look. What would have happened if we didn't use `paropen`?

In [None]:
with open('a_parallel_file.txt', 'r') as f:
    for line in f:
        line = line.strip()
        print(line)

### Deleting specfic atoms from an atoms object

We can also remove particular atoms from an `atoms` object, which is quite convenient using the list comprehension. The following example removes all `H` atoms from the molecule.

In [1]:
from ase.build import molecule
atoms = molecule('CH3CH2OH')
idx = [atom.index for atom in atoms if atom.symbol=='H']
del atoms[idx]