## CHY1610: Introduction to Scientific Computing for Chemists
## Dr Daniel Cole
* Room: BEDB.2.29
* email: daniel.cole@ncl.ac.uk

## Workshop 3: Conditional Statements, Data Files & Functions.

### Conditional Statements

We saw at the end of last week that we can use `for` loops to perform repetitive operations very efficiently. Usually we will not need (or want) to perform every operation in our code on every piece of data. An important concept is the use of *conditional statements* that are used to control the flow of operations in a program. In Python, these usually use the `if...elif...else` construction (`elif` is shorthand for `else if`), and the *comparison operators* that we met in week 1:

In [None]:
a = 3
if a < 3:
    print('a is less than 3')
elif a > 3:
    print('a is greater than 3')
else:
    print('a must equal 3')

Hopefully it's clear what the code is doing, try changing the value of `a`. The code following each `if` statement is only executed if the corresponding logic evaluates to `True` (e.g. `a < 3` for the first condition). Note the use of intendation and the `:` following each `if`, `elif` or `else` operator, as we saw last time for the `for` loops. 

**Example:** Write some code to check whether the measured boiling point of a substance falls in the range expected for typical hydrocarbons (between for example -100 and 250 degrees Celsius).

In [None]:
measured_temp = 300
min_temp = -100
max_temp = 250

if measured_temp < min_temp:
    print("Temperature outside expected range")
elif measured_temp > max_temp:
    print("Temperature outside expected range")
else:
    print("Temperature in expected range")

Actually, we can simplify the code above by chaining together the comparison. Note that the `if` now tests whether the measured temperature is both lower than the maximum **and** higher than the minimum expected temperature:

In [None]:
measured_temp = 30
min_temp = -100
max_temp = 250

if max_temp > measured_temp > min_temp:
    print("Temperature in expected range")
else:
    print("Temperature outside expected range")

As discussed in week 1, we can also string together multiple comparisons using the `and`, `not` and `or` logic operators:

In [None]:
raining = True
day = "Saturday"
temperature = 22

# check whether the conditions are right to go for a walk:
if day == "Saturday" and temperature > 15 and not raining:
    print("Go for a walk")
else:
    print("Stay indoors")

The above code tests several conditions to test whether it's a nice day for a walk. In this case all expressions must evaluate to `True` in order for the condition to be met. In the above, we're advised to stay indoors because it's raining.

In [None]:
raining = True
day = "Saturday"
temperature = 22

if day == "Saturday" and (temperature > 15 or not raining):
    print("Go for a walk")
else:
    print("Stay indoors")

In this second example, we have relaxed the conditions a bit, so it only has to be warm **or** not raining in order for us to decide to go outside. Note that parentheses (brackets) have been used around the `or` test to make it clearer what we intend (and in fact they are crucial here, as otherwise `and` takes precedence over `or` and is evaluated first).

**Question 1.** Edit the code below to allow for a walk on any day of the weekend (not just Saturday).

In [None]:
raining = True
day = "Sunday"
temperature = 22

if day == "Saturday" and (temperature > 15 or not raining):
    print("Go for a walk")
else:
    print("Stay indoors")

Importantly, we can embed the conditional statement inside the `for` loop to perform different operations depending on the values of the data:

In [None]:
for i in range(6):
    # Test whether i is divisible by 2:
    if i % 2 == 0:
        print(i,'is even')
    else:
        print(i,'is odd')

Note the various levels of indentations. The `if` statements are indented once to indicate they are inside the `for` loop. The `print` statements are indented again following the `if` statements. As a simple rule, always add an extra level of indentation following the `:` in a `for` loop or `if` statement.

A `for` loop continues for a fixed number of iterations. This is useful when we know beforehand how long we want to loop for, but this is not always the case. In contrast, we can use a `while` loop to continue for as long as some condition holds:

In [None]:
i = 0
while i < 10:
    print(i)
    i += 3

Here we continue to print `i` while the conditional statement evaluates to `True`. In the next iteration, `i=12`, and we exit from the loop. Note the importance of *initialising* `i` on the first line (in this case to zero), otherwise it would have started from whatever value the computer happens to be storing for `i`. Here are some more examples illustrating the `break`, `continue` and `else` statements:

In [None]:
i = 0
while i < 100000:
    i += 1
    if ( i % 23 == 0 and i % 857 == 0 ):
        print(i, 'is divisible by 23 and 857')
        break

The `break` command inside a loop immediately halts the loop once the conditional statement has been satisfied (in this case we're only interested in the smallest number that is divisible by both 23 and 857). If you remove it, the code will find all answers that satisfy the requested condition.

Similarly, `continue` moves on to the next iteration in the loop, without executing the rest of the code:

In [None]:
i = 0
while i < 10:
    i += 1
    if ( i % 2 == 0):
        continue
    print(i,'is odd')

In [None]:
list = [0, 4, 5, 8, 45]
for i in list:
    if i < 0:
        print(i,'is negative')
        break
else:
    print('error: cannot find any negative numbers')

The last piece of code is designed to pick the first occurance of a negative number out of a list. Your code might depend on being able to find that number, so we have asked for an error message using the `else` statement to be printed if it cannot be found. More formally, the `else` statement can be used following a `for` loop with a `break` statement. If the `break` condition is not met during the loop, then the `else` clause is evaluated.

By now you might have noticed that there is often more than one way to perform a given task in python and the choice of method often comes down to *readability* of the code or *efficiency* (time saving when running the code). But don't worry about this too much at the moment, in this course we're focussing on producing code that works. Very rarely will you be penalised for not doing something in the most efficient way possible.

**Example.** Write some code to test whether **36563** is a prime number.

In [None]:
i = 36563
j = 1
while j < i-1:
    j+=1
    if ( i % j == 0 ):
        print(i,'is divisible by',j)
        break
else:
    print(i,'is a prime number!')

(NB if you ever get stuck in a very long or *infinite loop* in Jupyter (this would happen if we left out line 4 above), you can go to Kernel > Interrupt in the menu bar to stop the calculation.)

**Question 2.** Here is a random DNA sequence, what is the complementary DNA sequence? (The complementary bases are A,T and C,G). Include a warning to be output if the string contains any erroneous bases (i.e. not a, t, c, or g). The output should be formatted in the same way, i.e. no spaces or line breaks (see week 2 for the use of `end=''`) ```cttaccctatagtatgtctaggcagttagctgtctaccatgccgcagacttatcagttcgactgcttcgagctggttaggaaagctacaccgtgtccgcg```
*(Hint: you can loop over the characters of a string, just as you would for a list).*

### File Input / Output

We don't usually want to copy data in/out of our python programs by hand, so usually we will want to read some input data from a file and/or write some edited data to an output file.

In [None]:
f = open('myfile.txt', 'w')
f.close()

The above code snippet opens a file for writing (denoted by the `'w'` character). This will create a new file (or over-write a file with the same name) in the directory where this Jupyter notebook sits (e.g. your OneDrive). Note that we have also closed the file, but this will also happen automatically when the program terminates. We can re-direct the output of the `print` command to our file:

In [None]:
f = open('myfile.txt', 'w')
print('Hello World!', file=f)
f.close()

Check that the file `myfile.txt` has been written to your directory and contains the printed text.

**Example.** Recall that last week we used a `for` loop to calculate the change in concentration of a reactant A with time. Here is the code block again, where this time I have modified the last line to print the time and final concentration to a file `data.csv` (I have also shortened the time step so we don't have too much data to print later):

In [None]:
# set initial concentration [A]_0 and rate constant:
conc_a = 1.0 # mol/L
k = 0.1 # s^-1

f = open('data.csv', 'w')

# discrete time step to use (in seconds):
timestep = 0.01

# total time (in seconds):
tot_time = 10

# iterate over the required number of time steps:
for i in range(int(tot_time / timestep)):
    change_in_a = -k * conc_a * timestep # change in concentration of [A]
    conc_a += change_in_a # update new concentration of [A]

# print final answer after loop has terminated:
print(tot_time, conc_a, sep=',', file=f)
f.close()

Check the code works and you get the expected output printed to `data.csv`. Note that I have used `sep=','` here to separate the data by a comma (rather than a space). This will make the file read-able by Excel later on.

**Question 3.** Change the code above to instead write the concentration `conc_a` and time (in seconds) to a file every 10 time steps. *(Hints: You will need to move the `print` statement inside the `for` loop, think about how to calculate the time from the step counter `i`, and use the `%` mathematical operator to only output every 10th step).*

**(Not assessed).** Check that you can open your new `data.csv` file in Excel, and plot the decay in concentration with time. Note that we will cover how to do this in the Jupyter notebook, without exporting to Excel, next week.

Similarly, we can read in the data from a file one line at a time using the `readline()` command, or all at once using `readlines()`. Let's first write a new file to experiment with:

In [None]:
f2 = open('data_forreading.dat', 'w')

# list of (arbitrary) data for storing:
data = [3.4, 2.6, 4.7, 5.0, 9.5]

# print line numbers and data to file:
for i in range(5):
    print('Line {0} {1}'.format(i, data[i]), file=f2)
f2.close()

And now read from the file, one line at a time:

In [None]:
f2 = open('data_forreading.dat', 'r')
# print first line:
print(f2.readline())
# print second line:
print(f2.readline())
f2.close()

(Note in the above that as well as the data on the line, `readline()` also reads that there is a new line at the end of each line. Hence, when `print()` adds another new line, we end up with the behaviour above).

Just like a string or a list, we can extract the entire contents of a file using `readlines()` and then loop over it one line at a time:

In [None]:
f2 = open('data_forreading.dat', 'r')
for line in f2.readlines():
    print(line, end='')
f2.close()

In the example above we have used a `for` loop to iterate over the contents of the file. Note that the `end=''` argument above is used to now avoid adding a second new line after each line of the file.

**Example.** Take the file `data_forreading.dat` and use it to output the square of the data on each line.

In [None]:
f2 = open('data_forreading.dat', 'r')
for line in f2.readlines():
    # create a list of substrings:
    fields = line.split(" ")
    print('The square of the input data on line {0} is \
{1:6.3f}'.format(fields[1], float(fields[2])**2))

f2.close()

This example illustrates an important point. The data is read in as a type `string`. This is useful as it means we can perform operations like `split(" ")` (see Week 2). This splits the string into substrings that were separated by a blank space (`" "`). We can then operate on elements of the data separately, so e.g. `fields[1]` contains the line number and `fields[2]` contains the data. Note that `fields[2]` is still a string so we need to convert it into a floating point number before peforming maths on it `float(fields[2])`.

### Functions

Once your code starts getting longer, *functions* are invaluable for making sure you are not continually having to re-use code and for breaking down complex problems into more manageable chunks. A function is a set of statements that can be called from anywhere in your code. A function is defined using the `def` statement:

In [None]:
def sum(a, b):
    sum_ab = a + b
    return sum_ab

print(sum(14, 19))

Following the `def` statement, there is a function name and the *arguments* (a,b) that the function expects to receive. 

The function itself is indented following the colon `:`, and the `return` statement specifies what information should be returned to the main code. The function can be called any number of times with different arguments.

In the above case, the function `sum` is called on the last line. The arguments `a` and `b` are set to 14 and 19 respectively, and the function returns the sum of the two arguments (33).

In [None]:
import math
def roots(a, b, c):
    """Return the roots of ax^2 + bx + c"""
    d = b * b - 4 * a * c
    r1 = (-b + math.sqrt(d)) / (2 * a)
    r2 = (-b - math.sqrt(d)) / (2 * a)
    return r1, r2

In [None]:
print(roots.__doc__)
a = 1
b = -1
c = -6
roots(a, b, c)

In the above example, we are calculating the solutions to the equation: ax$^2$ + bx + c = 0. Note that the function returns two values inside a *tuple* (the positive and negative roots of the equation). We have also used a `docstring` as the first line of the function, enclosed in three quotation marks. The docstring gives information on how to use the function, which arguments should be provided, and what values are returned. This can be printed later (as in the next code block) and used as a basis for documenting your code.

Note also that in the `roots` function, we have defined the variable `d` inside the function. We say that `d` is a *local* variable, while for example `a` and `b` defined above are *global* variables (that is, they are available everywhere in the program). That is all fine, but just note that `d` is undefined outside the function, e.g. if we try to print it we get an error:

In [None]:
print(a)
print(b)
print(c)
print(d)

**Question 4.** Use the function below to calculate the equilibrium constant for the reaction:

$$2\text{H}_2\text{(g)} + \text{O}_2\text{(g)} \rightarrow 2\text{H}_2\text{O(g)} $$

from the following data:

| Component | $\Delta_fH$ / kJmol$^{-1}$ | $S$ / kJmol$^{-1}$K$^{-1}$ |
| --- | --- | --- |
| O$_2$ | 0 | 0.2050 |
| H$_2$ | 0 | 0.1307 |
| H$_2$O | -241.8 | 0.1888 |

Make use of a loop to perform your calculation at five different temperatures (300K, 310K, 320K, 330K, 340K). *(Hint: I have included comments in the cell below to help you).*


In [None]:
def equilibrium_constant(deltaH_products, deltaH_reactants, S_products, S_reactants, T):
    """Return equilibrium constant from enthalpy/entropy of 
    reactants/products at given temperature"""
    deltaH = deltaH_products - deltaH_reactants
    deltaS = S_products - S_reactants
    Gibbs_free_energy = deltaH - T * deltaS
    K = math.exp(-1.0 * Gibbs_free_energy/(R * T))
    return K

In [None]:
import math

# define global variable (universal gas constant):
R = 8.3144626181 * 1e-3 # kJ/K/mol

# calculate total enthalpy & entropy changes
# of reactants and products (remember that we 
# need 2 moles of H2 and H2O for every mole of O2):


# use a loop to calculate K at 5 different temperatures:



### Learning outcomes

In today's workshop, you have learned how to:
* Use **conditional** statements to control flow through our code;
* **Read** and **write** from data files;
* Define **functions** to simplify our code.