# Programming for Chemists: Key data types and uses

**Importance for scientists:**
* Data types are required for every program using numerical methods.
* Scientific data may be a mixture of output data types and we need to know how best to process these data.

In this session we will cover the key data types in Python, their common uses and how they interact with one another. Data types are the classification of data items and represent a quantity which determines what operations can be performed on that data. Numeric, non-numeric and Boolean (true/false) data are the most commonly used data types, however, each programming language has its own classification largely reflecting its programming philosophy. 

Python differs from many other programming languages, not requiring the user to specify the data type when declaring a variable. Python is completely **object oriented**, and not "statically typed". Object Oriented Programming (OOP) is a computer programming model that organises software design around data, or objects, rather than functions and logic. OOP focuses on the **objects that developers want to manipulate** rather than the logic required to manipulate them. 

You do not need to declare variables before using them, or declare their type as the Python interpreter automatically picks the most suitable built-in type for it. We can check the data type of a variable using the `type()` function:

In [None]:
type(1)

This returns `int` defining an integer data type which we will now explore in greater detail along with other numeric data types.

## Numeric data types

There are three distinct **numeric types** in Python: 

1. Integers
2. Floating point numbers
3. Complex numbers

### **Integers**

An `integer` is a whole number, positive or negative, that can be written without a fractional/decimal component, e.g. 1, 12, -643656 etc...<font color='red'>Assign the integer -643656 to the variable x and print its type:</font>

### **Floating Point Numbers**

`Floating point numbers` are numbers that **do** have a fractional part, e.g. 7.3715 or 3.14159. The "float" refers to the fact that a number's decimal point can be placed anywhere relative to the significant digits of the number. In general floating point numbers are represented approximately to a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base, for example the floating point number 4.1384

\\[
4.1384 = \underbrace{41384}_{\text{significand}} \times \underbrace{10}_{\text{base}} \overbrace{^{-4}}^{\text{exponent}}
\\]

This position is indicated as the exponent component, and thus the floating-point representation is very similar to scientific notation. Understanding floating point numbers and their limitations are vital as they are one of the most used data types in all of science. Say an astrophysicist wants to use the speed of light, $c \approx 299 792 458 \text{ms}^{-1}$, and Newton's gravitational constant, $G = 0.0000000000667$ in the same calculation, how do we handle the difference in magnitudes between these numbers? With floating point numbers! 

In Python, floating point numbers are implemented using the `double` type from the `C` programming language meaning they are accurate to approximately 15-16 decimal places. Precision of floating point numbers varies between computers but for nearly all applications this does not pose an issue.

Floating point numbers have the advantage of being able to represent very large and very small numbers, but unfortunately when converting a decimal number into a binary representation, base 2, floating point numbers cannot be represented exactly. Due to limited computer memory you have to 'cut off' the decimal part at some point and continue on with the calculation, hence the creation of the `double` data type which is accurate to approximately 15-16 decimal places. Any digits past this point should be considered **incorrect**. Unfortunately these error digits can cause headaches for programmers as they can accumulate throughout a calculation and cause rounding errors.

Let us look at a simple example of precision loss due to representing decimal numbers as floating point numbers, also known as **floating point rounding error**. Consider the addition of 0.1 and 0.2:

In [None]:
print(0.1 + 0.2)

This is technically incorrect as the answer should be **exactly** 0.3 but for nearly all applications this number is 'numerically' correct. This will not affect an engineer building a bridge as it does not matter that the number is incorrect at the 16th decimal place, but other scientific disciplines can not be as relaxed. The ATLAS experiment at the Large Hadron Collider (LHC) relies crucially on the ability to track charged particles with exquisite precision (10 microns over a 10m length) and high reliability (over 99% of roughly 1000 charged particles per collision correctly identified)$^{[1]}$. It is critical that any rounding errors are not ignored as it could result in a missed collision or a mis-identified one.

Floating point errors should be taken very seriously by scientists as some major catastrophes have resulted due to misunderstanding of how numbers are represented on computers, including:

* **1991 patriot missile defence disaster$^{[2]}$:** The American patriot missile defense battery failed to intercept an incoming Scud missile which killed 28 soldiers. The internal clock of the defense system measured and stored time as an integer value in units of tenths of a second; and so this number was multiplied by 1/10 to produce the time in seconds. 1/10, has a non-terminating binary representation and was chopped at approximately 7 digits after the decimal point. The Patriot battery had been active for around 100 hours, and these truncations of the time measurements resulted in an effective time error of ~ 0.34 seconds. A Scud missile travels at 1,676 metres per second, travelling more than half a kilometre in this time. This resulted in the system mis-tracking the incoming Scud projectile, instead scanning an area of airspace more than 500 metres from the target.

* **1996 Ariane 5 rocket explosion$^{[3]}$:** Shortly after the launch of the rocket, the inertial guidance system produced a number which was interpreted by the rockets' on-board computer as a course change. The on-board computer then reacted correctly to get back on the right course based on that number. However, even though the number from the guidance system looked like a course change, it was not. The guidance system had actually shut down because of a number conversion error which was not handled correctly. The shut down was caused when the software attempted to convert a 64 bit velocity number capable of representing billions of numbers into a 16 bit number which is only capable of representing 65,535 values. For the first few seconds of flight, the rocket’s acceleration was low meaning the conversion between these two values was successful. As the rocket’s velocity increased, the 64-bit variable became too large and could no longer fit in a 16-bit variable. This resulted in a `integer overflow` error which was not handled correctly; resulting in the **\$500 million** unmanned rocket exploding seconds after launch, a catastrophe that could have been prevented by just a few lines of code. Below is a clip of the sudden, incorrect course change the Ariane 5 rocket made:

<center><img src='https://media.giphy.com/media/YqWdjFCg5igsXDF9W2/giphy.gif'></center>

A useful feature of Python is that **integers** are implemented using "arbitrary precision" as they are stored as a `digit array` object which is variable in length and hence integer representations are only limited by the memory of the computer. Arbitrary precision integers mean that as long as the integer can fit in your computer's memory, mathematical operations applied to said integer should result in exactly correct numerical answers; with no rounding errors.   

### **Complex Numbers**

`Complex numbers` are an extension of the familiar real number system in which all numbers are expressed as a sum of a real part and an imaginary part. Imaginary numbers are real multiples of the imaginary unit (the square root of -1), often written `i` in mathematics or `j` in engineering. Python has built-in support for complex numbers, which are written with this latter notation; the imaginary part is written with a `j` suffix, e.g., `3 + 1j`. They can also be defined using the `complex()` function.

In [None]:
# Two ways to define a complex number in Python
3 + 1j

complex(3, 1)

We can access the real and imaginary parts using the `.real` and <font color='red'>`.imag` functions:</font>

In [None]:
x = 3 + 1j

print(x.real)

print(x.imag)

### **Arithmetic Operators**

Python supports standard mathematical operations such as `+`, `-`, `*`, `/` along with several others listed in the table below.

| Operator | Name                    | Example   |
|:---------|:------------------------|:----------|  
| `+`      | Addition                | `x + y`   |
| `-`      | Subtraction             | `x - y`   |
| `*`      | Multiplication          | `x * y`   |
| `/`      | Division                | `x / y`   | 
| `%`      | Modulus                 | `x % y`   | 
| `**`     | Exponentiation (power)  | `x ** y`  |
| `//`     | Floor division          | `x // y`  |

Most of these are commonplace, but we will discuss the meaning of `Modulus` and `Floor division`. The `Modulus` operator yields the remainder when the first operand is divided by the second. Consider the following examples:

In [None]:
x = 20
y = 3

print(x % y) # As 20 is not a multiple of 3 we are left with a remainder of 2

# It also works for floating point numbers
x = 15.79354
y = 3

print(x % y)

**Floor division**, also referred to as integer division will return only the whole integer part of a number even if the number is a floating point number, though the result’s **type** is not necessarily `int`:

In [None]:
x = 30.60245
y = 5

x // y # Prints the integer part but the type is `float`

Here are examples of each of the operations in the above table:

In [None]:
x = 12
y = 5.3

# Addition
print("Addition:", x + y)

# Subtraction
print("Subtraction:", x - y)

# Multiplication
print("Multiplication:", x * y)

# Division
print("Division:", x / y)

# Modulus
print("Modulus:", x % y)

# Exponentiation
print("Exponentiation:", x ** y)

# Floor division
print("Floor division:", x // y)

## Boolean 

Booleans are a data type that exist in all programming languages, acting as a "switch" representing "True" or "False". You can declare a boolean value in your code using the keywords `True` and `False` (note the uppercase). <font color='red'>Let us assign a boolean data type to a variable:</font>

`python_is_fun = True`:

More commonly, a boolean value is returned as a result of some kind of comparison. Consider the comparison of two integers:

In [None]:
x = 1
y = 10

x == y # Does x == y?

Note the use of double equals `==` which represents the `Equal to` operator. This is one of many comparison operators in Python, with others listed in the table below.

| Operator   |      What it means        | Example
|----------  |:-------------             |:-------
| `==`       |  Equal to                 | x == y
| `!=`       |  Not equal to             | x != y
|  `<`       |  Less than                | x < y
|  `>`       |  Greater than             | x > y
|  `<=`      |  Less than or equal to    | x <= y
|  `>=`      |  Greater than or equal to | x >= y

Here are some more examples of comparison operators. <font color='red'>Work out the (True/False) output before typing and running them or try some of your own!</font>:

`1 != 4`

`23 > 23`

`56 <= 100`

`"String" == "string"`

What if we have multiple conditions that decide the output of a program? We can use logical operators to evaluate whether two or more expressions are `True` or `False`. There are three logical operators that are used to compare values, evaluating expressions down to Boolean values returning either `True` or `False`. These operators are `and`, `or`, and `not` and are defined in the table below.

| Operator   |      What it means                                   | Example    |
|----------  |:-------------                                        |:---------  |
| `and`      |  True if **both** the operands are true                  | x and y    |
| `or`       |  True if **either** of the operands is true              | x or y     |
| `not`      |  True if operand is false (complements the operand)  | not x      |

Consider some more complex examples given below utilising `and`, `or` and `not` for comparisons:

In [None]:
(1 != 1) and (2 < 3)

In [None]:
(10 > 1) or (4 < 1)

In [None]:
x = True
not x

In [None]:
x = False

3 > 1 and not x

Boolean logic is a very useful way to make your code behave differently based on current conditions. They see their main use in conjunction with the `if`, `elif` and `else` conditional statements which we will cover in the next session.

## Sequence data types

A sequence is an ordered collection of similar or different data types allowing for storage of multiple values in an efficient and organised fashion. Python has a variety of sequence data types but we will focus on the three key ones:

1. Strings
2. Lists
3. Tuples

### **String** 

A string value is a collection of one or more characters put in single, double or triple quotes.

In [None]:
print('This is a string in single quotes')

print("This is a string in double quotes")

print('''This is a string in triple quotes''')

`''` and `""` are equivalent. If you have an apostrophe in your string, it is easier to use `""` so the string does not terminate at the apostrophe. If you have quotes in the string, it's easier to use `''` so the string does not terminate at the quotes. Triple quotes (both `"""` and `'''` are permitted) allow the string to contain line breaks. A Python string can contain as many characters as you wish. The only limit is your machine’s memory.

**String formatting:**

The `.format()` method formats the specified value(s) and insert them inside a placeholder in the string, <font color='red'>specified using `{}`:</font>

`print("Tom has {} apples and gives 3 to {}".format(no_apples, person))`

In [None]:
no_apples = 
person =


### **List**

A list object is an ordered collection of one or more data items, not necessarily of the same type, put in square brackets.

In [None]:
lab_shopping = ["Conical flasks", "Bunsen burners", "Test tubes"] # A list of strings

data_type_list = ["String", 22, 5.36456, True] # A list of multiple data types

A very important quality of Python lists is that they use **zero-based indexing**, meaning the first item in the list has index 0, **not** 1, an index of 1 is the second item and so on. Indices are assigned to the items in the list in order to extract out specific items using the notation `List[index]`. Let us extract out the first two items from the following list:

In [None]:
lst = ["Item 1", "Item 2", "Item 3"] # Create a list

print(lst[0]) # Print the first item, index 0

print(lst[1]) # Print the second item, index 1

You can also use negative indices to iterate through the list in reverse order. Consider extracting the last item of the following list:

In [None]:
lst = [1, 2, 3, 4, 5] # A list of integers

lst[-1] # Extract the last item of the list by using a negative index

There are a lot of useful operations that can be applied to lists including: 

In [None]:
# Create a list
lst = [1, 5] 

# Add an item, x, to the end of the list using .append(x)
lst.append(7) 
print("Item appended:", lst)

# Insert an item in the list at a specified position using .insert(i,x). i is the index of the element before which to insert, x is the item to insert
lst.insert(1, 3) 
print("Item inserted:", lst)

# Count the number of times an item appears in the list using .count(x), e.g. the number 1:
print("Count items:", lst.count(1)) 

# Reverse the order of elements in the list in place using .reverse()
lst.reverse() 
print("Reverse list:", lst)

# Count the number of items in a list using len(list)
print("Number of items:", len(lst)) 

### **Tuple** 

Lists are enclosed in brackets `[ ]` and their elements and size can be changed, they are **mutable**. Tuples are enclosed in parentheses `( )` and cannot be updated, they are **immutable.**

In [None]:
our_first_tuple = ("apple", "banana", "orange") # Make a tuple
print(our_first_tuple)

You can access tuple items just like we did with list items, by referring to the index number, inside square brackets:

In [None]:
our_first_tuple[0]

There is a culture of using tuples for collections of **heterogeneous** objects which are data with high variability of data types and formats; for example an address broken into name, street, city, county and post code.

Lists on the other hand are often used for **homogeneous** objects which share the same data type or format. it should be noted that these are just conventions and it is up to the user what structure they use. The important deciding factor for choosing a tuple or list is the mutability of your data. Must it be mutable? Use a **list**. Must it not be mutable? Use a **tuple**.

Tuples may seem a limited version of a Python list but they do have unique advantages:
    
* Tuples are faster than lists. If you're defining a constant set of values and are only ever going to iterate through it, use a tuple instead of a list.
* It makes your code safer if you “write-protect” data that does not need to be changed. Using a tuple instead of a list is like having a lock on the data requiring special user privilige to override it.
* Some tuples can be used as dictionary keys (specifically, tuples that contain immutable values like strings, numbers, and other tuples). Lists can **never** be used as dictionary keys, because lists are not immutable. We will discuss dictionaries in the next section.

## Dictionaries

Dictionaries are one of the most useful data types in the Python language. They are Python’s implementation of a data structure that is more generally known as an **associative array**. A dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its associated value.

Dictionaries in Python are defined by enclosing a comma-separated list of key-value pairs in curly braces `{}`. A colon `:` separates each key from its associated value. Let's consider a dictionary which stores some useful scientific data. Note, the key-value pairs can all be defined on a single line, but it is sometimes clearer if each entry is put on a separate line:

In [11]:
# Create a dictionary to hold useful scientific data from an experiment
experiment_info = {
                   'pressure'    : 20,
                   'temperature' : 110,
                   'ph'          : 3.1,
                   'salinity'    : 1.025,
                   'threshold'   : False
                  }

print(experiment_info)

{'pressure': 20, 'temperature': 110, 'ph': 3.1, 'salinity': 1.025, 'threshold': False}


When we print the entries in the dictionary they are displayed in the order they were defined, but that is irrelevant when it comes to retrieving them as dictionary elements are not accessed by numerical index like lists and tuples. A value is retrieved from a dictionary by specifying its corresponding key in square brackets (`[]`):

In [12]:
# Print the ph from the dictionary
experiment_info['ph']

3.1

To add a new key to an existing dictionary, we just need to assign a new key and value. <font color='red'>Let's add a `volume` key and value to our `experiment_info` dictionary:</font>

In [13]:
experiment_info[] = 30 
print(experiment_info)

{'pressure': 20, 'temperature': 110, 'ph': 3.1, 'salinity': 1.025, 'threshold': False, 'volume': 30}


If you want to update an entry in a dictionary, assign a new value to an existing key. Let's change the temperature:

In [None]:
experiment_info['temperature'] = 50 
print(experiment_info)

To delete an entry from a dictionary, use the `del` statement, specifying the key to delete. Let's delete the threshold entry:

In [None]:
del experiment_info['threshold']
print(experiment_info)

In most applications the exact type and number of keys is unknown so it is more common to first define an empty dictionary and add entries later. Any data types can be added to a dictionary including `lists`, `tuples` and even other dictionaries!

In [None]:
# Create empty dictionary as '{}'
new_experiment = {}

# Add temperature to empty dictionary
new_experiment['temperature'] = 200
# Add volume to dictionary
new_experiment['volume'] = 20
# Add list to dictionary
new_experiment['other'] = [3, 5, 6]
# Add dictionary to dictionary
new_experiment['runs'] = {'exp1': 10, 'exp2': 5}

print(new_experiment)

Retrieving the values in the sublist or subdictionary requires an additional index or key:

In [None]:
# Extract the second item from the 'other' list
print(new_experiment['other'][1])

# Extract the number of runs for experiment 1 from the sub dictionary called 'runs'
print(new_experiment['runs']['exp1'])


The values contained in a dictionary don’t need to be the same type and neither do the keys:

In [None]:
# Form a dictionary of multiple data types for keys and values
data_dict = {33: 'String 1', 
             5.34: 'String 2', 
             True: 5.791
            }

# Print the dictionary
print(data_dict)

# Print the associated values in the dictionary using the keys 
print(data_dict[33])

print(data_dict[5.34])

print(data_dict[True])


A key can appear in a dictionary **only once**, duplicate keys are not allowed. A dictionary maps each key to a corresponding value, so it is not possible to map a key more than once. There are **no restrictions** on dictionary values which can be any type of object Python supports, including mutable types like lists and dictionaries, and user-defined objects.

## Checking Types

As your programs become larger and more complex it can sometimes be useful to check specific data types of variables, which is easily done using the  `type` function. This can be especially useful when checking the output of a particular command as data types may be changing when they are not supposed to.

In [None]:
x = 10 
y = 23.67 
z = 'This is a string'

# Check the type of x
print(type(x))

# Check the type of y
print(type(y))

# Check the type of z
print(type(z))


## Review

In this session we covered:
    
* Numeric data types, integers, floating point numbers and complex numbers.
* Booleans which evaluate to 'True' or 'False' and how to use them with comparison and logical operators.
* Sequence data types including strings, lists and tuples, learning how to maipulate them.
* How to use dictionaries to store and access key-value pairs.

## Exercise

Below is a dictionary containing some experimental information on an object colliding with a surface head on. We want to calculate the following quantities:

* The force the object exerts on the surface which can be calculated using Newton's second law

\\[ 
F = ma.
\\]
    
* The momentum the object has when it collides with the surface which can be calculated using

\\[
p = mv.
\\]

Extract the required values from the dictionary, calculate these two values, then add them to the dictionary.

In [None]:
exp_data = {
            'angle'        : 90,
            'velocity'     : 134,
            'mass'         : 0.3,
            'acceleration' : 49,   
           }

## Exercise

The following code does not run. Your task is to fix the program! **Hint: there are multiple mistakes.**

In [None]:
A = '5'

A + 2

cool_dict = {
            'B' : 2,
            'C' : 4,
            'D' : 1
            }

cool_dict['A' = A

print(cooldict)

## References

[1] A. Simone. *Precision measurements at the LHC.* PoS. **RADCOR2019**. 001. 2020 

[2] ["Patriot missile defense, Software problem led to system failure at Dharhan, Saudi Arabia; GAO report IMTEC 92-26"](https://www.gao.gov/products/IMTEC-92-26). US Government Accounting Office. February 27, 1992. Archived from the original on January 6, 2018.

[3] ["Ariane 5 Flight 501 Failure, Report by the Inquiry Board"](https://web.archive.org/web/20000815230639/http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html). Archived from the original (PDF) on 15 August 2000.