# Python Scripting and Data Processing Tutorial
Python is a multi-purpose programming language that is useful for a wide variety of applications, in particular it has become very. 
One way to use Python in interactively in a Jupyter Notebook as we are using it here.
Python is run through the Python interpreter, which takes and runs the code in the notebook cell by cell.
To run a the current cell, press `shift+enter`.
These examples are largely based on the [MOLSSI Python Tutorials](https://education.molssi.org/python_scripting_cms/01-introduction/index.html) created by Ashley Ringer-McDonald.
This notebook is meant to be worked through interactively, to start run the cell below.

In [None]:
import checks


## Variables and Data Types

In Python variables store a values and are declared in the form:

```Python
variable_name = variable_value
```

The value is then saved to that variable name can then be referenced later.

Variable values are categorized by data type, which describes what the variable is.
In Python there are three basic data types integers, floats and booleans, which are sometimes called primitive types.
- Integers are whole numbers without a decimal, and can be positive or negative, for example `x = -3`.
- Floating point values or floats are numerical values with decimals, and can also be declared in scientific notation, ex: `T = 346.1` or `dH = 1.56e5` where the number following `e` is raised to the power of 10.
- Booleans are the logical values `True` and `False`, and come into play more when discussing control flow.
When declaring a variable in Python the data type can be specified with a "type hint", which indicates to users and the interpreter what data type the variable is.

The syntax for a type signature is:

```Python
variable_name: variable_type = variable_value
```

These are not required, but help to make code more readable.

### Using Variables

The code cells below illustrate a few examples of using variables.

In [1]:
deltaH: float = 5.415e2 # kJ/mol
print(deltaH)

541.0


A few thinks to note about this cell: first, comments are denoted with `#`, and we can use the `print()` function to output information. Printing variables is often a useful was to understand what your code is doing.

Next lets look at some calculations

In [2]:
deltaH: float = -541.5   #kJ/mole
deltaS: float =  10.4     #kJ/(mole K)
temp: int = 298      #Kelvin
deltaG: float = deltaH - temp * deltaS

print(deltaG)

-3640.7000000000003


In this cell we can see some math operations used, and the result assigned to the variable `deltaG`.
An additional detail illustrates how data types work in Python.
When `temp` is declared it is an integer, since it has not decimal point, but when it is used to calculate `deltaG` the result is a floating point value.
This is because math operations can accept integers and floats interchangeably, and returns a floating point value.
The next example illustrates this more directly.

In [4]:
x: float = 1.1
y: int = 2
print(f"x is an {type(x)}")
print(f"y is an {type(y)}")
print(type(x * y))

x is an <class 'float'>
y is an <class 'int'>
<class 'float'>


The function `type()` returns the type of a variable, and as we can see multiplying an int with a float returns a float.

Next lets look at a few more examples of variables behavior of  in python.

In [43]:
print(deltaG)
deltaG * 1000
print(deltaG)

-3640.7000000000003
-3640.7000000000003


This example illustrates two ideas:
- variables persist between cell in a notebook
- if the value of an operation is assigned a variable name the value does not change

If we wanted to save the result of that calculation, we would have to overwrite the previous value of the variable

In [44]:
print(deltaG)
deltaG = deltaG * 1000
print(deltaG)

-3640.7000000000003
-3640700.0000000005


In most situations you may want both values, in that case create a new variable to store the value.

In [None]:
print(deltaG)
deltaG_joules = deltaG * 1000
print(deltaG)
print(deltaG_joules)

It is also possible to assign several variables at once, this is sometimes done for the sake of compactness.

In [7]:
#I can assign all these variables at once
deltaH, deltaS, temp = -541.5, 10.4, 298
deltaG = deltaH - temp * deltaS
print(deltaG)

-3640.7000000000003


### Composite Data Types

We have previously discussed the primitive data types, which represent a single value.
This is not the only kind of data type in Python, or in programming languages more generally.
Beyond primitive types, there are composite types (also known as compound or complex types) which contain several primitive values in a singe variable.
Since every other data type in Python is a composite type, it would not make sense to enumerate them all below.
We'll go into two important ones, lists and strings, in the subsequent sections.

### Lists

Lists in Python group several variables or values together.
They are declared using square brackets `[]` and the values are separated by commas.
The function `len()` returns the number of items in a list.
The following code block demonstrates declaring a list and printing its length.

In [23]:
# This is a list
energy_kcal = [-13.4, -2.7, 5.4, 42.1]

# print the list length
print('The length of this list is', len(energy_kcal))

The length of this list is 4


Accessing a particular values of a list, a process called *indexing*, is done by placing square brackets with the item number you want after the list name. 
In Python counting starts at 0, so the first element of the list is `list[0]`.

#### Exercise
In the cell below assign the variable `temp2` the second element of the list `temps` by indexing `temps`.


<details><summary>Answer</summary>

```python
temp2 = temps[1]
```

</details>

In [None]:
temps = [78.1, 64.3, 98.7]

# Add your code here
temp2

Lets go over a few more details about lists.
- List can be printed to see their elements
- Like with variables, individual list elements can be updated
- List elements can also be used in calculations without overwriting them
- New values can be added to to list with the syntax `list.append(value)`
- In Python, lists can store elements of any data type

The next example will use these facts about lists.

In [24]:
energy_kcal = [-13.4, -2.7, 5.4, 42.1]

print(f"the original list is {energy_kcal}")

# Adding one to the first list value
energy_kcal[0] = energy_kcal[0] + 1

# Appending a new value to the list

energy_kcal.append(True)
print(f"now the list is {energy_kcal}")

the original list is [-13.4, -2.7, 5.4, 42.1]
now the list is [-12.4, -2.7, 5.4, 42.1, True]


#### List Slicing

Often you'll need some subset of a list, for example we might want just the first few elements of the previous list.
This is called a *slice*.
The syntax for slicing a list is:

```python
new_list = list_name[start:end]
```

It is important to note that in Python **Slices go up to but do not include the `end` element**
So the slice `short_list = list_name[0:2]` includes `list_name[0]` and `list_name[1]` but not `list_name[2]`.

#### Exercise 
In the next cell create a new list containing the first 3 elements of the list `masses`.

```python
    masses = [15.998, 1.01, 14.01, 12.01]
```

<details><summary>Answer</summary>

`new_masses = masses[:4]`

</details>

### Strings

Strings are a composite data type in Python that stores text.
A string can be declared with either single quotes (`'`) or double quotes (`"`).
Strings are indexed and sliced in the same way as lists.

#### Exercise

In the next cell capitalize the first character in the string `hello`.

```python
    hello: str = "hello world"
```

Afterwards create a new string, `word` containing the first word of `hello` by slicing it.

<details><summary>Answer</summary>

```python
    hello: str = "hello world"
    
    hello[0] = 'H'
    word = hello[:6]
```

</details>

### Type Casting

It is important to note that the string `'1'` in not the same as the integer `1`. This is because the first is the character 1 and the second represents the number 1.
Lets try adding them together and see what happens, don't panic when you see an error.

In [26]:
txt_1: str = '1'
num_1: int = 1

num_1  + txt_1

TypeError: unsupported operand type(s) for +: 'int' and 'str'

When we do this, Python gives us a `TypeError`.
This indicates that the data types used in a process are not compatible.
This is because you cannot add one to text.

If we want to add a number from a string to a numerical value, we need to preform a type cast.
This transforms the string into a numerical type.
The syntax for type casting is shown below.

In [29]:
int(txt_1) + num_1

2

## Control Flow

Ordinarily, code is run line by line, from the top of the cell to the bottom of the cell.
Controlling how code executes is called control flow.
In this section we'll go over the main ways to control how code executes.

### Repeating an operation many times: for loops

Lets say we have a list of temperatures values in Celsius, and want to convert them to Kelvin.
We could go through each element and add the 273.15 to convert it like this:

```python
    temps = [37.5, 22.3, 45.8]
    temps_k = []

    temps_k.append(temps[0] + 273.15)
    temps_k.append(temps[1] + 273.15)
    temps_k.append(temps[2] + 273.15)
```

This obviously is impractical, as each a new line is needed for each additional element of the list.
A `for` loop lets us repeat a section of code.
The basic structure of a `for` loop based on the example above, where we want to alter each element of a list is:

```python
    for variable in list:
        do stuff to variable
```

There are a few important things to note about `for` loops:
- A colon `:` is required at the end of every for statement
- Python uses indentations to denote that code is repeated inside the loop, this is inputted with the **tab** key

Lets write a for loop based on the unit conversion example.

In [31]:
temps = [37.5, 22.3, 45.8]
temps_k = []

for number in temps:
    print(number)
    temps_k.append(number + 273.15)

print(f"Temps in kelvin {temps_k}")

37.5
22.3
45.8
Temps in kelvin [310.65, 295.45, 318.95]


### Making choices: logic Statements

Within code, you have to preform different operations on a variable based on its value.
This is handled by an `if` statement.
The `if` statement allows for conditional execution of code.
Lets look at a simple example below


In [32]:
nums = [5, -3, 2, 1, -7]
negs = []

for num in nums:
    if num < 0:
        negs.append(num)

print(negs)

[-3, -7]


The `if` statement works by evaluating a logical statement, and if it is true, executes the indented code following the `if` statement.
There are several logical operations in Python:
- equal to `==`
- not equal to `!=`
- greater than `>`
- less than `<`
- greater than or equal to `>=`
- less than or equal to `<=`

Finally, we come back to the last primitive type, booleans.
As previously stated, booleans are logical values, either `True` or `False`.
When these logical operations are used, they return a boolean value for example:


In [35]:
print(1 == 1)
print(1 == 2)

True
False


Multiple logical statements can be put together with `and` and `or`.
The `and` operator returns `True` only if both sides of the statement are true.
The `or` operator returns `True` if one or both sides of the operator are true.

In [37]:
print("True and True: ", True and False)
print("True and False: ", True and False)
print("False and False: ", False and False)
print("True or True: ", True or True)
print("True or False: ", True or False)
print("False or False: ", False or False)

True and True:  False
True and False:  False
False and False:  False
True or True:  True
True or False:  True
False or False:  False


Finally there is the `not` operator, which returns the opposite of the given logical value.

These logical operations, paired with `if` statements, let us control specifically what code is run.
Lets look at a few examples in the following cells.

In [42]:
nums = [14, 9 -4, 2, -1, -11, 0, 12, 17]

print("first loop")
for num in nums:
    if num < 0 or num == 0:
        print(num)

print("second loop")
for num in nums:
    if num - 2 > 0 and num + 1 < 14:
        print(num)
 


first loop
-1
-11
0
second loop
5
12


### Writing Functions

Functions give us a way to organize code into a blocks that preform a specific task.
This has multiple benefits
- It organizes code into more understandable segments
- It makes code modular, letting us reuse the code for a process
- It makes code more testable

In Python, the following syntax for defining a function:

```python
    def function_name(parameter: parameter_type) -> return_type:
        * function code *
        return value_to_return
```

We define a function using the `def` keyword, followed by the function's name.
This name is used to refer to that function later in the body of the code.
After the name, the function's parameters are given, functions have no limit to the number of parameters.
The parameter's type can optionally be given as well, with a colon `:` used to separate the parameter name from the type.
After the parameters the functions return type can optionally be given using `->` followed by the data type name, and a colon `:`.
Giving the parameter and return data types is optional, but it helps make the code easier to understand, and prevents you from getting a `TypeError` when the wrong parameter type is given.
It is important to not that defining a function does not run it.

Lets look at an example where defining a function is helpful.
In the next cell we have a list of temperature measurements in Fahrenheit called `temps_f`.
We can define a function called `fah_to_c` to make this easier.

In [51]:
def fah_to_c(temp:float) -> float:
    temp_c: float = (temp - 32) * 5/9
    return temp_c

fah_temps = [32.0, 212.0, 54.2]
c_temps = []
for temp in fah_temps:
    c_temps.append(fah_to_c(temp))

print(c_temps)

[0.0, 100.0, 12.333333333333336]


Since the Fahrenheit to Celsius conversion is simple, this is a very short function.
It could easily be done in a `for` loop without making a function, but as processes get more complex, defining a function will help simplify your code.

Lets look at a few other features of functions in Python, default values for parameters, and returning multiple values.
A default parameter value in Python lets you give a standard value that can still be overwritten.
The syntax for adding a default parameter is an equals sign `=` after the parameter name and type (if you give types for parameters).
Multiple values can also be returned just by listing multiple variables in after `return` on the last line of the function.
Lets look at a quick example of a function `in_between` that takes a list, and two values, and returns the number of list elements in-between those values, and a list of those elements.

In [52]:
def in_between(numbers, lower=1, upper=4):
    between = []
    count = 0

    for num in numbers:
        if num > lower and num < upper:
            count = count + 1
            between.append(num)
    return count, between

bet_test = [3, 2, 0, 5]
print(in_between(bet_test))

(2, [3, 2])


## File Processing

Now that we've got the hang of control flow, we can move on to using Python to help us do science!
