<a href="https://colab.research.google.com/github/Teoroo-CMC/DoE_Course_Material/blob/main/Week_1/Workshop_1/1_getting_started_with_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Prelude

Python is a programming language used not only by software developers. In many other fields, such as materials science, the programming language enjoys great popularity because calculations, data analysis and data visualization can be done quickly, reproducibly and with high-quality.

This tutorial is not a complete introduction to the Python programming language. Rather, it is intended to give an overview over the tools that are useful in a scientific environment.

There are a lot of resources in the internet to learn more about Python and the libraries mentioned in this document. You can refer to those during and after this lab.

## Jupyter Notebooks

In bigger software projects the Python code is usually written in files with the suffix '.py'.
In a scientific environment it is often more adequate to use jupyter notebooks instead (files with the suffix '.ipynb') since these support interactive data processing, analysis, and visualization in an easily shared format.

The document that you are viewing right now is such a notebook. A notebook consists of blocks of (Python-)code and (Markdown-) text. Notebooks are executed in a so-called Kernel. Kernels are programming language specific processes that run independently and interact with the Notebook. You can see if a Kernel is currently running on the top-right corner of the screen.

To run a cell of the notebook, select it (which gives it a green border) and either:
- Press 'Run' at the top of the screen, or
- Press 'Ctrl' and 'Enter'.

## Using NumPy, Pandas and Matplotlib

Usually, the first few lines of Python code (or the first Jupyter Notebook cell) contain so-called import statements. These tell the Python interpreter which of the packages that we have installed on our computer we want to use in this notebook.

In [1]:
import numpy as np # Now Numpy can be accessed via np.<name of variable or function to use>
import pandas as pd # Now Pandas can be accessed via pd.<name of variable or function to use>
# ignore the following line, it only tells the matplotlib library to display the plots within this notebook directly
%matplotlib inline  

So, here we import these tools and assign them abbreviated names ('import X as Y'). We do not have to use abbreviations of package names, and we can use other names if we prefer, but we choose to respect the Python tradition in these exercises.

### Disclaimer: 

Most of the Basic Python and NumPy code from this notebook has been taken from:

https://medium.com/@manmeet20singh11/introduction-to-python-jupyter-notebook-numpy-and-pandas-a57dc1086a8f

The examples given in the tutorial on that link are rather extensive and can serve as a good starting point diving into the Python and NumPy philosophies.

Some of the exercises are taken from:

https://raw.githubusercontent.com/weisscharlesj/SciCompforChemists/master/Book%5FPDF/SciCompforChemists%5F1.0.6a.pdf

A good and more complete introduction into the programming language can be found here: 

https://docs.python.org/3/tutorial/

You don't have to read everything here. This website is more a place to look things up.

## 1. Basic Python

Python source code consists of the actual code and comments. 
The code is read and executed by the Python Interpreter line by line. This allows you to instruct the computer what you want it to do.
Comments serve as a way of documenting the code, i.e. making it more readable to someone who has not taken part in writting the code. 
Basically, everything after a hash sign (#) and it not read by the interpreter.

### Python data types

Python offers four primitive datatypes (Strings, Booleans, Integers and Floats) and a series of in-built non-primitive datatypes, like e.g. Tuples, Lists, Dictionaries and Sets, which allow you to bundle data together.

Integers ("whole numbers") and floats are two different ways to represent numbers on a computer.
Booleans are a data type best thought of as having either the value 1 or 0, or "True" or "False". They are very much used when working with algorithms and logic.

Strings are used to represent text.

Read through and execute the following cells and study the different outputs and the code that produces them.

#### Primitive datatypes

In [3]:
### Integers and floats

x = 4          # Create an Integer and assign it to a variable with the name x
print(x)       # Prints "4"
print(type(x)) # Prints "<class 'int'>"
print(x + 1)   # Addition; prints "5"
print(x - 1)   # Subtraction; prints "3"
print(x * 2)   # Multiplication; prints "8"
print(x ** 2)  # Exponentiation; prints "16"
x += 1         # Basically the same as x = x + 1
print(x)       # Prints "5"
x *= 2         # Basically the same as x = x * 2
print(x)       # Prints "10"

y = 1.5        # Create a Float and assign it to a variable with the name y
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints "1.5 2.5 3.0 2.25"


y              # If you only specify a variable name (regardless of the type) in the end of a cell, the value is printed (only works in Jupyter notebooks!)

4
<class 'int'>
5
3
8
16
5
10
<class 'float'>
1.5 2.5 3.0 2.25


1.5

In [4]:
# Booleans

t = True       # Create a Boolean and assign it to a variable with the name t
f = False      # Create a Boolean and assign it to a variable with the name f
print(type(t), type(f)) # Prints "<class 'bool'>"

<class 'bool'> <class 'bool'>


In [5]:
# Strings

hello = 'hello'    # String literals can use single quotes
world = "world"    # or double quotes; it does not matter.
print(hello)       # Prints "hello"
print(len(hello))  # String length; prints "5"
hw = hello + ' ' + world  # String concatenation
print(hw)  # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print(hw12)  # prints "hello world 12"

hello
5
hello world
hello world 12


#### Non-primitive datatypes

In [6]:
# Lists 

# List items are ordered, changeable, and allow duplicate values.

A = [3, 1, 2]         # Create a list
print(A, A[0])        # Prints "[3, 1, 2] 3" - counting starts with 0 in Python!
print(A[-1])          # Negative indices count from the end of the list; prints "2"
A[2] = 'element3'     # Lists can contain elements of different types
print(A)              # Prints "[3, 1, 'element3']"

A.append('element4')  # Use the append method of List and add a new element to the end of the list
print(A)              # Prints "[3, 1, 'element3', 'element4']"
b = A.pop()           # Use the append method of List and remove and return the last element of the list
print(b, A)           # Prints "element4 [3, 1, 'element3']"

[3, 1, 2] 3
2
[3, 1, 'element3']
[3, 1, 'element3', 'element4']
element4 [3, 1, 'element3']


In [7]:
# List slicing

# A slice is a subview of a part of the list
nums = list(range(5))     # an alternative way to create a list
print(nums)               # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])          # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])           # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])           # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])            # Get a slice of the whole list; prints "[0, 1, 2, 3, 4]"
print(nums[:-1])          # Slice indices can be negative; prints "[0, 1, 2, 3]"
nums[2:4] = [8, 9]        # Assign a new sublist to a slice
print(nums)               # Prints "[0, 1, 8, 9, 4]"

[0, 1, 2, 3, 4]
[2, 3]
[2, 3, 4]
[0, 1]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 8, 9, 4]


In [8]:
# Tuples

# A tuple is a collection which is ordered and unchangeable.

thistuple = ("apple", "banana", "cherry")

print(thistuple, thistuple[0])        # Prints "("apple", "banana", "cherry") "apple"" - counting starts with 0 in Python!
print(thistuple[-1])                  # Negative indices count from the end of the list; prints "cherry"

# thistuple[0] = "pineapple"          # This will not work since tuples are unchangable
print(thistuple)

('apple', 'banana', 'cherry') apple
cherry
('apple', 'banana', 'cherry')


In [9]:
# Sets

# A set is a collection which is both unordered and unindexed.

thisset = {"apple", "banana", "cherry"}
print(thisset)        # This works with     sets
# print(thisset[1])   # This does not work with sets. They cannot be indexed.

anotherset = {"pineapple", "raspberry", "banana", "cloudberry"}

# Sets allow nice utilities
# e.g. ...
print(thisset.union(anotherset))                   # Returns a new set containing all items from both sets.
print(thisset.symmetric_difference(anotherset))    # Returns a new set, that contains only the elements that are NOT present in both sets.

{'apple', 'banana', 'cherry'}
{'cloudberry', 'raspberry', 'pineapple', 'apple', 'banana', 'cherry'}
{'pineapple', 'apple', 'raspberry', 'cloudberry', 'cherry'}


In [10]:
# Dictionaries

d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"

print(d)              # Prints {'cat': 'cute', 'dog': 'furry'}
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d)              # Prints {'cat': 'cute', 'dog': 'furry', 'fish': 'wet'}
d['fish'] = 'orange'
print(d['fish'])      # Prints "orange"
del d['fish']         # Remove an element from a dictionary
print(d)              # Prints {'cat': 'cute', 'dog': 'furry'}

cute
True
{'cat': 'cute', 'dog': 'furry'}
{'cat': 'cute', 'dog': 'furry', 'fish': 'wet'}
orange
{'cat': 'cute', 'dog': 'furry'}


### Loops

In [11]:
# A loop allows to do operations iteratively

# The most common loop is the for loop
integerlist = [0, 1, 2, 3, 4]

for element in integerlist:
    # Do something smart...
    print(element)

0
1
2
3
4


In [12]:
# the same result as...
for element in range(5):    # Create a range object instead of a list
    # Do something smart...
    print(element)

0
1
2
3
4


In [13]:
# Another common loop is the while loop

# the same result as...
counter = 0                 # Create a counter
while counter < 5:          # Execute until first time False
    # Do something smart...
    print(counter)
    counter += 1

0
1
2
3
4


### Python functions

Sometimes we perform certain tasks many times, but without having to re-write the lines that perform that task. For this purpose we define functions.

A function usually takes in zero or several arguments, performs some sort of task using these arguments and then returns some object or information. Inside the definition of a function, we write the code that would perform the task when the function is run. Then, every time we want to run the function, we just call it.

In [14]:
# Defining functions

def sign(x):    # the function takes in one argument - here its name is x
    
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print('sign of {} is {}'.format(x, sign(x))) # the function is called on numbers from the list which we loop through
    
# Prints "negative", "zero", "positive"

sign of -1 is negative
sign of 0 is zero
sign of 1 is positive


Puh... that was a lot to take in. Don't worry if you did't get everything until now. The lines above should only get you a feeling how Python works at it's core. Hold on a little more, now we can start to introduce Numpy which is one of the most well known Python libraries for handling numbers, vectors and matrices. 

## 2. NumPy

In [15]:
# The NumPy array

a = np.array([1, 2, 3])   # Create a rank 1 array using Numpy 
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]])    # Create a two-dimensional
print(b.shape)                     # Prints the array's shape "(2, 3)" 

# Similarly, the elements of 2D arrays can be referred to through their column and row indices 
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"
                                   

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]
(2, 3)
1 2 4


In [16]:
# Other ways of creating an array

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[0.91940167 0.08143941]
                             # [ 0.68744134  0.87236687]]"
    
# Creating an array from a list
F = [1,2,3,4]
f = np.array(F)
print(f)             # Prints "[1 2 3 4]"

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.0508782  0.6594542 ]
 [0.94451021 0.24630679]]
[1 2 3 4]


In [17]:
# NumPy math

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(list(x) + list(y)) # Note the difference between arrays and lists when it comes to addition!
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

# Performing a mathematical operation row- and column-wise
x = np.array([[1,2],[3,4]])
print(x)
print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

[[ 6.  8.]
 [10. 12.]]
[array([1., 2.]), array([3., 4.]), array([5., 6.]), array([7., 8.])]
[[ 6.  8.]
 [10. 12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[1.         1.41421356]
 [1.73205081 2.        ]]
[[1 2]
 [3 4]]
10
[4 6]
[3 7]


When performing these array operations like summation, multiplication, etc. it is very important to make sure that the shapes of the arrays you are using for the operation are complementary. Most of the time you will get a warning but sometimes errors like this are tedious to debug.

For convenience, numpy arrays also allow operations using "*", "/", "+" or "-" signs. **Uncomment the last line in the cell below to see what happens!**

In [8]:
x = np.arange(9)
x = np.reshape(x, (3,3))

print(x)

y = np.ones((12))*np.arange(12)
y = np.reshape(y, (3,4))
print(y)

z = x+y

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]


ValueError: operands could not be broadcast together with shapes (3,3) (3,4) 

Great job! You are now familiar with some important NumPy functionality. Of course, you don't have to remember all function names. Just use your favourite internet browser to find the functions which suit you best.

## 3. Pandas

Pandas is always helpful if you have data which can be arranged in a column like way (relational data). 
The main object Pandas is providing is the so-called DataFrame which saves data in rows and columns and provides some handy functions.
Here we will only cover some very basic things you can do with Pandas. We will not use much of Pandas' functionality in our labs, except for the purpose of displaying data in a table and then plotting the columns.

So in this notebook we will only go through:

- writing data into a DataFrame

- importing data from an external file

- writing a DataFrame to an external file

- very basic statistic description of the data

- adding columns into a DataFrame

Plotting data directly from a Pandas DataFrame will (extensively) be covered in the **seaborn\_tutorial.ipynb** notebook.

In [9]:
# Ways of creating a pandas DataFrame

# From a dictionary
data = {
    'apples': [3, 2, 0, 1],
    'oranges': [0, 3, 7, 2]
}

purchases = pd.DataFrame(data)

purchases # Reminder: If you only specify a variable name (regardless of the type) in the end of a cell, the value is printed (only works in Jupyter notebooks!)

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


In the cell above, we have generated a DataFrame object called "purchases", using information contained in a dictionary called "data".

In [10]:
# From numpy arrays

column_names = ["apples", "oranges"]
data = np.array([[3,0],[2,3],[0,7],[1,2]])

purchases = pd.DataFrame(data = data, columns = ["apples", "oranges"])

purchases 

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


In [11]:
# From a file

sales = pd.read_csv('https://raw.githubusercontent.com/Teoroo-CMC/DoE_Course_Material/main/Week_1/Workshop_1/monthly_sales.csv', index_col = 0, on_bad_lines='skip')

sales 

Unnamed: 0_level_0,2018 sales %,2019 sales %,2020 sales %
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan,92.82,91.65,100.18
Feb,99.8,89.33,89.33
Mar,102.51,90.49,98.63
Apr,90.49,102.12,100.96
May,102.12,88.94,104.45
Jun,105.61,94.37,99.02
Jul,100.96,99.8,102.12
Aug,101.35,87.78,95.92
Sep,97.86,93.98,103.67
Oct,90.88,106.0,96.31


In [12]:
# Some statistics information about the data from a DataFrame, per column
# mean value
# standard deviation
# minimum value
# maximum value
# the 25%, 50% (median) and 75% percentiles

sales.describe()

Unnamed: 0,2018 sales %,2019 sales %,2020 sales %
count,12.0,12.0,12.0
mean,98.4075,94.465,99.020833
std,4.874198,6.163882,4.438307
min,90.49,87.78,89.33
25%,95.145,89.2325,96.2125
50%,100.185,92.815,99.6
75%,101.5425,99.9925,102.4125
max,105.61,106.0,104.45


In [13]:
# Selecting a row
sales.loc["Jun"]

 2018 sales %    105.61
 2019 sales %     94.37
 2020 sales %     99.02
Name: Jun, dtype: float64

In [15]:
# Row-wise stats                       #### This is not mean? This is minimum? (Of the wrong month?)

september_mean = min(sales.loc["Sep"])
print(september_mean)

93.98


In [16]:
# Adding an extra column
predictions = np.round(100 + (np.random.rand(12,1)-np.full((12,1),0.5))*15, 2)

index_list = sales.index.values.tolist() #extract row names from the sales DataFrame

# Create a new DataFrame with the same indices like before
sales_2021 = pd.DataFrame(data = predictions, columns = ['2021 sales %'], index = index_list)

# Merge these two DataFrames into one
sales_updated = pd.concat([sales, sales_2021], axis = 1)

sales_updated

Unnamed: 0,2018 sales %,2019 sales %,2020 sales %,2021 sales %
Jan,92.82,91.65,100.18,101.26
Feb,99.8,89.33,89.33,97.46
Mar,102.51,90.49,98.63,101.6
Apr,90.49,102.12,100.96,98.82
May,102.12,88.94,104.45,97.18
Jun,105.61,94.37,99.02,106.71
Jul,100.96,99.8,102.12,96.96
Aug,101.35,87.78,95.92,99.71
Sep,97.86,93.98,103.67,100.38
Oct,90.88,106.0,96.31,98.57


#### Questions to practice Python

1) A 1.6285 L (V) flask contains 1.220 moles (n) of ideal gas at 273.0 K (T). Calculate the pressure (P) for the 
   above system by assigning all values to variables (e.g., V, n, R, and T) and performing
   the mathematical operations on the variables.
    
2) Calculate the distance of point (23, 81) from the origin on an xy-plane first using the 
   math.hypot() function and then by the following distance equation.
    
3) Assign x = 12 and then increase the value by 32 without typing “x = 32”.

4) Create the following variable elements = 'NaKBrClNOUP' and slice it to 
   obtain the following strings.
   
       a) NaK
       b) UP
       c) NKrlOP

5) For DNA = 'ATTCGCCGCTTA', use Boolean logic to show that the DNA sequence 
   is a palindrome. Hint: this will require a Boolean 
   logic operator to evaluate as True.
   
   
6) Create a list of even numbers from 18 → 88 including 88. Using list methods, 
   perform the following transformations in order on the same list (you may have to google the required methods):
   
       a) Reverse the list
       b) Remove the last value (i.e., 18)
       c) Append 16
   
   
7) Create a tuple of even numbers from 18 → 320 including 320. 

8) Can you reverse, remove, or append values to the tuple?

9) Write a Python script that prints out “PV = nRT” twenty times.

10) The isotope 137Cs has a half-life of about 30.2 years. Using a while loop, determine 
    how many half-lives until a 500.0 g sample would have to decay until there is less 
    that 10.00 grams left. Create a counter (counter = 0) and add 1 to it each cycle of
    the while loop to keep count
    
11) Complete a function calculating the Boltzmann distribtion P = exp(-E/kT) taking parameters E (energy) and T (temperature). Then, use this function to plot the distribution against energy for different temperatures T = 100 K, 200 K and 500 K.


In [26]:
# Question 1 

V = 1.6285 #Volume in L
n = 1.220 #moles of ideal gas
T = 273 #K

def ideal_gas_law (V, n, T):
    R = 8.314 #Gas constant 
    Pressure = (n*R*T)/V

    return Pressure


pressure = ideal_gas_law(V, n, T)
print(pressure)


1700.3750936444578


In [30]:
#Q2

import math

distance1 = math.hypot(23, 18)

print(distance1)

29.206163733020468


In [14]:
#Q3

#3) Assign x = 12 and then increase the value by 32 without typing “x = 32”.

x = 12 

while x < 32:           #Why is this inclusive?
    x += 1 


print(x) 


32


In [25]:
#4) Create the following variable elements = 'NaKBrClNOUP' and slice it to 
#   obtain the following strings.
#   
#       a) NaK
#       b) UP
#       c) NKrlOP

variable = 'NaKBrClNOUP'

a = variable[0:3] #is exclusive up until index 3 -> gives me 0-2
b = variable[-2:] # if i write -1, sista elementet är inte inkluderat
c1 = variable[0]
c2 = variable[2]
c3 = variable[4]
c4 = variable[6]
c5 = variable[8]
c6 = variable[10]
c= c1 + c2 + c3 + c4 + c5 + c6  #Do this in a better way?

print(a)
print(b)
print(c)


NaK
UP
NKrlOP


In [None]:
#5) For DNA = 'ATTCGCCGCTTA', use Boolean logic to show that the DNA sequence 
#   is a palindrome. Hint: this will require a Boolean 
#   logic operator to evaluate as True.



In [None]:
def boltzmann(E, T):
    '''
    Calculate the Boltzmann distribtion.
    
    Args:
        E (float): Energy in kJ/mol.
        T (float): Temperature in Kelvin.

    Returns:
        float: Boltzmann distribution.
    '''
    pass  # This keyword is just a placeholder and does nothing. You have to replace it with some nice code.

12) DNA is composed of two strands of the nucleotides adenine (A), thymine (T), 
    guanine (G), and cytosine (C). The two strands are lined up with adenine always 
    opposite of thymine and guanine opposite cytosine. For example, if one strand is 
    ATGGC, then the opposite strand is TACCG. Write a function that takes in a DNA 
    strand as a string and prints the opposite DNA strand of nucleotides.
    
13) create two random numpy arrays with shapes (5, 5) and perform a matrix multiplication and print the result. 
    You may have to find the functions you need in the Numpy documentation: https://numpy.org/