# Programming for Chemists


The aim of this course is to provide you with the skills to utilise computers in your own scientific work, from data management, analysis and presentation to solving real scientific problems. The beginning of the course focuses more on the formal aspects of the Python language but then transitions into focusing more on solving scientific problems. 

## Learning Outcomes

At the end of this course participants should be able to demonstrate:

* Knowledge of the Python 3 programming language.
* An understanding of key data types and structures in Python.
* The skills to write and develop simple programs in Python.
* The ability to import data from files for analysis and presentation.
* The ability to detect errors in programs.
* The ability to describe and document the results of a programming project.

## Resources

1. Recommended textbook:
    * Learning Scientific Programming with Python, C. Hill, 2016. Cambridge University Press. **Sussex Ebook link:** https://sussex-primo.hosted.exlibrisgroup.com/permalink/f/c622i2/44SUS_ALMA_DS51141926100002461
2. Supplementary reading:
    * Python notes for professionals: https://books.goalkicker.com/PythonBook/
3. Recommended online resources:
    * Official Python documentation: https://docs.python.org/3/
4. Mathematical/computer programming problems (Good for application of what you learn, but some are very
advanced):
    * Project Euler: https://projecteuler.net/

## Running Python Code Locally

The goal of this course is that you will feel comfortable using Python in your own work. In order for this to be realistic, programming using only Jupyter notebooks in web browsers is not that feasible, so I wrote a separate, short tutorial on how to setup an easy and versatile solution for Python programming locally on your own computer using the Visual Studio Code software. Follow [this link](https://adambaskerville.github.io/posts/LocalProgramming/) if this interests you

## Course Overview

1. Introduction and Jupyter Notebook.
2. Key data types and uses.
3. Control Flow Statements.
4. Basic NumPy and arrays. 
5. File input/output using pandas and plotting using matplotlib.
6. Physical chemistry using NumPy, SciPy and Sympy.
7. Hands on machine learning for chemists.

# Introduction to Programming for Chemists

**Q. Why learn to program as a chemist?**
* The world of chemistry is changing, laboratories are generating increasing quantities of digital data, and chemists need the ability to efficiently process, analyze, and visualize these data.
* An increasing number of chemistry and biochemistry related jobs specifically ask for programming experience.
* It opens more opportunities in different areas of science outside of your degree subject.
* It will make you a more efficient and effective scientist.

**Q. Why use the Python programming language?**

* It is a powerful, general-purpose programming language.
* It is a high-level language, meaning it automates fundamental operations such as memory management carried out at the processor level.
* It has a large variety of data structures such as lists, tuples, dictionaries and sets.
* It can easily interface with lower-level languages such as `C`, `C++`, `Fortran`, `Rust` etc. . . 
* [It is becoming increasingly popular](https://stackoverflow.blog/2017/09/06/incredible-growth-python/):


<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/PythonGrowth.png" width="600" height="600" /></center>

* It has a shallow learning curve with a clean and simple syntax.

**Example:** Consider printing items from a shopping list using Python vs. using C. To run the Python code element click on the code and hold <kbd>SHIFT</kbd> and press <kbd>ENTER</kbd>. This notebook can not run the C code as it only works with one kernel at a time. You do not need to understand the syntax here as we will discuss it in a future session.

**Python Syntax:**

In [None]:
shopping = ["Bread", 'Oranges', 'Soup', 'Tea'] 
for item in shopping:
    print(item)

**C Syntax (cannot be run in this notebook):**

In [None]:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_STRING_LENGTH 20
#define NUMBER_OF_STRINGS 4
int main()
{
    int i;
    char shopping[NUMBER_OF_STRINGS][MAX_STRING_LENGTH + 1];
    strcpy(shopping[0], "Bread");
    strcpy(shopping[1], "Oranges");
    strcpy(shopping[2], "Soup");
    strcpy(shopping[3], "Tea");
    
    for (i=0; i<NUMBER_OF_STRINGS; i++) {
        fprintf(stdout, "%s\n", shopping[i]);
    }
}

# Output:
Bread
Oranges
Soup
Tea

Python was designed to be a highly readable language, with a relatively uncluttered visual layout making use of English keywords where other languages use punctuation. Python aims to be simple and consistent in the design of its syntax which is clear from the above example.

## Google Colab


Google Colab is the platform we will use to host and interact with the Jupyter notebooks. To access the Jupyter notebooks in this course:

1. Visit my website: https://adambaskerville.github.io/tabs/progchem/. 
2. Click the `Open in Colab` button.

    [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/adambaskerville/ProgrammingForChemists/blob/master/)

3. You can then select which notebook you want to view and interact with. You can save copies of your version of the notebooks to your local drive or to cloud storage, but **cannot override the master copies on GitHub** as all your changes are done locally on your ocmputer.
4. When I update the worksheets, Binder may take slightly longer to start the next time you use it, as it has to re-initialize the server.
5. If Google Colab is not working you can use Binder instead, but the sessions will take longer to load.

    [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/adambaskerville/ProgrammingForChemists/HEAD)


## Jupyter Notebook

* **The Jupyter notebook is what you are using now.** It is an open document that can contain documentation, code, interactive elements, and the output of your Python code such as plots and numerical values.
* It is an interactive environment, ideal for data processing and visualisation and suitable for sharing with others.
* It is Free and Open Source Software (FOSS) and accessible anywhere, either online or downloadable for offline use.
* Each interactive session will take place at a computer using a Jupyter Notebook and any required data files.
* **Note** you do not *need* a Jupyter notebook to run Python code. Python code is most commonly developed in text editors locally and run through your computers' terminal. The Jupyter notebook is used in these tutorials for the convenience of allowing text and usable code cells in the same format and without the need for UNIX commands. In session 4 we will download and use `VScode` providing a local and very versatile solution for your programming requirements in this course and beyond.

### The Notebook Interface

The Jupyter interface will hopefully not look entirely alien; after all, Jupyter is essentially an advanced word processor. Take a look around the menus and get a feel for the options. There are two terms that you should notice, cells and kernels, which are key to understanding Jupyter and to what makes it more than just a word processor:

* A **kernel** is a “computational engine” that executes the code contained in a notebook document.
* A **cell** is a container for text to be displayed in the notebook or code to be executed by the notebook’s kernel.

Cells form the body of a Jupyter notebook, and they come in two main types:

* A **code** cell contains code to be executed in the kernel and displays its output below the cell.
* A **Markdown** cell contains text formatted using the Markdown language and displays its output in-place when it is run.

By **double clicking** the text elements you can **modify the text** giving you full control over the notebook; allowing you to add your own notes or reword parts to make them clearer to you. None of this affects the master branch. To exit the editor hold <kbd>SHIFT</kbd> and press <kbd>ENTER</kbd>. See the following gif for what this looks like:

<center><img src="https://raw.githubusercontent.com/adambaskerville/ProgrammingForChemists/master/images/markdownEdit.gif" width="auto" height="auto" /></center>


To use Python in a Jupyter notebook, type your Python code into a code cell which is one of these: 

In [None]:
# This is a code cell!

and hold <kbd>SHIFT</kbd> and press <kbd>ENTER</kbd> to run the code or click the play button at the left hand side of the cell; and the output is printed below the cell. It is **highly encouraged** to add your own code to the example code cells as you progress through the sessions and experiment with what you are learning and test any ideas you have. 

<font color='red'>Throughout the notebooks are red text statements highlighting where code is required or suggested from you.</font>

### Inserting and Deleting Cells

You can insert new code cells above or below the currently selected one by clicking `Insert -> Code cell` on the top menu.

You can insert new text (markdown) cells by clicking `Insert -> Text cell` on the top menu.

You can delete cells by clicking `Edit -> Delete selected cells` on the top menu.

### Unhiding Cells 

Google Colab has an annoying feature whereby it will automatically collapse cells in order to shorten the length of the notebook. This is a minor inconvenience, but we can expand all the cells to be visible by doing the following: 

1. Click `Edit -> Select all cells.` 
2. Click `View -> Expand sections.`

### Saving Your Notebooks

All of the changes you make or add to the notebooks can be downloaded and saved to your local computer. Remember that **anything not saved before you exit the notebooks will be deleted.** If you want to download the jupyter notebook you can go to `File -> Download -> Download .ipynb`. A variety of applications can open and run these including `VScode`. You can also save the notebooks to Google Drive from the `File` menu.

## Getting Started with Python 3

### Hello World!

A tradition when learning a new language is to print "Hello World!" to the screen, which we will now do. To print this in Python we need to invoke the `print` function and inside it write, "Hello World" in quotation marks which represent the **string** datatype in the Python language; we will cover strings in more detail in the next session. To learn the syntax and how to run the code, <font color='red'>type this into the code box below `print("Hello World!")` then run the code</font>.

Note the importance of the quotation marks in the above code example. Run the following code which does not include them:

In [None]:
print(Hello World)

This is invalid syntax as we have now told Python to print variables Hello and World which we have not defined. 

Now we have got to grips with Jupyter, let's learn some Python fundamentals. 

### Creating Variables and Assigning Values
One of the most important things that we want to do is create **variables** which represent a quantity whose value can change. To do this in Python, we need to specify the variable name, and then assign a value to it, which is done using the following syntax, `variable name = value`. 

<font color='red'>Lets assign the value 10 to the letter `x` as follows; printing the value to the screen as well:</font>

`x = 10`

Variable assignment works from left to right. <font color='red'>So the following will give you a syntax error:</font>

`10 = x`

There are strict rules for naming of variables:

1. Variable names must start with a letter or an underscore:

In [None]:
x = 10 # valid
_y = 10 # valid

9x = 10 # Invalid as starts with numeral
$y = False # Invalid as starts with symbol

2. The remainder of your variable name may consist of letters, numbers and underscores.

3. Names are case sensitive:

In [None]:
x = 9 # Define variable lower case x
y = X*5 # 'Accidentally' call upper case X 

When you use `=` to do an assignment, what's on the left of `=` is a **name** for the **object** on the right. `=` assigns the **reference** of the object on the right to the **name** on the left. That is:

```a_name = an_object # "a_name" is now a name for the reference to the object "an_object"```

You can assign multiple values to multiple variables in one line, but there must be an equal number of arguments on the left and right sides of the `=` operator. <font color='red'>This is done using the syntax:</font>

`x, y, z = 1, 2, 3`

In [None]:

print(x, y, z)

You can also assign a single value to several variables simultaneously. <font color='red'>This is done using the syntax:</font>

`x = y = z = 1`

In [None]:

print(x, y, z)

Variable values can be updated with new values throughout the program:

In [None]:
x = 1
y = 2
print(x + y)

x = 3 # Assign a different object to x
print(x + y)

### Comments
In the above examples you may have noticed multiple `#` symbols with text written after them. These are known as **comments** and are a crucial and often underused aspect of programming languages. Comments are lines that exist in computer programs that are ignored by the program. Including comments in programs makes code more readable for humans:

* It provides some information or explanation about what each part of a program is doing. 
* It is a good idea to write comments while you are writing or updating a program as it is easy to forget your thought process later on.
* Comments written later may be less useful in the long term
* Comments are very appreciated by someone who may use your program or be tasked with modifying parts of it, as they need an insight into your thought process.

Try to avoid **W.E.T** comments meaning you 'Wrote Everything Twice'. Comments such as:

In [None]:
print(a)  # prints a

Your comments should be **D.R.Y.** (Don’t Repeat Yourself) and offer insight into how the code truly functions. Do not be afraid to write about limitations or future changes you want to make with the code as **no program is perfect.**

# Review

In this session we covered:

* Motivation for learning to program.
* Motivation for using Python.
* How to use a Jupyter notebook.
* Printed Hello World!
* How to assign variables in Python.
* How to write comments in python.

# Exercise


1. Assign the number 45 to the letter f.
2. Assign the number 13 to the letter g. 
3. Print both these variables to screen. 
4. Reassign the value of f to the value of g and print the sum of the two numbers. 
5. Add a comment at the top of the code box explaining what the short code does.

# Glossary

Throughout this course many acronyms and phrases will be used, presented here for easy lookup:

* **AI:** Artificial Intelligence.
* **Algorithm:** A list of steps to finish a task.
* **Code:** The language that programmers create and use to tell a computer what to do.
* **CPU:** Central Processing Unit. This is the electronic circuitry within a computer that executes instructions that make up a computer program.
* **Data:** Information. Often, quantities, characters, or symbols that are the inputs and outputs of computer programs.
* **Debugging:** Finding and fixing problems in an algorithm or program.
* **DL:** Deep Learning.
* **Function:** A piece of code that you can easily call over and over again.
* **GPU:** Graphical Processing Unit. A specialized, electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images. a specialized, electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images  
* **Hard Drive (HDD):** This is an example of non-volatile random-access memory; where the non-volatile means the disk keeps its bytes even after being powered off.
* **Loop:** The action of doing something over and over again.
* **ML:** Machine Learning.
* **Program:** An algorithm that has been coded into something that can be run by a machine.
* **Pseudocode:** Pseudocode is a plain language description of the steps in an algorithm or another system.
* **RAM:** Random Access Memory. This is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code.
* **Variable:** A placeholder for a piece of information that can change.

# Library Information

This section is included for information on the relevant libraries used throughout this course:
    
- numpy        = 1.16.4
- scipy        = 1.3.1
- sympy        = 1.4
- pandas       = 1.1.1
- scikit-learn = 0.20.3
- seaborn      = 0.9.0
- matplotlib   = 3.1.0
- deepchem     = 2.3.0
- tensorflow   = 1.14
- gast         = 0.2.2