# Unit 03: Loops, pandas and simple plotting

<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="right"/></a>

Author: Dr Claire Hobday   

Email: claire.hobday@ed.ac.uk

## Learning objectives:

By the end of this unit, you should be able to
- use in-built functionality in python
- import modules and libraries 
- use the `math` module to do some simple scientific computing tasks
- developemore `pandas` skills to deal with large volumes of data
- use logical operations to filter data
- understand and use the different types of loops to do repetitive tasks including:
    - `for`
    - `if`
    - `else`/`elif`
    - `while`
    - `break`
- combine these tools to analyse data in a large file containing information about the periodic table
- use tools from python to understand trends in the periodic table

Some of the material was adapted from [Dr. Matteo Degiacomi](https://github.com/Degiacomi-Lab/python4science/blob/master/2_Python_numerical_data.ipynb), as well as [Software Carpentries](http://swcarpentry.github.io/python-novice-gapminder/index.html).

## Table of contents
- [3.1 Working with Pandas](#31-Pandas)  
   - [3.1.1 Importing libraries](#311-importing-libraries)
   - [Tasks 3.1.1](#tasks-311)
   - [3.1.2 Import the pandas library](#312-import-the-pandas-library)
   - [3.1.3 Accessing the dataframe](#313-accessing-the-dataframe)
- [3.2 Loops](#32-loops)  
   - [3.2.1 For loops](#321-for-loops)
   - [Tasks 3.2.1](#tasks-321)    
   - [3.2.2 Conditional Loops](#322-conditional-loops)    
   - [Tasks 3.2.2](#tasks-22)
- [3.3 Boolean indexing](#33-boolean-indexing)    
   - [Tasks 3.3.1](#tasks-331)
- [3.4 Using Pandas DataFrames with loops to analyse data](#34-using-pandas-dataframes-with-loops-to-analyse-data)
   - [Task 3.4.1](#task-341)
- [3.5 Plotting with Pandas](#35-plotting-with-pandas)
   - [Tasks 3.5.1](#tasks-351)
- [3.6 Feedback](#36-feedback)


   

**<span style="color:black">Jupyter Cheat Sheet</span>**
- To run the currently highlighted cell and move focus to the next cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- To run the currently highlighted cell and keep focus in the same cell, hold <kbd>&#x21E7; Ctrl</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- To get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;

### Links to documentation

You can find useful information about using `math` and `pandas` at
- [math](https://docs.python.org/3/library/math.html)
- [pandas](https://pandas.pydata.org/)
- [anaconda](https://anaconda.org)
- [PyPI](https://pypi.org)

# 3.1 Working with Pandas <a id='31-pandas'></a>

## 3.1.1 Importing libraries <a id="311-importing-libraries"></a>

### Most of the power of a programming language is in its libraries.

- A library is a collection of files (called modules) that contains functions for use by other programs.
- May also contain data values (e.g. numerical constants) and other things.
A library’s contents are supposed to be related, but there’s no way to enforce that.
- The Python standard library is an extensive suite of modules that comes with Python itself.
- Many additional libraries are available from anaconda or PyPI (the Python Package Index, see above for links).

### A program must import a library module before using it.
- Use `import` to load a library module into a program’s memory.
- Then refer to things from the module as `module_name.thing_name`.
  - Python uses `.` to mean “part of”.
- We will be using a library called `pandas`

### Import specific items from a library module to shorten programs.
- Use `from ... import ...` to load only specific items from a library module.
- Then refer to them directly without library name as prefix.

Run the cell below to import all the required libraries for this unit:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Make the helper functions accessible
import sys
import os.path
sys.path.append(os.path.abspath('../'))
from helper_functions.mentimeter import Mentimeter

Run the below cell to import $\cos$ and $\pi$ from the library `math`:

In [None]:
# We are importing the functions cos and the value pi from the library math
from math import cos, pi

print('cos(pi) is', cos(pi))

### Create an alias for a library module when importing it to shorten programs.

- Use `import ... as ...` to give a library a short alias while importing it.
- Then refer to items in the library using that shortened name.

In [None]:
import math as m

print('cos(pi) is', m.cos(m.pi))

# Tasks 3.1.1 <a id="tasks-311"></a> 

<div class="alert alert-success">
<b>Task 3.1.1 a: Jigsaw Puzzle (Parson’s Problem) Programming Example</b>
</div>

Rearrange the following statements so that a random DNA base is printed and its index in the string. Remember that you've already imported math above! You can check it in an empty cell below to understand what it is doing. 

In [None]:
from IPython.display import IFrame
IFrame('https://parsons.herokuapp.com/puzzle/7cf55d16a0454de580f31418505f3b54', width=1000, height=400)

In [None]:
# Test out the code in this cell once you have the right order!

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.1.1 a </summary>
    
```python
import math 
import random
bases = "ACTTGCTTGAC" 
n_bases = len(bases)
idx = random.randrange(n_bases)
print(f"Random base {bases[idx]} base index {idx}.")

```
 </details>

<div class="alert alert-success">
<b>Task 3.1.1 b: Importing With Aliases</b>
</div>


1. Fill in the blanks so that the program below prints `90.0`.
2. Rewrite the program so that it uses `import` without `as`.
3. Which form do you find easier to read?

In [None]:
# Question 1
import math as m
angle = # FIXME.degrees(# FIXME.pi / 2)
print(# FIXME)

In [None]:
# Question 2

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.1.1 b </summary>
    
Filling in the right variables:  
    
```python
import math as m
angle = m.degrees(m.pi / 2)
print(angle)   
    
```
Re-writing the program without an import as:
    
```python
import math
angle = math.degrees(math.pi / 2)
print(angle)  
    
```   
    
*Explanation*:
  
Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.

 </details>

<div class="alert alert-success">
<b>Task 3.1.1 c: Importing specific items</b>
</div>


1. Import the exponential function from the math libary.
2. Use it to work out $e^{10}$.
3. Import a function from math which will allow you to raise a number to a power of your choice.
4. Raise $6^5$

In [None]:
# FIXME


<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.1.1 c </summary>
    
Filling in the right variables:  
    
```python
#1
from math import exp 
#2
number= exp(10)
print(number)
#3
from math import pow
#4
powered = pow(6,5)
print(powered)   
```   
    
*Explanation*:
  
The first part should be relatively straightforward. For part 3, you may need to google the math library and read the documentation to find the function that you need to raise a number to the power of another number. 

 </details>

<div class="alert alert-info"> <b>Key Points</b></div>

- Most of the power of a programming language is in its libraries.
- A program must import a library module in order to use it.
- Import specific items from a library to shorten programs.
- Create an alias for a library when importing it to shorten programs.

## 3.1.2 Importing the pandas library <a id="312-import-the-pandas-library"></a>

In [Unit 02](../Unit_02/Unit_02_variables.ipynb), we looked at a couple of different ways of opening files, in this session we are going to exclusively use pandas. 

Pandas is a library in python that works much like Excel, but we have the added advantage of being able to manipulate the data in a programmatic way. 


In [None]:
# The community agreed alias for pandas is pd, 
# so loading pandas as pd is assumed standard practice for all of the pandas documentation.
import pandas as pd

### Import the data

Now, we need to import the data into what pandas calls a `DataFrame`, which takes the input data and formats it as a sort of "spreadsheet" with this form:



![pandas-data-structure.svg](images/pandas-data-structure.svg)

In [None]:
# Use pandas to read the csv file:
data = pd.read_csv('files/ptable.csv') 
# View the imported dataframe, note how the index column "element" is in bold: 
data 

Now that we've imported this data, we will learn some more fundamental python concepts in order to examine the data at the end of the session. 

<br>

## 3.1.3 Accessing the dataframe <a id="313-accessing-the-dataframe"></a>

We are now going to try and view the dataframe in different ways </br>
- `data.head()` shows us the first 5 lines of the dataframe.... Note how python counts from 0. </br>
- `data.tail()` shows us the last 5 lines of the dataframe.</br>
- `data.columns` lists us all the column headers which are properties associated with the elements.

Test them out below:

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data.columns

We might also be interested in knowing what the datatypes are for the columns.

In [None]:
print('Period is data type', data['Period'].dtype)

It is also possible to change the datatype of a column in a dataframe using `.astype()` function.

In [None]:
data['Period'] = data['Period'].astype(float)

In [None]:
print('Period is data type', data['Period'].dtype)

<div class="alert alert-info">
Once we learn a little more about how to play around with arrays, we will come back and analyse this data.
</div>

# 3.2 Loops

<a id='313-loops'></a>
![loop.png](images/loop.png)


A loop is used for iterating over a set of statements.
There are many different kinds of loops that can be useful in different situations. We are going to go through some of the most common types of loops. 

## 3.2.1. For loops
<a id='321-for-loops'></a>
This loop is used for iterating over some kind of a sequence. That can be a list, tuple, dictionary, string, etc.

Everything that is inside the for statement is going to be executed a number of times.

![loop2.png](images/loop2.png)
    
  

If we think about how we would define a `for` loop in python it would have the structure:
```Python
for variable in iterable:
    statement(s) 
```

where:
* `variable` is a variable 
* `iterable` is a collection of objects such as a list or tuple
* `statement(s)` in the loop body (denoted by the **indent**) are executed once for each item in **iterable**.

The loop variable `variable` takes on the value of the next element in `iterable` each time through the loop, until we have iterated through all items in `iterable`.

Let's take a look at some simple examples that show us how powerful `for` loops can be, and how they must be properly structured to be interpreted by Python. 

### Example 3.2.1

The first line of the `for` loop must end with a colon, and the body must be indented.


In [None]:
for number in [2,3,5]:
    print(number)

This `for` loop is equivalent to:

In [None]:
print(2)
print(3)
print(5)

We can see that the `for` loop is a much more efficient way of doing this task, than having to type of `print(value)`. 

In [None]:
# FIXME
for number in [2,3,5]:
print(number)

In [None]:
# FIXME
for number in [2,3,5]
    print(number)

### Example 3.2.2

Loop variables can be called anything. So please try to make them be as meaningful as possible

In [None]:
for kitten in [2, 3, 5]:
    print(kitten)

In [None]:
for numbers in [2,3,5]:
    print(numbers)

### Example 3.2.3

The body of a loop can contain many statements.
However, its best practise to keep a loop to no more than a few lines long.

In [None]:
primes = [2, 3, 5]
for p in primes:
    squared = p ** 2
    cubed = p ** 3
    print(p, squared, cubed)

### Example 3.2.4


Use `range` to iterate over a sequence of numbers.

The built-in function `range` produces a sequence of numbers.

Not a list: the numbers are produced on demand to make looping over large ranges more efficient. Its easier than typing `[2,3,5,7,9,11,13]` like we have done in above examples.

`range(N)` is the numbers `0..N-1`

Exactly the legal indices of a list or character string of length N

e.g. `range(5)` would be `0,1,2,3,4`

In [None]:
print('a range is not a list: range(0, 3)')
for number in range(0, 3):
    print(number)

### Example 3.2.5 


The Accumulator pattern turns many values into one.

A common pattern in programs is to:
 1. Initialize an accumulator variable to zero, the empty string, or the empty list.
 2. Update the variable with values from a collection.

In [None]:
# Sum the first 10 integers.
total = 0
for number in range(10):
    total = total + (number + 1)
print(total)

- Read `total = total + (number + 1)` as:
 - Add 1 to the current value of the loop variable `number`.
 - Add that to the current value of the accumulator variable `total`.
 - Assign that to `total`, replacing the current value.
- We have to add `number + 1` because `range` produces `0..9`, not `1..10`.

## Tasks 3.2.1 <a id="tasks-321"></a>

<div class="alert alert-success">
<b>Task 3.2.1 a: Practice Accumulating 1</b>
</div>

Fill in the blanks 

In [None]:
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word) # FIXME
print(total)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.1 a </summary>
    
```python
total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)
```
</details>  

<div class="alert alert-success">
<b>Task 3.2.1 b: Practice Accumulating 2</b>
</div>
Fill in the blanks 

In [None]:
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____ # FIXME
for word in ["red", "green", "blue"]:
    lengths.____(____) # FIXME
print(lengths)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.1 b </summary>
    
```python
 
lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)

```


 </details>

<div class="alert alert-success">
<b>Task 3.2.1 c: Practice Accumulating 3</b>
</div>
Fill in the blanks

In [None]:
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____  # FIXME
for ____ in ____: # FIXME
    ____  # FIXME
print(result)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.1 c </summary>
    
```python
words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)

```


 </details>


<div class="alert alert-success">
<b>Task 3.2.1 d Create a whole loop</b>
</div>

Start out with an empty string `acronym=""`.
Generate a loop that uses the words 'red', 'green', 'blue' and the function `upper()` that by the end of the loop the acronym contains "RBG" when you type `print(acronym)`

In [None]:
# FIXME
acronym = ""


print(acronym)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.1 d </summary>
    
```python
acronym = ""
for word in ["red", "green", "blue"]:
    acronym = acronym + word[0].upper()
print(acronym)

```


 </details>

<div class="alert alert-success">
<b>Task 3.2.1 e: Cumulative Sum</b>
</div>

Reorder and properly indent the lines of code below so that they print a list with the cumulative sum of data. The result should be `[1, 3, 5, 10]`.

In [None]:
# FIXME
cumulative.append(sum)
for number in data:
cumulative = []
sum += number
sum = 0
print(cumulative)
data = [1,2,2,5]

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.1 e </summary>
    
```python
data = [1,2,2,5]
cumulative = []
sum = 0
for number in data:
    sum += number
    cumulative.append(sum)
print(cumulative)

```


 </details>

<div class="alert alert-success">
<b>Task 3.2.1 f: Identifying Variable Name Errors</b>
</div>

1. Read the code below and try to identify what the errors are without running it.
2. Run the code and read the error message. What type of `NameError` do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.

In [None]:
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.1 f </summary>
    
```python
message = ""
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (number % 3) == 0:
        message = message + "a"
    else:
        message = message + "b"
print(message)

```
##### Explanation:
The variable `message` needs to be initialized and Python variable names are case sensitive: `number` and `Number` refer to different variables.

 </details>


<div class="alert alert-info"> <b>Key Points</b></div>

- A for loop executes commands once for each value in a collection.
- A `for` loop is made up of a collection, a loop variable, and a body.
- The first line of the `for` loop must end with a colon, and the body must be indented.
- Indentation is always meaningful in Python.
- Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).
- The body of a loop can contain many statements.
- Use `range` to iterate over a sequence of numbers.
- The Accumulator pattern turns many values into one.

## 3.2.2 Conditional loops <a id='322-conditional-loops'></a>

Computer programming is often referred to as a "language" and often we use similar nomenclature to traditional languages. Here we will discover how **conditional** `loops` are interpretted by Python. Conditionals are used much like the tense in languages to speculate about what could happen with respect to an if clause. 

E.g. If it rains, take an umbrella. Or, if the pH is below 7, it's acidic. 

Notice how the first phrase controls the content of the second phrase.

E.g. If it's sunny, wear sunscreen. Or if the pH is above 7, it's basic. 


We could take this analogy and allow more options,
e.g.  if the pH is above 7, it's basic. Otherwise (or else) it's acidic. Notice how we can categories the information by these conditional statements. 


We can use `if` statements to allow our computer programs to do different things for different data. 

### Use `if` statements to control whether or not a block of code is executed

### Example 3.2.6 - `if`

Use an `if` statement to control whether or not a block of code is executed.
- An `if` statement (more properly called a conditional statement) controls whether some block of code is executed or not.
- Structure is similar to a `for` statement:
    - First line opens with `if` and ends with a colon
    - Body containing one or more statements is indented (usually by 4 spaces or a tab)

In [None]:
mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

In [None]:
mass = 2.07
if mass > 3.0:
    print(mass, 'is large')

Things that you should notice:
- The importance of ending the first line of the `for` loop in a colon.
- How the computer does not return anything in the second code block as it does not meet the `if` statement criteria. 

### Example 3.2.7 - `if`


Conditionals are often used inside loops.
- Not much point using a conditional when we know the value (as above).
- But useful when we have a collection to process.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
    if mass > 3.0:
        print(mass, 'is large')

### Example 3.2.8 - `if` and `else`

Use `else` to execute a block of code when an `if` condition is not true.
- `else` can be used following an `if`.
- Allows us to specify an alternative to execute when the `if` statement criterie is not met.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
    if mass > 3.0:
        print(mass, 'is large')
    else:
        print(mass, 'is small')

### Example 3.2.9 - `if` and `elif`

Use `elif` to specify additional tests.
- May want to provide several alternative choices, each with its own test.
- Use `elif` (short for “else if”) and a condition to specify these.
- Always associated with an `if`.
- Must come before the `else` (which is the “catch all”).

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
    if mass > 9.0:
        print(mass, 'is HUGE')
    elif mass > 3.0:
        print(mass, 'is large')
    else:
        print(mass, 'is small')

### Example 3.2.10 - order of conditions

Conditions are tested once, in order.
- Python steps through the branches of the conditional in order, testing each in turn.
- So ordering matters.

In [None]:
grade = 85
if grade >= 70:
    print('grade is C')
elif grade >= 80:
    print('grade is B')
elif grade >= 90:
    print('grade is A')

We can see here that our condition is met in the first conditional `if` statement, so none of the `elif` statements are evaluated.

### Example 3.2.11 - using conditionals to evolve the values of variables


In the example below we use `if` and `else` within a `for` loop in order to change the value of `velocity`.

Notice:
- the indent for the `for` loop and also for the `if` and `else` statements.
- the use of the colon at the end of `for`, `if` and `else` statements.
- The program must have a `print` statement outside the body of the loop to show the final value of velocity, since its value is updated by the last iteration of the loop.

In [None]:
velocity = 10.0
# Execute the loop 5 times
for i in range(5): 
    print("try",i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        velocity = velocity - 5.0
    else:
        print('moving too slow')
        velocity = velocity + 10.0
print('final velocity:', velocity)

# Tasks 3.2.2 <a id="tasks-322"></a>

<div class="alert alert-success">
<b>Task 3.2.2 a: Trimming values:</b>
</div>

Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.

In [None]:
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____ # FIXME
for value in original:
    if ____: # FIXME
        result.append(0)
    else:
        ____ # FIXME
print(result)

Output should look like this:   
```[0, 1, 1, 1, 0, 1]```

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.2 a </summary>
    
```python
    
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
    if value<0.0:
        result.append(0)
    else:
        result.append(1)
print(result)

```


 </details>

<div class="alert alert-success">
<b>Task 3.2.2 b: Initializing</b>
</div>

Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.

What are the advantages and disadvantages of using this method to find the range of the data?

In [None]:
values = [...some test data...] # FIXME
smallest, largest = None, None
for v in values:
    if ____: # FIXME
        smallest, largest = v, v
    ____: # FIXME
        smallest = min(____, v) # FIXME
        largest = max(____, v) # FIXME
print(smallest, largest)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.2.2 b </summary>
    
```python
values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
    if smallest==None and largest==None:
        smallest, largest = v, v
    else:
        smallest = min(smallest, v)
        largest = max(largest, v)
print(smallest, largest)

```


 </details>

<div class="alert alert-info"> <b>Key Points</b></div>

- Use `if` statements to control whether or not a block of code is executed.
- Conditionals are often used inside loops.
- Use `else` to execute a block of code when an `if` condition is *not* true.
- Use `elif` to specify additional tests.
- Conditions are tested once, in order.
- Create a table showing variables’ values to trace a program’s execution.

# 3.3 Boolean idexing<a class="anchor" id="33-boolean-indexing"></a>

<img src="https://upload.wikimedia.org/wikipedia/commons/c/ce/George_Boole_color.jpg" width="250" style="float: right">

Related to `if`,  `elif` and `else` conditions are Booleans.


**George Boole** was a 19th century self-taught English mathematician, philosopher and logician. He is known for Boolean algerbra, that is based on variables being **True** or **False**, denoted as **1** and **0** respectively. 

The operations in Boolean algebra are **and** denoted as $\wedge$, **or** denoted as $\vee$ , and **not** denoted as $\neg$.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/ae/Vennandornot.svg/2560px-Vennandornot.svg.png" width="300" style="float: center" title="Venn diagram"> <em><center>Venn diagram</center></em>

In fact, in using `if` we have already asked python to do a Boolean operation. If our answer to our if statement is true, we continue with our conditional loop. The return of the Boolean variable `true` or `false` is what determines the fate of our `if` loop. 


### Bitwise Operators

In python there are many ways to do the same Boolean operations, we are going to use Bitwise operators which compare binary 

<div class="alert alert-info">
    
| Operator | Name | Description |
| :- | :- | :- |
| `&` | AND | Sets each bit to 1 if both bits are 1 |
|  &#124; | OR | Sets each bit to 1 if one of two bits is 1 |
| `^` | XOR | Sets each bit to 1 if only one of two bits is 1 |
| `~` | NOT | Inverts all the bits |
| `<<` | Zero fill left shift | Shift left by pushing zeros in from the right and let the leftmost bits fall off |
| `>>` | Signed right shift | Shift right by pushing copies of the leftmost bit in from the left, and let the rightmost bits fall off |

</div>


### Boolean Tests <a class="anchor" id="booltest"></a>

Boolean tests on an array produce an array of booleans:

<img src="images/BooleanOp.png" width="500">


What we can see here is for each value in this series, what the Boolean outcome would be for two tests. 


In [None]:
# Declare a series dataframe (just 1D dataframe, like one column of an excel sheet)
a = pd.Series([32, 2, 65, 29, 7, 14, 57, 81, 27, 0, 56])

To print variables together with strings, we can use [f-strings](https://realpython.com/python-f-strings/).

The structure of f-strings is as follows: 
```python
my_variable = 4
print(f"some text before a variable: {my_variable}")
```
This prints the following:
```python
some text before a variable: 4
```

In [None]:
# Take a look at the format of the series
# "\n" means new-line, so the variable gets printed on the line below followed by an empty line 
print(f"Series a \n {a} \n") 

In [None]:
# Declare tests
c = a[a > 15]
d = a[a < 0]

print (f"condition c = a>15 \n {c} \n")
print (f"condition d = a<0 \n {d} \n")


Both tests c and d are satisfied one after the other. 
What if we want to satisfy both tests at the same time?

In [None]:
# AND logic
print(c & d)

Here, you can see the output is False, and shows you all the indices where they are False. 

Now if we try `or` :

In [None]:
# OR logic
print(c | d)

###  Boolean Indexing in Series <a class="anchor" id="boolind"></a>
We can also use an array of booleans to index another array, i.e. only elements coresponding to `True` are extracted from the indexed array.

<img src="images/Boolean.png" width="500">

In [None]:
c = a > 15
d = a < 0
a_cVd = a[c | d]
print(a_cVd)

### Boolean Indexing in DataFrames

Using these same logical principles, we can index whole dataframes to recover just the data we need. 
Let's take a look at the dataframe below:

In [None]:
# Dictionary of lists
dict = {'name':["Toni", "James", "Claire", "Valentina"],
        'degree': ["Chemistry", "Medicinal and Biological Chemistry", "Chemical Physics", "Chemical Physics"],
        'score':[90, 77, 61, 98]}
 
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)

print(df)


Now let's use the comparison operator to filter just those who do MBC.

In [None]:
# Using a comparison operator for filtering of data
    
Cr = df["degree"] == "Medicinal and Biological Chemistry"
print(Cr)
# print(df['degree'] == 'Medicinal and Biological Chemistry')

In [None]:
# Apply the indexing to our dataframe to return only those that fit our criteria
print(df[Cr])

In [None]:
# Apply the indexing to our dataframe to return only those that DO NOT fit our criteria
print(df[~Cr])

# Tasks 3.3.1 <a id="tasks-331"></a>

<div class="alert alert-success">
    <b>Task 3.3.1 a: Mass-spectrometry</b>

Using the mass-spectometry data in the file `ms.txt`, find m/z values in the region between m/z 6400 and 6600. 
</div>


In [None]:
# FIXME
# Read the file ms.txt into the dataframe, and make sure to give the data column names
ms_data = pd.read_csv(filepath_or_buffer=____, sep=___, header=____, names = [_____])

# Criteria for slicing data

print(Output)

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.3.1 a </summary>
    
```python
ms_data = pd.read_csv(filepath_or_buffer="files/ms.txt", sep="\t", header=None, names=["m/z", "intensity"])

# Criteria for slicing data
Crit1 = ms_data["m/z"] > 6400
Crit2 = ms_data["m/z"]< 6600
Output = ms_data[Crit1 & Crit2 ]
print(Output)

```

*Explanation*: we need to know the path to the file, look at the file to work out how the columns of data are separated, does the data have headers? What should the column names be?
 </details>

<div class="alert alert-success">
    <b>Task 3.3.1 b </b> : Assess at what m/z value there is a peak between 6400 and 6600. 
</div>

In [None]:
# FIXME


<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.3.1 b </summary>
    
```python
intensity = Output["intensity"]
m_z = Output["m/z"]
max_value = intensity.max()
max_index = intensity.idxmax()

print("peak", max_value, "at m/z", m_z[max_index])

```


 </details>

# 3.4 Using Pandas DataFrames with loops to analyse data <a id='34-using-pandas-dataframes-with-loops-to-analyse-data'></a>

We are now going to play around with the dataframe we looked at earlier containing information about the periodic table and pull out some more information from this.

In [None]:
ptable = pd.read_csv("files/ptable.csv")

In [None]:
# The entries in the df are ordered 0 to 117 for each element. 
# 0 is hydrogen.
print(ptable.loc[0, :]) 

We can see that there are some variables which return `NaN` (which stands for "Not a Number"), we can remove or fill these values easily in pandas.

In [None]:
# drop all values
ptable.dropna() 
# or fill with zeros 
ptable.fillna(0)

We are going to add a new column to our dataframe, which will calculate the mass number from the number of neutron and protons which we already have in two columns. This is just done as a simple addition of the two columns.

In [None]:
# add new column to the dataframe
ptable["Calculated_mass_number"] = ptable["NumberofNeutrons"] + ptable["NumberofProtons"]

Calculate the difference between our calculated value `Calculated_mass_number` and the value given in the array originally `AtomicMass`. <br>
Think about why the values would be different.

In [None]:
ptable["Difference_in_mass"] = ptable["Calculated_mass_number"] - ptable["AtomicMass"]
for i in range(len(ptable)):
    Name = ptable.iloc[i,1]
    Difference = ptable.iloc[i, -1]
    print("Difference in mass for", Name, Difference)

Now let's use some Boolean logic on the periodic table data. Below we will look for all elements that exist as solid as standard temperature and pressure. 

In [None]:
solids = ptable[ptable['Phase'] =='solid']
solids

# Task 3.4.1 <a id="task-341"></a>

<div class="alert alert-success">
    <b>Task 3.4.1 </b>: Set boolean tests to check what elements are liquid at 297 K, and extract the names of those elements.
</div>


In [None]:
# What columns from the dataframe will we need?

# FIXME

# Set Boolean tests to check what is liquid at 297 K (room T)

# FIXME

# Run the test and print names of elements that are liquid at room temperature

# FIXME


<details> <summary {style="color:green;font-weight:bold"}> Click here to see the full solution to Task 3.4.1 </summary>

    
```python
    
# What columns from the dataframe will we need?    
element = ptable["Element"]
melting = ptable["MeltingPoint"]
boiling = ptable["BoilingPoint"]
    
# Boolean to check what is liquid at 297 K (room T)
    
T1 = melting<297
T2 = boiling>297

criteria = T1 & T2
print(element[criteria])

```
    

</details>


# 3.5 Plotting with Pandas <a id='35-plotting-with-pandas'></a>

Plotting can be done in many ways within python. At first we will just use plotting straight from pandas. It uses a popular library `matplotlib` in the background. As we progress to later sessions we will use this directly, but for now we will just stick to pandas plotting. 

We can use the pandas dataframe to easily plot columns with matplotlib. <br>

The easiest thing we could do, would be to plot all variables against our index

with: 

In [None]:
ptable.plot()

However, we can see that this is not very informative, and we must use our knowledge that we have learned in the previous session and earlier in this session to plot the data more sensibly. <br>
We can access what columns we can plot against each other with this command:

In [None]:
ptable.columns

Now, let's take two variables from the column headers and plot them against each other.<br>
Let's see if we can see trends in the periods of the periodic table.

In [None]:
ptable.plot.scatter(x = 'AtomicNumber', y = 'AtomicRadius')

We can clearly see the trends of each period in the periodic table.

Now lets try another type of plot which isn't scatter. 
We can use a pie chart to show how many elements were discovered in each location. 

Pandas has a handy `values_counts()` function that we can use on our column of this data.

In [None]:
counting = ptable.discovery_location.value_counts()
print(counting)

In [None]:
# Then we can plot these data as a piechart

counting.plot.pie(figsize=(10,10))

We can also loop over our columns in ptable and plot them as a function of "Element".

In [None]:
for i in ptable.columns[4:9]:
    print(i)
    fig = plt.figure(figsize=(15,10))
    plt.plot(ptable["Element"], ptable[i])
    plt.xticks(rotation=90)
    plt.show()
    
# Note how we rotate the xticks 90 degrees to see the name on its side!

# Tasks 3.5.1 <a id="tasks-351"></a>

<div class="alert alert-success">
<b>Task 3.5.1 a: Selecting individual values in a dataframe</b>
</div>

Using the ptable dataframe, write an expression to find and print the boiling point of argon.

In [None]:
# FIXME



<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.5.1 </summary>

Like most things in python, this can be approached in a few ways, e.g. we know that Argon is the 18th element in the periodic table, therefore it is going to be index 17 in our dataframe (python counts from zero)    
    
```python
print(ptable.loc[17, 'BoilingPoint'])

```

    
    
Another way to search for this would be to do some Boolean indexing.
    
```python    
Criteria_ar = ptable["Element"] == "Argon"
print(Criteria_ar) 
print(ptable[Criteria_ar]["BoilingPoint"])
  
```
 </details>

<div class="alert alert-success">
<b>Task 3.5.2: Slicing dataframes</b>
</div>

Have a look at these two ways of slicing, what is different about them?

In [None]:
print(ptable.iloc[4:6, 1:3])

In [None]:
print(ptable.loc['4':'6', 'AtomicNumber':'Symbol'])

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.5.2 </summary>

Notice how the second statement produces additional columns and many additional row compared to the first statement.

What conclusion can we draw? We see that a numerical slice, (first example) omits the final index (i.e. 6) in the range provided, while a named slice, `'AtomicNumber':'Symbol'`, includes the final element "Symbol".

A funny quirk of not naming our index is that all rows between 40 and 60 are printed with the second method. 

 </details>


<div class="alert alert-success">
<b>Task 3.5.3: Use a loop to identify all symbols with 1 letter</b>
</div>

Using the symbol column in the dataframe `ptable` use a loop to identify and count all the elements which have a one letter symbol.

In [20]:
# FIXME

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.5.3 </summary>
    
```python
OneLetter= []
for x in ptable.Symbol:
    if len(x) < 2:
        OneLetter.append(x)
len(OneLetter)
```

 </details>


# 3.6 Feedback <a id="36-feedback"></a>

We are running this course for the first time, so its helpful for us to know what you did like about the session today and what you think could be improved. This will help us make your future sessions better, as well as help us for future years. 
Please fill in these two mentimeter Q&As which are anonymous and your answers will not be shown. 