In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Make the helper functions accessible
import sys
import os.path
sys.path.append(os.path.abspath('../'))
from helper_functions.mentimeter import Mentimeter

# Session 3: Loops, pandas and simple plotting
***
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align="right"/></a>

Author: Dr Claire Hobday   
Email: claire.hobday@ed.ac.uk

Some of the material was adapted from [Dr. Matteo Degiacomi](https://github.com/Degiacomi-Lab/python4science/blob/master/2_Python_numerical_data.ipynb), as well as [Software Carpentries](FIXME).


## Learning outcomes:
> - import modules and libraries 
> - using [math](https://docs.python.org/3/library/math.html) module to do some simple scientific computing tasks
> - developing more [pandas](https://pandas.pydata.org/) skills to deal with large volumes of data
> - using logical operations to filter data
> - understand and use the different types of loops to do repetitive tasks including:
>  - `for`
>  - `if`
>  - `else`/`elif`
>  - `While`
>  - `Break`

**Jupyter cheat sheet**:
- to run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;
- to get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;

## Table of Contents
1. [Working with Pandas](#Pandas)    
2. [Loops](#loops)  
   2.1 [For loops](#s_loops)    
   2.2 [Conditional Loops](#c_loops)    
   2.3 [Loops and Pandas](#p_loops)    
3. [Plotting with Pandas](#plotting)

<div class="alert alert-info">
    <b>The aims of today's session</b> <br>
-  Learning how to use in-built functionality in python <br>
- Combining these tools together to analyse data on a large `.csv` file that contains information about the elements of the periodic table. <br>
- We are going to use these tools to understand trends within the periodic table. 

</div>
   

# 1. Working with Pandas
***
<a id='pandas'></a>

### Most of the power of a programming language is in its libraries.

- A library is a collection of files (called modules) that contains functions for use by other programs.
- May also contain data values (e.g., numerical constants) and other things.
Library’s contents are supposed to be related, but there’s no way to enforce that.
- The Python standard library is an extensive suite of modules that comes with Python itself.
- Many additional libraries are available from anaconda or PyPI (the Python Package Index).

### A program must import a library module before using it.
- Use `import` to load a library module into a program’s memory.
- Then refer to things from the module as module_name.thing_name.
  - Python uses `.` to mean “part of”.
- We will be using a library called `pandas`

### Import specific items from a library module to shorten programs.
- Use `from ... import ...` to load only specific items from a library module.
- Then refer to them directly without library name as prefix.

In [2]:
# We are importing the functions cos and the value pi from the library maths
from math import cos, pi

print('cos(pi) is', cos(pi))

cos(pi) is -1.0


### Create an alias for a library module when importing it to shorten programs.

- Use `import ... as ...` to give a library a short alias while importing it.
- Then refer to items in the library using that shortened name.

In [3]:
import math as m

print('cos(pi) is', m.cos(m.pi))

cos(pi) is -1.0


## 1. Tasks 

<div class="alert alert-success">
<b>Task 1.1: Jigsaw Puzzle (Parson’s Problem) Programming Example</b>
</div>
Rearrange the following statements so that a random DNA base is printed and its index in the string. Remember that you've already imported math above! You can check it in an empty cell below to understand what its doing. 

In [4]:
from IPython.display import IFrame
IFrame('https://parsons.herokuapp.com/puzzle/7cf55d16a0454de580f31418505f3b54', width=1000, height=400)

In [5]:
#test out the code in this cell, once you have the right order!

##### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 1.1. </summary>
    
```python
import math 
import random
bases = "ACTTGCTTGAC" 
n_bases = len(bases)
idx = random.randrange(n_bases)
print(f"Random base {bases[idx]} base index {idx}.")

```
 </details>

<div class="alert alert-success">
<b>Task 1.2: Importing With Aliases</b>
</div>


1. Fill in the blanks so that the program below prints `90.0`.
2. Rewrite the program so that it uses `import` without `as`.
3. Which form do you find easier to read?

In [6]:
# Task 1.2 question 1
import math as m
angle = #FIXME.degrees(#FIXME.pi / 2)
print(#FIXME)

SyntaxError: invalid syntax (3298776963.py, line 3)

In [None]:
# space to rewrite the program for Task 1.2 question 2

##### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 1.2 </summary>
    
Filling in the right variables:  
    
```python
import math as m
angle = m.degrees(m.pi / 2)
print(angle)   
    
```
Re-writing the program without an import as:
    
```python
import math
angle = math.degrees(math.pi / 2)
print(angle)  
    
```   
    
*Explanation*:
  
Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.

 </details>

<div class="alert alert-success">
<b>Task 1.3: Importing specific items</b>
</div>


1. Import the exponential function from the math libary.
2. Use it to work out e^10.
3. Import a function from math which will allow you to raise a number to a power of your choice.
4. Raise 6 ^ 5

In [None]:
# space to try out task 1.3


##### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 1.3 </summary>
    
Filling in the right variables:  
    
```python
#1
from math import exp 
#2
number= exp(10)
print(number)
#3
from math import pow
#4
powered = pow(6,5)
print(powered)   
```   
    
*Explanation*:
  
The first part should be relatively straightforward. For part 3, you may need to google the math library and read the documentation to find the function that you need to raise a number to the power of another number. 

 </details>

<div class="alert alert-info"> <b>Key Points</b></div>

- Most of the power of a programming language is in its libraries.
- A program must import a library module in order to use it.
- Import specific items from a library to shorten programs.
- Create an alias for a library when importing it to shorten programs.

## Import the pandas library
***
In session one, we looked at a couple of different ways of opening files, in this session we are going to exclusively use pandas. 

Pandas is a library in python that works much like excel, but we have the added advantage of being able to manipulate the data in a programmatic way. 


In [None]:
import pandas as pd
#The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

### Import the data
***

Now, we need to import the data into what pandas calls a `Dataframe`, which takes the input data and formats it as a sort of "spreadsheet" with this form:



![pandas-data-structure.svg](images/pandas-data-structure.svg)

In [None]:
data = pd.read_csv('files/ptable.csv') #use pandas to read the csv file, 
data #view the imported dataframe, note how the index column "element" is in bold. 

Now that we've imported this data, we will learn some more fundamental python concepts in order to interogate the data at the end of the session. 

<br>

## Accessing the dataframe
***
We are now going to try and view the dataframe in different ways </br>
- data.head() shows us the first 5 lines of the dataframe.... Note how python counts from 0. </br>
- data.tail() shows us the last 5 lines of the dataframe.</br>
- data.columns() lists us all the column headers which are properties associated with the elements.

Test them out below

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data.columns

We might also be interested in knowing what the datatypes are for the columns.

In [None]:
print('Period is data type', data['Period'].dtype)

It is also possible to change the datatype of a column in a dataframe using `.astype()` function.

In [None]:
data['Period'] = data['Period'].astype(float)

In [None]:
print('Period is data type', data['Period'].dtype)

<div class="alert alert-info">
Once we learn a little more about how to play around with arrays, we will come back and analyse this data.
</div>

# 2. Loops
---
<a id='loops'></a>
![loop.png](images/loop.png)


A loop is used for iterating over a set of statements.
There are many different kinds of loops that can be useful in different situations. We are going to go through some of the most common types of loops. 

## 2.1. For loops
<a id='s_loops'></a>
This loop is used for iterating over some kind of a sequence. That can be a list, tuple, dictionary, string, etc.

Everything that is inside the for statement is going to be executed a number of times.
![loop2.png](images/loop2.png)
    
  
***
If we think about how we would define a **for loop** in python it would have the structure:
```Python
for var in iterable:
    statement(s) 
```
***
Where:
* **var** is a variable 
* **iterable** is a collection of objects such as a list or tuple
* **statement(s)** in the loop body (denoted by the indent) are executed once for each item in **iterable**.

The loop variable **var** takes on the value of the next element in **iterable** each time through the loop, until we have iterated through all items in **iterable**.

Let's take a look at some simple examples that show us how powerful `for` loops can be, and how they must be properly structured to be interpreted by Python. 

### Example 2.1.1
***
The first line of the for loop must end with a colon, and the body must be indented.


In [None]:
for number in [2,3,5]:
    print(number)

This `for` loop is equivalent to:

In [None]:
print(2)
print(3)
print(5)

We can see that the `for` loop is a much more efficient way of doing this task, than having to type of print(value). 

Notice the importance of the first line of the `for` loop must end with a colon, and the body must be indented. The colon at the end of the first line signals the start of a block of statements. The indent shows nesting, and the indent can take many forms as long as its consistent. e.g. 4 spaces or a "tab". 
When using python within a Jupyter Notebook, as soon as you finish the `for` loop line with a colon, the next line automatically indents properly. However, lets look at what happens when you don't indent or include the colon.

In [None]:
for number in [2,3,5]:
print(number)

In [None]:
for number in [2,3,5]
    print(number)

### Example 2.1.2
***



Loop variables can be called anything. So please try to make them be as meaningful as possible

In [None]:
for kitten in [2, 3, 5]:
    print(kitten)

In [None]:
for numbers in [2,3,5]:
    print(numbers)

### Example 2.1.3
***

The body of a loop can contain many statements.
However, its best practise to keep a loop to no more than a few lines long.

In [None]:
primes = [2, 3, 5]
for p in primes:
    squared = p ** 2
    cubed = p ** 3
    print(p, squared, cubed)

### Example 2.1.4
***

Use `range` to iterate over a sequence of numbers.

The built-in function `range` produces a sequence of numbers.

Not a list: the numbers are produced on demand to make looping over large ranges more efficient. Its easier than typing `[2,3,5,7,9,11,13]` like we have done in above examples.

`range(N)` is the numbers `0..N-1`

Exactly the legal indices of a list or character string of length N

e.g. `range(5)` would be `0,1,2,3,4`

In [None]:
print('a range is not a list: range(0, 3)')
for number in range(0, 3):
    print(number)

### Example 2.1.5 
***

The Accumulator pattern turns many values into one.
- A common pattern in programs is to:
 1. Initialize an accumulator variable to zero, the empty string, or the empty list.
 1. Update the variable with values from a collection.

In [None]:
# Sum the first 10 integers.
total = 0
for number in range(10):
    total = total + (number + 1)
print(total)

- Read total = total + (number + 1) as:
 - Add 1 to the current value of the loop variable number.
 - Add that to the current value of the accumulator variable total.
 - Assign that to total, replacing the current value.
- We have to add number + 1 because range produces 0..9, not 1..10.

## Tasks 2.1

<div class="alert alert-success">
<b>Task 2.1.1: Practice Accumulating 1</b>
</div>

Fill in the blanks 

In [None]:
# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)

##### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.1.1 </summary>
    
```python
total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)
```
</details>  

<div class="alert alert-success">
<b>Task 2.1.2: Practice Accumulating 2</b>
</div>
Fill in the blanks 

In [None]:
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
    lengths.____(____)
print(lengths)

##### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.1.2 </summary>
    
```python
 
lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)

```


 </details>

<div class="alert alert-success">
<b>Task 2.1.3: Practice Accumulating 3</b>
</div>
Fill in the blanks

In [None]:
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)

##### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.1.3 </summary>
    
```python
words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)

```


 </details>


<div class="alert alert-success">
<b>Task 2.1.4: Create a whole loop</b>
</div>

Start out with an empty string `acronym=""`.
Generate a loop that uses the words 'red', 'green', 'blue' and the function `upper()` that by the end of the loop the acronym contains "RBG" when you type `print(acronym)`

In [None]:
# Your solution here:
acronym = ""


print(acronym)

##### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.1.4 </summary>
    
```python
acronym = ""
for word in ["red", "green", "blue"]:
    acronym = acronym + word[0].upper()
print(acronym)

```


 </details>

<div class="alert alert-success">
<b>Task 2.1.5: Cumulative Sum</b>
</div>

Reorder and properly indent the lines of code below so that they print a list with the cumulative sum of data. The result should be `[1, 3, 5, 10]`.

In [None]:
cumulative.append(sum)
for number in data:
cumulative = []
sum += number
sum = 0
print(cumulative)
data = [1,2,2,5]

#### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.1.5 </summary>
    
```python
data = [1,2,2,5]
cumulative = []
sum = 0
for number in data:
    sum += number
    cumulative.append(sum)
print(cumulative)

```


 </details>

<div class="alert alert-success">
<b>Task 2.1.6: Identifying Variable Name Errors</b>
</div>

1. Read the code below and try to identify what the errors are without running it.
2. Run the code and read the error message. What type of `NameError` do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?
3. Fix the error.
4. Repeat steps 2 and 3, until you have fixed all the errors.

In [None]:
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)

##### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.1.6 </summary>
    
```python
message = ""
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (number % 3) == 0:
        message = message + "a"
    else:
        message = message + "b"
print(message)

```
##### Explanation:
The variable `message` needs to be initialized and Python variable names are case sensitive: `number` and `Number` refer to different variables.

 </details>


<div class="alert alert-info"> <b>Key Points</b></div>

- A for loop executes commands once for each value in a collection.
- A `for` loop is made up of a collection, a loop variable, and a body.
- The first line of the `for` loop must end with a colon, and the body must be indented.
- Indentation is always meaningful in Python.
- Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).
- The body of a loop can contain many statements.
- Use `range` to iterate over a sequence of numbers.
- The Accumulator pattern turns many values into one.

## 2.2. Conditional loops
***
<a id='c_loops'></a>

Computer programming is often referred to as a "language" and often we use similar nomenclature to traditional languages. Here we will discover how conditional `loops` are interpretted by Python. Conditionals are used much like the tense in languages to speculate about what could happen with respect to an if clause. 

E.g. If it rains, take an umbrella. Or, if the pH is below 7, its acidic. 

Notice how the first phrase controls the content of the second phrase.

E.g. If its sunny, wear sunscreen. Or if the pH is above 7, its basic. 


We could take this analogy and allow more options,
e.g.  if the pH is above 7, its basic. Otherwise (or else) its acidic. Notice how we can categories the information by these conditional statements. 


We can use if statements to allow our computer programs to do different things for different data. 

### Use `if` statements to control whether or not a block of code is executed.

### Example 2.2.1 - `if`
***

Use an `if` statement to control whether or not a block of code is executed.
- An `if` statement (more properly called a conditional statement) controls whether some block of code is executed or not.
- Structure is similar to a `for` statement:
 - First line opens with if and ends with a colon
 - Body containing one or more statements is indented (usually by 4 spaces or a tab)

In [None]:
mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

In [None]:
mass = 2.07
if mass > 3.0:
    print (mass, 'is large')

Things that you should notice:
- The importance of ending the first line of the for loop in a colon.
- How the computer does not return anything in the second code block as it does not meet the `if` statement criteria. 

### Example 2.2.2 - `if`
***

Conditionals are often used inside loops.
- Not much point using a conditional when we know the value (as above).
- But useful when we have a collection to process.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
    if mass > 3.0:
        print(mass, 'is large')

### Example 2.2.3 - `if` and `else`
***

Use `else` to execute a block of code when an `if` condition is not true.
- `else` can be used following an if.
- Allows us to specify an alternative to execute when the `if` branch isn’t taken.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
    if mass > 3.0:
        print(mass, 'is large')
    else:
        print(mass, 'is small')

### Example 2.2.4 - `if` and `elif`
***

Use `elif` to specify additional tests.
- May want to provide several alternative choices, each with its own test.
- Use `elif` (short for “else if”) and a condition to specify these.
- Always associated with an `if`.
- Must come before the `else` (which is the “catch all”).

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for mass in masses:
    if mass > 9.0:
        print(mass, 'is HUGE')
    elif mass > 3.0:
        print(mass, 'is large')
    else:
        print(mass, 'is small')

### Example 2.2.5 - order of conditions
***

Conditions are tested once, in order.
- Python steps through the branches of the conditional in order, testing each in turn.
- So ordering matters.

In [None]:
grade = 85
if grade >= 70:
    print('grade is C')
elif grade >= 80:
    print('grade is B')
elif grade >= 90:
    print('grade is A')

We can see here that our condition is met in the first conditional `if` statement, so none of the `elif` statements are evaluated.

### Example 2.2.6 - using conditionals to evolve the values of variables
***

In the example below we use `if` and `else` within a `for` loop in order to change the value of `velocity`.

Notice:
- the indent for the `for` loop and also for the `if` and `else` statements.
- the use of the colon at the end of `for`, `if` and `else` statements.
- The program must have a `print` statement outside the body of the loop to show the final value of velocity, since its value is updated by the last iteration of the loop.

In [None]:
velocity = 10.0
for i in range(5): # execute the loop 5 times
    print("try",i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        velocity = velocity - 5.0
    else:
        print('moving too slow')
        velocity = velocity + 10.0
print('final velocity:', velocity)

## Tasks 2.2

<div class="alert alert-success">
<b>Task 2.2.1: Trimming values:</b>
</div>

Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.

In [None]:
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
    if ____:
        result.append(0)
    else:
        ____
print(result)

Output should look like this:   
```[0, 1, 1, 1, 0, 1]```

##### Solution
<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.2.1 </summary>
    
```python
    
original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
    if value<0.0:
        result.append(0)
    else:
        result.append(1)
print(result)

```


 </details>

<div class="alert alert-success">
<b>Task 2.2.2: Initializing</b>
</div>

Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.

What are the advantages and disadvantages of using this method to find the range of the data?

In [None]:
values = [...some test data...]
smallest, largest = None, None
for v in values:
    if ____:
        smallest, largest = v, v
    ____:
        smallest = min(____, v)
        largest = max(____, v)
print(smallest, largest)

##### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.2.2 </summary>
    
```python
values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
    if smallest==None and largest==None:
        smallest, largest = v, v
    else:
        smallest = min(smallest, v)
        largest = max(largest, v)
print(smallest, largest)

```


 </details>

<div class="alert alert-info"> <b>Key Points</b></div>

- Use `if` statements to control whether or not a block of code is executed.
- Conditionals are often used inside loops.
- Use `else` to execute a block of code when an `if` condition is *not* true.
- Use `elif` to specify additional tests.
- Conditions are tested once, in order.
- Create a table showing variables’ values to trace a program’s execution.

----
## 2.3 Boolean Indexing <a class="anchor" id="bool"></a>

<img src="https://upload.wikimedia.org/wikipedia/commons/c/ce/George_Boole_color.jpg" width="250" style="float: right">

Related to `if`,  `elif` and `else` conditions are Booleans.


**George Boole** was a 19th century self-taught English mathematician, philosopher and logician. He is known for Boolean algerbra, that is based on variables being **True** or **False**, denoted as **1** and **0** respectively. 

The operations in Boolean algebra are **and** denoted as $\wedge$, **or** denoted as $\vee$ , and **not** denoted as $\neg$.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/ae/Vennandornot.svg/2560px-Vennandornot.svg.png" width="300" style="float: center" title="Venn diagram"> <em><center>Venn diagram</center></em>

In fact, in using `if` we have already asked python to do a Boolean operation. If our answer to our if statement is true, we continue with our conditional loop. The return of the Boolean variable `true` or `false` is what determines the fate of our `if` loop. 


### Bitwise Operators

In python there are many ways to do the same Boolean operations, we are going to use Bitwise operators which compare binary 

<div class="alert alert-info">
    
| Operator | Name | Description |
| :- | :- | :- |
| `&` | AND | Sets each bit to 1 if both bits are 1 |
|  &#124; | OR | Sets each bit to 1 if one of two bits is 1 |
| `^` | XOR | Sets each bit to 1 if only one of two bits is 1 |
| `~` | NOT | Inverts all the bits |
| `<<` | Zero fill left shift | Shift left by pushing zeros in from the right and let the leftmost bits fall off |
| `>>` | Signed right shift | Shift right by pushing copies of the leftmost bit in from the left, and let the rightmost bits fall off |

</div>


### Boolean Tests <a class="anchor" id="booltest"></a>

Boolean tests on an array produce an array of booleans:

<img src="images/BooleanOp.png" width="500">


What we can see here is for each value in this series, what the Boolean outcome would be for two tests. 


In [None]:
#declare a series dataframe (just 1D dataframe, like one column of an excel sheet)
a = pd.Series([32, 2, 65, 29, 7, 14, 57, 81, 27, 0, 56])

#take a look at the format of the series
print("Series a")
print(a)
print(" ")

#declare tests
c = a[a>15]
d = a[a<0]

print ("condition c = a>15 ", c)
print(" ")
print ("condition d = a<0 ", d)


Both tests c and d are satisfied one after the other. 
What if we want to satisfy both tests at the same time?

In [None]:
#AND logic
print(c & d)

Here, you can see the output is False, and shows you all the indices where they are False. 

Now if we try `or` :

In [None]:
#OR logic
print(c | d)

###  Boolean Indexing in Series <a class="anchor" id="boolind"></a>
We can also use an array of booleans to index another array, i.e. only elements coresponding to **True** are extracted from the indexed array.

<img src="images/Boolean.png" width="500">

In [None]:
c = a>15
d = a<0
a_cVd = a[c | d]
print(a_cVd)

### Boolean Indexing in DataFrames

Using these same logical principles, we can index whole dataframes to recover just the data we need. 
Let's take a look at the dataframe below:

In [None]:
# dictionary of lists
dict = {'name':["Toni", "James", "Claire", "Valentina"],
        'degree': ["Chemistry", "Medicinal and Biological Chemistry", "Chemical Physics", "Chemical Physics"],
        'score':[90, 77, 61, 98]}
 
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)

print(df)


Now let's use the comparison operator to filter just those who do MBC.

In [None]:
# using a comparison operator for filtering of data
    
Cr=df["degree"] == "Medicinal and Biological Chemistry"
print(Cr)
#print(df['degree'] == 'Medicinal and Biological Chemistry')

In [None]:
# Apply the indexing to our dataframe to return only those that fit our criteria
print(df[Cr])

In [None]:
# Apply the indexing to our dataframe to return only those that DO NOT fit our criteria
print(df[~Cr])

---
# Tasks 2.3
---

<div class="alert alert-success">
    <b>TASK 2.3.1 </b> : Using the mass spec data in the file ms.txt, find m/z values in the region between m/z 6400 and 6600. 
</div>


In [None]:
#Your solution here

#Read the file ms.txt into the dataframe, and make sure to give the data column names
ms_data = pd.read_csv(filepath_or_buffer=____, sep=___, header=____, names = [_____])

#criteria for slicing data

print(Output)

##### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.3.1 </summary>
    
```python
ms_data = pd.read_csv(filepath_or_buffer="files/ms.txt", sep="\t", header=None, 
                 names=["m/z", "intensity"])

#criteria for slicing data
Crit1 = ms_data["m/z"] > 6400
Crit2 = ms_data["m/z"]< 6600
Output = ms_data[Crit1 & Crit2 ]
print(Output)

```

**explanation** we need to know the path to the file, look at the file to work out how the columns of data are separated, does the data have headers?, what should the column names be?
 </details>

<div class="alert alert-success">
    <b>TASK 2.3.2 </b> : Assess at what m/z value there is a peak between 6400 and 6600. 
</div>

In [None]:
#Your solution here


##### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 2.3.2 </summary>
    
```python
intensity = Output["intensity"]
m_z = Output["m/z"]
max_value = intensity.max()
max_index = intensity.idxmax()

print("peak", max_value, "at m/z", m_z[max_index])

```


 </details>

## 2.3. Using pandas dataframe with loops to analyse data
<a id='p_loops'></a>

We are now going to play around with the dataframe we looked at earlier containing information about the periodic table and pull out some more information from this.

In [None]:
ptable = pd.read_csv("files/ptable.csv")

In [None]:
print(ptable.loc[0, :]) # the entries in the df are ordered 0 to 117 for each element. 0 is hydrogen.

We can see that there are some variables which return "NaN" (Not a number), we can remove or fill these values easily in pandas.

In [None]:
ptable

We are going to add a new column to our dataframe, which will calculate the mass number from the number of neutron and protons which we already have in two columns. This is just done as a simple addition of the two columns.

In [None]:
# add new column to the dataframe
ptable["Calculated_mass_number"] = ptable["NumberofNeutrons"] + ptable["NumberofProtons"]

Calculate the difference between our calculated value `Calculated_mass_number` and the value given in the array originally `AtomicMass`. <br>
Think about why the values would be different.

In [None]:
ptable["Difference_in_mass"] = ptable["Calculated_mass_number"] - ptable["AtomicMass"]
for i in range(len(ptable)):
    Name = ptable.iloc[i,1]
    Difference = ptable.iloc[i, -1]
    print("Difference in mass for", Name, Difference)

---
Now let's use some Boolean logic on the periodic table data. Below we will look for all elements that exist as solid as standard temperature and pressure. 

In [None]:
solids = ptable[ptable['Phase'] =='solid']
solids

<div class="alert alert-success">
    <b>TASK 2.3.1 </b> : set boolean tests to check what elements are liquid at 297 K, and extract the names of those elements.
</div>


In [None]:
##WHAT columns from the dataframe will we need?



#SET Boolean tests to check what is liquid at 297 K (room T)



#RUN the test and print names of elements that are liquid at room temperature



<details>
    <summary> <mark> COMPLETE SOLUTION:</mark> </summary>

    
```python
    
#what columns from the dataframe will we need?    
element = ptable["Element"]
melting = ptable["MeltingPoint"]
boiling = ptable["BoilingPoint"]
    
#boolean to check what is liquid at 297 K (room T)
    
T1 = melting<297
T2 = boiling>297

criteria = T1 & T2
print(element[criteria])

```
    

</details>


# 3. Plotting with Pandas
***
<a id='plotting'></a>

Plotting can be done in many ways within python. At first we will just use plotting straight from pandas. It uses a popular library `matplotlib` in the background. As we progress to later sessions we will use this directly, but for now we will just stick to pandas plotting. 

We can use the pandas dataframe to easily plot columns with matplotlib. <br>

The easiest thing we could do, would be to plot all variables against our index

with: 

In [None]:
ptable.plot()

However, we can see that this is not very informative, and we must use our knowledge that we have learned in the previous session and earlier in this session to plot the data more sensibly. <br>
We can access what columns we can plot against each other with this command:

In [None]:
ptable.columns

Now, let's take two variables from the column headers and plot them against each other.<br>
Let's see if we can see trends in the periods of the periodic table.

In [None]:
ptable.plot.scatter(x = 'AtomicNumber', y = 'AtomicRadius')

We can clearly see the trends of each period in the periodic table.

Now lets try another type of plot which isn't scatter. 
We can use a pie chart to show how many elements were discovered in each location. 

Pandas has a handy `values_counts()` function that we can use on our column of this data.

In [None]:
counting = ptable.discovery_location.value_counts()
print(counting)

In [None]:
#then we can plot these data as a piechart

counting.plot.pie(figsize=(10,10))

We can also loop over our columns in ptable and plot them as a function of "Element".

In [None]:
for i in ptable.columns[4:9]:
    print(i)
    fig = plt.figure(figsize=(15,10))
    plt.plot(ptable["Element"],ptable[i])
    plt.xticks(rotation=90)
    plt.show()
    
#note how we rotate the xticks 90 degrees to see the name on its side!

## Tasks 3.1

<div class="alert alert-success">
<b>Task 3.1.1: Selecting individual values in a dataframe</b>
</div>

Using the ptable dataframe, write an expression to find and print the boiling point of argon.

In [None]:
#put your solution here



#### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.1.1 </summary>

Like most things in python, this can be approached in a few ways, e.g. we know that Argon is the 18th element in the periodic table, therefore it is going to be index 17 in our dataframe (python counts from zero)    
    
```python
print(ptable.loc[17, 'BoilingPoint'])

```

    
    
Another way to search for this would be to do some Boolean indexing.
    
```python    
Criteria_ar = ptable["Element"] == "Argon"
print(Criteria_ar) 
print(ptable[Criteria_ar]["BoilingPoint"])
 
    
```

    
 </details>

<div class="alert alert-success">
<b>Task 3.1.2: Slicing dataframes</b>
</div>

Have a look at these two ways of slicing, what is different about them?

In [None]:
print(ptable.iloc[4:6, 1:3])

In [None]:
print(ptable.loc['4':'6', 'AtomicNumber':'Symbol'])

#### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.1.2 </summary>
    

Notice how the second statement produces additional columns and many additional row compared to the first statement.

What conclusion can we draw? We see that a numerical slice, (first example) omits the final index (i.e. 6) in the range provided, while a named slice, ‘'AtomicNumber':'Symbol'’, includes the final element "Symbol".

A funny quirk of not naming our index is that all rows between 40 and 60 are printed with the second method. 

 </details>


<div class="alert alert-success">
<b>Task 3.1.3: Use a loop to identify all symbols with 1 letter</b>
</div>

Using the symbol column in the dataframe `ptable` use a loop to identify and count all the elements which have a one letter symbol.

In [None]:
#Put your solution here

#### Solution

<details><summary {style='color:green;font-weight:bold'}> Click here to see solution to Task 3.1.3 </summary>
    
```python
OneLetter= []
for x in ptable.Symbol:
    if len(x) < 2:
        OneLetter.append(x)
len(OneLetter)

```

 </details>


## Feedback on the session

We are running this course for the first time, so its helpful for us to know what you did like about the session today and what you think could be improved. This will help us make your future sessions better, as well as help us for future years. 
Please fill in these two mentimeter Q&As which are anonymous and your answers will not be shown. 

In [None]:
positive_feedback = Mentimeter(vote = 'https://www.menti.com/al5m2471wyzw')
positive_feedback.show()

In [None]:
critical_feedback = Mentimeter(vote = 'https://www.menti.com/alrrb51myncy')
critical_feedback.show()