<a href="https://colab.research.google.com/github/DavidSenseman/BIO5853/blob/master/Lesson_01_6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---------------------------
**COPYRIGHT NOTICE:** This Jupyterlab Notebook is a Derivative work of [Jeff Heaton](https://github.com/jeffheaton) licensed under the Apache License, Version 2.0 (the "License"); You may not use this file except in compliance with the License. You may obtain a copy of the License at

> [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

------------------------

# **BIO 58533: Biostatistics**

##### **Module 1: Getting Started with Python**

* Instructor: [David Senseman](mailto:David.Senseman@utsa.edu), [Department of Integrative Biology](https://sciences.utsa.edu/integrative-biology/), [UTSA](https://www.utsa.edu/)


### Module 1 Material

* Part 1.1: Course Overview 
* Part 1.2: Installing Python, Miniconda and Jupyter Lab
* Part 1.3: Introduction to Jupyterlab AI, Google CoLab
* Part 1.4: Python Basics 1 -- Strings, Variables and Indexing
* Part 1.5: Python Basics 2 -- Numbers, Booleans, Operators and Comparisons
* **Part 1.6: Python Basics 3 -- Lists, Dictionaries and Sets**
* Part 1.7: Python Basics 4 -- Conditionals and Loops
* Part 1.8: Python Basics 5 -- Packages, NumPy arrays and Matplotlib
* Part 1.9: Python Basics 6 -- Pandas and File Handling

### Lesson Setup

Run the next code cell to load necessary packages

In [None]:
# You MUST run this code cell first
import os
import shutil
path = '/'
memory = shutil.disk_usage(path)
dirpath = os.getcwd()
print("Your current working directory is : " + dirpath)
print("Disk", memory)

### Google CoLab Instructions

The following code ensures that Google CoLab is running the correct version of TensorFlow.
  Running the following code will map your GDrive to ```/content/drive```.

In [None]:
# You must run this cell second
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    from google.colab import auth
    auth.authenticate_user()
    COLAB = True
    print("Note: using Google CoLab")
    %tensorflow_version 2.x
    import requests
    gcloud_token = !gcloud auth print-access-token
    gcloud_tokeninfo = requests.get('https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=' + gcloud_token[0]).json()
    print(gcloud_tokeninfo['email'])
except:
    print("Note: not using Google CoLab")
    COLAB = False

# Part 1.6: Python Basics

Python includes Lists, Sets, Dictionaries, and other data structures as built-in types. The syntax for each is similar, which can cause confusion for beginning Python programmers. 

This course will focus primarily on Lists, Sets, and Dictionaries. It is important to understand the differences between these three fundamental collection types.

* **List -** A list is a mutable ordered collection that allows duplicate elements.
* **Tuple -** A tuple is an immutable ordered collection that allows duplicate elements.
* **Dictionary -** A dictionary is a mutable unordered collection that Python indexes with name and value pairs.
* **Set -** A set is a mutable unordered collection with no duplicate elements.

Most Python collections are _mutable_, meaning the program can add and remove elements after definition. One notable exception is a Python **tuple** which is an _immutable_ collection which means that items cannot be added or removed after its definition. 

It is also essential to understand that an ordered collection means that items maintain their order as the program adds them to a collection. This order might not be any specific ordering, such as alphabetic or numeric.

Lists and tuples are very similar in Python and are often confused. The significant difference is that a list is mutable, but a tuple isn’t. So, we include a list when we want to contain similar items and a tuple when we know what information goes into it ahead of time.

## Lists and Tuples

For a Python programmer, lists and tuples look very similar. Both lists and tuples hold an ordered collection of items. It is possible to get by as a programmer using only lists and ignoring tuples.

The primary difference is that a list is enclosed by square braces [], while a tuple is enclosed by parenthesis (). The following code defines both list and tuple.

### Example 1: Create a list called `myList`

The code in the cell below uses square brackets `[ ]` to create a list called `myList`. (Note: `list` is a Python **reserved word** so don't use it a variable name without adding something to it).

In [None]:
# Example 1: Create a list

# Use square brackets to create a list
myList = ['a', 'b', 'c', 'd']

# Print output
print(myList)

The output is 
~~~text
['a', 'b', 'c', 'd']
~~~

### **Exercise 1: Create a tuple called `myTuple`** 

In the cell below use parentheses `( )` to create a tuple called `myTuple`. (Note: `tuple` is a Python **reserved word** so don't use it a variable name without adding something to it).

In [None]:
# Insert your code for Exercise 1 here



If your code is correct you should see the following output.
~~~text
('a', 'b', 'c', 'd')
~~~

### Example 2: Change the contents of myList

As mentioned above, a list is _mutable_, which means its contents can be changed after it has been created. 

The code in the cell below demonstrates that the program can change a list. This example uses square bracket `[ ]` indexing to change the _second_ element in `myList`. (Remember: Python starts counting sequences at `0`.)  

In [None]:
# Example 2: Change contents of myList

# Second the second element
myList[1] = 'Z'

# Print output
print(myList)

The output is
~~~text
['a', 'Z', 'c', 'd']
~~~

The output shows that the contents of a list can be changed after it has been created.

### **Exercise 2: Change the contents of myTuple**

As mentioned above, a tuple is _immutable_, which means its contents can **not** be changed after it has been created. 

Using **Example 2** as a template. write the code in the cell below to change the _second_ element in `myTuple` to letter `Z`. 

In [None]:
# Insert your code for Exercise 2 here



If your code is correct you should have triggered the following error message:

![_](https://biologicslab.co/BIO1173/images/class_01/class_1_6_error.png)

Tuples are immutable. Once they are created their contents can not be changed. 

### Difference Between List and Tuple

The primary difference that you will see _syntactically_ is that a list is enclosed by square braces [ ], and a tuple is enclosed by parenthesis ( ). 

The following code defines both list and tuple.


In [None]:
myList = ['a', 'b', 'c', 'd']
myTuple = ('a', 'b', 'c', 'd')

print(myList)
print(myTuple)

If you code is correct you should see the following out:
~~~text
['a', 'b', 'c', 'd']
('a', 'b', 'c', 'd')
~~~
The primary difference you will see _programmatically_ is that a list is mutable, which means the program can change it. On the other hand, a tuple is immutable, which means the program cannot change it. 

##  Python Dictionaries

A Python _dictionary_ or `dict` is a _mutable unordered_ collection of _key/value pairs_. A mutable data type in Python is a type of object whose value can be changed after it is created. It means that you can modify, add, or remove elements within the object without creating a new instance of it.

For example, lists (list) and dictionaries (dict) in Python are mutable data types. You can change the elements of a list or update the key-value pairs of a dictionary without creating a new list or dictionary. An important aspect of mutability is that if you have multiple references to the same mutable object, any modifications made to the object will be reflected in all the references.

In contrast, immutable data types like strings (str), tuples (tuple), and numbers (int, float) cannot be changed after they are created. If you want to modify these types, you need to create a new instance with the desired changes.

Like other collection types, `dict` can be called with a collection argument to create a dictionary with the elements of the argument. However, those elements must be `tuples` or `lists` of two elements — a key and a value. Dictionaries are enclosed within curly braces {}, and each item is separated by a comma. The key-value pairs in a dictionary are separated by a colon :.

### Example 3: Create a `dict` called `DNABaseDict`

The code in the cell below uses Python's `dict()` function to create a dictionary called `DNABaseDict`. 

The dictionary's `keys` are the single-letter abbreviation of the four bases in DNA. Each `key` has an associated `value` which is the name of the base. The `key/value` pairs in this example are contained in a `list` since they are defined using square brackets `[ ]`.   

In [None]:
# Example 3

# create dict using dict() function
DNABaseDict = dict((['A', 'adenine'],
                    ['C', 'cytosine'],
                    ['G', 'guanine'],
                    ['T', 'thymine']
                   ))

# Print out the dictionary
DNABaseDict

The output is 
~~~text
{'A': 'adenine', 'C': 'cytosine', 'G': 'guanine', 'T': 'thymine'}
~~~
Python uses curly braces `{ }` to denote that the output is a `list`. 

### **Exercise 3: Create a `dict` called `RNABaseDict`**

In the cell below use Python's `dict()` function to create a dictionary called `RNABaseDict`. The dictionary's `keys` should be the single-letter abbreviations of the four bases in RNA with the `value` being the name of the base. Print out the `RNABaseDict`. You should already know the four nitrogenous bases in RNA. If you don't, you can "Google" it.   

In [None]:
# Insert your code for Exericse 3 here



If your code for Exercise 3 is correct you should see 
~~~text
{'A': 'adenine', 'C': 'cytosine', 'G': 'guanine', 'U': 'uracil'}
~~~

-----------------------

### **Why the curly braces `{}` ?**

Even though parentheses `()` were used to define the `DNABaseDict` dictionary in Example 3, Python printed out this dictionary using curly braces `{}`. 

>So why the curly braces? 

Because they are so frequently used, Python provides a notation for dictionaries that is similar to `sets`: a comma-separated `list` of key/value pairs enclosed in curly braces `{}`. Within the braces, a colon `:` is used to separate each key/value pair. When Python prints out a dictionary, it uses the curly brace format.  

-----------------------

### Example 4: Create `dnaBaseDict` using curly braces `{}`

As explained above, you can also create a Python dictionary using curly braces `{}`. The cell below shows how to create a dictionary called `dnaBaseDict` using curly braces. This dictionary is similar to `DNABaseDict` created in Example 1 except that lower case letters are used for the `keys`. 

In [None]:
# Example 4: Create dictionary using {}

# Create dictionary using curly braces
dnaBaseDict = {'a': 'adenine', 't': 'thymine', 'g': 'guanine',  'c': 'cytosine'}

# Print out the dictionary
dnaBaseDict


If you code is correct you should see the following output:
~~~text
{'a': 'adenine', 't': 'thymine', 'g': 'guanine', 'c': 'cytosine'}
~~~

### **Exercise 4: Create `rnaBaseDict` using curly braces `{}`**

In the cell below create a dictionary called `rnaBaseDict` using curly braces. Use lower case letters for the `keys`. 

In [None]:
# Insert your code for Exercise 4 here




If your code for Exercise 4 is correct you should see 
~~~text
{'a': 'adenine', 'u': 'uracil', 'g': 'guanine', 'c': 'cytosine'}
~~~

---------------------------------------------------------------
## Unique Keys

The keys of a mapping must be **_unique_** within the collection, because the dictionary has no way to distinguish different values indexed by the same key. 

----------------------------------------------------------------

### Bioinformatics Example of a Dictionary

Dictionaries are the natural Python representation for tabular data. For example, they can be used to hold rows obtained from a database table: the table’s primary key is the dictionary’s key, and the value can be a tuple, with each element of the tuple representing one column of the table. There is nothing strange about a dictionary key being part of the value associated with it — in fact, this is quite common. 

The code in the cell below shows how to create the RNA codon translation table using Python dictionary.  

In [None]:
# Bioinformatics Example

RNA_codon_table = {
#                        Second Base
#        U             C             A             G
# U
    'UUU': 'Phe', 'UCU': 'Ser', 'UAU': 'Tyr', 'UGU': 'Cys',     # UxU
    'UUC': 'Phe', 'UCC': 'Ser', 'UAC': 'Tyr', 'UGC': 'Cys',     # UxC
    'UUA': 'Leu', 'UCA': 'Ser', 'UAA': '---', 'UGA': '---',     # UxA
    'UUG': 'Leu', 'UCG': 'Ser', 'UAG': '---', 'UGG': 'Urp',     # UxG
# C
    'CUU': 'Leu', 'CCU': 'Pro', 'CAU': 'His', 'CGU': 'Arg',     # CxU
    'CUC': 'Leu', 'CCC': 'Pro', 'CAC': 'His', 'CGC': 'Arg',     # CxC
    'CUA': 'Leu', 'CCA': 'Pro', 'CAA': 'Gln', 'CGA': 'Arg',     # CxA
    'CUG': 'Leu', 'CCG': 'Pro', 'CAG': 'Gln', 'CGG': 'Arg',     # CxG
# A
    'AUU': 'Ile', 'ACU': 'Thr', 'AAU': 'Asn', 'AGU': 'Ser',     # AxU
    'AUC': 'Ile', 'ACC': 'Thr', 'AAC': 'Asn', 'AGC': 'Ser',     # AxC
    'AUA': 'Ile', 'ACA': 'Thr', 'AAA': 'Lys', 'AGA': 'Arg',     # AxA
    'AUG': 'Met', 'ACG': 'Thr', 'AAG': 'Lys', 'AGG': 'Arg',     # AxG
# G
    'GUU': 'Val', 'GCU': 'Ala', 'GAU': 'Asp', 'GGU': 'Gly',     # GxU
    'GUC': 'Val', 'GCC': 'Ala', 'GAC': 'Asp', 'GGC': 'Gly',     # GxC
    'GUA': 'Val', 'GCA': 'Ala', 'GAA': 'Glu', 'GGA': 'Gly',     # GxA
    'GUG': 'Val', 'GCG': 'Ala', 'GAG': 'Glu', 'GGG': 'Gly'      # GxG
}


-------------------

### Why not just use a Python `list` instead of a `dict`?

You might be asking yourself whether you could create the RNA codon table using a Python `list` instead of a `dict`? After all, a `list` can do everything, and more, that a `dict` can do. 

The answer is that while you could use a `list` to make an RNA codon table, it would be much harder to use. If our codon table was a `list`, we would have to write a lot of code in order to search through it to find the translation we want. 

As you will see below, using a dictionary for our codon table makes looking up the amino acid for a particular codon a trivial task.

-------------------

### Simple function to look-up an amino acid from the RNA codon table

In order to use our `RNA_codon_table` we first need to create the simple function shown in the cell below. The process of writing Python functions will be covered in a later lesson. 

In [None]:
# Create translation function

def translate_RNA_codon(codon):
    return RNA_codon_table[codon]

### Example 5: Find the amino acid coded by 'UUU'

The **_genetic code_** was first 'broken' in 1961 by Mashall Nirenberg and Heinrich Matthaei working at the National Institutes of Health (NIH). (In the picture below, Nirenberg can be seen sitting in his lab with Matthaei standing at the left). 

![](https://biologicslab.co/BIO1173/images/class_01/Nierenberg_matthaei.png)

To break the genetic code, Nirenberg and Matthaei synthesized an artifical strand of mRNA containing only one nucleotide, uracil ('U'). This special RNA is called **_poly(U)_**. 

When they placed their _poly(U)_ into a test tube containing ribosomes, a protein with repeating copies of the amino acid _phenylalanine_ (phe) was produced. From this they concluded that the codon "UUU" coded for the amino acid **phenlyalanine**. 

The Python code in the cell below shows how to use the `translate_RNA_codon()` function to look-up the amino acid coded by 'UUU'.

In [None]:
# Example 5: Use codon dictionary

# Look-up codon UUU
translate_RNA_codon('UUU')

If your code is correct you should see the following output:
~~~text
'Phe'
~~~
Just as Nirenberg and Matthaei found in there pioneering experiment, the codon `UUU` codes for the amino acid `Phe` or phenylanine. 

### **Exercise 5: Find the amino acid coded by 'AAA'**

Within a few years, Nirenberg and Matthaei repeated their experiment using mRNA strands containing only adenine (A) and cytocine (C). (For technical reasons, they weren't able to synthesis mRNA containing only guanine (G)).

In the cell below, write the Python code needed to find the amino acid coded by 'AAA'.

In [None]:
# Insert you code for Exercise 5 here

# Use function to look-up codon AAA
translate_RNA_codon('AAA')

If your code is correct you should see the following output:
~~~text
'Lys'
~~~
'Lys', the 3-letter abbreviation for the amino acid **lysine**. Nirenberg and Matthaei found that their synthetic _poly(A)_ created a polypeptide that only contained the amino acid lysine. 

## Python Sets

A Python **set** is an _unordered_ collection of items that contains _**no** duplicates_. As we will see, if you try to add an item that is already in a set, nothing happens. 

Since strings behave as collections, a string can be used as the argument for a call to set. The resulting set will contain a **single-character string** for each unique character that appears in the argument. The order in which the elements of a set are printed will not necessarily bear any relation to the order in which they were added as shown in Example 1.

### Example 6: Create a Set called `DNABases_set`

The code below shows how to create a set of single-character strings called `DNABase_set` using curly braces `{}`.

In [None]:
# Example 6

# Create a set called DNABases_set
DNABases_set = {'T', 'C', 'A', 'G'}

# Print DNABases_set
print(DNABases_set)

If you code is correct you should see something _similiar_the following output:
~~~text
{'C', 'A', 'G', 'T'}
~~~
You should notice two things. First our set is **not** the string "TCAG', but a collection of 4 single-character strings ("letters"), A, C, G and T. 

Second, the order in the "letters" when we created the set (TCAG) may not preserved. When we printed out the `DNABases_set`, the "letters" were printed out _alphabetically_. 

Also, a set is an _unordered_ collection. If you tried to use square brackets `[]` to index an item in a set, you would get the error message: `set object is not subscriptable`. 

### **Exercise 6: Create a Set called `RNABases_set`**

In the cell below, create a new set called `RNABases_set` and print it out. Remember that in RNA the base _uracil_ substitutes for the base _thymine_.

In [None]:
# Insert your code for Exercise 6 here



If your code is correct you should see something _similiar_ to the following output:
~~~text
{'U', 'A', 'G', 'C'}`
~~~
The order of the "letters" might be different.

### Example 7: Algebraic set operations - Union

In Python there are a number of operations and functions that work on different _collection_ types such as _sets_. In this example, we show one example of an operation called _union_. 

The "adding" of one set with another is called the _union_ of the two sets. In Python, you can use the `|` operator to create a **union** of two sets as shown in the next cell.

In [None]:
# Example 7: Union of 2 sets

# Create a new set called AddBases_set
AddBases_set =  {'X', 'Y', 'Z', 'U', 'U', 'A','A'}

# Use | to create union 
RNABases_set_union = RNABases_set | AddBases_set

# Print the new set
print(RNABases_set_union)

If your code is correct you should see something similiar to the following output:
~~~text
{'A', 'U', 'C', 'Y', 'Z', 'G', 'X'}
~~~
Notice that when we add the two sets together, only the letters `XYZ` were added to `RNABases_set`, not the additional `U`s and `A`s. 

>Why?

Because every element in a set must be **unique**. Since our original `RNABases_set` already contained the letters U and A, they were not added, only the new letters, X, Y and Z. In other words, a set can only contain one example of each element.

### **Exercise 7: Try to create a set with duplicated items**

Because each element in a set must be _unique_, when you try to create a set with duplicated items, you don't get an error, but only one item will be added to the set. 

In the cell below, create a set called `RNABases_set2` with `{'U', 'A', 'A', 'G', 'U', 'C', 'C'}` and then print out the set. 

In [None]:
# Insert your code for Exercise 7 here



If your code is correct, you should see: 
~~~text
{'A', 'C', 'G', 'U'}
~~~~
The new `RNABases_set2` only contains one example of each item.

### Example 8: Algebraic set operations - Intersection

Another algebraic set operation is **intersection**. 

The cell below uses the `&` operator to find the intersection of two sets.

In [None]:
# Example 8: Set intersection using & operator

# Create 2 sets using curly braces
let1_set = {'a','b','c','d','e'}
let2_set = {'c','d','e','f','g'}

# Use `&` to find their intersection
let_set_intersection = let1_set & let2_set

# Print out the intersection
print(let_set_intersection)

The output from the algebraic set operation, intersection, is:
~~~text
{'c', 'd', 'e'}
~~~
Set intersection is the set of elements that _both_ sets have in common. In this example, only the letters `c`, `d` and `e` were contained in both sets. 

### **Exercise 8: Algebraic set operations - Intersection**

In Example 8, set intersection was found using the `&` operator. Python also offers the `intersection()` method for accomplishing the same thing.  

The cell below use the  `intersection()` method to find the intersection between the same two sets, `let1_set` and `let2_set` used in Example 8.

(**HINT:** The use of Python _methods_ was covered in Class_01_04. Methods are called using _dot notation_. In this case, the `intersection()` method is attached (by the dot) to the first set and its argument is the second set.) 

In [None]:
# Insert your code for Exercise 8 here



If your code is correct you should see the following output:
~~~text
{'c', 'd', 'e'}
~~~

### Example 9: Use `add()` method with sets

A list is always enclosed in square braces `[ ]`, a tuple in parenthesis `( )`, and similarly a set is enclosed in curly braces `{ }`. 
Programs can add items to a set as they run. Programs can dynamically add items to a set with the add function. It is important to note that the **_append function_** adds items to lists, whereas the **_add function_** adds items to a set. 

In [None]:
# Example 9: Use add() method

# Manually add items, sets do not allow duplicates
mySet = set()
mySet.add('a')
mySet.add('b')
mySet.add('c')
mySet.add('c')
print(mySet)

The output is the set 
~~~text
{'a', 'b', 'c'} 
~~~
Sets can only contain _unique_ items so there is only one `c` in the final set.

### **Exercise 9: Use `append()` method with lists** 

While programs can dynamically add items to a set with the `add()` method you must use the `append()` method to add items to a list. 

In the cell below use the `append()` method function to adds item to a list called `myList` using **Example 9** as a template.

In [None]:
# Insert your code for Exercise 9 here



If your code is correct you should see the following list:  
~~~text
['a', 'b', 'c', 'c'] 
~~~
Lists (but _not_ sets) can contain _duplicate_ items so the letter `c` appears twice in this list.

## **Lesson Turn-in**

When you have completed all of the code cells, and run them in sequential order (the last code cell should be number 23). 

NOTE: You **won't** be able to use the **Restart the kernel and run all the cells** icon for this lesson, since there is an intentional error in **Exercise 2**. Whenever there is an error, the Python Interpreter stops. Simply go to the code cell after **Exercise 2** and manually run all of the remaining code cells in order, using either Shift+Enter or by clicking on the **Run this cell and advance** icon in the toolbar menu.   

Whe all of the cells have been run, use the **File --> Print.. --> Save to PDF** to generate a PDF of your JupyterLab notebook. Save your PDF as `Lesson_01_6.lastname.pdf` where _lastname_ is your last name, and upload the file to Canvas.