## Data types sequences, dictionaries and sets
### BIOINF 575 - Fall 2022

___
##### Jupyter lab documentation
https://jupyterlab.readthedocs.io/en/stable/user/interface.html
##### Keyboard shortcuts:
https://gist.github.com/kidpixo/f4318f8c8143adee5b40    might be a bit outdated     
https://ipython.readthedocs.io/en/1.x/interactive/notebook.html#keyboard-shortcuts     
Also in Settings > Advanced Settings Editor > Keyboard Shortcuts
___


#### Magics

https://ipython.readthedocs.io/en/stable/interactive/magics.html

https://towardsdatascience.com/top-10-magic-commands-in-python-to-boost-your-productivity-1acac061c7a9



In [None]:
# see the objects you added to your environment
%whos 

In [None]:
# list of magic commands
%lsmagic

In [None]:
# see the history of the code you ran
%history

#### Python - Data types

Typing is dynamic but strict in python (everything is an object and objects have type).<br><br>
The principal built-in types are numerics, sequences, mappings, classes, instances and exceptions.<br>
Numeric Types — int (subtype: bool), float, complex.<br>
Sequence Types — string, list, tuple, range.<br><br>
There are two major categories: mutable and immutable types.<br>
<b>Immutable</b> objects <b>cannot be changed</b>, while <b>Mutable</b> objects <b>can be changed</b>.<br>
Integer, Floats, Strings, Tuples, Boolean are immutable.<br>
List are mutable types.<br>

https://docs.python.org/3/library/stdtypes.html

https://media.geeksforgeeks.org/wp-content/uploads/20191023173512/Python-data-structure.jpg

<img src="https://media.geeksforgeeks.org/wp-content/uploads/20191023173512/Python-data-structure.jpg" width="400">




<b>Keywords and their description:</b><br>

In [None]:
help("keywords")

In [None]:
help("del")

<b>Calling a function:</b><br>

\<function name\><b>(</b>parameters<b>)</b>
    
Example: 
```python
print("This is a test")
```


In [None]:
# builtin functions
# https://docs.python.org/3/library/functions.html#built-in-funcs

#dir(__builtin__)

Get more information regarding a function or a data type using the <b>help(\<function\_name\>)</b> function or <b>\<function_name\>?</b> or <b>\<function_name\>??</b>.

In [None]:
x = input() # expects an input from the keyboard

In [None]:
x

In [None]:
# check the type
type(x)

In [None]:
# explicit conversion 
x = int(x)

In [None]:
x

In [None]:
type(x)

In [None]:
# evaluate string expressions
eval("x+5")

In [None]:
# build strings that have new lines using triple quotes
# we want to test some more builtin functions

sequence_data = '''>fasta file sequence
AAACGTACG
AAACGTA'''

In [None]:
# same as str - displays the string represenatation
sequence_data

In [None]:
# string representation
str(sequence_data)

In [None]:
# evaluates special characters like tabs and new lines
print(sequence_data)

In [None]:
# canonical represenation
# the one that if you use eval 
# on it creates the object 
repr(sequence_data)

In [None]:
eval(repr(sequence_data))

In [None]:
eval(repr(sequence_data)) == sequence_data

#### Working with the numerical data types

In [None]:
import math # use this for more complex math functions

In [None]:
dir(math)

In [None]:
# to format the numbers for a nice display use the format builtin function
help(format)

In [None]:
1000.5

In [None]:
# display 15 decimals

format(1000.5, "0.15f")

In [None]:
10**3

#### Floating point error - approximation error
https://docs.python.org/3/tutorial/floatingpoint.html  
"Floating-point numbers are represented in computer hardware as base 2 (binary) fractions.     
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions.       
A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine.
"

https://www.geeksforgeeks.org/floating-point-error-in-python/


In [None]:
format(0.1, "0.30f")

In [None]:
0.6 + 0.7

#### Bit-wise operations - apply <i>and</i> and <i>or</i> to each digit of the binary representation

1 is True   
0 is False      
& is and    
| is or


In [None]:
# 4 & 3 
# 100 & 
# 011
# ---
# 000 = 0

4 & 3

In [None]:
# 100 |
# 011
# ---
# 111 = 7

4 | 3

In [None]:
# any empty structure when converted to bool will be False 
bool('')

In [None]:
bool(0)

In [None]:
# any non-empty structure when converted to bool will be True 
bool("test")

In [None]:
bool(-8)

In [None]:
# looks for the first False
# and operator consideres any element as True or False 
# returns the last non-emlty structure or the first empty structure
100 and "test"

In [None]:
# looks for the first True
# or operator consideres any element as True or False 
# returns the last empty structure or the first non-empty structure
100 or "test"

In [None]:
bool(100 or "test")

In [None]:
bool(0 or "test")

In [None]:
bool(0 and "test")

___
### String - a sequence of characters that is immutable (unchangeable) and ordered (characters can be accessed by index)

<img src='https://journals.plos.org/ploscompbiol/article/figure/image?size=original&download=&id=10.1371/journal.pcbi.1004867.g001' width="600"/>

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004867


<img src='https://media.geeksforgeeks.org/wp-content/cdn-uploads/20200204160843/strings.jpg' width="400"/>

https://www.geeksforgeeks.org/python-strings/



#### Create a string - quotes (single, double, triple), str()
##### ' '    
##### " "     
##### ''' ''' - allows for multiple lines     
##### """ """ - allows for multiple lines

##### Use double quotes if you have single quotes in the string and the other way around.   
E.g. "We'll see more of these examples"

#### Sequence operations

<img src = "sequence_ops.png" width = "600"/>

https://docs.python.org/3/library/stdtypes.html


### EXAMPLES

In [None]:
"AGC" in "CCGTAGCTTAAGAA"

In [None]:
not "AGC" in "CCGTAGCTTAAGAA"

In [None]:
"AGC" not in "CCGTAGCTTAAGAA"

In [None]:
# using concatenation and multiplication to create a string

DNA_seq = 3 * "A" + 6 * "C" + "\n" + "T" + 8 * "G"
DNA_seq

In [None]:
print(DNA_seq)

#### Indexing - starts from 0
#### Get the element at a certain position from a sequence 
```python
s[i]
```

In [None]:
# int objects are not sequences - error

#10[0]


In [None]:
# to return the 3rd element use index 2 because it starts from 0

"AACGT"[2] 


In [None]:
DNA_seq

In [None]:
# use index 10 to get 11th character (the new line is also one of the characters)

DNA_seq[10]


#### Slicing - get a range of elements - give start:stop:step
#### Start is included stop is not included
```python
s[i:j:k]
```

In [None]:
# get one character - at index 2 - stop index not included
"AACGT"[2:3]

In [None]:
# subsettinh one element is the same as indexing for strings -- not for lists 
"AACGT"[2:3] == "AACGT"[2]

In [None]:
# get two characters
"AACGT"[2:4]

In [None]:
# go to the end
"AACGT"[2:]

In [None]:
# start from the beginning
"AACGT"[:3]

In [None]:
# take all elements - make a copy 
"AACGT"[:]

In [None]:
# take only every other element - use the step
# starts from the beginning - with the first element

"AACGT"[::2]


In [None]:
# no complains if we go over
"AACGT"[2:30]

In [None]:
# negative indexing or slicing starts from the end 
 
"AACGT"[-1]

In [None]:
"AACGT"[-4:-1]

In [None]:
"AACGT"[-4:]

In [None]:
"AACGT"[-2:5]

In [None]:
# reverse a sequence - negative step

"AACGT"[::-1]


#### Length of a sequence 
```python
len(s)
```

In [None]:
len("AACGT")

In [None]:
DNA_seq

In [None]:
len(DNA_seq)

#### Order of characters: ASCII code     
Because everything is 0 and 1 in the computer and we only know how to change numbers to 0 and 1.      
https://python-reference.readthedocs.io/en/latest/docs/str/ASCII.html

In [None]:
# ord - return the ASCII code of a character
ord("A")

In [None]:
ord("a")

In [None]:
# only for characters not strings - error
# ord("aa")

In [None]:
ord("\n")

In [None]:
# chr - return the character with a given ASCII code

chr(68)

In [None]:
chr(99)

In [None]:
# compare strings - it is alphabetical order but ASCII code is considered
"AMAZING" > "amino acid"

In [None]:
"amino acid" > "amino Acid"

In [None]:
min("amino acid")

In [None]:
min("aminoAcid")

In [None]:
max("amino Acid")

#### Find the index of a subsequence
```python
# x - subsequence, i and j are optional start and stop position
s.index(x,i,j) 
```
    

In [None]:
# help(str.index)

In [None]:
DNA_seq

In [None]:
DNA_seq.index("ACC")

In [None]:
# use start index - error not found

# DNA_seq.index("ACC", 5)


In [None]:
# use end index - error not found

# DNA_seq.index("CCC", 5, 6)


#### Count the occurences of a subsequence
```python
# x - subsequence, i and j are optional start and stop position
s.count(x,i,j) 
```


In [None]:
# help(str.count)

In [None]:
DNA_seq

In [None]:
DNA_seq.count("AA")

In [None]:
DNA_seq.count("CC")

In [None]:
# use start and end position

DNA_seq.count("CC", 10,2000)


#### String specific functions
* strip - remove leading and trailing whitepace or specific characters
* startswith - checks if string starts with substring
* endswith - checks if string ends with substring
* find - returns index of substring in string

In [None]:
# remove empty characters (space, tab, new line) at the beginning and end of the string

"\n AAAGC TTTAC \n GGGTA \t\n ".strip()


In [None]:
# check if the string startswith or endswith a certain substring

"ACGTAAC".startswith("AC")

In [None]:
# check endswith

"ACGTAAC".endswith("GAC")


In [None]:
# find - same as index but no error if not found - returns -1

help(str.find)


In [None]:
DNA_seq

In [None]:
DNA_seq.find("AA")

In [None]:
DNA_seq.find("DONE")

In [None]:
# error
# DNA_seq.index("DONE")

#### <font color = "red">Exercise</font>

A restriction enzyme is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. EcoRI is a restriction endonuclease enzyme isolated from species <i>E. coli</i>.  
EcoRI recognition site with cutting pattern indicated by a green line.

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/EcoRI_restriction_enzyme_recognition_site.svg/350px-EcoRI_restriction_enzyme_recognition_site.svg.png" width = "100" /> 

https://en.wikipedia.org/wiki/EcoRI  

- Find the position of the string in "GAATTCT" in the following two DNA sequences:    
"AACGTCAAGGTTCCTA"  
"ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT"
- If you find "GAATTCT", split the sequence in the two sequences resulted after the cut


In [None]:
## Write your solution here



#### More string functions

In [None]:
# replace substring in string
help(str.replace)

In [None]:
# translate specific characters into other characters
help(str.translate)

In [None]:
help(str.maketrans)

#### <font color = "red">Exercise</font>
- Compute the GC content (percentage of C and G bases) in the following DNA sequence:  
"AACGTCAAGGTTCCTA"  
- Compute the complement of the sequence (A<=>T,C<=>G)
- Reverse the sequence complement (compute the reverse strand on the DNA)


In [None]:
## Write your solution here



___
### List - a collection of elements that: allows duplicates, is orderred (can access its elements by index), and is mutable (changeable)

#### Instead of characters in our sequence we can have any type of data, and make changes to it

A list may be constructed in several ways:<br>

Using a pair of square brackets to denote the empty list: []   
Using square brackets, separating items with commas: [a], [a, b, c]   
Using a list comprehension: [x for x in iterable]  
Using the type constructor: list() or list(iterable)  
A list is a dynamic array of references, a contiguous allocation of references in memory  
https://docs.python.org/3/faq/design.html#how-are-lists-implemented-in-cpython   
https://www.tutorialspoint.com/difference-between-std-vector-and-std-array-in-cplusplus

Mutable sequence type operations:
    
<img src = "list_ops.png" width = "500"/>

_____
If two variables values are references to the same object then they both change the same location in memory and a change made with either one of them will show in both variables.


<img src="http://henry.precheur.org/python/list1.png" width=200 /><img src="http://henry.precheur.org/python/list2.png" width=200 />

To make a copy of a list use subsetting/slicing:<br>
```python
copy_list = initial_list[:]
```
Similarly we can make a copy using the .copy() function/method:<br>
```python
copy_list = initial_list.copy()  
#or
copy_list = list(initial_list)
```

Make a deep copy to be able to change any element without changing the initial list
```python
import copy
copy_list = copy.deepcopy(initial_list)
```


In [None]:
# list with differet types of elements

test_list = [1, [2, 3], 4, ["ACGT", "AAA"], [DNA_seq, (7, 8, 9)]]
test_list_backup = test_list 

In [None]:
test_list 

In [None]:
test_list_backup

In [None]:
# change the second element in the backup list - index 1

test_list_backup[1] = 543


In [None]:
test_list_backup

In [None]:
# the change also shows in the test_list

test_list


In [None]:
# making a shallow copy

test_list_backup = test_list[:]


In [None]:
test_list_backup

In [None]:
# change the third element (index 2) which to 600 

test_list_backup[2] = 400


In [None]:
test_list_backup

In [None]:
# the change does not show in the test_list
# it is at the first level of the list and therefore 
# the shallow copy made copies of these elements in 
# a different location in memory

test_list


In [None]:
# change "ACGT" to ["AC", "GT"] in backup list

test_list_backup[3][0] = ["AC", "GT"]


In [None]:
test_list_backup

In [None]:
# the change also shows in the test_list
# it is at the second level of the list and therefore 
# the shallow copy did not make copies of these elements in 
# a different location in memory

test_list


In [None]:
# make a deep copy to be able to change any element without changing the initial list
import copy

In [None]:
test_list_backup = copy.deepcopy(test_list)

In [None]:
# change 'AAACCCCCC\nTGGGGGGGG' to ('AAACCCCCC','TGGGGGGGG') in the backup list

test_list_backup[4][0] = ("AAACCCCCC","TGGGGGGGG")

In [None]:
# check the lists
test_list_backup


In [None]:
# the change does not show in the test_list
# it is at the second level of the list and therefore 
# the deep copy made copies of these elements in 
# a different location in memory
# it does that for elelemts at all levels

test_list


In [None]:
print(test_list)

#### <font color = "red">Exercise</font> 

Retrieve list elements using indices
* Retrieve the tuple from the test_list
* Retrieve the number from the list 8


In [None]:
# Write your solution here



##### Exploring a covid cases by county in Michigan 

In [None]:
county_covid_no = [130, 18, 55, 80, 150, 30, 200]

#### Add elements
* append - add element at the end
* insert - add element at index
* extend - can add multiple elements at the end

In [None]:
# reporting for another county
# add the object as element of the list

county_covid_no.append(31)


In [None]:
county_covid_no

In [None]:
# reporting for specific county
# at given index insert object in the list

county_covid_no.insert(2,19)


In [None]:
county_covid_no

In [None]:
# what if we want to add multiple elements at once
# append will add a list as an idividual element

county_covid_no.append([40, 38])


In [None]:
county_covid_no

In [None]:
# multiple counties start reporting 
# add each element of the given object as an element of the list

county_covid_no.extend([55, 22, 0])


In [None]:
county_covid_no

In [None]:
# string is a sequence extend the list withe string NONE
# it will add each element of the string as a list element

county_covid_no.extend("NONE")


In [None]:
county_covid_no

#### Remove element
* pop - retrieve the value and remove element
    * pop() - remove last element or 
    * pop(index) - remove element at index
* remove - remove first occurence of value
* del - delete element(s) retrieved by indexing/subsetting 
* clear - removes all list elements 

In [None]:
# reporting stopped for the last county 
# remove last element and retrieve value

county_covid_no.pop()

In [None]:
county_covid_no

In [None]:
# reporting stopped for the 10th county (index 9)
# remove element at index and retrieve value

county_covid_no.pop(9)


In [None]:
county_covid_no

In [None]:
# reporting stopped in more counties
# delete range of elements from the list

del county_covid_no[11:14]


In [None]:
county_covid_no

In [None]:
# reporting for the first county with 55 cases stopped
# remove first occurence of value 55

county_covid_no.remove(55)


In [None]:
county_covid_no

In [None]:
# remove bad reporting 
# remove first occurence of value "N"

county_covid_no.remove("N")


In [None]:
county_covid_no

In [None]:
# try again - if the value is not present in the list there will be an error displayed

# county_covid_no.remove("N")


In [None]:
county_covid_no

In [None]:
# clear/empty the list

county_covid_no.clear()


In [None]:
county_covid_no

#### Other list functions
* reverse - reverses the list
* sort - sorts the list

In [None]:
# reset the list

county_covid_no = ["A", 130, 18, 19, 80, 150, 30, 200, 31, 55, 22]


In [None]:
# reverse list

county_covid_no.reverse()


In [None]:
county_covid_no

In [None]:
# sort the list
# will give an error the values have to be comparable and int and str are not

# county_covid_no.sort()


In [None]:
county_covid_no

In [None]:
# remove the last element and then sort

bad_number = county_covid_no.pop()
print("removed bad number: ", bad_number)
county_covid_no.sort()


In [None]:
county_covid_no

<b>String join()</b> - makes a string outof the elements of a list using a given string as a separator/delimiter (default: any whitespace)

In [None]:
sequence = "-".join(["AACGT", "CCC", "TACG", "ATT"])
sequence

<b>String split()</b> - always creates a list - splits a string by a given separator/delimiter 

In [None]:
seq_list = sequence.split(sep = "-")
seq_list

#### <font color = "red">Exercise</font> 

* Assign an empty list to a variable amino_acids.  
* Add the following elements to the list: "arginine", "alanine"   
* Add the elements of the follwing list to the amino_acids lists all at once:
    - \["asparagine", "aspartic acid", "cysteine", "glutamine", "glutamic acid"] 
* Remove cysteine
* Sort the amino_acids list
* Remove the last element


In [None]:
## Write your solution here




_______________
### Range - an immutable sequence of numbers

It is commonly used for looping a specific number of times.  
It is not generated until needed - lazy loading - generator type

```python
# stop is not included
# start and step are optional 
# if no start i sprovided then the range starts from 0
# if no step is provided step 1 is used

range(stop)
range(start, stop[, step])  
```

In [None]:
range(5)

In [None]:
list(range(5))

#### <font color = "red">Exercise</font>
- Use the function range() to compute/create a list with numbers between 5 and 20 divisible by 3



____
### Tuple - a collection of elements, allows duplicates, is ordered, and is <u>unchangeable</u>
NO changes - NO additions, deletions, substitutions of elements!     
Faster than lists!   
All general sequence operations apply. 

Tuples may be constructed in a number of ways:
* Using a pair of parentheses to denote the empty tuple: ()
* Using a trailing comma for a singleton tuple: a, or (a,)
* Separating items with commas: a, b, c or (a, b, c)
* Using the tuple() built-in: tuple() or tuple(iterable)

      
UNPACKING a tuple assign each element of a tuple to a variable:
``` python
    gene_symbol, sequence, copy_number = ("GENE1", "AACTGA", 3)
```



In [None]:
# Only two methods available:
#  'count',
#  'index'

# dir(tuple)

In [None]:
# suitalbe for unchangeble sequences

nucleotides = ("A", "C", "G", "T")


In [None]:
nucleotides

__________________

### Dictionaries - a collection of key:value pairs, is unordered, has no duplicate keys, is changeable, and is indexed (by key).
### The mapping type

Dictionaries can be created by:
- placing a comma-separated list of key: value pairs within braces
    - ```{key1: value1, key2: value2, ..., key_n: value_n}```
    - item - ```key: value```
- by the dict constructor

dict(**kwarg)  
dict(mapping, **kwarg)<br>
dict(iterable, **kwarg)<br>

Dictionaries are unorderred - you access elemnts by key not index.     
Keys are unique and should of an immutable type.


<img src = "https://upload.wikimedia.org/wikipedia/commons/5/5b/GooglePythonClass_Day1_Part3_Pic.jpg" width = 400/>

https://commons.wikimedia.org/wiki/File:GooglePythonClass_Day1_Part3_Pic.jpg


In [None]:
#dir(dict)

In [None]:
# create empty dictionary

dict()


In [None]:
type({})

In [None]:
# create a dictionary from keys available in a sequence 
# and assign the same value to all

nucleotides_counts = dict.fromkeys("ACGT", 0)
nucleotides_counts


____
<img src = "https://labster-image-manager.s3.amazonaws.com/v2/PSL/053389c7-1738-464e-a13c-f68e79268fa2/PSL_Amino_acid_list.en.png" width = "450"/>

https://theory.labster.com/list-amino-acids/

In [None]:
# create dictionary using  {}
# provide key: value pairs separated by comma

amino_acids_map = {"Ser": "Serine", "Lys": "Lysine", "Ala": "Alanine"}


In [None]:
amino_acids_map

In [None]:
# create dictionary using dict()
# provide key_name = value pairs separated by comma

amino_acids_dict = dict(Ala = "Alanine", Ser = "Serine", Lys= "Lysine")


In [None]:
amino_acids_dict

In [None]:
# the order does not matter, the content does

amino_acids_map == amino_acids_dict


Common Sequence operations that apply to dictionaries</b>


|Operation                  |Description:                                  |
| --------                  | ----------                                   |
|<b>x in d</b>              |True if a key of d is equal to x, else False|
|<b>x not in d</b>          |False if a key of d is equal to x, else True|
|<b>len(d)</b>              |length of d - number of keys|
|<b>min(d)</b>              |smallest key of d|
|<b>max(d)</b>              |largest key of d|____

In [None]:
amino_acids_map

In [None]:
# check if key is in dictionary

"Ser" in amino_acids_map


In [None]:
# length of dictionary - number of keys

len(amino_acids_map)


In [None]:
min(amino_acids_map)

Dictionaries methods:

| Method       | Description                                                                                                 |
|--------------|-------------------------------------------------------------------------------------------------------------|
| clear      | Removes all the elements from the dictionary                                                                |
| copy       | Returns a copy of the dictionary                                                                            |
| fromkeys   | Returns a dictionary with the specified keys and value                                                      |
| get        | Returns the value of the specified key                                                                      |
| items      | Returns a list containing a tuple for each key value pair                                                   |
| keys       | Returns a list containing the dictionary's keys                                                             |
| pop        | Removes the element with the specified key                                                                  |
| popitem    | Removes the last inserted key-value pair                                                                    |
| setdefault | Returns the value of the specified key. If the key does not exist: insert the key, with the specified value |
| update     | Updates the dictionary with the specified key-value pairs                                                   |
| values     | Returns a list of all the values in the dictionary                                                          |

#### Retrieving a dictionary value - subset by key or use the get function

In [None]:
amino_acids_map

In [None]:
amino_acids_map["Lys"]

In [None]:
# key error

# amino_acids_map["Leu"]


In [None]:
help(dict.get)

In [None]:
amino_acids_map.get("Leu")

#### To add a new dictionary element - subset the dictionay using the new key and assign the value or use the setdefault or update functions

In [None]:
amino_acids_map

In [None]:
# if a new key is used then a new element is added

amino_acids_map["Cys"] = "Cysteine" 


In [None]:
amino_acids_map

In [None]:
# if an existing key is used then a the value is updated

amino_acids_map["Leu"] = "Leu cinne"


In [None]:
amino_acids_map

In [None]:
help(dict.setdefault)

In [None]:
# inserts a new element if key does not exists and returns the value for the key

amino_acids_map.setdefault("Leu", "Lcin")


In [None]:
amino_acids_map

#### The update function updates the value if the key exists or adds a new dictionary element if the key does not exist in the dictionary

In [None]:
# correcting the value using the update method

amino_acids_map.update({"Leu": "Leucine"})


In [None]:
amino_acids_map

In [None]:
# adding new values using the update methos
amino_acids_map.update({"Val": "Valeine", "Gln": "Glutamine"})

In [None]:
# return a collection containing the dictionary keys

amino_acids_map.keys()


In [None]:
# return a collection containing the dictionary values

amino_acids_map.values()


In [None]:
# return a collection of tuples of two containing 
# the dictionary items as tuples of (key, value)

amino_acids_map.items()


#### Removing elements: 
* del - subset by key to get element
* pop() - retrieve and removes element by key
* popitem() - retrieves and remove the last element
* clear() - removes all elements

In [None]:
amino_acids_map.popitem()

In [None]:
amino_acids_map

#### <font color = "red">Exercise</font> 
* Remove the element with Key "Lys" from the dictionary
* Create a string with all the aminoacids keys
* Use the get function to retrieve the amino acid Tryptophan, the key is Tpr if it does not exist, retun the empty string
* Add the amino acid Tryptophan to the dictinary

In [None]:
" ".join(sorted(amino_acids_map.keys()))

_______________
### Sets - a collection of elements, is unordered, has no duplicates, can be changeable or unchangeable.

A set object is an unordered collection of distinct objects.<br>
A set is mutable, unless it is a frozenset.<br>
To create a set use braces to separate set elements or the set([iterable]) constructor.<br>
Elements cannot be changed/updated - they have to be of an immutable type - but can be added and removed.<br>
The update() function can be used to add multiple elements.<br>

https://www.w3schools.com/python/python_ref_set.asp

<img src = "set_ops.png" width =  "840"/>

In [None]:
# dir(set)

In [None]:
# to create an ampty set always use the set function
# if you use {}, you create an empty dictionary

example_set = set()
example_set 


In [None]:
# you can use braces to create the set if it is not empty

model_organisms = {"human", "mouse", "rat", "fruit fly", "worm", "E coli"}


In [None]:
# sequence oerators and functions still apply:
# in, len, min, max the same as for dict - nothing related to indices

"human" in model_organisms


In [None]:
# add method to add an element - similar to append for lists

model_organisms.add("yeast")


In [None]:
model_organisms

In [None]:
# update to add multiple elements - similar to expand for lists

model_organisms.update(["zebrafish","frog"])


In [None]:
model_organisms

In [None]:
model_organisms.update("AB")

In [None]:
model_organisms

In [None]:
# remove element from set using the remove function
# raises exception if element does not exist

model_organisms.remove("rat")
model_organisms

In [None]:
# remove element from set using the discard function
# does not raise exception if element does not exist

model_organisms.discard("rat")
model_organisms

In [None]:
model_organisms.discard("frog")
model_organisms

 ________   
Set operations:
* Union - all elements from both sets    
* Intersection - common elements   
* Difference - elements spcific to one set

<img src = "https://www.typescript-training.com/static/a1f71546ccdea12ccbc62e7643b6aaa1/2bef9/venn.png" width = "300"/>

https://www.typescript-training.com/course/fundamentals-v3/06-union-and-intersection-types/
_______

In [None]:
study_transcriptomics = {"human", "mouse", "rat", "fruit fly", "worm", "E coli"}
study_proteomics = {"rat", "zebrafish", "frog", "yeast", "worm"}

In [None]:
# combine two sets using the union function

study_overall = study_transcriptomics.union(study_proteomics)
study_overall


https://docs.python.org/3/library/stdtypes.html#set

#### <font color = "red">Exercise</font> 

Given the previous 2 sets compute the intersection and difference.<br> 
Update study_proteomics with the elements of study_transcriptomics.<br>
Check if study_proteomics is now equal with study_overall. <br>


____
#### Resources:


https://docs.python.org/3/library/stdtypes.html      
https://www.tutorialspoint.com/python/python_variable_types.htm       
https://www.geeksforgeeks.org/python-data-types/      
https://www.oreilly.com/library/view/bioinformatics-programming-using/9780596804725/ch01.html      
http://doxey.uwaterloo.ca/tutorials/Python.html       
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf     
https://docs.python.org/3/tutorial/datastructures.html     
https://www.tutorialspoint.com/python/python_lists.htm     
https://www.w3schools.com/python/python_lists.asp
https://docs.python.org/3/tutorial/introduction.html     
https://www.w3schools.com/python/python_strings.asp     
https://www.w3schools.com/python/python_sets.asp     
https://docs.python.org/3/tutorial/datastructures.html#dictionaries    
https://docs.python.org/3/tutorial/datastructures.html#sets     
https://www.tutorialspoint.com/python3/python_dictionary.htm    
https://www.tutorialspoint.com/python_data_structure/python_sets.htm   



