#  Summary of previous notebook
1. Data types and their associated methods
    * Strings, float, integer, lists
    * format: variable_name.method()
    * special tricks such as variable_name.[tab key] will bring up method options for data type of variable_name   
2. Special manipulations with strings, ie. slicing, escape characters

# Summary of this notebook:
1. Lists
2. Tuples

# Reminder: Debugging strategies
## We have already used common debugging strategies:
----------------
1. print statements
2. googling error terms
3. COMMENTING OUR CODE!
   
## We haven't used yet:
1. Assertions
2. LLM?
3. Others?

# 1. Lists are a bit like strings...but mutable.
--------
1. Homogeneous data types
2. Access individual values
    * Remember: counting starts at 0
3. POINTERS.

#### [Pointers](https://xkcd.com/138)

In [None]:
# very, very short review on slicing
myDog="CHIHUAHUA"
#function: len()
#method: variable_name.method_name()
#() versus []
print(len(myDog))
# [inclusive lower index: exclusive upper index]
print(myDog[-3:])
print(myDog[:3])
print(myDog[-1])
print(myDog[3:])
# ----------------------------------------
#print(list_name[lowerbound:upperbound:increment]
# ----------------------------------------
# another format: list_name(lower,upper]
print(myDog[2::3])
print(myDog[2::-2])

# Lists (and loops)

Programming is particularly useful for repetitive processing of data sets that have multiple pieces of information (ie. not just one sequence but multiple sequences - an even unknown number of sequences).

** WE COUNT FROM 0 **

## Lists

* Built-in **data type** which hold a *flexible* number of pieces of related information that are *homogenous* - the same type (such as strings or numbers although you can also have lists of lists which would allow you to combine different data types)
* Allow us to store **many** elements in a **single variable**
        * Similar to strings in how they are indexed
            * Start counting at '0'
        * Use square brackets: []
* Lists do not have to be of a fixed length!
* **mutable**
    * Lists are objects which share similarities to strings but, unlike strings, lists are mutable (they can be changed after they are created). This is because lists are actually full of pointers to objects (strings point to the actual character)
    * they have methods (and this should make you happy since they are supposed to have methods as they are mutable!)
    
* You will sometimes need to create a list that is initially empty: empty_list=[]
* _Assignment_ to lists:
        list_name=[item_1,item_2]

## Check in with pythontutor.org
How are lists stored in comparison to strings and integers?

## Loops (covered in Module 4)
* Repetition with variation
* Allows us to process **lists** or **other iterable objects**  one element at a time

## More about lists
*Access individual values in a list by:*

        1. Using [number]
            * list_name[0], list_name[1], etc
            
        2. Using[-1]
             * Last element (if you don’t know how long a list is this is very useful)
             
        3. .index() method
            * See next cell for example
            * you need to use () instead of [] <-- if you forget this, you will raise an exception (and then you will almost certainly have to google the exception to figure out what went wrong)
            
        4. Access a portion of a list
            * Specify a range of elements from list using the format: [initial inclusive num: exclusive last num]
            * This method doesn't overwrite or modify original list!
            * examples:
                    1. item_list[0:2] <-- access first and second item. NB: if you want to access last item
                       ie. in this example, include item at index 2, you need to write it as:
                            * item_list[0:n+1]
                    2. item_list[3:] <-- access fourth up to the last item
                    3. item_list[:5] <-- access the first five items

In [None]:
# you could write your sequence as a list, for instance, but strings make more sense
# lists can be made up of any data types, not just letters, so they are more flexible than strings.
SeqAsList=['A','T','C','G','A','T','C','G','A','T','C','G','A','T','C']
# remember the SeqAsList.tab to see methods for list data type
#SeqAsList.

In [None]:
# DON'T RUN THIS CELL YET!
# Here is a basic example that we will build on:
# List of great apes
apes=["Gorilla gorilla", "Homo sapiens","Pan troglodytes"]
# WHAT WILL THIS DO?
print("The first Primate is "+apes[0])
print("The second Primate is "+apes[1])
print("The third Primate is "+apes[2])

# how to combine string with variable, two ways:
#print("The first Primate is ",apes[0:2])
#print("The first Primate is "+str(apes[0:2]))

#if you don’t know which element Homo sapiens is but you know that it is part of the list
# Note: we are parking the results into a new variable
primate_index=apes.index("Homo sapiens")
# What value does primate_index now hold? 1
print(primate_index)

#if you want the first two elements
first_couple=apes[0:2]
print(first_couple)

Why do you need lists?
* No fixed length so you can add items to the end of the list with .append()
* you can use len() to get the length of the list
* concatenate two lists together with + (just like you can do with strings)

Unlike strings (which are immutable), you can modify a list:
 * Once you have a list, you can easily replace one item for another (see next cell)

In [None]:
# Take a moment to try to finish the code: Replace item_3 with item_3b in the following list:
list_items=["item_1","item_2", "item_3"]


__Some Useful list methods for adding to existing lists:__

BEWARE: They each do slightly different things even they are ways of adding to a list

1. .append("new_item") <-- adds **one** element object to the end of an existing list (the added element could be a list itself)
2. .extend()<-- similar to append **but adding a second list to the end of an existing list will mean that you add each element from the second list one at a time**
3. \+ <-- concatenate, similar to extend
4. .join()<-- connects strings in a list but you need to use a connector like ",". Technically, it takes an 'iterable' (iterables can be interated over like strings or lists) and it needs to put string that was just created from a list into a string like so:
** this is a tiny bit tricky because you are using NO separator for joining so you  use an empty string indicated by "". Note that this is functionally similar to creating an empty list and then filling it up**
list_dummy=['D','O','G']
print("".join(list_dummy))
5. .insert(number, "new_item")
Warning: it is prudent to determine the length of an existing list before adding to it with the len(list_name) function

In [None]:
# .join is a bit tricky because it is utilized with a bizarre syntax so here is an example
list_dummy=['D','O','G']
list_dummy.append('y')
print(list_dummy)
print("*".join(list_dummy))
print(" ".join(list_dummy))
print("-".join(list_dummy))
print(len(list_dummy))
list_dummy.insert(2,'g')
print(list_dummy)

In [None]:
#Demonstrated are a few ways of adding items to a list and HOW THEY DIFFER:
# ------------------------------
#concatenate operator
L=["M","O","U","S","E"]
L=L+[8,9]
print(L)
print("***********")
#now we will set up a new list and use .extend method
L2=[1,2,3]
# ------------------------
# Usually I want you to practice good programming hygiene and assign the results of the
# method to a new variable. However, important note:
#  .extend and .append return None so, in case you try it,
# it is incorrect to assign the results of these methods to a new variable, like so:

#L3=L2.extend([4,5])
#print(L3)

# in fact, you can un-hash the above two lines and see what gets printed out. I'll wait.
# --------------------------
L2.extend([4,5])
print("Extend: ",L2)

# concatenate
L5= L+L2
print("Concatenate: ",L5)

#finally we will compare these two methods with .append method
L4=[10,11,12]
L4.append([13,14,15])
print("append: ",L4)
print("With append, what is the position where the entire list is added: ", L4.index([13,14,15]))
print("~~~~~~~~")
print("So all three items are added as ONE element: ", L4[3])


In [None]:
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]

print("The first Primate is: "+apes[0])
print("The second Primate is: "+apes[1])
print("The third Primate is: "+apes[2])

#what if you don't know which element Homo sapiens is but you know that it is a member
# of the list?
print("*"*20)
primate_index=apes.index("Homo sapiens")
print(primate_index)
# look at the beautiful print statements that orient me in the program. Excellent for
# trouble shooting and following along!
print("-"*10)
#now you are adding an additional primate by appending it
apes.append("Pan paniscus")
print("************************************")
print(apes)

#this list is increasingly poorly named since it is now a bunch of primates and not apes.
#apes.insert(len(apes),"Macaca arctoides")
apes.insert(2,"Macaca arctoides")
print(apes)

#replace the Macaca with Hylobates agilis
apes[2]="Hylobates agilis"
print(apes)

#add two lists together
monkeys=["Papio ursinus","Macaca arctoides"]
primates=apes+monkeys
print("~"*20)
print(primates)
print(str(len(apes))+" apes")
print(str(len(monkeys))+" monkeys")
print(str(len(primates))+" primates")

The first Primate is: Homo sapiens
The second Primate is: Pan troglodytes
The third Primate is: Gorilla gorilla
********************
0
----------
************************************
['Homo sapiens', 'Pan troglodytes', 'Gorilla gorilla', 'Pan paniscus']
['Homo sapiens', 'Pan troglodytes', 'Macaca arctoides', 'Gorilla gorilla', 'Pan paniscus']
['Homo sapiens', 'Pan troglodytes', 'Hylobates agilis', 'Gorilla gorilla', 'Pan paniscus']
~~~~~~~~~~~~~~~~~~~~
['Homo sapiens', 'Pan troglodytes', 'Hylobates agilis', 'Gorilla gorilla', 'Pan paniscus', 'Papio ursinus', 'Macaca arctoides']
5 apes
2 monkeys
7 primates


__Some Useful list methods for removing items from existing lists:__
Once again, some subtle differences between these methods...

1. .remove("old_item") <-- removes **first** matching item *if it finds it*
2. .pop(index)<-- removes item at index and returns it to you (so you could assign it to a variable, if you wanted to double check that it should be deleted)
3. del(list_name[index]) <-- not a method but it removes item at index but does not return it to you. It supports slicing syntax so you can delete everything above a set index in a list
4. .clear() <- removes all items from a list

In [None]:
#demonstrates the seemingly subtle differences between the above two methods and one command:
n=[1,3,5,7,7,9,11,13,15]

print("Here's our list: ", n)
# OR convert it to a string so we can concatenate it in a print statement
# print("Here's our list: " +str(n))
print("Let's investigate the standard ways of removing items from lists: ")
print("First, .pop")
print("-----------")
#what will this print? Will there be a return variable when using .pop method?
remove_pop = n.pop()
print("Does the .pop method return anything? or is it just None? "+ str(remove_pop))
# Instead of popping off the last element, you can specify the element
remove_pop_1 = n.pop(1)
print(remove_pop_1)

#Now print n
print("What does the list look like now?")
print(n)
print("*******************************************")
print("Okay. Now, use .remove")
print("-----------")
rem_remove =n.remove(7)
print("Does the .remove method return anything? or is it just None? "+str(rem_remove))
print("What does the list look like now?")
# the remove function should remove ONLY THE FIRST 7 THAT IT ENCOUNTERS SO THERE SHOULD STILL BE ONE PRESENT
print(n)
print("Finally, the del function ")
print("-----------")
del(n[0])
print("What does the list look like now?")
#What should be printed?
print(n)
#Slice
del(n[3:])
print("What does the list look like now?")
#What should be printed?
print(n)

Some other useful list methods:
https://docs.python.org/3/tutorial/datastructures.html

1. .index(element) <- returns the index of an element
2. list_name[-1] <- returns the last element of a list
3. .reverse()<- reverses list but changes the variable that they are used on
4. .sort(key, reverse = True or False) <- sorts list but changes the variable that they are used on
5. .count()<- counts how many are present in a list
we will also see other useful complex data later (like lists of dictionaries or lists of tuples) but, for now, we can also have a list of lists:

[[1,2,3],[4,5,6],[7,8],9]

In [None]:
hierarchy=["Kingdom","Phylum","Class","order","family","genus","species"]
print("Our initial list is: "+str(hierarchy))
print("~~~~~~~~")
print("Let's see what happens when we reverse this list")
hierarchy.reverse()
print(hierarchy)
print("*******")
print("What has happened? The original list has been modified!")
print(hierarchy)
print("..............")
print("What about sort? Does that also modify the list?")
hierarchy.sort()
print("by using .sort(), the original hierarchy list has been modified")
print(hierarchy)
print("~~~~~~~~~ reverse = True ~~~~~~~~~~~")
hierarchy.sort(reverse=True)
print(hierarchy)
print("****** Another example of slicing *******")
# Does this give you what you expect? replace -1, with -2
# try moving the -1 to [-1:] and [-1:0] to see what prints out
print(hierarchy[:-1])

Our initial list is: ['Kingdom', 'Phylum', 'Class', 'order', 'family', 'genus', 'species']
~~~~~~~~
Let's see what happens when we reverse this list
['species', 'genus', 'family', 'order', 'Class', 'Phylum', 'Kingdom']
*******
What has happened? The original list has been modified!
['species', 'genus', 'family', 'order', 'Class', 'Phylum', 'Kingdom']
..............
What about sort? Does that also modify the list?
by using .sort(), the original hierarchy list has been modified
['Class', 'Kingdom', 'Phylum', 'family', 'genus', 'order', 'species']
~~~~~~~~~ reverse = True ~~~~~~~~~~~
['species', 'order', 'genus', 'family', 'Phylum', 'Kingdom', 'Class']
****** Another example of slicing *******
['species', 'order', 'genus', 'family', 'Phylum', 'Kingdom']


In [None]:
# Simple example I stole this from codeacademy more than a dozen years ago
# and cut it down to demonstrate lists
# The list of animals is stored in one list variable, zoo_animals.
# This makes it easier to keep track of then if all the animals were separate variables

zoo_animals = ["pangolin", "cassowary", "sloth","platypus"]
print(id(zoo_animals))
# Here is a primative loop. We have not yet learned about condition/criteria so don't worry about 'if' yet.
if len(zoo_animals) > 3:
    print("The first animal at the zoo is the " + zoo_animals[0])
    print(id(zoo_animals[0]))
    print("The second animal at the zoo is the " + zoo_animals[1])
    print(id(zoo_animals[1]))
    print("The third animal at the zoo is the " + zoo_animals[2])
    print(id(zoo_animals[2]))
    print("The fourth animal at the zoo is the " + zoo_animals[3])
    print(id(zoo_animals[3]))
print(zoo_animals.index("cassowary"))
# how would we add a Manatee?
# could we add two elements, a Manatee and a Narwhal, simultaneously?

# In Class Questions!

1. (2 minutes) Mixed_list=["why", "was", 6, "afraid", "of", 7,"?"]
Make up your own 'mixed' list (note that while this not verboten, we usually use lists with homogeneous data types - it isn't good programming hygiene to mix our data types in a list).

2. (3 minutes) Can you make a "list of lists"?

3. (5 minutes) a. Add a fifth animal to the zoo above.
   b. Replace the sloth with a capybara.
   
4. (10 minutes) BATTLESHIP! Can you use lists to build a battleship board? A battleship board is 5 X 5 full of "O"s. For now, just built the board, we will expand on this example (which was stolen from a Code Academy example) as we learn about for loops.
   
5. (10 minutes) *Pseudocode question:* We all know that there is degeneracy in the universal genetic code. This means that every amino acid has one or more nucleotide codons. https://en.wikipedia.org/wiki/Codon_degeneracy
How can a nucleotide string be fed into a program where the codons are translated using a list?

In [None]:
# Question 1
# A list of mixed items would be called heterogeneous. As you can see, Python will allow you to create this hideous creation, but you shouldn't.


['why', 'was', 6, 'afraid', 'of', 7, '?']


In [None]:
# questions 2


[[1, 1, 1, 1], [1, 1, 1], [1, 1]]


In [None]:
#Question 3


['pangolin', 'cassowary', 'sloth', 'platypus', 'Eurasier']
2
['pangolin', 'cassowary', 'Capybara', 'platypus', 'Eurasier']


In [None]:
#Question 4

In [None]:
# list of lists by creating an empty board list and filling it.


# Pseudocode question:  

We know that there is degeneracy in the universal genetic code. This means that every amino acid has one or more nucleotide codons. https://en.wikipedia.org/wiki/Codon_degeneracy
How can a nucleotide string be fed into a program where the codons are translated using a list?
--------------

Answer:


# Tuples!


## What is a tuple?
* Another built-in data type
* Tuples can be thought of as an immutable list
* Like lists: multiple elements, you can iterate over elements and you retrieve a particular element by using []
* Unlike lists: use () to define them
* Immutability means that once you have assigned a tuple, you cannot change any of the elements!

      * This may seem a little odd but it can be great for troubleshooting a program to know that an element hasn’t changed
      

## Why have an immutable data structure?
* Fancy reasons such as optimization or trouble shooting code
    * faster than searching lists
* Mostly because…**it allows tuples to be as keys to a dictionary (which a list cannot do)**
<div class="alert alert-block alert-warning">
EXAMPLE

        x={(“CCA”,”CCC”,”CCG”,”CCT”):”Pro”}<-- fine (because the key is an immutable data type, a tuple)
    
        x={[“CCA”,”CCC”,”CCG”,”CCT”]:”Pro”}<-- NOT FINE. WILL RAISE EXCEPTION B/C key is mutable – list--data type so it can’t be a key of a dictionary

* Using a tuple is sort of a type of safe coding – like “write protected”
* Tuples are usually used with HETERGENEOUS data (elements of different types of data) and where the position  of the element tells you something about the data. THINK OF AN INDIVIDUAL ROW IN AN EXCEL SPREADSHEET - DIFFERENT TYPES OF INFORMATION STORED AT DIFFERENT POSITIONS
            * ie. Tuple of three elements: sequence, accession number, genetic code – you would always want these elements to be the same order!
            * Gene sequences are often found within tuples due to their immutable nature. This immutability ensures that once the elements are assigned, they cannot be changed!
  
<div class="alert alert-block alert-warning">
EXAMPLE

    #tuples where each element contains, in the #following order: sequence, accession number, #identification code
            tuple_1=(“atgctga”,”ABC123”, 1)
            tuple_2=(“tcgcgcg”,”DEF456”, 1)


### Immutable data types cannot be changed once they are created.
* this is very much unlike a list which can be changed
* This means you cannot append, swap, sort etc

        * tuples don't have methods! You CAN still use count(), len() functions on them (since these don't modify the tuple), but no methods that change the tuple.
        
        * tuples CAN be sliced, however.
        
        * tuples CAN be converted to a list (and lists can be converted to tuples)
        
        * Technically, by the way, we have already seen an immutable data type: Strings!
     
     EXAMPLE:
              S="spam"
              S[1]="c"
              
              this should raise an exception because once set you shouldn't be able to overwrite any of the characters in "spam".
             
             * STRINGS ARE IMMUTABLE - AFTER ALL, STRINGS CAN BE CHANGED- THEY HAVE METHODS!

In [None]:
# you can't change a string, remember. Trying to do so throws an exception:
S="spam"
S[1]="c"

TypeError: 'str' object does not support item assignment

 # An example of a genomic tuple.

 This is how genomic information might be used (it is - of course - simplified).

In [None]:
tuple_1=("atgctga","ABC123", 1)
tuple_2=("tcgcgc","DEF456",1)
# combine into nested tuple
tuple_3=(tuple_1,tuple_2)

# you can iterate over the tuple
for item in tuple_3:
    print("Here's the entire tuple: ",item)

# even though we aren't using the third item in each tuple, we still need to put a third item in as iterator.
for seq, name, loc in tuple_3:
    print("Here is the sequence: ", seq, " of ", name, " and it is : ", len(seq), " nt longs")

# we can slice out an individual element
print(tuple_3[0][0])

Here's the entire tuple:  ('atgctga', 'ABC123', 1)
Here's the entire tuple:  ('tcgcgc', 'DEF456', 1)
Here is the sequence:  atgctga  of  ABC123  and it is :  7  nt longs
Here is the sequence:  tcgcgc  of  DEF456  and it is :  6  nt longs
atgctga


# In Class Question:
grades = [("Elsa",(90,89,95)),("John", (91, 88, 94)),("Terry", (80, 100, 90))]
* How would we take this list of tuples and iterate through each item?
* Can calculate the average score for each student?

In [None]:
# Question
grades = [("Elsa",(90,89,95)),("John", (91, 88, 94)),("Terry", (80, 100, 90))]


Here is the total tuple:  ('Elsa', (90, 89, 95))
Here is the name of the learner:  Elsa
Here are the scores:  (90, 89, 95)
91.33333333333334
Here is the total tuple:  ('John', (91, 88, 94))
Here is the name of the learner:  John
Here are the scores:  (91, 88, 94)
91.0
Here is the total tuple:  ('Terry', (80, 100, 90))
Here is the name of the learner:  Terry
Here are the scores:  (80, 100, 90)
90.0


* Summary of tuples
    * IMMUTABLE
    * can act as KEYS for dictionaries
    * DO NOT HAVE METHODS BECAUSE THEY ARE IMMUTABLE (yes, strings continue to be weird)



