#  Summary of previous notebook 
---
1. Data types and their associated methods
    * Strings, float, integer, lists
    * format: variable_name.method()
    * special tricks such as variable_name.[tab key] will bring up method options for data type of variable_name   
2. Special manipulations with strings, ie. slicing, escape characters

## [Rosalind Problem](https://rosalind.info/problems/revc/)
This problem utilizes what we learned in Day2B.

**Problem**
In DNA strings, symbols 'A' and 'T' are complements of each other, as are 'C' and 'G'.

The reverse complement of a DNA string `s` is the string `sc` formed by reversing the symbols of `s`, then taking the complement of each symbol (e.g., the reverse complement of "GTCA" is "TGAC").

__Given:__ A DNA string `s` of length at most 1000 bp.

__Return:__ The reverse complement `sc` of `s`.

Sample Dataset: AAAACCCGGT

Sample Output: ACCGGGTTTT

## Common debugging strategies: 

Strategies we've used:
1. Print statements
2. Googling error terms
3. COMMENTING OUR CODE!
   
Strategies we haven't used yet:
1. Assertions
2. LLM?
3. Others?

## Slicing Review

In [None]:
# very, very small review on slicing
myDog="CHIHUAHUA"
#() versus []
print(len(myDog))
print(myDog[-3:])
print(myDog[:3])
print(myDog[-1])
print(myDog[3:])

# Summary of this notebook: 
---
1. Lists
   - what they are
   - slicing
   - methods
       - how to add items
       - how to remove items
2. More Rosalind Problems

# Lists
---
Programming is particularly useful for repetitive processing of data sets that have multiple pieces of information (ie. not just one sequence but multiple sequences - an even unknown number of sequences). 

Lists are a bit like strings... but mutable. 

1. Homogeneous data types
2. Access individual values
    * Remember: **Counting starts at 0!**
3. [POINTERS](https://xkcd.com/138)

## Characteristics of lists
* Built-in **data type** which hold a *flexible* number of pieces of related information that are *homogenous* - the same type (such as strings or numbers although you can also have lists of lists which would allow you to combine different data types)
* Allow us to store **many** elements in a **single variable**
    * Similar to strings in how they are indexed 
        * Start counting at '0'
        * Use square brackets: []
* Lists do not have to be of a fixed length!
* Lists are **mutable**
    * Lists are objects which share similarities to strings but, unlike strings, lists are mutable (they can be changed after they are created). This is because lists are actually full of pointers to objects (strings point to the actual character)
    * Lists have methods (and this should make you happy since they are supposed to have methods as they are mutable!)
* You will sometimes need to create a list that is initially empty: `empty_list=[]`
* _Assignment_ to lists:
        `list_name=[item_1,item_2]`

### Check in with [pythontutor](https://pythontutor.com/)
How are lists stored in comparison to strings and integers? 

## Indexing 
Access individual values in a list by:

1. Using `[number]`
    * `list_name[0]`, `list_name[1]`, etc
2. Using `[-1]`
    * Last element (if you don’t know how long a list is this is very useful)
3. .index() method
    * See next cell for example
    * Note you need to use () instead of [] 
        * if you forget this, you will raise an exception (and then you will almost certainly have to google the exception to figure out what went wrong)
4. Access a portion of a list
    * Specify a range of elements from list using the format: `[initial inclusive num: exclusive last num]`
        * This method doesn't overwrite or modify original list!
        * Examples: 
            1. `item_list[0:2]` <-- access first and second item. 
                * If you want to access last item (in this example, include item at index 2), you need to write it as: `item_list[0:n+1]`
            2. `item_list[3:]` <-- access fourth item up to the last item
            3. `item_list[:5]` <-- access the first five items

In [None]:
# you could write your sequence as a list, for instance, but strings make more sense
# lists can be made up of any data types, not just letters, so they are more flexible than strings. 
SeqAsList=['A','T','C','G','A','T','C','G','A','T','C','G','A','T','C']
# remember the SeqAsList.tab to see methods for list data type
#SeqAsList.

### [Rosalind Problem](https://rosalind.info/problems/ini3/)

**Given:** A string `s` of length at most 200 letters and four integers `a`, `b`, `c` and `d`.

**Return:** The slice of this string from indices `a` through `b` and `c` through `d` (with space in between), inclusively. In other words, we should include elements `s[b]` and `s[d]` in our slice.

Sample Dataset: <br>
`s = HumptyDumptysatonawallHumptyDumptyhadagreatfallAlltheKingshorsesandalltheKingsmenCouldntputHumptyDumptyinhisplaceagain.` <br>
`a = 22` <br>
`b = 27` <br>
`c = 97` <br>
`d = 102`

Sample Output: <br>
Humpty Dumpty

### Indexing Practice

In [None]:
# DON'T RUN THIS CELL YET!
# Here is a basic example that we will build on:
# List of great apes
apes=["Gorilla gorilla","Homo sapiens","Pan troglodytes"]
# WHAT WILL THIS DO?
print("The first Primate is "+apes[0])
print("The second Primate is "+apes[1])
print("The third Primate is "+apes[2])

#if you don’t know which element Homo sapiens is but you know that it is part of the list
# Note: we are parking the results into a new variable
primate_index=apes.index("Homo sapiens")
# What value does primate_index now hold?
print(primate_index)

#if you want the first two elements
first_couple=apes[0:2]
print(first_couple)

## Modifying Lists

Why do you need lists?
* No fixed length so you can add items to the end of the list with .append()
* you can use len() to get the length of the list
* concatenate two lists together with + (just like you can do with strings)

Unlike strings (which are immutable), you can modify a list: 
 * Once you have a list, you can easily replace one item for another (see next cell)

In [None]:
#Example: Replace item_3 in the following list:
list_items=["item_1","item_2", "item_3"]
list_items[2]="new_item"
print(list_items)

### Methods for adding to existing lists:

BEWARE: They each do slightly different things even though they are all ways of adding to a list.

1. `.append("new_item")` <-- adds **one** element object to the end of an existing list
   * the added element could itself be a list
3. `.extend()` <-- similar to append
   * __but adding a second list to the end of an existing list will mean that you add each element from the second list one at a time__
5. `+` <-- concatenate, similar to extend
6. `.join()` <-- connects strings in a list but you need to use a connector like ",".
    * Technically, it takes an 'iterable' (iterables can be interated over like strings or lists) and it puts the string that was just created from a list into a string like so: 
        * This is a tiny bit tricky because you are using NO separator for joining so you use an empty string indicated by "".
            * Note that this is functionally similar to creating an empty list and then filling it up.
            * `example_list=['D','O','G']`
            * `print("".join(example_list))`
8. `.insert(number, "new_item")`


**Warning:** it is prudent to determine the length of an existing list before adding to it with the `len(list_name)` function

#### Example: Append, Extend, Concatenate, and Insert

In [None]:
# Setting up our list
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]
print(apes)

# look at the beautiful print statements that orient me in the program. Excellent for
# trouble shooting and following along!
print("-"*20)

# ------------------------

#APPEND: now you are adding an additional primate by appending it
apes.append("Pan paniscus")
print(apes)
print("*"*20)

# ------------------------

# EXTEND: add two additional primates using extend
apes.extend(["Pongo abelii", "Pongo pygmaeus"])
print(apes)
print("."*20)

# ------------------------

# Extra note on EXTEND & APPEND

# Usually I want you to practice good programming hygiene 
# and assign the results of the method to a new variable. 
# However, important note: 
# .extend and .append return the data type "None" which is a special data type so, in case you try it, 
# it is incorrect to assign the results of these methods to a new variable, like so: 

#extend_test=apes.extend(["Papio ursinus","Macaca arctoides"])
#print(extend_test)

# in fact, you can un-hash the above two lines and see what gets printed out. I'll wait.

# --------------------------

# INSERT

#apes.insert(len(apes),"Macaca arctoides")
apes.insert(2,"Macaca arctoides")
print(apes)

#this list is increasingly poorly named since it is now a bunch of primates and not apes.
#replace the Macaca with "Hylobates agilis"
apes[2]="Hylobates agilis"
print(apes)
print("~"*20)

# --------------------------

# Concatenate
#Make a second list of monkeys, and then add to it using concatenation
monkeys=["Papio ursinus","Macaca arctoides"]
monkeys=monkeys+["Ateles paniscus", "Macaca radiata"]
print(monkeys)
print("-"*20)
print('\n')

# --------------------------

# Comparing concatenate, extend, and append
apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]
primates=apes+monkeys
print("Primate list created with concatenation: ", primates)
print('\n')

apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]
apes.extend(monkeys)
print("Primate list created with extend: ", apes)
print('\n')

apes = ["Homo sapiens", "Pan troglodytes", "Gorilla gorilla"]
apes.append(monkeys)
print("Primate list created with append: ", apes)
print('\n')

print("With append, the position where the entire list is added: ", apes.index(monkeys))
print("So all four items are added as ONE element: ", apes[3])


#### Example: Join

In [None]:
# .join is a bit tricky because it is utilized with a bizarre syntax so here is an example 
example_list=['D','O','G']
print(example_list)

print("-"*20)

print("*".join(example_list))
print(" ".join(example_list))
print("-".join(example_list))
print("".join(example_list))

print("-"*20)

print("Our list is ", type(example_list))
example_str = "".join(example_list)
print("The .join method produces an object of ", type(example_str))

## Methods for removing items from existing lists:

Once again, some subtle differences between these methods...

1. `.remove("old_item")` <-- removes **first** matching item *if it finds it* 
2. `.pop(index)`<-- removes item at index and returns it to you (so you could assign it to a variable, if you wanted to double check that it should be deleted)
3. `del(list_name[index])` <-- not a method but it removes item at index but does not return it to you. It supports slicing syntax so you can delete everything above a set index in a list
4. `.clear()` <- removes all items from a list

In [None]:
#demonstrates the seemingly subtle differences between the above two methods and one command:
n=[1,3,5,7,9,7,11,13,15]

print("Here's our list: ", n)
# OR convert it to a string so we can concatenate it in a print statement
# print("Here's our list: " +str(n))
print("Let's investigate the standard ways of removing items from lists:\n")
print("First, .pop")
print("-----------")
#what will this print? Will there be a return variable when using .pop method? 
remove_pop = n.pop()
print("What does the .pop method return?: "+ str(remove_pop))
# Instead of popping off the last element, you can specify the element
remove_pop_1 = n.pop(1)
print("Now we've also removed the second element:", remove_pop_1)

#Now print n
print("What does the list look like now?")
print(n)
print("*******************************************\n")

print("Okay. Now, use .remove")
print("-----------")
rem_remove =n.remove(7)
print("What does the .remove method return?: "+str(rem_remove))
print("What does the list look like now?")
# the remove function should remove ONLY THE FIRST 7 THAT IT ENCOUNTERS SO THERE SHOULD STILL BE ONE PRESENT
print(n)
print("*******************************************\n")

print("Finally, the del function ")
print("-----------")
del(n[0])
print("What does the list look like now?")
#What should be printed?
print(n)
#Slice 
del(n[3:])
print("What does the list look like now?")
#What should be printed?
print(n)

## Some other useful list methods:

https://docs.python.org/3/tutorial/datastructures.html

1. `.index(element)` <- returns the index of an element
2. `list_name[-1]` <- returns the last element of a list
3. `.reverse()` <- reverses list but changes the variable that they are used on
4. `.sort(key, reverse = True or False)` <- sorts list but changes the variable that they are used on
5. `.count()` <- counts how many elements are present in a list.

We will also see other useful complex data later (like lists of dictionaries or lists of tuples) but, for now, we can also have a list of lists:
`[[1,2,3],[4,5,6],[7,8],9]`

In [None]:
hierarchy=["Kingdom","Phylum","Class","order","family","genus","species"]
print("Our initial list is: "+str(hierarchy))
print("~~~~~~~~")

print("Let's see what happens when we reverse this list")
hierarchy.reverse()
print("What has happened? The original list has been modified!")
print(hierarchy)
print("..............")

print("What about sort? Does that also modify the list?")
hierarchy.sort()
print("by using .sort(), the original hierarchy list has been modified")
print(hierarchy)
print("~~~~~~~~~~~~~~~~~~~~")
print("What happens when we do .sort(reverse=True)?")
hierarchy.sort(reverse=True)
print(hierarchy)

print("*************")
print("Another example of slicing:")
# Does this give you what you expect? replace -1, with -2
# try moving the -1 to [-1:] and [-1:0] to see what prints out
print(hierarchy[:-1])

# EXTRA: In Class Questions with included answers!
---

1. (2 minutes) `Mixed_list=["why", "was", 6, "afraid", "of", 7,"?"] `
    * Make up your own 'mixed' list (note that while this not forbidden, we usually use lists with homogeneous data types - it isn't good programming hygiene to mix our data types in a list). 
<br><br>
3. (3 minutes) Can you make a "list of lists"?

4. (5 minutes) `zoo_animals = ["pangolin", "cassowary", "sloth","platypus"]`
    1. Add a fifth animal to the zoo above. 
    2. Replace the sloth with a capybara.
<br><br>
5. (10 minutes) BATTLESHIP! Can you use lists to build a battleship board? A battleship board is 5 X 5 full of "O"s. For now, just built the board, we will expand on this example (which was stolen from a Code Academy example) as we learn about for loops.
   
6. (10 minutes) *Pseudocode question:* There is degeneracy in the universal genetic code, because every amino acid has one or more nucleotide codons. If you'd like, you can [read more about codon degenerecy](https://en.wikipedia.org/wiki/Codon_degeneracy).
How can a nucleotide string be fed into a program where the codons are translated using a list?

In [None]:
# Question 1
# A list of mixed items would be called heterogeneous. As you can see, Python will allow you to create this hideous creation, but you shouldn't. 
Mixed_list=["why", "was", 6, "afraid", "of", 7,"?"] 
print(Mixed_list)
# lists should be ONE type of data: all strings, all characteres, all integers etc. 

In [None]:
# questions 2
# yes, we can make a list of lists - even lists of of unequal length are still lists
listoflists=[[1,1,1,1],[1,1,1],[1,1]]
print(listoflists)

In [None]:
#Question 3
# how do we add a fifth animal? There are multiple ways, but perhaps the most straight forward is to use append
zoo_animals = ["pangolin", "cassowary", "sloth","platypus"]
# I currently have a new Eurasier puppy so I am adding that to my imaginery zoo since puppies are currently on my mind
zoo_animals.append("Eurasier")
print(zoo_animals)
# can we replace the sloth with the second best animal: a capybara (the first best is obviously a platypus)
# in this case the list is small, but we might want to start by identifying the location of what we are trying to replace
sloth_index=zoo_animals.index("sloth")
print(sloth_index)
# now that we know that this is in third slot (index =2),we can replace it
zoo_animals[2]="Capybara"
print(zoo_animals)

In [None]:
#Question 4 - we took suggestions for this together because the breakout rooms weren't working. 
# there were a few ways to produce a board. This first way is tedious, and probably isn't 
# entirely what we are after, but it is acceptable. It produces a list of lists. 
row1=[0,0,0,0,0]
row2=[0,0,0,0,0]
row3=[0,0,0,0,0]
row4=[0,0,0,0,0]
row5=[0,0,0,0,0]
board=[row1,row2,row3,row4,row5]
print(board)
# this strategy produces an actual 'board' instead of a list of lists. It is still not efficient, but it is probably closer to a 
# boardgame. Notice that there is uniformity of the elements and repetition which would make this a good candidate for the 
# for loops that we will learn about in Module 3!
print("`````")
print(row1[0],row2[0],row3[0],row4[0],row5[0])
print(row1[1],row2[1],row3[1],row4[1],row5[1])
print(row1[2],row2[2],row3[2],row4[2],row5[2])
print(row1[3],row2[3],row3[3],row4[3],row5[3])
print(row1[1],row2[1],row3[1],row4[1],row5[1])
print("`````")
# another list of lists
row1=[[0,0,0,0,0]]*5
print(row1)
print("`````")
# list of lists by creating an empty board list and filling it. 
board2=[]
board2.append(["O"]*5)
board2.append(["O"]*5)
board2.append(["O"]*5)
board2.append(["O"]*5)
board2.append(["O"]*5)
print(board2)

## Pseudocode question:  

**Question:** There is degeneracy in the universal genetic code, because every amino acid has one or more nucleotide codons.
How can a nucleotide string be fed into a program where the codons are translated using a list?


**Answer:** There will be multiple legitimate strategies to this question. One might be to cut up the string into trinucleotides (3 nucleotides) and place each trinucleotide into an element of a list. Like so: 
"ATGTTTTTA" would be ["ATG","TTT","TTA"]

Then we will want to cycle through each element of this list and translate it into the appropriate Amino Acid using the codon rules. 
