# Iteration

Last time, we looked at basic types (`bool`, `int`, `str`, `float`). 

This time, we'll look at a few more types to be able to tackle our first 
[Rosalind Problem](https://rosalind.info/problems/dna/).

- `list`: ordered collection of values, modifyable during execution.
- `dict`: dictionary. A key-value mapping, like a collection of variables 

When we want to 'iterate' over some collection, interacting with each value,
we use a `for` loop.

## Lists

Let's try looking at each element in a list.

In [None]:
for value in [1, 2, 3, 4, 5]:
    if value == 3:
        continue  # skip the rest of the loop for this iteration
    print(value * 2)

For just getting a predictable sequence of integers, we can use `range(int)`, 
which takes up to three arguments: start, stop, and step.

In [None]:
for value in range(5):
    print(f"All : {value}")

for value in range(2, 10, 2):
    print(f"Even: {value}")

We can select a specific item from a list, or a range of items, using `[]`. 
The first item is always the '0th' item, and the last is the '-1st'.

In [None]:
my_list = list(range(10))

print(f"First: {my_list[0]}")
print(f"Last: {my_list[-1]}")

A 'slice' selects a range of values using the same 'start,stop,step' format as
`range`, but using `:` instead of `,` and indicies instead of values.

In [None]:
print(f"Slice: {my_list[2:5]}")  # Slicing from index 2 to 4
print(f"Reverse: {my_list[::-1]}")  # Reverse the list, -1 step

Try to make a slice that will do every other item, starting at the end.

In [None]:
print(f"Every other, starting at the end: {value}")

## Dicts 

A dictionary might be made with either of the following notations:

In [None]:
dict1 = {"a": 1, "b": 2, "c": 3}
dict2 = dict(a=1, b=2, c=3)  # Using keyword arguments
print(dict1 == dict2)  # True, both dictionaries are equal

We can get the value using the key with `[]` notation

In [None]:
dict1["a"] == 1

In [None]:
dict1["d"] = 4  # Adding a new key-value pair
dict1["b"] = 0  # Updating an existing key-value pair

But we run into issues if we try to fetch a value that doesn't exist

In [None]:
dict1["f"]

We can instead use `get`, which will return `None` or some default.

In [None]:
dict1.get("f") == None  # Returns None if the key 'f' does not exist

In [None]:
dict1.get("f", 0) == 0  # Returns 0 if the key 'f' does not exist

Just like lists, we can iterate over dictionaries, but we have to specify if we
want the `keys`, the `values` or all `items`

In [None]:
for key in dict1.keys():
    print(f"Key: {key}, Value: {dict1[key]}")
for key in dict2:  # keys() is implicitly called
    print(f"Key: {key}, Value: {dict2[key]}")

In [None]:
for value in dict1.values():
    print(f"Value: {value}")
for key, value in dict1.items():
    print(f"Key: {key}, Value: {value}")

## Strings

A `for` loop will iterate over each value in a string.


In [None]:
for letter in "abc":
    print(f"Letter: {letter}")

We can use the `enumerate` function to get both index and value

In [None]:
for index, letter in enumerate("abc"):
    print(f"Index: {index}, Letter: {letter}")

# Problem

**Given**: A DNA string `s` of length at most 1000 nt.

**Return**: Four integers (separated by spaces) counting the respective number
    of times that the symbols 'A', 'C', 'G', and 'T' occur in s

Let's look at a sample:

In [None]:
s = "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
ret = "20 12 17 21"

This repository comes with a validator to check solutions. No peeking!

In [None]:
import os

if os.getcwd().endswith("notebooks"):
    os.chdir("..")

from src.dna import validator

validator(s, ret)

What can you do to process this input `s` and return `ret`?

In [None]:
# your code here
ret = "something"

validator(s, ret)  # Validate your answer

Click on the triangle to see the hint

<details><summary>Hint 1</summary>

How can this be broken down into steps? 

1. Look at each part of the string.
2. Categorize each as one of four types
3. Add to a running sum based on the input
4. print out the string with these values.

</details>

Now, we'll try on a larger dataset:

In [None]:
with open("data/rosalind_dna.txt", "r") as file:
    s = file.read().strip()  # Get a larger DNA sequence from a file

# Your code here

More hints: 

<details><summary>Hint 2</summary>

Given the steps above, how can we translate them into code?

1. We can loop over each character in the string with `for char in s:`
2. We can use an `if` to check if the character is 'A', 'C', 'G', or 'T'

</details>

More: 

<details><summary>Hint 3</summary>

3. We can start a dictionary to count the occurrences of each nucleotide.
4. We can increment the count for each nucleotide in the dictionary and
   return the counts as a string.
</details>

One solution (there are many!): 

<details><Summary>Solution</summary>

```python 
my_count = dict(A=0, C=0, G=0, T=0)
for letter in s:
    if letter in my_count:
        my_count[letter] += 1
my_return = f'{my_count["A"]} {my_count["C"]} {my_count["G"]} {my_count["T"]}'
```

</details>

# Advanced

There are other but there are other built-ins than can help us solve this 
problem in even fewer lines of code.

<details><summary>Other methods</summary>

- `my_str.count(char)`: count the instances of `char` in `my_str` 
- `my_str.split(char)`: separate `my_str` into a list based on `char`. Default
     is spaces 
- `map(my_type, my_list)`: wrangle each item in `my_list` to `my_type`
- `zip(list1, list2)`: return items of matching indices in each list

</details>

Try using one or two of these methods in a new solution. Or, try building 
a validator function.

In [None]:
# Your new code here