# <span style="color:red">Before you turn this in, make sure everything runs as expected.</span>

1. **RESTART THE KERNEL** – in the menubar, select Kernel$\rightarrow$Restart
2. **RUN ALL CELLS** – in the menubar, select Cell$\rightarrow$Run All
3. **VALIDATE THE NOTEBOOK** – in the menubar, click the Validate button

## <span style="color:blue">How to Answer Questions</span>

### <span style="color:blue">Python code answers</span>

Enter your answer any place that says
```python
# Enter your code here
```
<span style="color:red">**AND delete the text.**</span>
```python
raise NotImplementedError # No Answer - remove if you provide an answer
```

### <span style="color:blue">Written answers</span>

Enter your answer any place that says
```
YOUR ANSWER HERE.
```

In [None]:
ANUID = "u7522927"

---

# Quiz 1 -- Sequence comparison foundations

This quiz is worth 1% of the course.

**DO NOT USE NUMPY FOR THIS QUIZ** If you do, you lose all marks for questions where `numpy` was applied.

## Q1 -- how many k-mers

Consider a 4 nucleotides long DNA sequence `ACGT`.

- When $k=1$, the number of 1-mers is four: `['A', 'C', 'G', 'T']`.
- When $k=2$, the number of 2-mers is 3: `['AC', 'CG', 'GT']`.

Complete the function `get_number_of_kmers()` so that it returns the number of k-mers when given a sequence length and a value for $k$.

As indicated by the function type hints, `get_number_of_kmers()` has 2 arguments
- `seq_length` is an integer
- `k` is an integer
-  `get_number_of_kmers()` returns an integer

### How I test your function

- I will use different values for `seq_length` and `k`
- You can assume the following:
    - `seq_length > 1`, `k>0`
    - `seq_length > k`

In [1]:
# complete this function
def get_number_of_kmers(seq_length: int, k: int) -> int:
    return ((seq_length) - k + 1)

In [2]:
# This part worth 0.05

# A test!
"""Q1 number of k-mers : correct name and callable"""
from gutils import check

check.allowed_modules()
check.expected_variables_exist(["get_number_of_kmers"], locals())
assert callable(get_number_of_kmers)

In [3]:
# This part worth 0.1

# A test!
"""Q1 number of k-mers : get_number_of_kmers() returns the correct values and type"""

'Q1 number of k-mers : get_number_of_kmers() returns the correct values and type'

## Q2 -- split a sequence into overlapping k-mers

Complete the function `get_kmers()`.

`get_kmers()` has two arguments

- `seq` is a string. This must be a required argument.
- `k` is an integer. This must be a must be an optional argument.

`get_kmers()` returns a list of the ordered k-mers. The items in the list are strings.

**Tip:** If you don't modify the provided function signature, the argument types (required, optional) will be correct!

When you apply your function to `a_seq` (defined below) with `k=2`, it should return the following:

```ipython
In[1]: kmers = get_kmers(a_seq, k=2)
In[2]: kmers
Out[2]: ['AC', 'CG', 'GT', 'TA', 'AC', 'CG', 'GT', 'TA', 'AC']
``` 

### How I test your function

- Values of `k>=1`
- I will check against `a_seq` and a randomly generated sequence
- The order of `k-mers` in the output is critical!

In [None]:
a_seq = "ACGTACGTAC"

In [4]:
# complete this function
def get_kmers(seq: str, k: int=1) -> list:
    result = []
    num_kmers = get_number_of_kmers(len(seq), k)
    for i in range(num_kmers):
        subseq = seq[i : i + k]
        result.append(subseq)
    return result


In [5]:
# This part worth 0.05

# A test!
"""Q2 generate k-mers : get_kmers is defined and callable"""
check.allowed_modules()
check.expected_variables_exist(["get_kmers"], locals())
assert callable(get_kmers)

In [6]:
# This part worth 0.1

# A test!
"""Q2 generate k-mers : get_kmers does not fail when called with k"""

'Q2 generate k-mers : get_kmers does not fail when called with k'

In [7]:
# This part worth 0.1

# A test!
"""Q2 generate k-mers : returns a list of strings"""

'Q2 generate k-mers : returns a list of strings'

In [8]:
# This part worth 0.2

# A test!
"""Q2 generate k-mers : correct values on a random seq"""

'Q2 generate k-mers : correct values on a random seq'

## Definitions

**coordinate** the index positions for a specific element. For example,

```python
data = [[23, 14],
        [1, 100]]
```
Here the value 1 is obtained by using the following indices
```python
data[1][0]
```
The indexing values `1, 0` are a coordinate.

**nested iteration** Nested loops of a multi-dimentional data structure. For example,

```python
for a in series_1:
    for b in series_2:
        do something involving a and b
```

## Q3 -- nested iteration

In order to compare k-mers between sequences, we need to be able to loop over the possible coordinates and check values.

In [None]:
matrix = [[0, 0, 1, 1],
          [1, 0, 0, 1],
          [0, 1, 0, 0],
          [1, 1, 0, 1]]

Complete the function `get_coords()` below. It takes a single argument (a lists of lists, e.g. `matrix`) and it returns the row coordinates and column coordinates for the elements that have the value `1` (e.g. `matrix`, at coordinate `0, 2` we have the value `1`).

`get_coords()` has one required argument
- `matrix` (a list of list of integers)

`get_coords()` returns:
- a tuple of two lists of integers. The lists are the same length.
- the first list corresponds to the row coordinates for items that equal 1
- the second list corresponds to the column coordinates for items that equal 1
- the order of elements must correspond between the two lists
    - this means `matrix[<row coordinate #3>][<col coordinate #3>] == 1`

In the case of `matrix`, your function should return the following.

```python
 In[1]: row_coords, col_coords = get_coords(matrix)
 In[2]: row_coords
Out[2]: [0, 0, 1, 1, 2, 3, 3, 3]
 In[3]: col_coords
Out[3]: [2, 3, 0, 3, 1, 0, 1, 3]
```
### How I test your function

- Check it returns two separate lists which have the same length
- It correctly processes the provided `matrix`
- It correctly processes a randomly generated matrix.
    - generated matrices will not necessarily be square (i.e. number of rows `!=` number of columns)
    - generated matrices will have a fixed shape (all rows have the same number of elements)

In [17]:
# complete this function
def get_coords(matrix: list) -> tuple:
    rowlist = []
    collist = []
    for i in range(len(matrix)):
        for j in range(len(matrix[i])):
            if matrix[i][j] == 1 : 
                rowlist.append(i)
                collist.append(j)
        j = 0
    return (rowlist,collist)
print (get_coords([[0, 0, 1, 1],
          [1, 0, 0, 1],
          [0, 1, 0, 0],
          [1, 1, 0, 1]]))

([0, 0, 1, 1, 2, 3, 3, 3], [2, 3, 0, 3, 1, 0, 1, 3])


In [18]:
# This part worth 0.05

# A test!
"""Q3 nested iter : defined and callable"""
check.allowed_modules()
check.expected_variables_exist(["get_coords"], locals())
assert callable(get_coords)

In [19]:
# This part worth 0.05

# A test!
"""Q3 nested iter : returns correct number of results and they have same length for provided matrix"""

'Q3 nested iter : returns correct number of results and they have same length for provided matrix'

In [20]:
# This part worth 0.1

# A test!
"""Q3 nested iter : returns correct value for provided matrix"""

'Q3 nested iter : returns correct value for provided matrix'

In [22]:
# This part worth 0.2

# A test!
"""Q3 nested iter : returns correct value for a random matrix"""

'Q3 nested iter : returns correct value for a random matrix'