## Assertions vs Raising Exceptions

- we should raise **exceptions** when we are checking inputs to our function. Ie we are checking to make sure the function is working properly.
- We should use **assertions** to make sure the function operates as expected for given input. This is almost always in a testing context.

In [2]:
!pytest -v
#raise runtime error not in seq_features.py; originally did not work

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 5 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 20%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 40%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 60%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 80%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid FAILED [100%]

________________ test_number_negatives_for_invalid_amino_acid _________________

    def test_number_negatives_for_invalid_amino_acid():
        with pytest.raises(RuntimeError) as excinfo:
>           sf.number_negatives('Z')
E           Failed: DID NOT RAISE <class 'RuntimeError'>

test_seq_features.py:27: Failed
..\..\anaconda3\lib\site-packages\pyreadline\py3k_comp

In [4]:
!pytest -v
#seq_features.py has raise runtime error

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 5 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 20%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 40%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 60%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 80%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [100%]

..\..\anaconda3\lib\site-packages\pyreadline\py3k_compat.py:8
    return isinstance(x, collections.Callable)



In [14]:
#now runtime error accounts for all amino acids
#replaced bootcamp_utils w/ bioinfo_dicts because it's a fucking bitch
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 5 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 20%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 40%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 60%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 80%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [100%]

..\..\anaconda3\lib\site-packages\pyreadline\py3k_compat.py:8
    return isinstance(x, collections.Callable)



## Summary of TDD

Now that you have some experience with TDD and have an idea about what it is and how it works, let’s formalize things by writing out the basic principles of test-driven development.

1. Build your software out of **small functions** that do **one specific thing.**
2. Build unit tests for all of your functions.
3. Whenever you want to make any enhancements of adjustments to your code, write tests for it **first.**
4. Whenever you encounter a bug, write tests for it that reproduce the behavior and then fix the code to make the entire test suite to pass.

## Improving the seq_features module using TDD: Practice

Now write a function that will calculate the total number of positively charged residues in a protein. In other words, let’s count the number of Lysine (K), Arginine (R) and Histidine (H) residues in the sequence.

In [16]:
#generated prototype function for seq_features and respective test function in test_seq_features
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 6 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 16%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 33%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 50%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 66%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 83%]
test_seq_features.py::test_number_positives_single_R_K_or_H FAILED       [100%]

____________________ test_number_positives_single_R_K_or_H ____________________

    def test_number_positives_single_R_K_or_H():
>       assert sf.number_positives('R') == 1
E       assert None == 1
E         +None
E         -1

test_seq_features.py:31: AssertionError
..\..\anaconda3\lib\s

In [17]:
#appended return variable for number_positives(seqs)
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 6 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 16%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 33%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 50%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 66%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 83%]
test_seq_features.py::test_number_positives_single_R_K_or_H PASSED       [100%]

..\..\anaconda3\lib\site-packages\pyreadline\py3k_compat.py:8
    return isinstance(x, collections.Callable)



In [1]:
#added number_positive variants for test functions
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 10 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 10%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 20%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 30%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 40%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 50%]
test_seq_features.py::test_number_positives_single_R_K_or_H PASSED       [ 60%]
test_seq_features.py::test_number_positives_for_empty PASSED             [ 70%]
test_seq_features.py::test_number_positives_for_short_sequences PASSED   [ 80%]
test_seq_features.py::test_number_positives_for_lowercase FAILED         [ 90%]
test_seq_features.py::test_number_positives_for_inv

In [3]:
#update number_positives to handle failed cases
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 10 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [ 10%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 20%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 30%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 40%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 50%]
test_seq_features.py::test_number_positives_single_R_K_or_H PASSED       [ 60%]
test_seq_features.py::test_number_positives_for_empty PASSED             [ 70%]
test_seq_features.py::test_number_positives_for_short_sequences PASSED   [ 80%]
test_seq_features.py::test_number_positives_for_lowercase PASSED         [ 90%]
test_seq_features.py::test_number_positives_for_inv

## Code refactoring and TDD
As we are building modules and functions, though we may try, we are not able to anticipate all the functionalities they must have. And by adding new functionalities, we might need to change our code substantially and even dramatically change the initial logic that worked so well up to this point. This is so common in programming that developers have a name for it: code refactoring.

For example, we did not anticipate when we start writing `seq_features` that we also wanted to calculate the positive charges as well. Beyond that, we broke one of the most important rules in programming: functions must do one thing and only one thing very well. It is clear that `number_negatives()` was doing three things:

1. Dealing with lowercases characters.
2. Raising exceptions for invalid amino-acids in the input sequence.
3. Calculating the negative charge of amino-acids.

Turns out that `number_positives()` also needs to do items 1 and 2, and because of that we have repeated the following lines of code in two different functions, within the same module:
```python
# Convert sequence to upper case
 seq = seq.upper()

 # Check for a valid sequence
 for aa in seq:
     if aa not in bootcamp_utils.aa.keys():
         raise RuntimeError(aa + ' is not a valid amino acid.')
```
and if we are trying to make this module more robust, every time we catch a bug, we will need to change identical code in two places. So let’s perform a code refactoring in order to keep the principle of functions doing only one thing as close to the truth as possible.

The first task, changing the inputted sequence to uppercase, uses a built-in Python function, and using another function to do this is unnessary. So, we can keep the `seq = seq.upper()` line in the functions.

Now, let’s write a functions that will check if the sequence is valid. That way we will focus all the logic related to checking for invalid sequences in one part of the code, and we can call it anywhere we need afterwards. So, your module `seq_features.py` should look like this:
```python
def is_valid_sequence(seq):
    for aa in seq:
        if aa not in bootcamp_utils.aa.keys():
            raise RuntimeError(aa + ' is not a valid amino acid.')
```         

In [4]:
#define function is_valid_sequence and add two test features 
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 12 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [  8%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 16%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 25%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 33%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 41%]
test_seq_features.py::test_number_positives_single_R_K_or_H PASSED       [ 50%]
test_seq_features.py::test_number_positives_for_empty PASSED             [ 58%]
test_seq_features.py::test_number_positives_for_short_sequences PASSED   [ 66%]
test_seq_features.py::test_number_positives_for_lowercase PASSED         [ 75%]
test_seq_features.py::test_number_positives_for_inv

refactoring tests is frowned upon and taken VERY seriously by developers; it is a very big responsibility and should be done carefully if ever. Keep on adding tests related to `is_valid_sequence()`, but do not remove the previous tests already in the suite.

So, let’s add the exception tests for `is_valid_sequence()` in `test_seq_features.py`:

In [5]:
#add two tests for is_valid_sequence
!pytest -v

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 14 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [  7%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 14%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 21%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 28%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 35%]
test_seq_features.py::test_number_positives_single_R_K_or_H PASSED       [ 42%]
test_seq_features.py::test_number_positives_for_empty PASSED             [ 50%]
test_seq_features.py::test_number_positives_for_short_sequences PASSED   [ 57%]
test_seq_features.py::test_number_positives_for_lowercase PASSED         [ 64%]
test_seq_features.py::test_number_positives_for_inv

In [7]:
#add larger encompassing test for is_valid_sequence
!pytest -v test_seq_features.py

platform win32 -- Python 3.8.11, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- C:\Users\Vivek\anaconda3\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\Vivek\git\bootcamp
plugins: anyio-2.2.0
collecting ... collected 15 items

test_seq_features.py::test_number_negatives_for_single_AA PASSED         [  6%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 13%]
test_seq_features.py::test_number_negatives_for_short_sequence PASSED    [ 20%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [ 26%]
test_seq_features.py::test_number_negatives_for_invalid_amino_acid PASSED [ 33%]
test_seq_features.py::test_number_positives_single_R_K_or_H PASSED       [ 40%]
test_seq_features.py::test_number_positives_for_empty PASSED             [ 46%]
test_seq_features.py::test_number_positives_for_short_sequences PASSED   [ 53%]
test_seq_features.py::test_number_positives_for_lowercase PASSED         [ 60%]
test_seq_features.py::test_number_positives_for_inv

In [9]:
%load_ext watermark
%watermark -v -p bioinfo_dicts,pytest,jupyterlab

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Python implementation: CPython
Python version       : 3.8.11
IPython version      : 7.27.0

bioinfo_dicts: unknown
pytest       : 6.2.4
jupyterlab   : 3.1.7

