# Unit Testing for DS in Python

**CI**: run all unit tests when code is pushed, preventing "bad code" going to production.

**Unit**: any small, independent piece of code.

**Integration test**: tests if multiple units run well together, and not independently.

**End-to-end test**: whole software at once.


## Unit testing basics

**Basic test files**  
**test_** indicates unit test inside a file (like test_row_to_list.py, for example). These files are also called **test modules**. Example of test file test_row_to_list.py:

In [None]:
import pytest
import row_to_list

# test_ indicates a unit test, not a regular function
def test_for_clean_row():
    # assert boolean_expression
    assert row_to_list(value) == expected_value

# case when the function needs to return None
# make sure to use is when using None
def test_for_missing_value():
    assert row_to_list["\t29,0\t"] is None

Another example:

In [None]:
# Import the pytest package
import pytest

# Import the function convert_to_int()
from preprocessing_helpers import convert_to_int

# Complete the unit test name by adding a prefix
def test_on_string_with_one_comma():
  # Complete the assert statement
  assert convert_to_int("2,081") == 2081

To run a test:

In [None]:
pytest test_row_to_list.py

Output:
- . represents pass and F failure
- A test could fail with an AssertionError (the function has a bug and needs to be fixed).
- For other types of exception (like NameError), there is something wrong with the test (before the assert is verified)

## Intermediate unit testing

Optional second argument (for assertion error):

In [None]:
def test_for_missing_value_with_message():
    
    actual = row_to_list("\t293,410\t")
    
    expected = None
    
    message = ("row_to_list('\t293,410\t') "
               "returned {0} instead"
               "of {1}".format(actual, expected)
              )
    
    assert row_to_list["\t29,0\t"] is None, message

When **comparing floats**, use pytest.approx():

In [3]:
assert 0.1 + 0.1 + 0.1 == pytest.approx(0.3)

It also works with NumPy arrays:

In [9]:
assert np.array([0.1 + 0.1]) == pytest.approx(np.array([0.2]))

**Multiple assert statements**

In [1]:
import pytest


def test_on_string_with_one_comma():
    return_value = convert_to_int("2,081")
    
    assert isinstance(return_value, int)
    assert return_value ==2081

General template for a context:

In [None]:
with context_manager:
    # Does something when entering context
    print("This is part of the context")
    # Does something when leaving the context

Using pytest .raises(arg) to **test for exceptions**:

In [None]:
with pytest.raises(ValueError):
    # Does nothing
    print("Part of the context")
    # If context raised ValueError, silence it
    # If context did not raise ValueError, raise an execption

In [7]:
# The test passes when the right exception is raised
with pytest.raises(ValueError):
    raise ValueError

In [6]:
# The test fails if not exception is raised
with pytest.raises(ValueError):
    pass

Failed: DID NOT RAISE <class 'ValueError'>

What if we need to check the exception message? Capture the exception and check the message on it after:

In [22]:
with pytest.raises(ValueError) as exception:
    raise ValueError('a')
assert exception.match("a")

See that the ValueError passes, but the exception message raises an assertion error:

In [23]:
with pytest.raises(ValueError) as exception:
    raise ValueError('a')
assert exception.match("b")

AssertionError: Pattern 'b' not found in 'a'

A **well tested function** has tests for these argument types:
- Bad arguments: arguments that raise an exception when passed;
- Special arguments: boundary values (neighbors of the acceptable values) and arguments with special logic;
- Normal arguments: not any of the above.

Recommended: one for each special case and at least 2 for normal  arguments. 

**Tip**: using mapping from arguments to tuples


**Test Driven Development** (TDD): writing unit tests before implementations.

## Test Organization and Execution

The test folders mirrors the application folder. Python module and test module correspondence is recommended.

**Test class** (PyTest): container for a single unit's tests

In [25]:
class TestFunctionName(object): # Always put the argument object
    def test_case1(self):
        
    def test_case2(self):
        

## Testing Models, Plots and Much More

Example of workflow: setup, assert, and teardown.

In [None]:
def test_on_raw_data():
    # Setup: create the raw data file
    preprocess(raw_data_file_path,
               clean_data_file_path)
    
    with open(clean_data_file_path) as f:
        lines = f.readlines()
    
    first_line = lines[0]
    assert first_line == '1801\t20'
    
    second_line = lines[1]
    assert second_line == '2002\t333'
    
    # Teardown: remove raw and clean data file

In PyTest, we use **fixture** for the setup and teardown:

In [None]:
@pytest.fixture
def raw_and_clean_data_file():
    # Do setup here
    
    yield data
    
    # Do teardown

In [None]:
def test_something(my_fixture):
    ...
    data = my_fixture
    ...

For the example above:

In [None]:
@pytest.fixture
def raw_and_clean_data_file():
    raw_data_file_path = "raw.txt"
    clean_data_file_path = "clean.txt"
    
    with open(raw_data_file_path, "w") as f:
        f.write("asasasas"
                "asfrgehr")
    
    yield raw_data_file_path, clean_data_file_path
    
    os.remove(raw_data_file_path)
    os.remove(clean_data_file_path)
    

In [None]:
import os
import pytest

def test_on_raw_data(raw_and_clean_data_file):
    
    raw_path, clean_path = raw_and_clean_data_file
    preprocess(raw_path, clean_path)
    
    with open(clean_data_file_path) as f:
        lines = f.readlines()
    
    first_line = lines[0]
    assert first_line == '1801\t20'
    
    second_line = lines[1]
    assert second_line == '2002\t333'
    
    # Teardown: remove raw and clean data file

PyTest provides built-in fixtures, like **tmpdir**. It creates a temporary directory during setup and deletes it during teardown. We can do **fixture chaining** using tmpdir and our own fixture:

In [None]:
# setup of tmpdir() -> setup of raw_and_clean_data_file
# -> test -> teardown of raw_and_clean_data_file() 
# -> teardown of tmpdir()

@pytest.fixture
def raw_and_clean_data_file(tmpdir):
    
    raw_data_file_path = tmpdir.join("raw.txt")
    clean_data_file_path = tmpdir.join("clean.txt")
    
    with open(raw_data_file_path, "w") as f:
        f.write("asasasas"
                "asfrgehr")
    
    yield raw_data_file_path, clean_data_file_path
    
    # Now we can ommit the teardown

**Mocking**  
- Test results should not depend on dependencies, but on the behavior of the function being tested.
- Mocking is testing functions independently of dependencies.
- We need two packages: **pytest-mock** and unittest.mock


**MagicMock** and **mocker.patch()**  
- We want to replace potentially buggy dependencies with unittest.mock.MagicMock()
- mocker.patch("dependency name with module name") returns a MagicMock object
- During testing, MagicMock() object can be programmed to behave as a bug-free component

In [None]:
def row_to_list_bug_free(row):
    # Dictionary with correct results for our desired behavior
    # This is like a row_to_list in dict form
    return_values = {
        '1,801\t201,411\n':["1,801", "201,411"],
        '1,7675,112\n': None
    }
    return return_values[row]

In [None]:
# Pass mocker as an argument
# In this case, we want to test preprocess, and not functions
# that compose it like row_to_list() and convert_to_int()
def test_on_raw_data(raw_and_clean_file,
                     mocker):
    raw_path, clean_path = raw_and_clean_file
    
    # works like a bug-free replacement of row_to_list
    row_to_list_mock = mocker.patch(
        "data.preprocessing_helpers.row_to_list")
    row_to_list_mock.side_effect = row_to_list_bug_free
    
    # From here on, when we call preprocess (the function being tested)
    # The bug free version of row_to_list will be used
    preprocess(raw_path, clean_path)
    
    first_line = lines[0]
    assert first_line == '1801\t20'
    
    second_line = lines[1]
    assert second_line == '2002\t333'    
    

# Demystifying the Patch Function (Lisa Roach)

MagicMock  

**Target**: 'package.module.ClassName'    
**When should you mock**? When you don't want to actually an object

my_module.py has two functions:

In [None]:
def foo():
    x = db_write()
    return x

def db_write():
    [...]

We want to test foo without calling db_write, so we mock db_write. The test.py file would be:

In [None]:
import my_module

@patch('my_module.db_write')
def test_foo(self, mock_write):
    x = my_module.foo()
    self.assertEqual(x, 10)

We are basically replacing db_write with a MagicMock when testing.