Skip to content
Steve Bond edited this page Sep 6, 2017 · 14 revisions

The goal is to cover over 90% of the BuddySuite code base. We have selected pytest as our testing suite, and have implemented an output hashing paradigm to compare test runs against expected results. Hashes were chosen instead of full string comparisons to keep the amount of test data to manageable levels; other options included many hundreds of individual files or many thousands of lines within the actual test scripts. The unfortunate tradeoff with our hashing system is a less informative error message when tests fail, but there are workarounds (see below).

Installing pytest

If you installed the Anaconda version of Python3, you likely already have pytest. Try updating it, and install if need be.

conda update pytest

or

conda install pytest

If you are not using Anaconda, then pip is the way to go.

pip install pytest

There are also a number of add-ons that make life better. I recommend installing them as well.

pip install pytest-xdist; pip install pytest-cov; pip install pytest-colordots

General structure of test scripts

  • Headers: Hashbang, encoding lines, and any notes
  • Imports
  • Non-test functions: A hashing function, file path generator to the unit test resource directory, and a Resource class are standard
  • List of resource files: Items may be called individually but this list is more useful for loops and comprehensions
  • Instantiation tests: These ensure that the buddy class is operating correctly
  • List of buddy objects: Created from the list of resource files and is in the same order
  • Helper function tests
  • Main API function tests (organized alphabetically by method name, not the short-form flags)
  • Command line UI tests

Tests are arranged in alphabetical order where appropriate.

Getting the full file path of a unit test resource file with resource()

Every developer will clone the BuddySuite repo in a different location on their system, so a small function has been written to get the appropriate file paths of resource files.

seqbuddy_obj = Sb.SeqBuddy(resource('Mnemiopsis_cds.fa'))

Resource classes

These classes are used to organize resources (file paths and buddy objects), and to retrieve them easily using letter codes. Interact with the objects created in each test module (sb_resources, alb_resources, pb_resources) with the following four methods:

get(code="", mode="objs)

Returns an ordered dictionary of buddy objects or paths.

get_list(code="", mode="objs")

Return all objects/paths as a list

get_one(code, mode="objs")

Return a single object/path

deets(code)

Return a dictionary with the full num/type/format of the code provided. May also include other useful facts, for example the number of trees in resource.

Letter codes

Where a method can take an empty code, it will return all results for a given code class (number, type, format) unless values are specified. For example, the code "p c" restricts AlignBuddy resources to protein (p) alignments in CLUSTAL (c) format, but will return both single alignments and multiple alignments because neither 'o' or 'm' was specified.

Number of alignments/trees in a resource

code Number
o one
m multi

Type of sequence in a resource

code Type
p protein
d DNA
r RNA

File formats

code Format
c Clustal
f FASTA
g GenBank
n NEXUS
k Newick
l NeXML
py PHYLIP
pr PHYLIP-relaxed
pss PHYLIP-sequential-strict
psr PHYLIP-sequential-relaxed
s Stockholm

Mode

If you want buddy objects returned, then just leave the mode argument as its default (i.e., 'objs'). To return paths to the resource files instead, set mode='paths'.

Writing a single unit test

Pytest looks for and executes all functions prepended with 'test_', depending on assert statements as the basis of testing.

def test_instantiate_alignbuddy_from_file():
    assert type(Alb.AlignBuddy(alb_resources.get_one("o d f", mode="paths")) == Alb.AlignBuddy 

Creating hashes

If the output of a BuddySuite tool is a new/modified buddy object, then the contents of that object must be converted to an md5 hash before doing an assert comparison. Each test script has its own 'to_hash()' function (e.g., seqs_to_hash() in SeqBuddy, and alignment_to_hash() in AlignBuddy) which will converting the output of a printed buddy object to an md5 hash string.

def test_add_feature_pattern():
    tester = sb_resources.get_one("d f")
    tester = Sb.add_feature(tester, 'test', (1, 100), _pattern='α4')
    assert seqs_to_hash(tester) == '7330c5905e216575b8bb8f54db3a0610'

The comparison hash on the right of the assert statement MUST come from your system's md5 program which has been fed the output of your new BuddySuite tool through a pipe in your terminal window.

$: sb Mnemiopsis_cds.fa --my_new_tool | md5

This ensures that the actual output of your tool jives with what's in the buddy object. Getting the proper hash can be a bit more involved depending on the expected output of your tool; if you run into any trouble let the core developers know so they can help you out.

Writing parametrized unit tests (i.e., looping)

It is often useful to run the same test on some (or all) of the shared buddy objects. Pytest has a special mark @pytest.mark.parametrize() that will allow you to iterate a test over a list.

hashes = ["e4a358ca57aca0bbd220dc6c04c88795", "3366fcc6ead8f1bba4a3650e21db4ec3",
          "365bf5d08657fc553315aa9a7f764286", "5891348e8659290c2355fabd0f3ba4f4"]
hashes = [(sb_obj, hashes[indx]) for indx, sb_obj in enumerate(sb_resources.get("d f g n s")]

@pytest.mark.parametrize("seqbuddy,next_hash", hashes)
def test_complement(seqbuddy, next_hash):
    tester = Sb.complement(seqbuddy)
    assert seqs_to_hash(tester) == next_hash

Note that the first argument into @pytest.mark.parametrize() is a comma-separated list (in string format) of the parameters expected by the actual test function (e.g., "seqbuddy,next_hash"), while the second argument is the list to be looped over.

Testing for expected exceptions

If your function raises exceptions, this code also needs to be covered by a test! Fortunately, pytest has a handy context manager for this:

def test_complement_pep_exception():
    protein_sb_obj = sb_resources.get_one("p f")
    with pytest.raises(TypeError):
        Sb.complement(protein_sb_obj)

Dealing with test failures

The reason for a failure may not be immediately clear from the pytest traceback because of the hashing paradigm (it just says the hashes don't match up!). This can be particularly troublesome when parametrizing your test. To sort out the problem you will need to check the difference between the command line output you used to generate the original hash and the state of the object being passed into the to_hash() function in the test script.

  1. Run the tool from the command line and look at the output for obvious clues. Is there a new warning being printed to stdout? Is something clearly broken?

  2. If all looks normal, copy the output into Diff Checker

  3. Change the to_hash() mode to 'string', and write the output to a file

def test_merge():
    tester = [sb_resources.get_one("d f"), sb_resources.get_one("p f")]
    tester = Sb.merge(tester)
    with open("test_output.fa", "w") as ofile:
        ofile.write(seqs_to_hash(tester, mode='string'))
    #assert seqs_to_hash(tester) == 'ce306df2c8d57c59baff51733ddb9ddc'
  • If the test is parametrized, write an if statement to catch the proper hash
hashes = ["e4a358ca57aca0bbd220dc6c04c88795", "3366fcc6ead8f1bba4a3650e21db4ec3",
          "365bf5d08657fc553315aa9a7f764286", "5891348e8659290c2355fabd0f3ba4f4"]
hashes = [(sb_obj, hashes[indx]) for indx, sb_obj in enumerate(sb_resources.get("d f g n s")]

@pytest.mark.parametrize("seqbuddy,next_hash", hashes)
def test_complement(seqbuddy, next_hash):
    tester = Sb.complement(seqbuddy)
    if next_hash == "365bf5d08657fc553315aa9a7f764286":
        with open("test_output.fa", "w") as ofile:
            ofile.write(seqs_to_hash(tester, mode='string'))
    else:
        assert seqs_to_hash(tester) == next_hash
  1. Copy the contents of that file to Diff Checker and see what's different

Testing code in the command line UI

Command line tests are structured a bit differently because we need to emulate a command line call and capture the output. Pytest includes an object called 'capsys' that does the heavy lifting for us if we pass it into the test function as an argument:

def test_num_seqs_ui(capsys):
    test_in_args = deepcopy(in_args)
    test_in_args.num_seqs = True
    Sb.command_line_ui(test_in_args, sb_resources.get_one("d f"), skip_exit=True)
    out, err = capsys.readouterr()
    assert out == '13\n'

Notice that we have made a copy of in_args and manually set the appropriate parameter (the exact values used will depend on how your tool is implemented in argparse), and that the appropriate command_line_ui() function was called with the skip_exit parameter set to True (this is required to keep the script going).

If the output is a new or modified buddy object, then hashes are once again necessary:

def test_order_features_alphabetically_ui(capsys):
    test_in_args = deepcopy(in_args)
    test_in_args.order_features_alphabetically = [True]
    Sb.command_line_ui(in_args, sb_resources.get_one("d f"), skip_exit=True)
    out, err = capsys.readouterr()
    assert string2hash(out) == 'b831e901d8b6b1ba52bad797bad92d14'

Main Toolkit Pages





Further Reading

Clone this wiki locally