Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVELOP-1437: Test data and automatic tests #46

Merged

Conversation

matrulda
Copy link
Collaborator

This PR adds test data and automatic tests for the pipeline.

In summary this has been done:

  • The folder test_data has been added with files to be able to run a complete integration test of of the pipeline. Among other things, it contains a run folder (210510_M03910_0104_000000000-JHGJL) from Illumina's public Demo space. The majority of files in this PR originates from this folder, they do not need a detailed inspection.
  • The complete integration test can be run using the new test profile. It is used like this:
nextflow run main.nf -profile dev,test,singularity
  • The folder tests contains:
    • Integration tests: In these tests the pipeline is run with the test profile and then the output is validated. Validation of the output includes verifying that reports exist and that they contain the sections they should.
    • Unit tests: These are unit tests for the python scripts in bin that we use in the pipeline.
  • A Github Actions workflow has been added so that tests are run every time code is pushed to the repo.
  • I have formatted all python code with black and it is now enforced through a check in the test workflow.
  • I've added instructions on how to perform tests locally.
  • Path to images has been updated so that we use images built for Singularity instead of Docker. Singularity can handle converting Docker images, but this became an issue when running the tests on Github's servers. Using these images makes the pipeline faster and more efficient. Related to this change I had to use new versions of software we use to be able to find publicly hosted images. I've made the assessment that the upgrades (minor and patch) are safe.
  • Configs in config has been sorted into subfolders nextflow_config and tools_config.

Copy link
Contributor

@GitEdvard GitEdvard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I appreciate that the commits were relative restrictive in size and had a good explanation. As you can see, I "dived" in to the structure of the tests a bit. Let's see what you think about that! Beside from that, I think it's important to look over if the test directory is wiped between runs or not. (if it's not, I think you should implement it)

Edit: I've checked now, test directory is handled by the tmp-factory. You don't need to do anything.

test_data/test_config/fastq_screen.conf Show resolved Hide resolved
requirements-dev.txt Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
tests/integration_tests/test_validate_output.py Outdated Show resolved Hide resolved


@pytest.fixture(scope="session", autouse=True)
def result_dir(tmpdir_factory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When naming a function to a noun, I anticipate a rather simple fetch algorithm. But here, the main action for the test is kind of hidden here, namely to start a nextflow pipeline. I would like a separate and explicit call for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I tried to address this in 58d7a4b

tests/integration_tests/test_validate_output.py Outdated Show resolved Hide resolved


def test_project_dirs_exist(project_reports_dir, projects):
for project in projects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading tests like these, I would like to get an easy available picture of the folder tree generated by the pipeline. I find it a bit cumbersome that the folder paths are parameterized, and hence fragmented. In order to get this picture of the folder structure, I have to assemble them in a notebook beside. The general advise I've got for automatic tests are to favor readability and verboseness over the DRY principle. I wonder if you could consider write them out as they are in each test? Like "for project in ['duck', 'wolf']", and project_reports_dir = r'path/to/reports'.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've tried to make it more readable in 58d7a4b . I think it better follows the Arrange, Act, Assert model now. Even though the "Act" part comes from result_dir as input. I'd like to keep it this way to make use of the pytest fixture feature, but let me know if you still think it's too unclear.

@matrulda
Copy link
Collaborator Author

matrulda commented Feb 3, 2022

@GitEdvard Thanks for the comments! I've addressed them, let me know if you think it's ok.

@GitEdvard
Copy link
Contributor

Great!

@GitEdvard GitEdvard merged commit a0a20b1 into Molmed:dev Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants