Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmarking script with new CLI #56

Draft
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

victorlin
Copy link
Collaborator

@victorlin victorlin commented Apr 25, 2020

Opening this PR to track progress for #27. Keeping as draft until ready.

TODO:

  • split into 2 scripts
    • cov_benchmark.py
      • design high-level CLI
      • implement simulation parameters
      • implement alignment parameters
      • implement custom subprocess parameters (msbar,art_illumina, bowtie2)
      • convert counts to proportions
    • pangenome_cov_benchmark.py
      • per genome in pangenome: generate single-record FASTA, run cov_benchmark.py with predefined parameters
      • aggregate results in .csv file
  • update README
  • write containerized unit tests (?)

@mathemage mathemage added this to Task In Progress in TODO List via automation Apr 26, 2020
@mathemage
Copy link
Collaborator

mathemage commented Apr 26, 2020

PR closes #27

@mathemage mathemage linked an issue Apr 26, 2020 that may be closed by this pull request
@victorlin
Copy link
Collaborator Author

I was able to do most of the finalizing work for this today. A few more unit tests that I'm figuring out how to implement since it doesn't seem like msbar supports a random seed to generate reproducible output.

The updated README is mostly technical without much insight on the biological context; @ababaian if you have any comments/suggestions feel free to modify directly or let me know!

@ababaian
Copy link
Owner

ababaian commented May 3, 2020

Hey @victorlin, can you remove the fq files from the repo and instead host them on the s3 bucket. There is a test_data folder which these would fit in.

--

I'm a bit tied up at the moment, I'll do a proper review tomorrow

@victorlin
Copy link
Collaborator Author

@ababaian done - moved all the test input files to s3://serratus-public/test-data/benchmarker/. Also added a setup.sh script to copy the files into test/benchmarker/local/. Using local for the folder name since it's targeted by our repo's .gitignore.

@victorlin
Copy link
Collaborator Author

A lot has happened since this was started. Is the benchmarker still needed?

The last task is unit testing msbar, but it seems non-trivial. We could merge this PR in and add a separate task, or just close out to make room for other exciting things.

@rcedgar
Copy link
Collaborator

rcedgar commented May 24, 2020

IMO we don't need a mapping benchmark for the main Serratus search, we have settled on how to run bowtie2 and are not likely to revisit benchmarking of mappers.

@ababaian
Copy link
Owner

Ironically enough I do think we /should/ revisit this as there will be ways to improve on --very-sensitive-local I just think on the totem pole of priorites this has gotten bumped down quite a bit as more pressing things always seem to be there. I'd say let's leave this open and swing back to it when time is a luxury we have.

@ababaian ababaian removed this from Task In Progress in TODO List Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarking statistics for divergence simulation tests
4 participants