Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to filter out refgene index source #61

Merged

Conversation

k-kom
Copy link
Contributor

@k-kom k-kom commented Aug 22, 2022

Problem

  • refGene.txt contains contig's records like chr6_cox_hap2 but we sometimes don't need contig's records at all.
  • There is no way to ignore arbitrary rows in refGene.txt (and ncbiRefSeq.txt).

Implementation

  • I add an optional arg filter-fns to load-ref-genes and load-ref-seqs to filter out unnecessary rows in these files.
  • I also renamed function load-ncbi-file to load-genepred-file because these files format are named as GenePred (just refactoring).

…'s content

- responsibility to filter out predicted model accession number is now caller's one for consistency
- refGene.txt and ncbiRefSeq.txt are [GenePred format](https://genome.ucsc.edu/FAQ/FAQformat.html#format9) and not NCBI's file
@k-kom k-kom self-assigned this Aug 22, 2022
@codecov
Copy link

codecov bot commented Aug 22, 2022

Codecov Report

Merging #61 (b96df56) into master (e75733e) will increase coverage by 0.00%.
The diff coverage is 75.00%.

@@           Coverage Diff           @@
##           master      #61   +/-   ##
=======================================
  Coverage   45.63%   45.64%           
=======================================
  Files          16       16           
  Lines        1970     1974    +4     
  Branches       60       60           
=======================================
+ Hits          899      901    +2     
- Misses       1011     1013    +2     
  Partials       60       60           
Impacted Files Coverage Δ
src/varity/ref_gene.clj 81.21% <75.00%> (-0.35%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@k-kom k-kom requested a review from federkasten August 22, 2022 10:03
@k-kom k-kom marked this pull request as ready for review August 22, 2022 10:13
@@ -58,24 +58,30 @@
(update m :cds-end-stat keyword)
(update m :exon-frames parse-exon-pos)))

(defn- load-ncbi-file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we released the function, I think it would be better to keep it and mark it as deprecated for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@federkasten
Thank you for the review. It sounds good but if this function was used from outside of this namespace, that should be rewritten because it is defined as private. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Sorry, I missed this is a private function.

(when s
(re-find #"^(NM|NR)_.+$" s)))

(defn- load-genepred-file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful in finding an appropriate name! 🙏

@@ -58,24 +58,30 @@
(update m :cds-end-stat keyword)
(update m :exon-frames parse-exon-pos)))

(defn- load-ncbi-file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Sorry, I missed this is a private function.

@federkasten federkasten merged commit b41161f into chrovis:master Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants