-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parse repophlan demo #1 #15
Conversation
1 similar comment
genomesubsampler/parseRepophlan.py
Outdated
|
||
|
||
def parse_repophlan(repophlan_wscores_fp): | ||
""" Extract number of HGTs found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Priya is awesome. @sjanssen2
@serenejiang We just finished this PR! can you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit disappointed that I cannot make any comments, because the code seems very sane to me :-)
Hi @sjanssen2 thanks for reading the code! Just a demo. Though it may be slightly useful. Feel free to merge! |
genomesubsampler/parseRepophlan.py
Outdated
|
||
|
||
def parse_repophlan(repophlan_wscores_fp): | ||
""" Compute basic statistics of RepoPhlAn-downloaded genomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pep8 standards say that you should end all sentences with periods, thus add a .
Furthermore, save the leading whitespace. (not 100% sure if I remember those standards correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked PEP257. You are right! I will revise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
genomesubsampler/parseRepophlan.py
Outdated
|
||
Parameters | ||
---------- | ||
repophlan_wscores_fp: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct format repophlan_wscores_fp : str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From many other KL projects I saw this xxx : str
. However in WGS-HGT it seems that most paragraphs were already written as xxx: string
. I am okay to both, and I actually prefer xxx : str
. But I wonder if this is part of PEP8? I didn't see that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't ask me :-) I find xxx : str more easy to read
genomesubsampler/parseRepophlan.py
Outdated
Parameters | ||
---------- | ||
repophlan_wscores_fp: string | ||
file path to RepoPhlAn summary table with scores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start with capital letter, end with period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay will do.
genomesubsampler/parseRepophlan.py
Outdated
Human-readable report of basic statistics of genomes | ||
""" | ||
df = pd.read_table(repophlan_wscores_fp, index_col=0, header=0) | ||
out = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using a list I would prefer a dict, where the key is the "human readable explanation" and the values are the numbers/strings you gather from the dataframe. It will then be easier to use your results, because res[1] is less intuitive than res['Number of RefSeq genomes']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjanssen2 I am afraid that I can't agree. out
is a piece of sequential, multiple-line information. printing res['Number of RefSeq genomes']
sounds awkward and not precise. At this stage, these numbers are only for user awareness purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I am not really aware what the purpose of this function is. Thus, you are certainly right.
genomesubsampler/parseRepophlan.py
Outdated
file_okay=True), | ||
help='RepoPhlAn summary table with scores') | ||
def _main(repophlan_wscores_fp): | ||
""" Parser for RepoPhlAn-downloaded genomes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete leading whitespace, end with .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
genomesubsampler/parseRepophlan.py
Outdated
""" Parser for RepoPhlAn-downloaded genomes | ||
""" | ||
out = parse_repophlan(repophlan_wscores_fp) | ||
click.echo('\n'.join(out)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think readability could be improved for the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current way is kindof comfortable to me. Do you have a suggestion?
|
||
|
||
class ParseRepophlanTests(TestCase): | ||
""" Tests for parseRepophlan.py """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my comments about pep8 from above apply to those docstrings here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
self.assertIn('Task completed.', res.output) | ||
|
||
|
||
str_basic_stats = ('Total number of genomes: 9.\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this a class variable rather than an object variable initiated in setup? Constant names should be all capital letters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a WGS-HGT legacy... Do you mean that I should write it as:
STR_BASIC_STATS = ('xxxxx...')
@qiyunzhu said that this PR is mainly for demonstration purposes and used to illustrate the pedantic process of having someone else commenting on your code - which often is very helpful but sometimes also a little frustrating because it is not only about wrong/right programming but also about styles (which is not less important for long term maintenance). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment
Hello @sjanssen2 Thank you for your careful comments on the coding style. I will respond and revise. @anupriyatripathi and @serenejiang , please refer to them as an example how code review is typically practised. |
Hi @sjanssen2 I think I took care of your comments. Wanna another look? |
Compute basic statistics of RepoPhlAn-downloaded genomes (work in progress).