Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse repophlan demo #1 #15

Merged
merged 4 commits into from
May 24, 2017
Merged

parse repophlan demo #1 #15

merged 4 commits into from
May 24, 2017

Conversation

qiyunzhu
Copy link
Collaborator

Compute basic statistics of RepoPhlAn-downloaded genomes (work in progress).

@coveralls
Copy link

Coverage Status

Coverage decreased (-9.6%) to 90.4% when pulling c20351b on qiyunzhu:master into ce4e523 on biocore:master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-9.6%) to 90.4% when pulling c20351b on qiyunzhu:master into ce4e523 on biocore:master.

@qiyunzhu qiyunzhu changed the title (WIP) parse repophlan demo #1 parse repophlan demo #1 May 22, 2017
@coveralls
Copy link

coveralls commented May 22, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling e3070fc on qiyunzhu:master into ce4e523 on biocore:master.



def parse_repophlan(repophlan_wscores_fp):
""" Extract number of HGTs found.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priya is awesome. @sjanssen2

@qiyunzhu
Copy link
Collaborator Author

@serenejiang We just finished this PR! can you please review?

@coveralls
Copy link

coveralls commented May 22, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 46922ea on qiyunzhu:master into ce4e523 on biocore:master.

Copy link
Collaborator

@sjanssen2 sjanssen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit disappointed that I cannot make any comments, because the code seems very sane to me :-)

@qiyunzhu
Copy link
Collaborator Author

Hi @sjanssen2 thanks for reading the code! Just a demo. Though it may be slightly useful. Feel free to merge!



def parse_repophlan(repophlan_wscores_fp):
""" Compute basic statistics of RepoPhlAn-downloaded genomes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pep8 standards say that you should end all sentences with periods, thus add a .
Furthermore, save the leading whitespace. (not 100% sure if I remember those standards correctly)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked PEP257. You are right! I will revise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool


Parameters
----------
repophlan_wscores_fp: string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct format repophlan_wscores_fp : str

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From many other KL projects I saw this xxx : str. However in WGS-HGT it seems that most paragraphs were already written as xxx: string. I am okay to both, and I actually prefer xxx : str. But I wonder if this is part of PEP8? I didn't see that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't ask me :-) I find xxx : str more easy to read

Parameters
----------
repophlan_wscores_fp: string
file path to RepoPhlAn summary table with scores
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start with capital letter, end with period

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will do.

Human-readable report of basic statistics of genomes
"""
df = pd.read_table(repophlan_wscores_fp, index_col=0, header=0)
out = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using a list I would prefer a dict, where the key is the "human readable explanation" and the values are the numbers/strings you gather from the dataframe. It will then be easier to use your results, because res[1] is less intuitive than res['Number of RefSeq genomes']

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjanssen2 I am afraid that I can't agree. out is a piece of sequential, multiple-line information. printing res['Number of RefSeq genomes'] sounds awkward and not precise. At this stage, these numbers are only for user awareness purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I am not really aware what the purpose of this function is. Thus, you are certainly right.

file_okay=True),
help='RepoPhlAn summary table with scores')
def _main(repophlan_wscores_fp):
""" Parser for RepoPhlAn-downloaded genomes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete leading whitespace, end with .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

""" Parser for RepoPhlAn-downloaded genomes
"""
out = parse_repophlan(repophlan_wscores_fp)
click.echo('\n'.join(out))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think readability could be improved for the user

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way is kindof comfortable to me. Do you have a suggestion?



class ParseRepophlanTests(TestCase):
""" Tests for parseRepophlan.py """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my comments about pep8 from above apply to those docstrings here as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

self.assertIn('Task completed.', res.output)


str_basic_stats = ('Total number of genomes: 9.\n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a class variable rather than an object variable initiated in setup? Constant names should be all capital letters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a WGS-HGT legacy... Do you mean that I should write it as:

STR_BASIC_STATS = ('xxxxx...')

@sjanssen2
Copy link
Collaborator

@qiyunzhu said that this PR is mainly for demonstration purposes and used to illustrate the pedantic process of having someone else commenting on your code - which often is very helpful but sometimes also a little frustrating because it is not only about wrong/right programming but also about styles (which is not less important for long term maintenance).
Thus, I went through this code again and commented every little detail that I found - Sorry for being such pedantic :-/

Copy link
Collaborator

@sjanssen2 sjanssen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment

@qiyunzhu
Copy link
Collaborator Author

Hello @sjanssen2 Thank you for your careful comments on the coding style. I will respond and revise. @anupriyatripathi and @serenejiang , please refer to them as an example how code review is typically practised.

@coveralls
Copy link

coveralls commented May 24, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 68e52d9 on qiyunzhu:master into ce4e523 on biocore:master.

@qiyunzhu
Copy link
Collaborator Author

Hi @sjanssen2 I think I took care of your comments. Wanna another look?

@serenejiang serenejiang merged commit f163703 into biocore:master May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants