-
Notifications
You must be signed in to change notification settings - Fork 60
Add command line option to limit dataset years #115
base: master
Are you sure you want to change the base?
Conversation
Following @Irio PR with the option to limit dataset years, this PR: - Update the way to select an specific path to save the datasets - Fetch the `Dataset()` method for year, only if `years` is no None - Update the README.md to this new option
There will be some tests to add to rosie, but I think that it is a great start! |
Sure it is! Many thanks for that ; ) Two doubts: |
This feature is important, so I must consider accept as it is now, and open a new PR to that
I would like to do it now, but I think it is too much for one PR, but I can add too, or we can open an issue and I can do it following, what do you think @cuducos ? |
If the feature is importante I'd say the opposite: let's not accept non-tested code ; ) Also, if the feature is important I think using TLDR I honestly prefer tests and a high level CLI development tool before reviewing this PR bizarrely for the same reason you prefer to postpone tests and high level CI dev tool hahaha… |
Oh! Nice point, working on it!! \o\ |
So considering what we need to do:
|
981abff
to
fc420ec
Compare
fc420ec
to
a7809ca
Compare
Now that I'm full recovered, I will get back to it! |
Just add the tests with and without years for (serenata_rosie) ➜ rosie git:(anaschwendler-limit-years) ✗ python rosie.py test
..............................................................
----------------------------------------------------------------------
Ran 62 tests in 152.857s
OK I'm completely sure that this it not the best way to do that, but I think that the logic is ok. I'll wait for @cuducos review, to see if he can help me with my tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better usage of docopt
can considerably enhance rosie.py
script.
What I suggest is as follows:
- Removing
argv
altogether - Using
docopt
as soon as possible in the script execution to extract the meaningfull arguments - Simplify
run
andtest
so they don't deal with argument parsing - Call
run
ortest
afterdocopt
is done with argument parsing
For example, on the bottom of the file:
def main():
arguments = docopt(__doc__)
if arguments.get('test'):
return test(**arguments)
if arguments.get('run'):
return run(**arguments)
if __name__ == '__main__':
main()
rosie.py
Outdated
python rosie.py test [chamber_of_deputies | federal_senate] | ||
|
||
Options: | ||
--years runs rosie wuth years filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: wuth
-> with
Rosie is a proper noun and should be capitalized.
In years filter the word years is an adjective (I guess) and should not be pluralized.
Also runs rosie wuth years filter doesn't tell much about what years filter does. What about: runs Rosie fetching data and finding suspicions for a specific set of years?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted 👍
README.md
Outdated
$ python rosie.py run chamber_of_deputies --path /my/serenata/directory/ | ||
``` | ||
|
||
Besides that, if you want to, you can run the project with years filter - for example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 52 uses a m-dash (instead of an hyphen) before for example. This line uses an hyphen. Either we go with m-dash or with hyphen, but swinging between both is weird.
Also a line break between text and mono-spaced block makes the raw markdown way easier to read.
rosie.py
Outdated
from sys import argv, exit | ||
from docopt import docopt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 suggests a blank line between native modules (sys
) and third-party modules (docopt
).
rosie.py
Outdated
|
||
Options: | ||
--years runs rosie wuth years filter | ||
--path runs rosie with specific directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rosie is a proper noun and should be capitalized.
Also runs runs rosie with specific directory doesn't tell much about what this directory is for What about: directory in which Rosie saves fetched data and the suspicions CSV?
If I'm not wrong this optional argument has a default (data/
? /tmp/serenata-data/
?) that should be documented as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not wrong this optional argument has a default (data/? /tmp/serenata-data/?) that should be documented as well.
Yes, serenata-data
, can you help me documenting that? There is something in the README.md
but I guess we can do more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you help me documenting that?
From the snippet of docopt
docs I pasted yesterday it's is as simple as replacing this very line we're commenting on. Just change:
--path runs rosie with specific directory
by:
--path runs rosie with specific directory [default: /tmp/serenata-data]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted
rosie.py
Outdated
exit(1) | ||
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/' | ||
|
||
target_directory = argv[argv.index('--path') + 1] if '--path' in argv else '/tmp/serenata-data/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use docopt
for the default value too. From their docs:
If you want to set a default value for an option with an argument, put it into the option's description, in the form
[default: <the-default-value>]
.
years = AVAILABLE_YEARS | ||
chamber_of_deputies = Dataset(self.temp_path, years) | ||
datasets = chamber_of_deputies.fetch() | ||
self.assertEqual(10, len(datasets)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hard-coded 10
will break tests every new year. len(AVAILABLE_YEARS)
wont ; )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted, I just didn't know we could do that
self.assertEqual(10, len(datasets)) | ||
|
||
def test_filter_years_fetch(self): | ||
years = [2017,2016] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 recommends a space after this comma.
years = [2017,2016] | ||
chamber_of_deputies = Dataset(self.temp_path, years) | ||
datasets = chamber_of_deputies.fetch() | ||
self.assertEqual(3, len(datasets)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we expect 3
when len(years) == 2
? Can we document that? For example:
self.assertEqual(3, len(datasets)) # data from 2 years + suspicions file
rosie.py
Outdated
--path runs rosie with specific directory | ||
|
||
""" | ||
|
||
from sys import argv, exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of docopt
is to avoid using argv
(exit
is still needed for the test command).
rosie.py
Outdated
from sys import argv, exit | ||
from docopt import docopt | ||
|
||
|
||
def entered_command(argv): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this method when using docopt
.
Hey @cuducos I think I got something here :) I'm sure that there is a few things to fix, but its getting better, can you review for me? |
rosie.py
Outdated
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/' | ||
target_directory = args['<path>'] if args['--path'] else '/tmp/serenata-data/' | ||
|
||
target_module = 'chamber_of_deputies' if args['chamber_of_deputies'] else 'federal_senate' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cuducos I'm not into this way of selecting target_module
do you have an better option for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using docopt
we should have in arguments
a dictionary like this:
{'--path': False,
'--years': False,
'chamber_of_deputies': True,
'core': False,
'federal_senate': False,
'run': True,
'test': False}
So I'd go for something like:
commands = ('chamber_of_deputies', 'federal_senate')
target_module, *_ = filter(lambda x: arguments.get(x), commands)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anaschwendler I tried to improve the docopt
description using their online tool – the output I;ve got with your code was too polluted so I guesses that something was wrong. I polished it with attention to details (line breaks, syntax etc) and I figure out minor details that could be improved and simplified. Take a look and check that writing this description properly helps us to have a neat arguments
dictionary.
This description:
Hi, this is Serenata's Rosie!
Usage:
rosie.py run (chamber_of_deputies | federal_senate) [--path=<path>] [--years=<years>]
rosie.py test [chamber_of_deputies | federal_senate]
Options:
--years=<years> runs Rosie fetching data and finding suspicions for a specific set of years (use a comma to separate years)
--path=<path> directory in which Rosie saves fetched data and the suspicions CSV [default: /tmp/serenata-data]
Gives us this output:
{
"--path": "/tmp/serenata-data",
"--years": None,
"chamber_of_deputies": True,
"federal_senate": False,
"run": True,
"test": False
}
My following comments will take into account this dictionary as our arguments
, ok?
rosie.py
Outdated
help() | ||
exit(1) | ||
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/' | ||
target_directory = args['<path>'] if args['--path'] else '/tmp/serenata-data/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to set the default value: docopt
will fill in that for you if not entered. target_directory = args['--path']
should be enough. Or move on directly with args['--path']
; )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I knew that should be more improvements, thanks for that, looks pretty good :)
rosie.py
Outdated
years = [int(num) for num in years.split(',')] | ||
klass.main(target_directory, years=years) | ||
else: | ||
klass.main(target_directory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this if
/else
here since:
- if
args['--years']
isNone
- and considering that the default value for
years
isNone
(and it is) - then
klass(target_dir)
is exactly equal toklass(target_dir, args['--years']
However I'm afraid we might need to convert the string 2016,2017
to a list of integers, i.e. [2016, 2017]
. For example:
if args['--years']:
args['--years'] = [int(v) for v in args['--years'].split(',')]
klass.main(target_directory, args['--years'])
rosie.py
Outdated
tests_path = os.path.join('rosie', 'chamber_of_deputies') | ||
tests = loader.discover(tests_path) | ||
elif args['federal_senate']: | ||
tests_path = os.path.join('rosie', 'federal_senate') | ||
tests = loader.discover(tests_path) | ||
else: | ||
tests = loader.discover('rosie') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor DRY suggestion. What about:
test_path = 'rosie'
if args['chamber_of_deputies']:
tests_path = os.path.join(test_path, 'chamber_of_deputies')
elif args['federal_senate']:
tests_path = os.path.join(test_path, 'federal_senate')
tests = loader.discover(test_path)
Hi @cuducos took a closer look on what you have said, and made some adjustments. I'm still not happy with the test script. But now the main script I think that may look nice. Can you review again for me? |
@@ -27,6 +30,18 @@ def setUp(self): | |||
def tearDown(self): | |||
shutil.rmtree(self.temp_path) | |||
|
|||
def test_quantity_of_downloaded_datasets_all_years_fetch(self): | |||
years = AVAILABLE_YEARS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? You use years
in line 35 but go back to AVAILABLE_YEARS
in 37.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't noticed, thanks /o/
What bugs you in this script? |
@cuducos It takes time by downloading the datasets from the source, I can't think of anything, but maybe there is another way to test that, without downloading the whole datasets |
You could mock HTTP requests, or you could mock only the classes (as internally the way |
This is just a template to help you make your point clear with this PR. :)
What is the purpose of this Pull Request?
Following @Irio #60 with the option to limit dataset years, this PR:
Dataset()
method for year, only ifyears
is no NoneWhat was done to achieve this purpose?
Understand what @Irio has done to achieve that new option and fix what was missing to make this new option works
How to test if it really works?
Run rosie with a few options like:
Who can help reviewing it?
@cuducos