Skip to content
This repository has been archived by the owner on Mar 1, 2018. It is now read-only.

Add command line option to limit dataset years #115

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

anaschwendler
Copy link
Collaborator

This is just a template to help you make your point clear with this PR. :)

What is the purpose of this Pull Request?
Following @Irio #60 with the option to limit dataset years, this PR:

  • Update the way to select an specific path to save the datasets
  • Fetch the Dataset() method for year, only if years is no None
  • Update the README.md to this new option

What was done to achieve this purpose?
Understand what @Irio has done to achieve that new option and fix what was missing to make this new option works

How to test if it really works?
Run rosie with a few options like:

$ python rosie.py run chamber_of_deputies --path /another/folder --years 2017
$ python rosie.py run chamber_of_deputies --path /another/folder
$ python rosie.py run chamber_of_deputies --years 2017,2016
$ python rosie.py run chamber_of_deputies 

$ python rosie.py run federal_senate --path /another/folder --years 2017
$ python rosie.py run federal_senate --path /another/folder
$ python rosie.py run federal_senate --years 2017,2016
$ python rosie.py run federal_senate 

Who can help reviewing it?
@cuducos

Irio and others added 6 commits June 18, 2017 03:52
Following @Irio PR with the option to limit dataset years, this PR:
- Update the way to select an specific path to save the datasets
- Fetch the `Dataset()` method for year, only if `years` is no None
- Update the README.md to this new option
@anaschwendler
Copy link
Collaborator Author

There will be some tests to add to rosie, but I think that it is a great start!

@cuducos
Copy link
Collaborator

cuducos commented Dec 8, 2017

There will be some tests to add to rosie, but I think that it is a great start!

Sure it is! Many thanks for that ; )

Two doubts:

  1. Should we wait for these tests before reviewing the PR?
  2. As the CLI gets more complicates I think we should adopt either argparse (⭐️ ), click (⭐️ ⭐️⭐️⭐️) or docopt ( ⭐️⭐️⭐️⭐️⭐️). What do you think of implementing that in this PR?

@anaschwendler
Copy link
Collaborator Author

Should we wait for these tests before reviewing the PR?

This feature is important, so I must consider accept as it is now, and open a new PR to that

As the CLI gets more complicates I think we should adopt either argparse (⭐️ ), click (⭐️ ⭐️⭐️⭐️) or docopt ( ⭐️⭐️⭐️⭐️⭐️). What do you think of implementing that in this PR?

I would like to do it now, but I think it is too much for one PR, but I can add too, or we can open an issue and I can do it following, what do you think @cuducos ?

@cuducos
Copy link
Collaborator

cuducos commented Dec 8, 2017

This feature is important, so I must consider accept as it is now, and open a new PR to that

If the feature is importante I'd say the opposite: let's not accept non-tested code ; ) Also, if the feature is important I think using argparse, click or docopt is very much welcomed: they are highly tested packages to avoid us reinventing the wheel and accidentally getting something wrong in in the way.

TLDR I honestly prefer tests and a high level CLI development tool before reviewing this PR bizarrely for the same reason you prefer to postpone tests and high level CI dev tool hahaha…

@anaschwendler
Copy link
Collaborator Author

Oh! Nice point, working on it!! \o\

@anaschwendler
Copy link
Collaborator Author

anaschwendler commented Dec 8, 2017

So considering what we need to do:

  • Test chamber_of_deputies with and without years
  • Include docopt to our CLI

@anaschwendler anaschwendler force-pushed the anaschwendler-limit-years branch 3 times, most recently from 981abff to fc420ec Compare December 8, 2017 18:46
@anaschwendler
Copy link
Collaborator Author

Now that I'm full recovered, I will get back to it!

@anaschwendler
Copy link
Collaborator Author

Just included docopt to our CLI, and had some doubts about documentation.

Docopt works fine, but when I run python rosie.py only, it gets a help > terminal, is it normal?

image

Independent of what I do, it don't change this option, what can be done about it?

@anaschwendler
Copy link
Collaborator Author

Just add the tests with and without years for chamber_of_deputies. The result is:

(serenata_rosie) ➜  rosie git:(anaschwendler-limit-years) ✗ python rosie.py test
..............................................................
----------------------------------------------------------------------
Ran 62 tests in 152.857s

OK

I'm completely sure that this it not the best way to do that, but I think that the logic is ok. I'll wait for @cuducos review, to see if he can help me with my tests.

Copy link
Collaborator

@cuducos cuducos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better usage of docopt can considerably enhance rosie.py script.

What I suggest is as follows:

  1. Removing argv altogether
  2. Using docopt as soon as possible in the script execution to extract the meaningfull arguments
  3. Simplify run and test so they don't deal with argument parsing
  4. Call run or test after docopt is done with argument parsing

For example, on the bottom of the file:

def main():
    arguments = docopt(__doc__)

    if arguments.get('test'):
        return test(**arguments)

    if arguments.get('run'):
        return run(**arguments)


if __name__ == '__main__':
    main()

rosie.py Outdated
python rosie.py test [chamber_of_deputies | federal_senate]

Options:
--years runs rosie wuth years filter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: wuth -> with

Rosie is a proper noun and should be capitalized.

In years filter the word years is an adjective (I guess) and should not be pluralized.

Also runs rosie wuth years filter doesn't tell much about what years filter does. What about: runs Rosie fetching data and finding suspicions for a specific set of years?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted 👍

README.md Outdated
$ python rosie.py run chamber_of_deputies --path /my/serenata/directory/
```

Besides that, if you want to, you can run the project with years filter - for example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 52 uses a m-dash (instead of an hyphen) before for example. This line uses an hyphen. Either we go with m-dash or with hyphen, but swinging between both is weird.

Also a line break between text and mono-spaced block makes the raw markdown way easier to read.

rosie.py Outdated
from sys import argv, exit
from docopt import docopt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 suggests a blank line between native modules (sys) and third-party modules (docopt).

rosie.py Outdated

Options:
--years runs rosie wuth years filter
--path runs rosie with specific directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rosie is a proper noun and should be capitalized.

Also runs runs rosie with specific directory doesn't tell much about what this directory is for What about: directory in which Rosie saves fetched data and the suspicions CSV?

If I'm not wrong this optional argument has a default (data/? /tmp/serenata-data/?) that should be documented as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong this optional argument has a default (data/? /tmp/serenata-data/?) that should be documented as well.

Yes, serenata-data, can you help me documenting that? There is something in the README.md but I guess we can do more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me documenting that?

From the snippet of docopt docs I pasted yesterday it's is as simple as replacing this very line we're commenting on. Just change:

    --path    runs rosie with specific directory

by:

    --path    runs rosie with specific directory [default: /tmp/serenata-data]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted

rosie.py Outdated
exit(1)
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/'

target_directory = argv[argv.index('--path') + 1] if '--path' in argv else '/tmp/serenata-data/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use docopt for the default value too. From their docs:

If you want to set a default value for an option with an argument, put it into the option's description, in the form [default: <the-default-value>].

years = AVAILABLE_YEARS
chamber_of_deputies = Dataset(self.temp_path, years)
datasets = chamber_of_deputies.fetch()
self.assertEqual(10, len(datasets))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard-coded 10 will break tests every new year. len(AVAILABLE_YEARS) wont ; )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted, I just didn't know we could do that

self.assertEqual(10, len(datasets))

def test_filter_years_fetch(self):
years = [2017,2016]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 recommends a space after this comma.

years = [2017,2016]
chamber_of_deputies = Dataset(self.temp_path, years)
datasets = chamber_of_deputies.fetch()
self.assertEqual(3, len(datasets))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we expect 3 when len(years) == 2? Can we document that? For example:

self.assertEqual(3, len(datasets))  # data from 2 years + suspicions file

rosie.py Outdated
--path runs rosie with specific directory

"""

from sys import argv, exit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of docopt is to avoid using argv (exit is still needed for the test command).

rosie.py Outdated
from sys import argv, exit
from docopt import docopt


def entered_command(argv):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this method when using docopt.

@anaschwendler
Copy link
Collaborator Author

Hey @cuducos I think I got something here :)

I'm sure that there is a few things to fix, but its getting better, can you review for me?

rosie.py Outdated
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/'
target_directory = args['<path>'] if args['--path'] else '/tmp/serenata-data/'

target_module = 'chamber_of_deputies' if args['chamber_of_deputies'] else 'federal_senate'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cuducos I'm not into this way of selecting target_module do you have an better option for that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using docopt we should have in arguments a dictionary like this:

{'--path': False,
 '--years': False,
 'chamber_of_deputies': True,
 'core': False,
 'federal_senate': False,
 'run': True,
 'test': False}

So I'd go for something like:

commands = ('chamber_of_deputies', 'federal_senate')
target_module, *_ =  filter(lambda x: arguments.get(x), commands)

Copy link
Collaborator

@cuducos cuducos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anaschwendler I tried to improve the docopt description using their online tool – the output I;ve got with your code was too polluted so I guesses that something was wrong. I polished it with attention to details (line breaks, syntax etc) and I figure out minor details that could be improved and simplified. Take a look and check that writing this description properly helps us to have a neat arguments dictionary.

This description:

Hi, this is Serenata's Rosie!

Usage:
  rosie.py run (chamber_of_deputies | federal_senate) [--path=<path>] [--years=<years>]
  rosie.py test [chamber_of_deputies | federal_senate]

Options:
  --years=<years>  runs Rosie fetching data and finding suspicions for a specific set of years (use a comma to separate years)
  --path=<path>    directory in which Rosie saves fetched data and the suspicions CSV [default: /tmp/serenata-data]

Gives us this output:

{
  "--path": "/tmp/serenata-data", 
  "--years": None, 
  "chamber_of_deputies": True, 
  "federal_senate": False, 
  "run": True, 
  "test": False
}

My following comments will take into account this dictionary as our arguments, ok?

rosie.py Outdated
help()
exit(1)
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/'
target_directory = args['<path>'] if args['--path'] else '/tmp/serenata-data/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to set the default value: docopt will fill in that for you if not entered. target_directory = args['--path'] should be enough. Or move on directly with args['--path'] ; )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew that should be more improvements, thanks for that, looks pretty good :)

rosie.py Outdated
years = [int(num) for num in years.split(',')]
klass.main(target_directory, years=years)
else:
klass.main(target_directory)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this if/else here since:

  • if args['--years'] is None
  • and considering that the default value for years is None (and it is)
  • then klass(target_dir) is exactly equal to klass(target_dir, args['--years']

However I'm afraid we might need to convert the string 2016,2017 to a list of integers, i.e. [2016, 2017]. For example:

if args['--years']:
   args['--years'] = [int(v) for v in args['--years'].split(',')]

klass.main(target_directory, args['--years'])

rosie.py Outdated
tests_path = os.path.join('rosie', 'chamber_of_deputies')
tests = loader.discover(tests_path)
elif args['federal_senate']:
tests_path = os.path.join('rosie', 'federal_senate')
tests = loader.discover(tests_path)
else:
tests = loader.discover('rosie')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor DRY suggestion. What about:

test_path = 'rosie'
if args['chamber_of_deputies']:
     tests_path = os.path.join(test_path, 'chamber_of_deputies')
elif args['federal_senate']:
     tests_path = os.path.join(test_path, 'federal_senate')
tests = loader.discover(test_path)

@anaschwendler
Copy link
Collaborator Author

anaschwendler commented Jan 7, 2018

Hi @cuducos took a closer look on what you have said, and made some adjustments. I'm still not happy with the test script. But now the main script I think that may look nice. Can you review again for me?

@@ -27,6 +30,18 @@ def setUp(self):
def tearDown(self):
shutil.rmtree(self.temp_path)

def test_quantity_of_downloaded_datasets_all_years_fetch(self):
years = AVAILABLE_YEARS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? You use years in line 35 but go back to AVAILABLE_YEARS in 37.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't noticed, thanks /o/

@cuducos
Copy link
Collaborator

cuducos commented Jan 7, 2018

I'm still not happy with the test script.

What bugs you in this script?

@anaschwendler
Copy link
Collaborator Author

@cuducos It takes time by downloading the datasets from the source, I can't think of anything, but maybe there is another way to test that, without downloading the whole datasets

@cuducos
Copy link
Collaborator

cuducos commented Jan 7, 2018

there is another way to test that, without downloading the whole datasets

You could mock HTTP requests, or you could mock only the classes (as internally the way self.years works might be already tested).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants