Skip to content
This repository has been archived by the owner on Mar 1, 2018. It is now read-only.

Add command line option to limit dataset years #60

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Irio
Copy link
Collaborator

@Irio Irio commented Jun 18, 2017

Depends on datasciencebr/serenata-toolbox#97, missing tests once the other pull request gets merged.

@cuducos
Copy link
Collaborator

cuducos commented Jun 20, 2017

LGTM… waiting for okfn-brasil/serenata-toolbox#97 then.

rosie.py Outdated
klass = getattr(rosie, target_module)
klass.main(target_directory)
klass.main(target_directory, years=years)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing in tests:

$ python rosie.py run federal_senate
Traceback (most recent call last):
  File "rosie.py", line 62, in <module>
    command()
  File "rosie.py", line 36, in run
    klass.main(target_directory, years=years)
UnboundLocalError: local variable 'years' referenced before assignment

If no --years is passed, years variable is not set.

@anaschwendler
Copy link
Collaborator

Hello everyone, I'm getting back to this PR :)

I tested the command with years, seems to be working, but I'll take a closer look:

(serenata_rosie) ➜  rosie git:(irio-limit-years) ✗ python rosie.py run chamber_of_deputies --years 2017 /tmp/         
2017-12-08 13:58:34,684 - root - INFO - Merging all datasets…
2017-12-08 13:58:34,684 - root - INFO - Loading reimbursements-2017.xz…
2017-12-08 13:58:37,153 - root - INFO - Dropping rows without document_value or reimbursement_number…
2017-12-08 13:58:37,845 - root - INFO - Grouping dataset by applicant_id, document_id and year…
2017-12-08 13:58:37,846 - root - INFO - Gathering all reimbursement numbers together…
2017-12-08 13:58:40,804 - root - INFO - Summing all net values together…
2017-12-08 13:58:40,826 - root - INFO - Summing all reimbursement values together…
2017-12-08 13:58:40,852 - root - INFO - Generating the new dataset…
2017-12-08 13:58:41,999 - root - INFO - Casting changes to a new DataFrame…
2017-12-08 13:58:41,999 - root - INFO - Writing it to file…
2017-12-08 13:59:00,764 - root - INFO - Done.
Downloading 2016-09-03-companies.xz: 100%|██████████████████████████████████████████████████████████████| 4.84M/4.84M [00:03<00:00, 1.30Mb/s]

@@ -31,7 +30,12 @@ def run():
exit(1)
target_directory = argv[3] if len(argv) >= 4 else '/tmp/serenata-data/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just find out the problem here: if we update the value, we need to check here: if len(argv) >= 4 else '/tmp/serenata-data/'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I plan to do, like @Irio's tag --years, I'll add a tag for path --path so if there is an path, it'll be the next argument

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants