Skip to content
This repository has been archived by the owner on Mar 1, 2018. It is now read-only.

Introducing Federal Senate script to Rosie! #51

Merged
merged 28 commits into from
May 23, 2017

Conversation

anaschwendler
Copy link
Collaborator

@anaschwendler anaschwendler commented May 16, 2017

This PR is a work in progress, but it has started today.
The things that need to be done are:

  • Create a sample randomic dataset to use to create tests.
  • Finish the adapter.py script, writing the tests first (that is the reason why I stopped the script)
  • Learn how to run the tests only for federal senate
  • Check if everything is working
  • :shipit:

Any opinion is important, so feel free to do so :)
(I never do this alone, so I stopped putting my name in the begining)

This PR is a work in progress, but it has started today.
The things that need to be done are:
- [ ] Create a sample randomic dataset to use to create tests.
- [ ] Finish the `adapter.py` script, writing the tests first (that is the reason why I stoped the script)
- [ ] Learn how to run the tests only for federal senate
- [ ] Check if everything is working
- [ ] :shipit:

Any opinion is important, so feel free to do so :)
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.4%) to 95.172% when pulling 94f3161 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-3.6%) to 92.939% when pulling dd19bfb on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-3.8%) to 92.799% when pulling 1241f08 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-3.8%) to 92.81% when pulling 3d5f2b9 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-4.2%) to 92.37% when pulling 1581441 on introducing-federal-senate into cceb0ff on master.

rosie.py Outdated
@@ -17,7 +17,7 @@ def help():


def run():
import rosie, rosie.chamber_of_deputies
import rosie, rosie.chamber_of_deputies, rosie.federal_senate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed it before but… multiple imports in not recommended ; )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you see, there is a way to fix it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtemporal do you know a better way to fix it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import rosie
import rosie.chamber_of_deputies
import rosie.federal_senate

if self.settings.UNIQUE_IDS:
self.suspicions = self.dataset[self.settings.UNIQUE_IDS].copy()
else:
self.suspicions = self.dataset.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance we end up here without UNIQUE_IDS? I thought it was suppose to be a kind of standard/requirement…

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know what to do about it, I want to change it, and really don't know what is necessary for chamber_of_deputies

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will need to figure out a way to uniquely identify each federal senate reimbursements, otherwise the suspicions file will have all the columns that can be found in the original dataset... that's what we use the UNIQUE_IDS for, so if we are comfortable with all the columns in the suspicions file there's no reason to set an UNIQUE_IDS.

Copy link
Collaborator

@jtemporal jtemporal May 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the matter of creating a unique ID for each reimbursement, I tried combining date, cnpj_cpf and document_id and yet wasn't able to create a string that was unique ¯\_(ツ)_/¯

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document_id will never be a great combination for UNIQUE_IDS because there is some receipts that don't have one, or some receipts that have sem fatura problem :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I thought combining those 3 columns might help... but it wasn't enough, I believe we need a brainstorm to figure this one out... So far I'm good with having all the columns in the suspicions file :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need consistency in this unique identifiers? I mean, considering Rosie runs today and tomorrow: is it really required that the document X today have exactly the same id as it would have tomorrow? If not we can bring pandas index (created by default) to the dataset.


def prepare_cpnj_cpf(self):
self._dataset = self._dataset[self._dataset['cnpj_cpf'].notnull()]
self._dataset['document_type'] = 'simple_receipt'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rational for that? Can you mention it in a comment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a padronization for the core module :)
To run invalid_cpnj_cpf the dataset must have this field.

I can comment it on the code!

Minor refactor on Federal Senate Adapter:
 - Moved column creation to a method for it
 - created a condition so that only generates the big file if it doesn't
   exist facilitating tests

Tests assums that all steps worked successfully and test to see if final file
is as it should be:
 - columns renamed after COLUMNS variable
 - `document_type` column created and filled with `simple_receipt`
names now reflect what the method and testreally do
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.3%) to 95.246% when pulling 247a2db on introducing-federal-senate into cceb0ff on master.

@anaschwendler
Copy link
Collaborator Author

@jtemporal there is only one thing missing, @cuducos asked if we can comment on the code why we create another column for invalid_cpnj_cpf, I can do that later, but for now I will only admire that good work you made yesterday <3

👏

@cuducos cuducos force-pushed the introducing-federal-senate branch from a369611 to cb62ce8 Compare May 22, 2017 14:47
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.367% when pulling cb62ce8 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.367% when pulling cb62ce8 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.367% when pulling d704a0a on introducing-federal-senate into cceb0ff on master.

@anaschwendler
Copy link
Collaborator Author

We have a problem which is: we need Rosie to update the data from Federal Senate every time she runs just like she does with the Chamber of Deputies data.

Right now that doesn't happen. Also, she is expecting that we already have a federal_senate_reimbursements.xz because of how the data path is built on the adapter

We need help to mock update_datasets() or figure out a way to fix this issue.

On hold until we finish this, soon will be resolved.

cc @cuducos

@cuducos
Copy link
Collaborator

cuducos commented May 22, 2017

We need help to mock Adapter.update_datasets() or figure out a way to fix this issue.

Running the update method by default, but mocking it in tests: that sounds like a really good approach IMHO.

@anaschwendler
Copy link
Collaborator Author

Running the update method by default, but mocking it in tests: that sounds like a really good approach IMHO.

This idea sounds really good to me, that could be a way, if it looks good to @jtemporal we can work on it tomorrow, or I can try it later.

@jtemporal
Copy link
Collaborator

This idea sounds really good to me, that could be a way, if it looks good to @jtemporal we can work on it tomorrow, or I can try it later

that's the plan

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.367% when pulling 545ab57 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.367% when pulling 5d7f157 on introducing-federal-senate into cceb0ff on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.382% when pulling efe32b9 on introducing-federal-senate into cceb0ff on master.

A better approach to the required missing information on document_type
on the Federal Senate dataset
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.2%) to 95.39% when pulling 9427bed on introducing-federal-senate into cceb0ff on master.

@cuducos cuducos changed the title [WIP] Introducing Federal Senate script to Rosie! Introducing Federal Senate script to Rosie! May 23, 2017
@cuducos cuducos merged commit 8a27187 into master May 23, 2017
@anaschwendler anaschwendler deleted the introducing-federal-senate branch May 23, 2017 15:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants