Introducing Federal Senate script to Rosie! #51

anaschwendler · 2017-05-16T17:14:26Z

This PR is a work in progress, but it has started today.
The things that need to be done are:

Create a sample randomic dataset to use to create tests.
Finish the adapter.py script, writing the tests first (that is the reason why I stopped the script)
Learn how to run the tests only for federal senate
Check if everything is working

Any opinion is important, so feel free to do so :)
(I never do this alone, so I stopped putting my name in the begining)

This PR is a work in progress, but it has started today. The things that need to be done are: - [ ] Create a sample randomic dataset to use to create tests. - [ ] Finish the `adapter.py` script, writing the tests first (that is the reason why I stoped the script) - [ ] Learn how to run the tests only for federal senate - [ ] Check if everything is working - [ ]

Any opinion is important, so feel free to do so :)

coveralls · 2017-05-17T13:05:23Z

Coverage decreased (-1.4%) to 95.172% when pulling 94f3161 on introducing-federal-senate into cceb0ff on master.

By running `python rosie.py run federal_senate` we can start finding suspicious reimbursements! 🎉

coveralls · 2017-05-17T15:28:08Z

Coverage decreased (-3.6%) to 92.939% when pulling dd19bfb on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-17T15:38:41Z

Coverage decreased (-3.8%) to 92.799% when pulling 1241f08 on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-18T13:56:48Z

Coverage decreased (-3.8%) to 92.81% when pulling 3d5f2b9 on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-18T15:46:05Z

Coverage decreased (-4.2%) to 92.37% when pulling 1581441 on introducing-federal-senate into cceb0ff on master.

cuducos · 2017-05-18T17:22:43Z

rosie.py

@@ -17,7 +17,7 @@ def help():


 def run():
-    import rosie, rosie.chamber_of_deputies
+    import rosie, rosie.chamber_of_deputies, rosie.federal_senate


I might have missed it before but… multiple imports in not recommended ; )

So, you see, there is a way to fix it?

@jtemporal do you know a better way to fix it?

import rosie import rosie.chamber_of_deputies import rosie.federal_senate

cuducos · 2017-05-18T17:23:49Z

rosie/core/__init__.py

+        if self.settings.UNIQUE_IDS:
+            self.suspicions = self.dataset[self.settings.UNIQUE_IDS].copy()
+        else:
+            self.suspicions = self.dataset.copy()


Is there a chance we end up here without UNIQUE_IDS? I thought it was suppose to be a kind of standard/requirement…

I really don't know what to do about it, I want to change it, and really don't know what is necessary for chamber_of_deputies

we will need to figure out a way to uniquely identify each federal senate reimbursements, otherwise the suspicions file will have all the columns that can be found in the original dataset... that's what we use the UNIQUE_IDS for, so if we are comfortable with all the columns in the suspicions file there's no reason to set an UNIQUE_IDS.

On the matter of creating a unique ID for each reimbursement, I tried combining date, cnpj_cpf and document_id and yet wasn't able to create a string that was unique ¯\_(ツ)_/¯

document_id will never be a great combination for UNIQUE_IDS because there is some receipts that don't have one, or some receipts that have sem fatura problem :/

that's why I thought combining those 3 columns might help... but it wasn't enough, I believe we need a brainstorm to figure this one out... So far I'm good with having all the columns in the suspicions file :)

Do we need consistency in this unique identifiers? I mean, considering Rosie runs today and tomorrow: is it really required that the document X today have exactly the same id as it would have tomorrow? If not we can bring pandas index (created by default) to the dataset.

cuducos · 2017-05-18T17:26:10Z

rosie/federal_senate/adapter.py

+
+    def prepare_cpnj_cpf(self):
+        self._dataset = self._dataset[self._dataset['cnpj_cpf'].notnull()]
+        self._dataset['document_type'] = 'simple_receipt'


Is there a rational for that? Can you mention it in a comment?

Yes, it is a padronization for the core module :)
To run invalid_cpnj_cpf the dataset must have this field.

I can comment it on the code!

Minor refactor on Federal Senate Adapter: - Moved column creation to a method for it - created a condition so that only generates the big file if it doesn't exist facilitating tests Tests assums that all steps worked successfully and test to see if final file is as it should be: - columns renamed after COLUMNS variable - `document_type` column created and filled with `simple_receipt`

names now reflect what the method and testreally do

coveralls · 2017-05-20T00:44:52Z

Coverage decreased (-1.3%) to 95.246% when pulling 247a2db on introducing-federal-senate into cceb0ff on master.

anaschwendler · 2017-05-20T09:56:38Z

@jtemporal there is only one thing missing, @cuducos asked if we can comment on the code why we create another column for invalid_cpnj_cpf, I can do that later, but for now I will only admire that good work you made yesterday <3

👏

coveralls · 2017-05-22T14:49:03Z

Coverage decreased (-1.2%) to 95.367% when pulling cb62ce8 on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-22T14:49:03Z

Coverage decreased (-1.2%) to 95.367% when pulling cb62ce8 on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-22T15:05:43Z

Coverage decreased (-1.2%) to 95.367% when pulling d704a0a on introducing-federal-senate into cceb0ff on master.

anaschwendler · 2017-05-22T15:54:16Z

We have a problem which is: we need Rosie to update the data from Federal Senate every time she runs just like she does with the Chamber of Deputies data.

Right now that doesn't happen. Also, she is expecting that we already have a federal_senate_reimbursements.xz because of how the data path is built on the adapter

We need help to mock update_datasets() or figure out a way to fix this issue.

On hold until we finish this, soon will be resolved.

cc @cuducos

cuducos · 2017-05-22T16:13:59Z

We need help to mock Adapter.update_datasets() or figure out a way to fix this issue.

Running the update method by default, but mocking it in tests: that sounds like a really good approach IMHO.

anaschwendler · 2017-05-22T16:19:58Z

Running the update method by default, but mocking it in tests: that sounds like a really good approach IMHO.

This idea sounds really good to me, that could be a way, if it looks good to @jtemporal we can work on it tomorrow, or I can try it later.

jtemporal · 2017-05-22T21:44:35Z

This idea sounds really good to me, that could be a way, if it looks good to @jtemporal we can work on it tomorrow, or I can try it later

that's the plan

coveralls · 2017-05-23T10:07:03Z

Coverage decreased (-1.2%) to 95.367% when pulling 545ab57 on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-23T10:10:33Z

Coverage decreased (-1.2%) to 95.367% when pulling 5d7f157 on introducing-federal-senate into cceb0ff on master.

coveralls · 2017-05-23T14:28:46Z

Coverage decreased (-1.2%) to 95.382% when pulling efe32b9 on introducing-federal-senate into cceb0ff on master.

A better approach to the required missing information on document_type on the Federal Senate dataset

coveralls · 2017-05-23T14:43:04Z

Coverage decreased (-1.2%) to 95.39% when pulling 9427bed on introducing-federal-senate into cceb0ff on master.

anaschwendler added enhancement help wanted refactor work in progress labels May 16, 2017

anaschwendler self-assigned this May 16, 2017

anaschwendler requested review from Irio, cuducos and jtemporal May 16, 2017 17:14

anaschwendler added 4 commits May 17, 2017 17:16

Add sample fixture of Federal Senate reimbursements

c97b3aa

Start of test structure

50663ec

Rosie basically working with Federal Senate, refactor needed

dd19bfb

By running `python rosie.py run federal_senate` we can start finding suspicious reimbursements! 🎉

Solution suggestion of unique_id problem

044fce4

Fix suggestion unique_id solution key

1241f08

Refactor UNIQUE_IDS attribute

3d5f2b9

Finally already generating a suspicious dataset correctly 🎉

1581441

cuducos reviewed May 18, 2017

View reviewed changes

jtemporal added 6 commits May 19, 2017 21:25

💅 extra line

b6a8dd5

Renaming test and method

cff1700

names now reflect what the method and testreally do

missing rename

6653a12

Adds drop null cnpj cpf test

541840e

Adds a line without cnpj to test drop null cnpj case

247a2db

cuducos added 2 commits May 22, 2017 11:44

💅

493bfda

Use fixture dataset path to simplify adapter/test logic

cb62ce8

cuducos force-pushed the introducing-federal-senate branch from a369611 to cb62ce8 Compare May 22, 2017 14:47

jtemporal added 4 commits May 22, 2017 11:52

Clarifying README info

ba59053

Fixing multiple imports in one line

1c5c86d

💅 add line separating imports

2a39960

Adds comment explaning the addition of 'simple_receipt' info

d704a0a

jtemporal added 2 commits May 23, 2017 06:57

isorting imports

9ac7aa8

Adding missing line before class: flake8

545ab57

💅 adds extra line to separate imports

5d7f157

Update dataset by default (and mock it for tests)

efe32b9

jtemporal added 2 commits May 23, 2017 11:37

Changes simple_receipt to unknown

53cb8f8

A better approach to the required missing information on document_type on the Federal Senate dataset

Isorting core

9427bed

cuducos approved these changes May 23, 2017

View reviewed changes

cuducos removed help wanted work in progress labels May 23, 2017

cuducos changed the title ~~[WIP] Introducing Federal Senate script to Rosie!~~ Introducing Federal Senate script to Rosie! May 23, 2017

cuducos merged commit 8a27187 into master May 23, 2017

anaschwendler deleted the introducing-federal-senate branch May 23, 2017 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing Federal Senate script to Rosie! #51

Introducing Federal Senate script to Rosie! #51

anaschwendler commented May 16, 2017 •

edited by cuducos

Loading

coveralls commented May 17, 2017

coveralls commented May 17, 2017

coveralls commented May 17, 2017

coveralls commented May 18, 2017

coveralls commented May 18, 2017

cuducos May 18, 2017

anaschwendler May 19, 2017

anaschwendler May 20, 2017

cuducos May 22, 2017

cuducos May 18, 2017

anaschwendler May 19, 2017

jtemporal May 20, 2017

jtemporal May 20, 2017 •

edited

Loading

anaschwendler May 20, 2017

jtemporal May 22, 2017

cuducos May 22, 2017

cuducos May 18, 2017

anaschwendler May 19, 2017

coveralls commented May 20, 2017

anaschwendler commented May 20, 2017

coveralls commented May 22, 2017

coveralls commented May 22, 2017

coveralls commented May 22, 2017

anaschwendler commented May 22, 2017

cuducos commented May 22, 2017

anaschwendler commented May 22, 2017

jtemporal commented May 22, 2017

coveralls commented May 23, 2017

coveralls commented May 23, 2017

coveralls commented May 23, 2017

coveralls commented May 23, 2017

Introducing Federal Senate script to Rosie! #51

Introducing Federal Senate script to Rosie! #51

Conversation

anaschwendler commented May 16, 2017 • edited by cuducos Loading

coveralls commented May 17, 2017

coveralls commented May 17, 2017

coveralls commented May 17, 2017

coveralls commented May 18, 2017

coveralls commented May 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtemporal May 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented May 20, 2017

anaschwendler commented May 20, 2017

coveralls commented May 22, 2017

coveralls commented May 22, 2017

coveralls commented May 22, 2017

anaschwendler commented May 22, 2017

cuducos commented May 22, 2017

anaschwendler commented May 22, 2017

jtemporal commented May 22, 2017

coveralls commented May 23, 2017

coveralls commented May 23, 2017

coveralls commented May 23, 2017

coveralls commented May 23, 2017

anaschwendler commented May 16, 2017 •

edited by cuducos

Loading

jtemporal May 20, 2017 •

edited

Loading