Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed reimbursement with net value with comma #144

Merged
merged 6 commits into from
Sep 13, 2017

Conversation

viniciusartur
Copy link
Contributor

@viniciusartur viniciusartur commented Sep 12, 2017

Fixing bug on Rosie reported at issue #77

Bug:
Given a CSV with a reimbursement with net_value containing comma (ex: 77,99)
When I translate this CSV using Serenata-Toolbox
Then I have a CSV translated with net_value containing comma

Expected:
Then I have a CSV translated with net_value containing dot

Bug analysis:
Converter responsible for replacing comma with dot was removed on this commit.

Solution:
Using decimal attribute on pandas.read_csv(), as suggested by @cuducos here.
Test added to cover this specific scenario.

@@ -0,0 +1,2 @@
congressperson_name,congressperson_id,congressperson_document,term,state,party,term_id,subquota_number,subquota_description,subquota_group_id,subquota_group_description,supplier,cnpj_cpf,document_number,document_type,issue_date,document_value,remark_value,net_value,month,year,installment,passenger,leg_of_the_trip,batch_number,reimbursement_number,reimbursement_value,applicant_id,document_id
ABELARDO CAMARINHA,141463,329,2011,SP,PSB,54,1,Maintenance of office supporting parliamentary activity,0,,TIM CELULAR S/A,04206050005140,20272-AB,0,2009-05-19 00:00:00,411.51,0,411.51,5,2009,0,,,406387,2950,0,1772,1615640
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file correct? The name says that the reimbursements has a comma, but the content has a value with a point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed to reimbursements-with-decimal-point. Thank you for pointing out this.

Copy link
Collaborator

@cuducos cuducos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many many thanks for that @viniciusartur! I added minor comments, what do you think about them? Overall it's really good, but I would like you to consider some minor tweaks to make the code even clearer ; )

@@ -0,0 +1,2 @@
congressperson_name,congressperson_id,congressperson_document,term,state,party,term_id,subquota_number,subquota_description,subquota_group_id,subquota_group_description,supplier,cnpj_cpf,document_number,document_type,issue_date,document_value,remark_value,net_value,month,year,installment,passenger,leg_of_the_trip,batch_number,reimbursement_number,reimbursement_value,applicant_id,document_id
ABELARDO CAMARINHA,141463,329,2011,SP,PSB,54,1,Maintenance of office supporting parliamentary activity,0,,TIM CELULAR S/A,04206050005140,20272-AB,0,2009-05-19 00:00:00,411.51,0,411.51,5,2009,0,,,406387,2950,0,1772,1615640
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should have .csv as a extension in this filename, right? Also can we have equivalent file names?

For example, one is …with-comma and the other …with-decimal-point. What about:

  • Ano-with-comma.csv
  • reimbursements-with-point.csv

Or

  • Ano-with-comma-as-decimal-separator.csv
  • reimbursements-with-point-as-decimal-separator.csv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the first option. I'm working on it.

@@ -67,6 +68,15 @@ def test_clean_2017_reimbursements(self):
with self.subTest():
assert(subquota in all_subquotas)

def test_translate_csv_with_reimbursement_with_net_value_with_comma(self):
csv_with_comma = os.path.join(self.fixtures_path, 'Ano-with-comma.csv')
with open(os.path.join(self.fixtures_path, 'reimbursements-with-decimal-point'), 'r') as csv_expected:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define the path in another line so we keep under the PEP8 suggestion of 80 cols?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that. I'm working on it.

expected = csv_expected.read()

xz_output = Dataset('')._translate_file(csv_with_comma)
output = lzma.open(xz_output).read().decode('utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a context manager here too (with lzma.open(…))… or at least to close the lzma file handler after reading it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that. I'm working on it.

@viniciusartur
Copy link
Contributor Author

To ensure ubiquitous language, we changed from comma to decimal-comma.

xz_path = Dataset('')._translate_file(csv_with_decimal_comma)
with lzma.open(xz_path) as xz_file:
output = xz_file.read().decode('utf-8')
assert(output == expected)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use unittest to get better messages when test fails? That would be self.assertEqual(output, expected)

@okfn-brasil okfn-brasil deleted a comment from coveralls Sep 13, 2017
@okfn-brasil okfn-brasil deleted a comment from coveralls Sep 13, 2017
@okfn-brasil okfn-brasil deleted a comment from coveralls Sep 13, 2017
@okfn-brasil okfn-brasil deleted a comment from coveralls Sep 13, 2017
@coveralls
Copy link

coveralls commented Sep 13, 2017

Coverage Status

Coverage remained the same at 85.667% when pulling 626a151 on viniciusartur:master into f30d0a8 on datasciencebr:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants