-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed reimbursement with net value with comma #144
Conversation
@@ -0,0 +1,2 @@ | |||
congressperson_name,congressperson_id,congressperson_document,term,state,party,term_id,subquota_number,subquota_description,subquota_group_id,subquota_group_description,supplier,cnpj_cpf,document_number,document_type,issue_date,document_value,remark_value,net_value,month,year,installment,passenger,leg_of_the_trip,batch_number,reimbursement_number,reimbursement_value,applicant_id,document_id | |||
ABELARDO CAMARINHA,141463,329,2011,SP,PSB,54,1,Maintenance of office supporting parliamentary activity,0,,TIM CELULAR S/A,04206050005140,20272-AB,0,2009-05-19 00:00:00,411.51,0,411.51,5,2009,0,,,406387,2950,0,1772,1615640 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file correct? The name says that the reimbursements has a comma, but the content has a value with a point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to reimbursements-with-decimal-point. Thank you for pointing out this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many many thanks for that @viniciusartur! I added minor comments, what do you think about them? Overall it's really good, but I would like you to consider some minor tweaks to make the code even clearer ; )
@@ -0,0 +1,2 @@ | |||
congressperson_name,congressperson_id,congressperson_document,term,state,party,term_id,subquota_number,subquota_description,subquota_group_id,subquota_group_description,supplier,cnpj_cpf,document_number,document_type,issue_date,document_value,remark_value,net_value,month,year,installment,passenger,leg_of_the_trip,batch_number,reimbursement_number,reimbursement_value,applicant_id,document_id | |||
ABELARDO CAMARINHA,141463,329,2011,SP,PSB,54,1,Maintenance of office supporting parliamentary activity,0,,TIM CELULAR S/A,04206050005140,20272-AB,0,2009-05-19 00:00:00,411.51,0,411.51,5,2009,0,,,406387,2950,0,1772,1615640 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should have .csv
as a extension in this filename, right? Also can we have equivalent file names?
For example, one is …with-comma
and the other …with-decimal-point
. What about:
Ano-with-comma.csv
reimbursements-with-point.csv
Or
Ano-with-comma-as-decimal-separator.csv
reimbursements-with-point-as-decimal-separator.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the first option. I'm working on it.
@@ -67,6 +68,15 @@ def test_clean_2017_reimbursements(self): | |||
with self.subTest(): | |||
assert(subquota in all_subquotas) | |||
|
|||
def test_translate_csv_with_reimbursement_with_net_value_with_comma(self): | |||
csv_with_comma = os.path.join(self.fixtures_path, 'Ano-with-comma.csv') | |||
with open(os.path.join(self.fixtures_path, 'reimbursements-with-decimal-point'), 'r') as csv_expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we define the path in another line so we keep under the PEP8 suggestion of 80 cols?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. I'm working on it.
expected = csv_expected.read() | ||
|
||
xz_output = Dataset('')._translate_file(csv_with_comma) | ||
output = lzma.open(xz_output).read().decode('utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a context manager here too (with lzma.open(…)
)… or at least to close the lzma file handler after reading it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with that. I'm working on it.
To ensure ubiquitous language, we changed from comma to decimal-comma. |
xz_path = Dataset('')._translate_file(csv_with_decimal_comma) | ||
with lzma.open(xz_path) as xz_file: | ||
output = xz_file.read().decode('utf-8') | ||
assert(output == expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use unittest
to get better messages when test fails? That would be self.assertEqual(output, expected)
Fixing bug on Rosie reported at issue #77
Bug:
Given a CSV with a reimbursement with net_value containing comma (ex: 77,99)
When I translate this CSV using Serenata-Toolbox
Then I have a CSV translated with net_value containing comma
Expected:
Then I have a CSV translated with net_value containing dot
Bug analysis:
Converter responsible for replacing comma with dot was removed on this commit.
Solution:
Using decimal attribute on pandas.read_csv(), as suggested by @cuducos here.
Test added to cover this specific scenario.