Skip to content
This repository has been archived by the owner on Sep 5, 2019. It is now read-only.

Wabstic import #158

Merged
merged 23 commits into from
Aug 9, 2019
Merged

Wabstic import #158

merged 23 commits into from
Aug 9, 2019

Conversation

gh-PonyM
Copy link
Contributor

@gh-PonyM gh-PonyM commented Aug 8, 2019

  • Improved errors when parsing major and proporz elections.
  • Switching expected headers in wabstic import scripts from gewahlt to gewaehlt. Fixed failing import of majorz election test dataset that raised "Missing column: gewahlt".
  • Import of new majorz and proporz test data was successfull

@gh-PonyM gh-PonyM requested a review from href August 8, 2019 15:43
@coveralls
Copy link

coveralls commented Aug 8, 2019

Coverage Status

Coverage decreased (-0.04%) to 94.844% when pulling e5a4fcd on wabstic_import into d21c358 on master.

Copy link
Contributor

@href href left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, aside from a few small things. You certainly want to avoid consuming all the CSV lines however - that's a blocker for me.

I think coercing None to 0 makes sense, as CandiateResult.votes wants it that way. When in doubt we probably want to stick with the existing model. A lot more thinking goes into building up the model than adding features later, so I guess we trust that it makes sense as is 😉

def line_is_relevant(line, number, district=None):
if district:
return line.sortwahlkreis == district and line.sortgeschaeft == number
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the else, since the if above always returns:

if district:
    return line.sortwahlkreis == district and line.sortgeschaeft == number

return line.sortgeschaeft == number

:param none_be_zero: raises ValueError if line.col is None
:return: integer value of line.col
"""
assert hasattr(line, col), 'Check done in load_csv'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 'Check done in load_csv' mean? Could you elaborate here?

If you need longer texts, you can also write assertions like this:

assert hasattr(line, col), f"""
     {col} was not found on {line} - this should have failed in load_csv
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to drop this line, since in load_csv in common.py, checks if the expected columns are there is performed already. The comment was a remainder of that.

return line.sortgeschaeft == number


def validate_integer(line, col, none_be_zero=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say non_be_zero should be named treat_none_as_zero. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better like this, I also added a default kwarg to return that default and changed the lines where the votes are received for the candidate.

entity_id = int(line.bfsnrgemeinde or 0)
def get_entity_id(line):
col = 'bfsnrgemeinde'
# try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out code should not be committed, unless it is meant to be activated in the near future (for example, if a Puppet deployment has been prepared and tested, but not yet rolled out).

@@ -79,6 +81,16 @@ def import_election_wabstic_majorz(
entities = principal.entities[election.date.year]
election_id = election.id

def has_no_lines(lines, filename):
if not list(lines):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You consume the whole list of lines, to check if there's at least one line. That seems wasteful. If lines is not a generator, you can just check for by running if lines.

If lines is a generator, don't have to consume it to check for it. You can do either this:

for line in lines:
     return False  # has lines

return True # has no lines

Or by getting the next value of the generator with a fallback:

if next(lines, None):
    return True # has lines

return False  # has no lines
     

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's definitively a blocker...

@@ -179,6 +185,16 @@ def import_election_wabstic_proporz(
entities = principal.entities[election.date.year]
election_id = election.id

def has_no_lines(lines, filename):
if not list(lines):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this should not traverse all lines.

Lukas added 5 commits August 9, 2019 18:16
- unclear which files could be empty and must be importable
- does not fit script flow causing errors anyway and aborting import
@gh-PonyM gh-PonyM merged commit 7991878 into master Aug 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants