Skip to content

Conversation

@willdurand
Copy link
Contributor

This PR makes the SGD/CGD data import slightly more robust:

  • it ignores too long aliases (> 100 chars)
  • it skips and logs rows with no name

Copy link
Contributor

@jmaupetit jmaupetit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Just a single logging statement to add.

aliases = filter(
lambda a: len(a) <= 100,
map(str, feature['aliases'].split('|'))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a warning message when we ignore one or more aliases:

splitted_aliases = feature['aliases'].split('|')

aliases = filter(
    lambda a: len(a) <= 100,
    map(str, splitted_aliases)
)

if len(aliases) != len(splitted_aliases):
    logger.warning('Ignored long aliases (100+ chars) in {}'.format(feature['aliases'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call

Copy link
Contributor

@jmaupetit jmaupetit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 99.366% when pulling 4fcf8b6 on long-aliases into 2d52a31 on master.

@willdurand willdurand merged commit 3ca9af3 into master Jan 22, 2018
@willdurand willdurand deleted the long-aliases branch January 22, 2018 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants