Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nickname surrounded by single quotes gets taken as the middle name. #74

Closed
xchek opened this issue Jun 1, 2018 · 3 comments
Closed
Assignees
Milestone

Comments

@xchek
Copy link

xchek commented Jun 1, 2018

Is it not the general notion to surround a nickname in quotes (and put it before the surname)? Or is this possibly a limit of the nameparser?

Here are name examples (not real people):

Chatham 'Chip' Frankfort
Helen 'Ellie' Brown
@derek73
Copy link
Owner

derek73 commented Jun 1, 2018

Currently the parser recognizes nicknames that are surrounded in parenthesis or double quotes. This is controlled by a regex, defined here:

("nickname", re.compile(r'\s*?[\("](.+?)[\)"]', re.U)),

And the parse_nicknames() method runs it:

def parse_nicknames(self):
"""
The content of parenthesis or double quotes in the name will
be treated as nicknames. This happens before any other
processing of the name.
"""
# https://code.google.com/p/python-nameparser/issues/detail?id=33
re_nickname = self.C.regexes.nickname
if re_nickname.search(self._full_name):
self.nickname_list = re_nickname.findall(self._full_name)
self._full_name = re_nickname.sub('', self._full_name)

You could update the regex to look for single quotes by replace that regex, something like:

CONSTANTS.regexes.nickname = re.compile(r'\s*?[\(\'"](.+?)[\'"]', re.U))

I think the only reason I didn't include single quotes is because I wasn't sure how to write the regex so that the second match only matches the same character that it found in the first match. Also there's some weird edge case names like "ab'ad al am'an" that I wasn't sure how to weed out. My regex chops are not very strong. If you come up with a better regex, I'd be happy to put it in the library.

@boxabirds
Copy link

boxabirds commented Aug 29, 2018

What about using groups? https://gist.github.com/bpeterso2000/11277541

@derek73 derek73 self-assigned this Aug 30, 2018
@derek73 derek73 modified the milestones: v0.5.8, v0.5.9, v1.0 Aug 30, 2018
@derek73
Copy link
Owner

derek73 commented Aug 30, 2018

Thanks @boxabirds for the tip. That put me on the right track. Once I dug into it I realized that the 3 different ways to indicate a nickname had slightly different rules around them. Single quotes cannot contains white space, but double quotes and parenthesis can. Double quote matches the same character twice but parenthesis does not. So I ended up splitting them up into 3 different regexes and not needing to use groups in that way after all. I appreciate the prodding though because that's the way it should work and now it will be fixed in v1.0.

@derek73 derek73 closed this as completed Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants