Enhancement request: Support '... Ph. D.' instead of only '... Ph.D.' #43

rolfhnelson · 2016-03-14T20:44:08Z

In 0.3.12:

HumanName('John Smith, Ph.D.') works as expected, but the common misspelling HumanName('John Smith, Ph. D.'), which incorrectly has a space between Ph. and D., now yields 'Ph. D. John Smith'. Personally I would prefer to go back to 0.3.11's behavior, where it left the misspelled title at the end.

derek73 · 2016-03-14T21:26:48Z

Interesting. Are you sure that it used to work that way? I have tried locally with every version back to v0.3.5 and the result from "John Smith, Ph. D." is always the same as v0.3.12. Am I doing something wrong with my local env or maybe you are mistaken about the previous behavior?

In general I try to avoid having the parser correct mistakes in the input, just because there are so many potential mistakes and correcting one frequently causes other valid input to not work. It's more important that it work correctly for input with no mistakes. But it would be nice if the parser could be useful a useful tool for that because the reality is that these mistakes sometimes exist in the input.

One approach would be to use the preprocess() method to do some regex replacing on the whole string before it is parsed to correct the mistake, whenever you find some variation of "ph. d." replace it with "ph.d." or something. That would be fairly simple, you could do it pretty easily with subclassing HumanName and little understanding of the class' inner workings. But that would actually change the string so what you got back would not equal what you input. Some people don't like that.

Another approach would be to make the parser recognize "Ph d" as a suffix. This would be somewhat difficult because at the moment the first thing the parser does is break up the string on spaces, so "ph" and "d" are in different pieces. Maybe you could do something like with the conjunctions, whenever you find a "ph" by itself connect it to the following piece, i guess only if it's a "d". But it's hard to imagine an agnostic solution that would be helpful for more than just "ph d". Can you think of other similar examples?

I feel like ideally I'd like to have the parser do something to make it easy for each developer to handle correcting the input for their particular use case. Not sure the best way to do that though, partly because I know so little about how people actually use this parser. Suggestions welcome.

rolfhnelson · 2016-03-15T01:02:44Z

Are you sure that it used to work that way?

No. Something about suffix handling changed in the last release but I may have mis-remembered which test case it was that originally caught my attention.

In general I try to avoid having the parser correct mistakes in the input, just because there are so many potential mistakes and correcting one frequently causes other valid input to not work.

That sounds wise.

But it's hard to imagine an agnostic solution that would be helpful for more than just "ph d". Can you think of other similar examples?

Not really, I think Ed.D. is the only other real example. These errors come up occasionally in older book author data. For example, http://clas.caltech.edu/record/418307?ln=en lists a
"Harrison, David, Ph. D" (sic).

derek73 · 2016-03-15T01:19:36Z

The change in the last release with suffix handling is here: fcd7652

It does pertain to the handling of suffixes after a comma. Now the parser will only consider the name to be in the "Firstname Lastname, Suffix" format if the part before the first comma has more than one piece when split on spaces, the assumption being that "Lastname, Suffix" is not an expected/supported format. Does that break something in your data?

rolfhnelson · 2016-03-15T02:20:18Z

Does that break something in your data?

No.

derek73 added the enhancement label Mar 19, 2016

derek73 closed this as completed in 76a2b9e Aug 31, 2018

derek73 added this to the v1.0 milestone Aug 31, 2018

derek73 self-assigned this Aug 31, 2018

derek73 added a commit that referenced this issue Aug 31, 2018

Fix overzealous regex for "Ph. D." (#43)

8c55eb9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement request: Support '... Ph. D.' instead of only '... Ph.D.' #43

Enhancement request: Support '... Ph. D.' instead of only '... Ph.D.' #43

rolfhnelson commented Mar 14, 2016

derek73 commented Mar 14, 2016

rolfhnelson commented Mar 15, 2016

derek73 commented Mar 15, 2016

rolfhnelson commented Mar 15, 2016

Enhancement request: Support '... Ph. D.' instead of only '... Ph.D.' #43

Enhancement request: Support '... Ph. D.' instead of only '... Ph.D.' #43

Comments

rolfhnelson commented Mar 14, 2016

derek73 commented Mar 14, 2016

rolfhnelson commented Mar 15, 2016

derek73 commented Mar 15, 2016

rolfhnelson commented Mar 15, 2016