Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First name is a known prefix #20

Closed
rlsjr opened this issue Jan 14, 2022 · 8 comments
Closed

First name is a known prefix #20

rlsjr opened this issue Jan 14, 2022 · 8 comments

Comments

@rlsjr
Copy link

rlsjr commented Jan 14, 2022

I greatly appreciate your project. In the data I am using I have a Mr. Del Richards. The last name returns Del Richards as del is listed in prefixes. However, Del is his first name. I am thinking that it could treat the prefix as the first name if there is not a first name, but there would be times no first name could be included. Any thoughts how to parse this one name correctly?

@Kaotic3
Copy link

Kaotic3 commented Jan 17, 2022

I am encountering the same problem.

I tried to go into the base file and amend it to remove reference to the prefix that was actually a name and the file is read-only. My knowledge on using Visual Studio 2022 is a bit limited - and I have no idea if this is done on your end to protect the project or something that Visual Studio is doing?

For the poster above, if you go to your program, hold down CTRL - and then click on HumanName the class variable - it should open up the NameParser.HumanName file and there you will find a list of all the prefixes, titles, suffixes that are being used - "del" is in prefixes. You may be able to amend them there, although I couldn't.

@rlsjr
Copy link
Author

rlsjr commented Jan 17, 2022

Kaotic3, I do not want the prefix removed; it is valid. Removing it would incorrectly impact other names. We need a way to determine when a prefix is a first name instead of a prefix.

@aeshirey
Copy link
Owner

I agree that "del" shouldn't be removed from the prefixes, but handling this case isn't clear-cut. The input "Mr. Del Richards" could be reasonably understood to be Title First Last or Title LastPrefix Last, so I see a few possibilities here:

  • Make prefix parsing case sensitive because surname prefixes are sometimes lowercase (eg, Guido van Rossum). Unfortunately, this isn't always the case (eg, Oscar De La Hoya); plus, making this library suddenly case aware seems like a breaking change.
  • Add logic to specifically handle the case when you have Title + LastPrefix + Last where LastPrefix is a single token. This would correctly handle "Mr. Del Richards" but would also incorrectly parse "Mr. Van Rossum". Also breaking.
  • Detect this case and parse it twice, yielding the alternate in the HumanName.AdditionalName field. This might be a bit complex to handle and would kick the disambiguation to the caller. What might the 'correct' handling of "Mr. Del Richards and Mr. Van Rossum" be? You'd get four names back and you'd have to decide yourself how to handle the results.
  • Add in a ctor argument that dictates how parsing should be handled, eg:
    var del = new HumanName("Mr. Del Richards", Prefer.FirstOverPrefix); // first = "Del", last = "Richards"
    var van = new HumanName("Mr. Van Rossum",   Prefer.PrefixOverFirst); // first = "",    last = "Van Rossum"
  • Expose the configuration lists so the caller can modify them as desired. If you want to remove "del" from the list of Prefixes, that should fix the problem for you, but you'll need to know that's what you want.

The last two options seem the least bad. It still requires the caller disambiguate, but they should know their data better than I do, and I can't think of any better way to handle it.

@rlsjr
Copy link
Author

rlsjr commented Jan 17, 2022

I would not modify the configuration lists. With my data removing the Del prefix would fix the one record but cause two others to be incorrect. I know in my data, I should always have a first name so that a PreferFirstOverPrefix could work. But how would Mr. David M Del Rio be parsed if PreferFirstOverPrefix is selected? I am sure you are a lot more knowledgeable about names than I am.

@aeshirey
Copy link
Owner

The ability to modify configuration now exists (in the source code, not yet in the NuGet package), but you're not obligated to use this.

I'm also adding in a Prefer enum that should do what you want:

public void FirstNameIsPrefix()
{
    // Default behavior
    var parsed_prefix = new HumanName("Mr. Del Richards");
    Assert.AreEqual(parsed_prefix.Title, "Mr.");
    Assert.AreEqual(parsed_prefix.First, "");
    Assert.AreEqual(parsed_prefix.Last, "Del Richards");
    Assert.AreEqual(parsed_prefix.LastPrefixes, "Del");

    // A single prefix should be treated as a first name when no first exists
    var parsed_first = new HumanName("Mr. Del Richards", Prefer.FirstOverPrefix);
    Assert.AreEqual(parsed_first.Title, "Mr.");
    Assert.AreEqual(parsed_first.First, "Del");
    Assert.AreEqual(parsed_first.Last, "Richards");
    Assert.AreEqual(parsed_first.LastPrefixes, "");
}

Does this fit your needs?

@rlsjr
Copy link
Author

rlsjr commented Jan 17, 2022

I think that would fit my requirements. Thank you.

@aeshirey
Copy link
Owner

The new version is available on nuget. You should be able to install with Install-Package NameParserSharp -Version 1.5.0

@rlsjr
Copy link
Author

rlsjr commented Jan 17, 2022

Thank you so much for your help. Update works great and all names are parsed as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants