Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misidentifying corporations as persons #15

stevevance opened this issue Apr 1, 2015 · 3 comments


Copy link

commented Apr 1, 2015

I parsed a list of the top 100 Cook County property tax bill recipients (by number of PINs they receive bills for).

All persons – by my review – were identified correctly as Person, but the following 8 names were mistakenly identified as persons:

  • 235 VAN BUREN
  • HUD
  • NULL

This comment has been minimized.

Copy link

commented Apr 1, 2015

hey @stevevance - thx for trying it out!

I'll add BAFCO & HOMEWERKS-LAMONT to the training data as corporations, but for the other 6 strings, I'm not sure what the 'true' labels would even be. thoughts?

this library can parse name/household/corporation input strings, but doesn't verify whether a string is a name/household/corporation vs none of the three.


This comment has been minimized.

Copy link

commented Apr 1, 2015

@cathydeng I think "235 VAN BUREN" should be ruled out as a Person name because it has a number in it. The others, I don't know. Maybe "null" should be a reserved word?


This comment has been minimized.

Copy link

commented Apr 1, 2015

  • HUD is a corporation
  • Scenic Tree is probably a corporation

The scope of probablepeople is to parse names, not identify names. So

We won't handle

  • Null, Not a name
  • Taxpayer, Not a name
  • 235 Van Buren (NB. there are a lot of businesses that are like "235 Van Buren, LLC")
@cathydeng cathydeng closed this in a129caa Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
3 participants
You can’t perform that action at this time.