Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Unable to tag this string because more than one area of the string has the same label #180

Open
rsingh2083 opened this issue May 12, 2017 · 8 comments

Comments

@rsingh2083
Copy link

While tagging this

usaddress.tag('Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States')

Im getting this error : -

---------------------------------------------------------------------------
RepeatedLabelError                        Traceback (most recent call last)
<ipython-input-41-410055ac0cac> in <module>()
----> 1 usaddress.tag('Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States')

C:\Users\Rahul\Anaconda2\lib\site-packages\usaddress\__init__.pyc in tag(address_string, tag_mapping)
    176         else:
    177             raise RepeatedLabelError(address_string, parse(address_string),
--> 178                                      label)
    179 
    180         last_label = label

RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  Mr. Robbie Thomson,Cal. Hosp 2,Street 11, Block H,Jersey, New Jersey 121889,United States
PARSED TOKENS:    [(u'Mr.', 'Recipient'), (u'Robbie', 'Recipient'), (u'Thomson,', 'Recipient'), (u'Cal.', 'Recipient'), (u'Hosp', 'Recipient'), (u'2,', 'AddressNumber'), (u'Street', 'StreetNamePreType'), (u'11,', 'StreetName'), (u'Block', 'Recipient'), (u'H,', 'Recipient'), (u'Jersey,', 'Recipient'), (u'New', 'Recipient'), (u'Jersey', 'Recipient'), (u'121889,', 'AddressNumber'), (u'United', 'StreetName'), (u'States', 'StreetName')]
UNCERTAIN LABEL:  Recipient

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
@jeancochrane
Copy link
Contributor

Hey @rsingh2083,

Thanks for filing this! That's a real doozy of an address. I haven't been able to figure out what it's referring to.

If you can confirm that this is a valid address pattern, we'd be happy to bring it in as training data. We'll need 4-5 more examples of the pattern to be able to train the model reliably.

@gl-ronak
Copy link

I get similar error for this address : 9234 N Loop 1604 W San Antonio TX 78249

@jeancochrane
Copy link
Contributor

Hey @gl-ronak,

Can you tell me how you were expecting that address to be parsed? In particular, what does the second set of numerics (1604) refer to?

If you can find 3-4 more examples of this pattern, we'd be glad to bring it in as training data.

@NoahCardoza
Copy link

NoahCardoza commented Sep 29, 2020

I just experienced a similar issue.

usaddress.RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  1407 7 Ave NW, Calgary, AB T2N 0Z3, Canada
PARSED TOKENS:    [('1407', 'AddressNumber'), ('7', 'StreetName'), ('Ave', 'StreetNamePostType'), ('NW,', 'StreetNamePostDirectional'), ('Calgary,', 'PlaceName'), ('AB', 'StateName'), ('T2N', 'OccupancyIdentifier'), ('0Z3,', 'OccupancyIdentifier'), ('Canada', 'PlaceName')]
UNCERTAIN LABEL:  PlaceName

I'm using this library to automate the parsing of data from Google Maps to input into a SF db of organizations we work with. I'm I think I see where the error occurred , Calgary,, however it is a Canadian address so that could be normal?

@jeancochrane
Copy link
Contributor

@NoahCardoza I think in this case there are actually two things going on:

  1. The Canadian postal code format is pretty different from zip codes so the postal code is getting tagged as OccupancyIdentifier, which is probably throwing off the tagging of Canada and causing it to get tagged as a repeated PlaceName
  2. We don't support non-US countries, so there's no real valid tag for the Canada string anyway

I was able to get a slightly more sensible parse by removing Canada from the end of the string:

>>> usaddress.tag('1407 7 Ave NW, Calgary, AB T2N 0Z3')
(OrderedDict([('AddressNumber', '1407'), ('StreetName', '7'), ('StreetNamePostType', 'Ave'), ('StreetNamePostDirectional', 'NW'), ('PlaceName', 'Calgary'), ('StateName', 'AB T2N'), ('ZipCode', '0Z3')]), 'Street Address')

@NoahCardoza
Copy link

Ah, that should probably be enough. We don't have many organizations in CA, however, what are your thoughts on #254? I'm assuming you might not be merging it seeing as the name of this project is usaddress?

@jeancochrane
Copy link
Contributor

I don't expect we'll support Canadian addresses in the near future, but if you'd like to support them you might try training your own model using the supplemental training data in #254.

@LeandroLosaria
Copy link

Hello,

I just encountered an error

ORIGINAL STRING: Bronx, New York City
PARSED TOKENS: [('Bronx,', 'PlaceName'), ('New', 'StateName'), ('York', 'PlaceName'), ('City', 'PlaceName')]
UNCERTAIN LABEL: PlaceName


Seems to be a valid place
https://www.britannica.com/place/Bronx-borough-New-York-City

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants