Skip to content

LA SELVA BEACH, CA causes parser confusion about the *state* #406

@glind-cc

Description

@glind-cc

The Input Address

'360 CAM AL BARRANCO, LA SELVA BEACH, CA, 95076'

(ex. 123 Main St. Chicago, Illinois)

Current Output

ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: 360 CAM AL BARRANCO, LA SELVA BEACH, CA, 95076
PARSED TOKENS: [('360', 'AddressNumber'), ('CAM', 'StreetName'), ('AL', 'StreetNamePostType'), ('BARRANCO,', 'PlaceName'), ('LA', 'StateName'), ('SELVA', 'PlaceName'), ('BEACH,', 'PlaceName'), ('CA,', 'StateName'), ('95076', 'ZipCode')]
UNCERTAIN LABEL: PlaceName

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

Expected Output

360 - AddressNumber
CAM AL BARRANCO - StreetName
??. - StreetNamePostType
LA SELVA BEACH - PlaceName
CA - StateName

Examples

  • Fail: '360 CAM AL BARRANCO, LA SELVA BEACH, CA, 95076'
  • Fail: ''360 CAMINO AL BARRANCO, LA SELVA BEACH, CA, 95076'
  • OK: "360 Camino al Barranco, Selva Beach, CA"

Additional context

This is clearly meant to be "360 Camino al Barranco, La Selva Beach, CA", which is here: https://maps.app.goo.gl/bobVwDJjJ6ke5dRFA.

It's throwing a repeated label error, because it's see the "LA" in La Selva Beach as a state

Editorial

  • features look correct
  • parsing looks correct

I think the problem is the TAGGER.tag(features), which is labeling this as a state.

As an alternative solution, consider giving 'commas' weight while training, or special case 'states' somehow.

Thanks for looking into this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions