Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper hyphen linked address components (unit-address) splitting #137

Closed
davebulaval opened this issue Jul 4, 2022 · 1 comment
Closed
Assignees
Labels
enhancement New feature or request

Comments

@davebulaval
Copy link
Collaborator

davebulaval commented Jul 4, 2022

Originally posted by @MasseGuillaume in #136 (comment)

I think the parsing for apartments in Canada can be improved:

If you take a look at:
https://www.canadapost-postescanada.ca/cpc/en/support/kb/sending/general-information/how-to-address-mail-and-parcels

Put a hyphen between the unit/suite/apartment number and the street number. Don’t use the # symbol.

address_parser("1-123 Rue Toto Montreal Canada")

obtained:

FormattedParsedAddress<StreetNumber='1-123', StreetName='rue toto', Municipality='montreal', Province='canada'>

expected:

FormattedParsedAddress<StreetNumber='123', StreetName='rue toto', Unit='1' Municipality='montreal', Province='canada'>

NB. libpostal gives the same incorrect result:

docker run -d -p 8080:8080 clicksend/libpostal-rest  
curl -X POST -d '{"query": "1-123 rue toto Montreal Quebec Canada"}' localhost:8080/parser | jq "."
[
  {
    "label": "house_number",
    "value": "1-123"
  },
  {
    "label": "road",
    "value": "rue toto"
  },
  {
    "label": "city",
    "value": "montreal"
  },
  {
    "label": "state",
    "value": "quebec"
  },
  {
    "label": "country",
    "value": "canada"
  }
]
@davebulaval davebulaval self-assigned this Jul 4, 2022
@davebulaval davebulaval added the enhancement New feature or request label Jul 4, 2022
@davebulaval
Copy link
Collaborator Author

Out-of-the-box performances evaluated on a new dataset for these cases yields the following performance.

Model Type Accuracy
FastText 86,50
FaxtTextAtt 87,72
BPEmb 71,85
BPEmbAtt 87,81

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant