Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRFEntityExtractor or DIETClassifier is splitting a entity into multiple if contains punctuation #6795

Closed
ridhimagarg opened this issue Sep 25, 2020 · 4 comments
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@ridhimagarg
Copy link

ridhimagarg commented Sep 25, 2020

Rasa 1.10.14:

Python 3.7.5:

Ubuntu 19.10 :

Issue:

CRFEntityextractor or DIETClassiifer is splitting an entity into multiple words.
I was using RASA 1.10.2 initially and it was working fine but the ConveRTFeaturizer issue reported in RASA 1.10.2 for the POLY-AI model URL changed. Hence I have updated to RASA 1.x latest version i.e, RASA 1.10.14

Now these CRFEntityextractor or DIETClassiifer are not identifying the entities propely.

Error:

5306 Walnut Ave., Building A, Sacramento, CA 95841
{
  "intent": {
    "name": "address",
    "confidence": 0.9996100664138794
  },
  "entities": [
    {
      "entity": "insuredaddress",
      "start": 0,
      "end": 15,
      "confidence_entity": 0.8947956045473241,
      "value": "5306 Walnut Ave",
      "extractor": "CRFEntityExtractor"
    },
    {
      "entity": "insuredaddress",
      "start": 18,
      "end": 28,
      "confidence_entity": 0.9364809066380787,
      "value": "Building A",
      "extractor": "CRFEntityExtractor"
    },
    {
      "entity": "insuredaddress",
      "start": 30,
      "end": 40,
      "confidence_entity": 0.9457472912523117,
      "value": "Sacramento",
      "extractor": "CRFEntityExtractor"
    },
    {
      "entity": "insuredaddress",
      "start": 42,
      "end": 50,
      "confidence_entity": 0.9197204154102323,
      "value": "CA 95841",
      "extractor": "CRFEntityExtractor"
    },
    {
      "entity": "insuredaddress",
      "start": 0,
      "end": 15,
      "value": "5306 Walnut Ave",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "insuredaddress",
      "start": 18,
      "end": 28,
      "value": "Building A",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "insuredaddress",
      "start": 30,
      "end": 40,
      "value": "Sacramento",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "insuredaddress",
      "start": 42,
      "end": 50,
      "value": "CA 95841",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "GPE",
      "value": "Walnut Ave",
      "start": 5,
      "confidence": null,
      "end": 15,
      "extractor": "SpacyEntityExtractor"
    },
    {
      "entity": "ORG",
      "value": "Building A",
      "start": 18,
      "confidence": null,
      "end": 28,
      "extractor": "SpacyEntityExtractor"
    },
    {
      "entity": "GPE",
      "value": "Sacramento",
      "start": 30,
      "confidence": null,
      "end": 40,
      "extractor": "SpacyEntityExtractor"
    },
    {
      "start": 0,
      "end": 4,
      "text": "5306",
      "value": 5306,
      "confidence": 1.0,
      "additional_info": {
        "value": 5306,
        "type": "value"
      },
      "entity": "number",
      "extractor": "DucklingHTTPExtractor"
    },
    {
      "start": 45,
      "end": 50,
      "text": "95841",
      "value": 95841,
      "confidence": 1.0,
      "additional_info": {
        "value": 95841,
        "type": "value"
      },
      "entity": "number",
      "extractor": "DucklingHTTPExtractor"
    }
  ],
  "intent_ranking": [
    {
      "name": "address",
      "confidence": 0.9996100664138794
    },
    {
      "name": "affirm",
      "confidence": 7.791238749632612e-05
    },
  ],
  "response_selector": {
    "default": {
      "response": {
        "name": null,
        "confidence": 0.0
      },
      "ranking": [],
      "full_retrieval_intent": null
    }
  },
  "text": "5306 Walnut Ave., Building A, Sacramento, CA 95841"
}

If I entered without punctuation then it's working -:

5306 Walnut Ave Building A Sacramento CA 95841                
{
  "intent": {
    "name": "address",
    "confidence": 0.9991991519927979
  },
  "entities": [
    {
      "entity": "insuredaddress",
      "start": 0,
      "end": 46,
      "confidence_entity": 0.9889847823938246,
      "value": "5306 Walnut Ave Building A Sacramento CA 95841",
      "extractor": "CRFEntityExtractor"
    },
    {
      "entity": "insuredaddress",
      "start": 0,
      "end": 46,
      "value": "5306 Walnut Ave Building A Sacramento CA 95841",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "FAC",
      "value": "Walnut Ave Building",
      "start": 5,
      "confidence": null,
      "end": 24,
      "extractor": "SpacyEntityExtractor"
    },
    {
      "entity": "GPE",
      "value": "Sacramento",
      "start": 27,
      "confidence": null,
      "end": 37,
      "extractor": "SpacyEntityExtractor"
    },
    {
      "entity": "CARDINAL",
      "value": "95841",
      "start": 41,
      "confidence": null,
      "end": 46,
      "extractor": "SpacyEntityExtractor"
    },
    {
      "start": 0,
      "end": 4,
      "text": "5306",
      "value": 5306,
      "confidence": 1.0,
      "additional_info": {
        "value": 5306,
        "type": "value"
      },
      "entity": "number",
      "extractor": "DucklingHTTPExtractor"
    },
    {
      "start": 41,
      "end": 46,
      "text": "95841",
      "value": 95841,
      "confidence": 1.0,
      "additional_info": {
        "value": 95841,
        "type": "value"
      },
      "entity": "number",
      "extractor": "DucklingHTTPExtractor"
    }
  ],
  "intent_ranking": [
    {
      "name": "address",
      "confidence": 0.9991991519927979
    },
  ],
  "response_selector": {
    "default": {
      "response": {
        "name": null,
        "confidence": 0.0
      },
      "ranking": [],
      "full_retrieval_intent": null
    }
  },
  "text": "5306 Walnut Ave Building A Sacramento CA 95841"
}

Content of configuration file (config.yml) (if relevant):

NLU pipeline

language: en
pipeline:
  - name: SpacyNLP
    model: en_core_web_md
  - name: ConveRTTokenizer
    "model_url": "https://github.com/PolyAI-LDN/polyai-models/releases/download/v1.0/model.tar.gz"
  - name: ConveRTFeaturizer
    "model_url": "https://github.com/PolyAI-LDN/polyai-models/releases/download/v1.0/model.tar.gz"
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CRFEntityExtractor
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: SpacyEntityExtractor
  - name: "DucklingHTTPExtractor"
    url: "http://0.0.0.0:8000"
    locale: "en_GB"
    timezone: "US/Pacific"
    timeout : 3
@ridhimagarg ridhimagarg added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Sep 25, 2020
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @koaning will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@ridhimagarg
Copy link
Author

Thanks, Waiting for the solution.

Required an urgent action on the same. I am using RASA in production and it stuck!

Thanks in advance!

@akelad
Copy link
Contributor

akelad commented Sep 30, 2020

Hi @ridhimagarg, this behaviour is intentional as of PR #6191. We recognise that there might be different use cases like yours though, so I've opened another issue #6852 to address this.

For now, the solution would be to combine the various different entities and fill a slot with them in a custom action. Or you could use a list slot, which will store all the values in a list automatically.

@akelad akelad closed this as completed Sep 30, 2020
@ridhimagarg
Copy link
Author

Thanks @akelad

It would be great if this enhancement is been done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

No branches or pull requests

3 participants