Skip to content

Name Entity Recognition

code2k13 edited this page Jul 14, 2021 · 2 revisions

The 'entity.py' scripts performs NER (Named Entity Recognition) using Spacy on nlphose compliant JSON records. This script does not take any command line parameters. It adds an attribute called 'entities' to the JSON record which contains list of all named entities and their types.

Given below is sample usage of the script

./file2json.py -n 3 data/1342-0.txt |\
./entity.py

This produces output similar to:

{
  "file_name": "1342-0.txt",
  "id": "6a5fe972-e2e6-11eb-9efa-42b45ace4426",
  "text": "Wickham were returned, and to lament over his absence from the Netherfield ball. He joined them on their entering the town, and attended them to their aunt’s where his regret and vexation, and the concern of everybody, was well talked over. To Elizabeth, however, he voluntarily acknowledged that the necessity of his absence _had_ been self-imposed.",
  "entities": [
    {
      "label": "PERSON",
      "entity": "Wickham"
    },
    {
      "label": "ORG",
      "entity": "Netherfield"
    },
    {
      "label": "PERSON",
      "entity": "Elizabeth"
    }
  ]
}

The 'label' field indicates the type of the named entity.