#**NLTK NE Chunker**

**Sources**

* [Lifting the Hood on NLTK's NE Chunker](https://mattshomepage.com/articles/2016/May/23/nltk_nec/)

#**Installs & Imports**

In [None]:
import nltk

#**Load dataset (in this case, model trained on dataset)**

In [None]:
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download("maxent_ne_chunker")
nltk.download('words')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


True

In [None]:
pretrained_chunker_model = nltk.data.load("chunkers/maxent_ne_chunker/english_ace_multiclass.pickle")

#**maxEnt_report()**

In [None]:
def maxEnt_report(chunker_model):
  maxEnt = chunker_model._tagger.classifier()
  print("Weight\tfeature_name\tlabel")
  print(maxEnt.show_most_informative_features())

In [None]:
maxEnt_report(pretrained_chunker_model)

Weight	feature_name	label
  10.125 bias==True and label is 'O'
   6.631 suffix3=='day' and label is 'O'
  -6.207 bias==True and label is 'I-GSP'
   5.628 prevtag=='O' and label is 'O'
  -4.740 shape=='upcase' and label is 'O'
   4.106 shape+prevtag=='<function shape at 0x8bde0d4>+O' and label is 'O'
  -3.994 shape=='mixedcase' and label is 'O'
   3.992 pos+prevtag=='NNP+B-PERSON' and label is 'I-PERSON'
   3.890 prevtag=='I-ORGANIZATION' and label is 'I-ORGANIZATION'
   3.879 shape+prevtag=='<function shape at 0x8bde0d4>+I-ORGANIZATION' and label is 'I-ORGANIZATION'
None


#**ne_report()**

In [None]:
def ne_report(sentence, chunker_model, report_all=False):
  maxEnt = chunker_model._tagger.classifier()
  tokens = nltk.word_tokenize(sentence)
  tagged_tokens = nltk.pos_tag(tokens) # returns [(word, pos_tag)...]
  previous_tags = []
  for i in range(len(tagged_tokens)):
    featureset = chunker_model._tagger.feature_detector(tagged_tokens, i, previous_tags)
    tag = chunker_model._tagger.choose_tag(tagged_tokens, i, previous_tags)
    if tag != 'O' or report_all:
      print("Explation for why:", tagged_tokens[i][0], " tagged as", tag, ":")
      maxEnt.explain(featureset)
      print(tag)
    previous_tags.append(tag)

In [None]:
ne_report("I am very excited about the next generation of Apple products.", pretrained_chunker_model)

Explation for why: Apple  tagged as B-GPE :
  Feature                                            B-GPE       O B-ORGAN   B-GSP
  --------------------------------------------------------------------------------
  prevtag=='O' (1)                                   3.767
  shape=='upcase' (1)                                2.701
  pos+prevtag=='NNP+O' (1)                           2.254
  en-wordlist==False (1)                             2.095
  label is 'B-GPE' (1)                              -2.005
  bias==True (1)                                    -1.975
  prevword=='of' (1)                                 0.742
  pos=='NNP' (1)                                     0.681
  nextpos=='nns' (1)                                 0.661
  prevpos=='IN' (1)                                  0.311
  wordlen==5 (1)                                     0.113
  nextword=='products' (1)                           0.060
  bias==True (1)                                            10.125
  prevtag=='O' 