# Ex.5 Part-of-Speech Tagging (Context & Suffixs)

A part-of-speech classifier whose feature detector examines the context in which a
word appears in order to determine which part of speech tag should be assigned. In particular, the identity of the previous word is included
as a feature. 

เพิ่มเอา context คำก่อนหน้า features["prev-word"] มาใส่เป็น Features ด้วย เช่น fly ถ้าข้างหน้าเป็น 'a' POS ของ 'fly' เป็น Noun ไม่งั้นเป็น Verb

In [0]:
import nltk
from nltk.corpus import brown

def pos_features(sentence, i):
  features = {"suffix(1)": sentence[i][-1:],
              "suffix(2)": sentence[i][-2:],
              "suffix(3)": sentence[i][-3:]}
  if i == 0:
      features["prev-word"] = "<START>"
  else:
      features["prev-word"] = sentence[i-1]
  return features

brown.sents()[0] -> ['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', ....]

pos_features(brown.sents()[0], 8) -> {'suffix(1)': 'n', 'suffix(2)': 'on', 'suffix(3)': 'ion', 'prev-word': 'an'}

untagged_sent คือ Sentence ที่ไม่มี Tag ที่แต่ละคำ

In [0]:
tagged_sents = brown.tagged_sents(categories='news')
featuresets = []
for tagged_sent in tagged_sents:
    untagged_sent = nltk.tag.untag(tagged_sent)
    for i, (word, tag) in enumerate(tagged_sent):
      featuresets.append((pos_features(untagged_sent, i), tag) )

In [0]:
size = int(len(featuresets) * 0.1)
train_set, test_set = featuresets[size:], featuresets[:size]
classifier = nltk.NaiveBayesClassifier.train(train_set)

In [4]:
print(nltk.classify.accuracy(classifier, test_set))

0.7891596220785678


However, it is unable to learn the generalization that a
word is probably a noun if it follows an adjective, because it doesn't have access to the previous word's part-of-speech tag.

In general, simple
classifiers always treat each input as *independent from all other inputs*. In many contexts, this makes perfect sense. For example, decisions about
whether names tend to be male or female can be made on a case-by-case basis. However, there are often cases, such as part-of-speech tagging, where
we are interested in solving classification problems that are *closely related to one another*.