Tutorial Pekan Keenam, Constituency Parser.

sumber: 

http://www.nltk.org/howto/parse.html

http://www.nltk.org/howto/generate.html

https://markgw.github.io/uh-nlp19/day4/


Import library yang dibutuhkan

In [1]:
import nltk
from nltk.parse.generate import generate
from nltk.parse import ViterbiParser

Contoh pendefinisian CFG

In [2]:
grammar_1 = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "Melihat" | "ate" | "walked"
  NP -> "Amir" | "Mona" | "Devi" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)

Pendefinisian sebuah contoh kalimat, perhatikan bahwa kalimat ini mengandung ambiguitas

In [3]:
sent_1 = 'Amir Melihat a man with a telescope'.split()

Contoh parsing dengan parser Top Down Chart.

Perhatikan parse tree yang dihasilkan > 1

In [4]:
td_parser = nltk.parse.TopDownChartParser(grammar_1)


for tree in td_parser.parse(sent_1):
    print(tree)

(S
  (NP Amir)
  (VP
    (V Melihat)
    (NP (Det a) (N man) (PP (P with) (NP (Det a) (N telescope))))))
(S
  (NP Amir)
  (VP
    (V Melihat)
    (NP (Det a) (N man))
    (PP (P with) (NP (Det a) (N telescope)))))


Contoh parsing dengan parser Bottom Up Chart.

Perhatikan parse tree yang dihasilkan > 1

In [5]:
bu_parser = nltk.parse.BottomUpChartParser(grammar_1)


for tree in bu_parser.parse(sent_1):
    print(tree)


(S
  (NP Amir)
  (VP
    (V Melihat)
    (NP (Det a) (N man))
    (PP (P with) (NP (Det a) (N telescope)))))
(S
  (NP Amir)
  (VP
    (V Melihat)
    (NP (Det a) (N man) (PP (P with) (NP (Det a) (N telescope))))))


Contoh parsing dengan Shift Reduce parser

Perhatikan proses shift-reduce hingga dicapai simbol Start (S)

In [6]:
sr_parser = nltk.ShiftReduceParser(grammar_1, trace=2)

for tree in sr_parser.parse(sent_1):
    print(tree)

Parsing 'Amir Melihat a man with a telescope'
    [ * Amir Melihat a man with a telescope]
  S [ 'Amir' * Melihat a man with a telescope]
  R [ NP * Melihat a man with a telescope]
  S [ NP 'Melihat' * a man with a telescope]
  R [ NP V * a man with a telescope]
  S [ NP V 'a' * man with a telescope]
  R [ NP V Det * man with a telescope]
  S [ NP V Det 'man' * with a telescope]
  R [ NP V Det N * with a telescope]
  R [ NP V NP * with a telescope]
  R [ NP VP * with a telescope]
  R [ S * with a telescope]
  S [ S 'with' * a telescope]
  R [ S P * a telescope]
  S [ S P 'a' * telescope]
  R [ S P Det * telescope]
  S [ S P Det 'telescope' * ]
  R [ S P Det N * ]
  R [ S P NP * ]
  R [ S PP * ]


Cek apakah Grammar memenuhi syarat CNF.

Latihan: coba ubah Grammar tersebut menjadi CNF!

In [7]:
 print(grammar_1.is_chomsky_normal_form())

False


Cek coverage Grammar

Perhatikan bahwa Grammar grammar_1 belum mengandung kata 'I'

In [8]:
sent_2 = 'I Melihat a man with a telescope'.split()

for s in sent_2:
    grammar_1.check_coverage(sent_2)

ValueError: Grammar does not cover some of the input words: "'I'".

Tambahkan aturan produksi / production rule, sehingga kata 'I" tercakup dalam Grammar

In [None]:
grammar_1 = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "Melihat" | "ate" | "walked"
  NP -> "Amir" | "Mona" | "Devi" | "I" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)
print(grammar_1)

Cek apakah kata kalimat sent_2 sudah dapat diproses oleh Grammar grammar_1

In [None]:
for s in sent_2:
    print('cek kata:',s)
    grammar_1.check_coverage(sent_2)

Generate sentence sesuai Grammar grammar_1

In [None]:
for sentence in generate(grammar_1, n=10):
    print(' '.join(sentence))

Coba generate Grammar dari file constituency treebank

Contoh yang digunakan di sini adalah 5 kalimat awal dari Constituency Treebank Bahasa Indonesia, kethu https://github.com/ialfina/kethu

Upload file **kethu_example.mrg** ke Google Drive anda!

Perhatikan bahwa Anda perlu **menyesuaikan path lokasi file .mrg**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
from nltk.corpus import BracketParseCorpusReader

ptb = BracketParseCorpusReader(r"/content/drive/My Drive/TU/PERKULIAHAN/NLP", r".*/*\.mrg")

print(ptb)
print(ptb.sents())
print(ptb.parsed_sents())

Induksi PCFG (Probabilistic Context Free Grammar) dari constituency Treebank

In [None]:
from nltk import Nonterminal, nonterminals, Production, PCFG, induce_pcfg

S = Nonterminal('S')

productions = []
for t in ptb.parsed_sents():
    productions += t.productions()
grammar_3 = induce_pcfg(S, productions)
print(grammar_3)

Coba tes parse sebuah kalimat dengan grammar hasil induksi

In [None]:
sent_3 = 'ribuan monyet amankan pesta'.split()
# contoh menggunakan bottom-up parser
bu_parser = nltk.parse.BottomUpChartParser(grammar_3)

for tree in bu_parser.parse(sent_3):
    print(tree)

Tes parsing dengan Viterbi Parser, yang akan mengembalikan 1 pohon parse dengan probability total paling tinggi

In [None]:
from nltk.parse import ViterbiParser

sent_3 = 'ribuan monyet amankan pesta'.split()
# contoh menggunakan bottom-up parser
parser = ViterbiParser(grammar_3, trace=2)
for t in parser.parse(sent_3):
    t.pretty_print()