# Simple Malay Grammar

Author: Benjamin TAN

I will use the techniques taught in this course to implement a grammar parser of the standardized Malay language as spoken in Singapore and Malaysia. 


## Problem

### Introduction

The Malay language (*Bahasa Melayu* in Malay) is an Austronesian language officially spoken in Indonesia, Brunei, Malaysia and Singapore. In Malaysia, it is known as *Bahasa Malaysia* (Malaysian language) and in Indonesia, it is known as *Bahasa Indonesia* (Indonesian Language). It is also unofficially spoken in East Timor and some regions of Thailand. Worldwide, counting all speakers of Malay and Indonesian, the language is spoken by about 290 million speakers. [1]

Malay is officially written using the Latin script, known as *Rumi* in Brunei, Malaysia and Singapore, or *Latin* in Indonesia. An Arabic script, known as *Jawi*, exists and is only officially used in Brunei. [2] As there are no syntactical differences between the two scripts, for the purpose of simplicity, this project will only be analyzing the Malay language written in using the Latin script.

### Aim

I will be implementing a simple grammar parser of the Malay language using symbolic NLP techniques taught in the course. 

As there are differences across dialects, this project will be based on the standard Malay of Malaysia and Singapore.


## Relevant Studies

In [72]:
from nltk import CFG, ChartParser

In [73]:
lab1_grammar = CFG.fromstring("""
# Grammar
S -> NP VP
NP -> Det N | NP PP
VP -> V | V NP | V PP | V NP PP | V PP PP
PP -> P NP

# Lexicon
NP -> 'kirk'
Det -> 'the' | 'my' | 'her' | 'his' | 'a' | 'some'
N -> 'dog' | 'daughter' | 'son' | 'sister' | 'aunt' | 'neighbour' | 'cousin'
V -> 'grumbles' | 'likes' | 'gives' | 'talks' | 'annoys' | 'hates' | 'cries'
P -> 'of' | 'to' | 'about'
""")
print(lab1_grammar)

Grammar with 33 productions (start state = S)
    S -> NP VP
    NP -> Det N
    NP -> NP PP
    VP -> V
    VP -> V NP
    VP -> V PP
    VP -> V NP PP
    VP -> V PP PP
    PP -> P NP
    NP -> 'kirk'
    Det -> 'the'
    Det -> 'my'
    Det -> 'her'
    Det -> 'his'
    Det -> 'a'
    Det -> 'some'
    N -> 'dog'
    N -> 'daughter'
    N -> 'son'
    N -> 'sister'
    N -> 'aunt'
    N -> 'neighbour'
    N -> 'cousin'
    V -> 'grumbles'
    V -> 'likes'
    V -> 'gives'
    V -> 'talks'
    V -> 'annoys'
    V -> 'hates'
    V -> 'cries'
    P -> 'of'
    P -> 'to'
    P -> 'about'


In [74]:
sentence = 'the cousin talks to the neighbour of her sister'.split()
print(sentence)

parser = ChartParser(lab1_grammar)
out = parser.parse(sentence)
for tree in out:
    print(tree)

['the', 'cousin', 'talks', 'to', 'the', 'neighbour', 'of', 'her', 'sister']
(S
  (NP (Det the) (N cousin))
  (VP
    (V talks)
    (PP (P to) (NP (Det the) (N neighbour)))
    (PP (P of) (NP (Det her) (N sister)))))
(S
  (NP (Det the) (N cousin))
  (VP
    (V talks)
    (PP
      (P to)
      (NP
        (NP (Det the) (N neighbour))
        (PP (P of) (NP (Det her) (N sister)))))))


In [75]:
def parse(grammar, string):
    # Takes a grammar and a string and parses the string.
    # Displays the grammar tree of the string
    print(string)
    sentence = string.split()
    parser = ChartParser(grammar)
    out = parser.parse(sentence)
    for tree in out:
        print(tree)

In [76]:
# Example
parse(lab1_grammar, 'my dog talks')

my dog talks
(S (NP (Det my) (N dog)) (VP (V talks)))


## Claim

### *Pola*

*Pola*, in Malay, refers to the pattern of grammar in Malay sentences. Each *pola* is a formula to make a basic grammatical sentence in Malay.

Example:   
***Pola:*** actor + verb  
**Malay Sentence:** "Saya makan."  
**Translation:** "I eat."

Here are some of the basic *pola* of Malay grammar:

* actor + verb
* actor + verb + complement
* verb + complement

From this, we will derive a set of basic rules of Malay grammar one *pola* at a time.

#### actor + verb

We start off with a set of basic grammar:

```
S -> NP VP  
NP -> N  
VP -> V  
```

We then add a set of common pronouns in Malay:

* *saya* - I, me (formal)
* *aku* - I, me (informal)
* *kita* - we (includes listener)
* *kami* - we (excludes listener)
* *anda* - you (formal)
* *awak* - you (informal)
* *dia* - he, she, him, her
* *mereka* - they, them

We also include a set of basic verbs:

* *ada* - to have
* *makan* - to eat
* *minum* - to drink
* *tidur* - to sleep
* *duduk* - to sit
* *sayang* - to love
* *bercakap* - to talk
* *dengar* - to listen
* *menyanyi* - to sing

In [77]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> NP VP
NP -> N
VP -> V

# Lexicon
N -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka'
V -> 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi'
""")
print(malay_grammar)

Grammar with 20 productions (start state = S)
    S -> NP VP
    NP -> N
    VP -> V
    N -> 'saya'
    N -> 'aku'
    N -> 'kita'
    N -> 'kami'
    N -> 'anda'
    N -> 'awak'
    N -> 'dia'
    N -> 'mereka'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'


In [78]:
# I eat
parse(malay_grammar, 'saya makan')

# we drink
parse(malay_grammar, 'kita minum')

# we sleep
parse(malay_grammar, 'kami tidur')

# you sit
parse(malay_grammar, 'awak duduk')

# he loves
parse(malay_grammar, 'dia sayang')

# they talk
parse(malay_grammar, 'mereka bercakap')

# I listen
parse(malay_grammar, 'aku dengar')

# you sing
parse(malay_grammar, 'anda menyanyi')

saya makan
(S (NP (N saya)) (VP (V makan)))
kita minum
(S (NP (N kita)) (VP (V minum)))
kami tidur
(S (NP (N kami)) (VP (V tidur)))
awak duduk
(S (NP (N awak)) (VP (V duduk)))
dia sayang
(S (NP (N dia)) (VP (V sayang)))
mereka bercakap
(S (NP (N mereka)) (VP (V bercakap)))
aku dengar
(S (NP (N aku)) (VP (V dengar)))
anda menyanyi
(S (NP (N anda)) (VP (V menyanyi)))


#### actor + verb + complement

We extend the grammar with:

```
VP -> VP NP
```

We will add some nouns in order to add a complement:

* *nasi* - rice
* *mee* - noodles
* *roti* - bread
* *air* - water

In [79]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> NP VP
NP -> N
VP -> V | V NP

# Lexicon
N -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka' | 'nasi' | 'mee' | 'roti' | 'air'
V -> 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi'
""")
print(malay_grammar)

Grammar with 25 productions (start state = S)
    S -> NP VP
    NP -> N
    VP -> V
    VP -> VP NP
    N -> 'saya'
    N -> 'aku'
    N -> 'kita'
    N -> 'kami'
    N -> 'anda'
    N -> 'awak'
    N -> 'dia'
    N -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'


In [80]:
# I eat rice
parse(malay_grammar, 'saya makan nasi')

# we drink water
parse(malay_grammar, 'kita minum air')

# we eat bread
parse(malay_grammar, 'kami makan roti')

# you eat noodles
parse(malay_grammar, 'awak makan mee')

# he loves me
parse(malay_grammar, 'dia sayang saya')

# I love you
parse(malay_grammar, 'saya sayang awak')

saya makan nasi
(S (NP (N saya)) (VP (VP (V makan)) (NP (N nasi))))
kita minum air
(S (NP (N kita)) (VP (VP (V minum)) (NP (N air))))
kami makan roti
(S (NP (N kami)) (VP (VP (V makan)) (NP (N roti))))
awak makan mee
(S (NP (N awak)) (VP (VP (V makan)) (NP (N mee))))
dia sayang saya
(S (NP (N dia)) (VP (VP (V sayang)) (NP (N saya))))
saya sayang awak
(S (NP (N saya)) (VP (VP (V sayang)) (NP (N awak))))


#### verb + complement

It is possible to form a sentence without the actor.

We add the following rule:

```
S -> VP
```

We also add some more verbs to use this rule:

* *dengarkan* - to listen (to someone)

In [81]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> NP VP | VP
NP -> N
VP -> V | V NP

# Lexicon
N -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka' | 'nasi' | 'mee' | 'roti' | 'air'
V -> 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan'
""")
print(malay_grammar)

Grammar with 27 productions (start state = S)
    S -> NP VP
    S -> VP
    NP -> N
    VP -> V
    VP -> VP NP
    N -> 'saya'
    N -> 'aku'
    N -> 'kita'
    N -> 'kami'
    N -> 'anda'
    N -> 'awak'
    N -> 'dia'
    N -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'


In [82]:
# eat rice
parse(malay_grammar, 'makan nasi')

# drink water
parse(malay_grammar, 'minum air')

# listen to me
parse(malay_grammar, 'dengarkan saya')

makan nasi
(S (VP (VP (V makan)) (NP (N nasi))))
minum air
(S (VP (VP (V minum)) (NP (N air))))
dengarkan saya
(S (VP (VP (V dengarkan)) (NP (N saya))))


#### 'this is...'

The form 'this is...' is expressed by placing *itu* after the noun phrase. Hence, *itu* acts as the verb 'to be' to express 'this is...'.

Example:

* *nasi itu* - this is rice
* *air itu* - this is water


In [83]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> NP VP | VP
NP -> N
VP -> V | V NP

# Lexicon
N -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka' | 'nasi' | 'mee' | 'roti' | 'air'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan'
""")
print(malay_grammar)

Grammar with 28 productions (start state = S)
    S -> NP VP
    S -> VP
    NP -> N
    VP -> V
    VP -> VP NP
    N -> 'saya'
    N -> 'aku'
    N -> 'kita'
    N -> 'kami'
    N -> 'anda'
    N -> 'awak'
    N -> 'dia'
    N -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'


In [84]:
# this is rice
parse(malay_grammar, 'nasi itu')

# this is water
parse(malay_grammar, 'air itu')

nasi itu
(S (NP (N nasi)) (VP (V itu)))
air itu
(S (NP (N air)) (VP (V itu)))


### Nouns

We can explore different constructions of nouns in Malay.

#### Adjectives

https://ia802702.us.archive.org/10/items/practicalmalaygr00sheliala/practicalmalaygr00sheliala.pdf

In Malay, adjectives go after the noun.

* *rumah* - house
* *kuda* - horse
* *orang* - man


* *besar* - big
* *kuat* - strong
* *baik* - good

Examples:

* *rumah besar* - a big house
* *kuda kuat besar* - a big strong horse
* *orang baik* - a good man

To add these rules, we will add the following grammar rules:

```
NP -> NP AP  
AP -> AP A  
AP -> A  
```

In [85]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> VP | NP VP
NP -> NP AP | N
VP -> V | V NP
AP -> AP A | A

# Lexicon
N -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka' | 'nasi' | 'mee' | 'roti' | 'air' | 'rumah' | 'kuda' | 'orang'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan'
A -> 'besar' | 'kuat' | 'baik'
""")
print(malay_grammar)

Grammar with 37 productions (start state = S)
    S -> VP
    S -> NP VP
    NP -> NP AP
    NP -> N
    VP -> V
    VP -> VP NP
    AP -> AP A
    AP -> A
    N -> 'saya'
    N -> 'aku'
    N -> 'kita'
    N -> 'kami'
    N -> 'anda'
    N -> 'awak'
    N -> 'dia'
    N -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    N -> 'rumah'
    N -> 'kuda'
    N -> 'orang'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'
    A -> 'besar'
    A -> 'kuat'
    A -> 'baik'


In [86]:
# I have a big house
parse(malay_grammar, 'saya ada rumah besar')

# You have a big strong horse
parse(malay_grammar, 'anda ada kuda kuat besar')

# she has a good man
parse(malay_grammar, 'dia ada orang baik')

saya ada rumah besar
(S
  (NP (N saya))
  (VP (VP (V ada)) (NP (NP (N rumah)) (AP (A besar)))))
anda ada kuda kuat besar
(S
  (NP (N anda))
  (VP (VP (V ada)) (NP (NP (N kuda)) (AP (AP (A kuat)) (A besar)))))
(S
  (NP (N anda))
  (VP
    (VP (V ada))
    (NP (NP (NP (N kuda)) (AP (A kuat))) (AP (A besar)))))
dia ada orang baik
(S (NP (N dia)) (VP (VP (V ada)) (NP (NP (N orang)) (AP (A baik)))))


#### Quantity

http://mylanguages.org/malay_adjectives.php

Words that express quantity are placed before the noun.

* *beberapa* - few, some
* *sedikit* - little
* *banyak* - many, much
* *bahagian* - part
* *semua* - whole, all

Examples:

* *beberapa epal* - some apples
* *semua orang* - everybody

We add the following grammar rules:
```
NP -> Q NP  
```

In [87]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> VP | NP VP
NP -> NP AP | Q NP | N
VP -> V NP | V
AP -> AP A | A

# Lexicon
N -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka' | 'nasi' | 'mee' | 'roti' | 'air' | 'rumah' | 'kuda' | 'orang' | 'epal'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan'
A -> 'besar' | 'kuat' | 'baik'
Q -> 'beberapa' | 'sedikit' | 'banyak' | 'bahagian' | 'semua'
""")
print(malay_grammar)

Grammar with 44 productions (start state = S)
    S -> VP
    S -> NP VP
    NP -> NP AP
    NP -> Q NP
    NP -> N
    VP -> VP NP
    VP -> V
    AP -> AP A
    AP -> A
    N -> 'saya'
    N -> 'aku'
    N -> 'kita'
    N -> 'kami'
    N -> 'anda'
    N -> 'awak'
    N -> 'dia'
    N -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    N -> 'rumah'
    N -> 'kuda'
    N -> 'orang'
    N -> 'epal'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'
    A -> 'besar'
    A -> 'kuat'
    A -> 'baik'
    Q -> 'beberapa'
    Q -> 'sedikit'
    Q -> 'banyak'
    Q -> 'bahagian'
    Q -> 'semua'


In [88]:
# everybody loves me
parse(malay_grammar, 'semua orang sayang saya')

# they eat some apples
parse(malay_grammar, 'mereka makan beberapa epal')

semua orang sayang saya
(S (NP (Q semua) (NP (N orang))) (VP (VP (V sayang)) (NP (N saya))))
mereka makan beberapa epal
(S
  (NP (N mereka))
  (VP (VP (V makan)) (NP (Q beberapa) (NP (N epal)))))


#### Possession

Possession is expressed by placing the pronoun after the noun.

* *rumah saya* - my house  
Literal: house + I
* *air kita* - our water  
Literal: water + we

This can be expressed by letting pronouns be adjectives.

```
AP -> PN
NP -> PN
```


In [89]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> VP | NP VP
NP -> NP AP | Q NP | PN | N
VP -> V NP | V
AP -> AP A | PN | A

# Lexicon
PN -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka'
N -> 'nasi' | 'mee' | 'roti' | 'air' | 'rumah' | 'kuda' | 'orang' | 'epal'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan'
A -> 'besar' | 'kuat' | 'baik'
Q -> 'beberapa' | 'sedikit' | 'banyak' | 'bahagian' | 'semua'
""")
print(malay_grammar)

Grammar with 46 productions (start state = S)
    S -> VP
    S -> NP VP
    NP -> NP AP
    NP -> Q NP
    NP -> PN
    NP -> N
    VP -> VP NP
    VP -> V
    AP -> AP A
    AP -> PN
    AP -> A
    PN -> 'saya'
    PN -> 'aku'
    PN -> 'kita'
    PN -> 'kami'
    PN -> 'anda'
    PN -> 'awak'
    PN -> 'dia'
    PN -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    N -> 'rumah'
    N -> 'kuda'
    N -> 'orang'
    N -> 'epal'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'
    A -> 'besar'
    A -> 'kuat'
    A -> 'baik'
    Q -> 'beberapa'
    Q -> 'sedikit'
    Q -> 'banyak'
    Q -> 'bahagian'
    Q -> 'semua'


In [91]:
# this is my big house
parse(malay_grammar, 'rumah besar saya itu')

# this is our few water
parse(malay_grammar, 'sedikit air kita itu')

rumah saya itu
(S (NP (NP (N rumah)) (AP (PN saya))) (VP (V itu)))
air kita itu
(S (NP (NP (N air)) (AP (PN kita))) (VP (V itu)))


### Verb Negation

* *tidak*, *tak* - 'not', used to negate verbs and adjectives  
*Saya tidak makan* - I do not eat  
Literal: I + not + eat
  
* *bukan* - 'not be', used to negate nouns  
*Anda bukan kawan saya* - You are not my friend  
Literal: you + not be + friend + I

* *jangan* - 'do not', used as negative imperatives  
*jangan makan roti* - Do not eat bread  
Literal: do not + eat + bread

We had the following rules to support negation.

```
VP -> Neg VP
Neg -> 'tidak' | 'tak'
V -> 'bukan' | 'jangan'
```

In [96]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> VP | NP VP
NP -> NP AP | Q NP | PN | N
VP -> Neg VP | V NP | V
AP -> AP A | PN | A

# Lexicon
PN -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka'
N -> 'nasi' | 'mee' | 'roti' | 'air' | 'rumah' | 'kuda' | 'orang' | 'epal' | 'kawan'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan' | 'bukan' | 'jangan'
A -> 'besar' | 'kuat' | 'baik'
Q -> 'beberapa' | 'sedikit' | 'banyak' | 'bahagian' | 'semua'
Neg -> 'tidak' | 'tak'
""")
print(malay_grammar)

Grammar with 52 productions (start state = S)
    S -> VP
    S -> NP VP
    NP -> NP AP
    NP -> Q NP
    NP -> PN
    NP -> N
    VP -> Neg VP
    VP -> VP NP
    VP -> V
    AP -> AP A
    AP -> PN
    AP -> A
    PN -> 'saya'
    PN -> 'aku'
    PN -> 'kita'
    PN -> 'kami'
    PN -> 'anda'
    PN -> 'awak'
    PN -> 'dia'
    PN -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    N -> 'rumah'
    N -> 'kuda'
    N -> 'orang'
    N -> 'epal'
    N -> 'kawan'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'
    V -> 'bukan'
    V -> 'jangan'
    A -> 'besar'
    A -> 'kuat'
    A -> 'baik'
    Q -> 'beberapa'
    Q -> 'sedikit'
    Q -> 'banyak'
    Q -> 'bahagian'
    Q -> 'semua'
    Neg -> 'tidak'
    Neg -> 'tak'


In [97]:
# I do not eat
parse(malay_grammar, 'saya tidak makan')

# You are not my friend
parse(malay_grammar, 'anda bukan kawan saya')

# Do not eat bread
parse(malay_grammar, 'jangan makan roti')

saya tidak makan
(S (NP (PN saya)) (VP (Neg tidak) (VP (V makan))))
anda bukan kawan saya
(S
  (NP (PN anda))
  (VP (VP (V bukan)) (NP (NP (N kawan)) (AP (PN saya)))))
(S
  (NP (PN anda))
  (VP (VP (VP (V bukan)) (NP (N kawan))) (NP (PN saya))))
jangan makan roti


### Conjunctions

We introduce basic conjunctions.

* *dan* - and
* *atau* - or

We add the following rules.

```
NP -> NP C NP
VP -> VP C VP
C -> 'dan' | 'atau'
```



In [98]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> VP | NP VP
NP -> NP AP | Q NP | PN | N | NP C NP
VP -> Neg VP | V NP | V | VP C VP
AP -> AP A | PN | A

# Lexicon
PN -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka'
N -> 'nasi' | 'mee' | 'roti' | 'air' | 'rumah' | 'kuda' | 'orang' | 'epal' | 'kawan'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan' | 'bukan' | 'jangan'
A -> 'besar' | 'kuat' | 'baik'
Q -> 'beberapa' | 'sedikit' | 'banyak' | 'bahagian' | 'semua'
Neg -> 'tidak' | 'tak'
C -> 'dan' | 'atau'
""")
print(malay_grammar)

Grammar with 56 productions (start state = S)
    S -> VP
    S -> NP VP
    NP -> NP AP
    NP -> Q NP
    NP -> PN
    NP -> N
    NP -> NP C NP
    VP -> Neg VP
    VP -> VP NP
    VP -> V
    VP -> VP C VP
    AP -> AP A
    AP -> PN
    AP -> A
    PN -> 'saya'
    PN -> 'aku'
    PN -> 'kita'
    PN -> 'kami'
    PN -> 'anda'
    PN -> 'awak'
    PN -> 'dia'
    PN -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    N -> 'rumah'
    N -> 'kuda'
    N -> 'orang'
    N -> 'epal'
    N -> 'kawan'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'
    V -> 'bukan'
    V -> 'jangan'
    A -> 'besar'
    A -> 'kuat'
    A -> 'baik'
    Q -> 'beberapa'
    Q -> 'sedikit'
    Q -> 'banyak'
    Q -> 'bahagian'
    Q -> 'semua'
    Neg -> 'tidak'
    Neg -> 'tak'
    C -> 'dan'
    C -> 'atau'


In [99]:
# I eat rice and noodles
parse(malay_grammar, 'saya makan nasi dan mee')

# They have horses or houses
parse(malay_grammar, 'mereka ada kuda atau rumah')

saya makan nasi dan mee
(S
  (NP (PN saya))
  (VP (VP (V makan)) (NP (NP (N nasi)) (C dan) (NP (N mee)))))
mereka ada kuda atau rumah
(S
  (NP (PN mereka))
  (VP (VP (V ada)) (NP (NP (N kuda)) (C atau) (NP (N rumah)))))


### Prepositions

http://mylanguages.org/malay_prepositions.php

Some basic prepositions are as follows:

* *tentang* - about
* *di* - at
* *untuk* - for
* *dari* - from, of
* *ke* - to

To support prepositions, we add the following rules:

```
NP -> NP PP
VP -> V PP | V NP PP | V PP PP
PP -> P NP
```


In [100]:
malay_grammar = CFG.fromstring("""
# Grammar
S -> VP | NP VP
NP -> NP AP | Q NP | PN | N | NP C NP | NP PP
VP -> Neg VP | V NP | V | VP C VP | V PP | V NP PP | V PP PP
AP -> AP A | PN | A
PP -> P NP

# Lexicon
PN -> 'saya' | 'aku' | 'kita' | 'kami' | 'anda' | 'awak' | 'dia' | 'mereka'
N -> 'nasi' | 'mee' | 'roti' | 'air' | 'rumah' | 'kuda' | 'orang' | 'epal' | 'kawan'
V -> 'itu' | 'ada' | 'makan' | 'minum' | 'tidur' | 'duduk' | 'sayang' | 'bercakap' | 'dengar' | 'menyanyi' | 'dengarkan' | 'bukan' | 'jangan'
A -> 'besar' | 'kuat' | 'baik'
Q -> 'beberapa' | 'sedikit' | 'banyak' | 'bahagian' | 'semua'
Neg -> 'tidak' | 'tak'
C -> 'dan' | 'atau'
P -> 'tentang' | 'di' | 'untuk' | 'dari' | 'ke'
""")
print(malay_grammar)

Grammar with 66 productions (start state = S)
    S -> VP
    S -> NP VP
    NP -> NP AP
    NP -> Q NP
    NP -> PN
    NP -> N
    NP -> NP C NP
    NP -> NP PP
    VP -> Neg VP
    VP -> V NP
    VP -> V
    VP -> VP C VP
    VP -> V PP
    VP -> V NP PP
    VP -> V PP PP
    AP -> AP A
    AP -> PN
    AP -> A
    PP -> P NP
    PN -> 'saya'
    PN -> 'aku'
    PN -> 'kita'
    PN -> 'kami'
    PN -> 'anda'
    PN -> 'awak'
    PN -> 'dia'
    PN -> 'mereka'
    N -> 'nasi'
    N -> 'mee'
    N -> 'roti'
    N -> 'air'
    N -> 'rumah'
    N -> 'kuda'
    N -> 'orang'
    N -> 'epal'
    N -> 'kawan'
    V -> 'itu'
    V -> 'ada'
    V -> 'makan'
    V -> 'minum'
    V -> 'tidur'
    V -> 'duduk'
    V -> 'sayang'
    V -> 'bercakap'
    V -> 'dengar'
    V -> 'menyanyi'
    V -> 'dengarkan'
    V -> 'bukan'
    V -> 'jangan'
    A -> 'besar'
    A -> 'kuat'
    A -> 'baik'
    Q -> 'beberapa'
    Q -> 'sedikit'
    Q -> 'banyak'
    Q -> 'bahagian'
    Q -> 'semua'
    Neg -> 'tid

In [105]:
# You talk about him
parse(malay_grammar, 'awak bercakap tentang dia')

# The horse is for my friend
parse(malay_grammar, 'kuda itu untuk kawan saya')

awak bercakap tentang dia
(S (NP (PN awak)) (VP (V bercakap) (PP (P tentang) (NP (PN dia)))))
kuda itu untuk kawan saya
(S
  (NP (N kuda))
  (VP (V itu) (PP (P untuk) (NP (NP (N kawan)) (AP (PN saya))))))


## Evidence

## Discussion

## Bibliography

1. Wardhana, Dian Eka Chandra (2021). ["Indonesian as the Language of ASEAN During the New Life Behavior Change 2021"](http://ejournal.karinosseff.org/index.php/jswse/article/view/114). Journal of Social Work and Science Education. 1 (3): 266–280. Retrieved 

2. Pusat Rujukan Persuratan Melayu (2014), [Ejaan Rumi Baharu Bahasa Malaysia](https://prpm.dbp.gov.my/cari1?keyword=ejaan%20rumi%20baharu), retrieved 2014-10-04. 