# Morphology of Pangasinan
*Last updated: 2025-09-25*

This notebook cell provides a concise, sourced overview of the key **morphological features** of the Pangasinan language, including **case marking**, **pronouns**, **verb focus/aspect systems**, **reduplication**, **negation**, and **morphophonemics**. References are listed at the end, with links to primary descriptions (Benton 1971), the Rosetta Project grammar handout, and typological summaries.

---

## 1) Typological snapshot
- **Morphology:** highly **agglutinative** with productive affixation (prefixes, infixes, suffixes).  
- **Clause alignment & order:** analyses commonly describe a **focus/voice** system with **predicate‑initial** order typical of Philippine languages; grammatical relations are cued by verbal morphology and **case‑marking particles** rather than nominal inflection.  
  *Refs:* Benton (1971); Rosetta Project grammar handout.

---

## 2) Case marking & noun phrase structure
Pangasinan marks NP roles with **case particles (articles)**. Core forms for common nouns:

| Grammatical role (approx.) | Singular | Plural | Notes |
|---|---|---|---|
| **Nominative / absolutive** | **say**, **so** (clitic **=’y**) | **saray**, **so saray**; also **(i)ra so** in some paradigms | Subject/pivot of the clause. |
| **Genitive / ergative (agent, possessor)** | **na** (for common nouns), **ni** (for personal names) | **na saray** | Agents of transitive verbs and possessors. |
| **Oblique** | **ed** (clitic **=’d**), **dyad** (sentence‑initial variant) | **ed saray** | Locatives, goals, other obliques. |


**Linker** (adjectival/NP linker): **ya** with allomorphs **ya / ‑n / a** depending on phonological context (e.g., *maong **’y** baley* ‘the good town’).  
*Refs:* Benton (1971), Rosetta Project handout; cross‑check Wikipedia tables.

---

## 3) Personal pronouns (independent & enclitic genitive)

Pangasinan distinguishes **inclusive vs. exclusive** in 1st‑plural and has a **dual inclusive** (1+2 only). Independent absolutive vs. genitive enclitics (most common forms):

| Person/number | Absolutive (independent) | Genitive (enclitic) | Oblique form (with *ed*) |
|---|---|---|---|
| **1sg** | **siák** | **‑ko** | *ed siák* |
| **1du incl** | **sikatá** | **‑ta** | *ed sikatá* |
| **1pl incl** | **sikatayó** | **‑tayo** | *ed sikatayó* |
| **1pl excl** | **sikamí** | **‑mi** | *ed sikamí* |
| **2sg** | **siká** | **‑mo** | *ed siká* |
| **2pl** | **sikayó** | **‑yo** | *ed sikayó* |
| **3sg** | **sikató** | **to** | *ed sikató* |
| **3pl** | **sikará** | **da** | *ed sikará* |



*Refs:* Benton (1971); Wikipedia summary table (to be used as a quick crib, cross‑checked against Benton).

---

## 4) Verb morphology: focus/voice × aspect

Pangasinan verbs mark **which participant is in focus** (actor, patient/theme, locative/referent, benefactive, instrumental), typically **together with aspect** (non‑completed vs. completed). Below is a compact mapping of commonly cited affixes; exact distribution can be root‑sensitive and dialectal.

### 4.1 Actor‑focus (AF) sets (illustrative)

- **man‑ / nan‑** — AF pair contrasting **non‑completed** (man‑) vs **completed** (nan‑) aspect.  

- **maN‑** (nasal‑assimilating) — AF; the nasal assimilates to the following consonant (e.g., *maN* + *p‑* → *mam‑*). Completed counterpart is often given as **aN‑** in older descriptions.  

- **oN‑** — AF set described in the literature; semantics vary by root class (see studies listed below).  



### 4.2 Patient/theme‑focus and other non‑AF sets

- **Patient/Goal focus (PF):** **‑en** (non‑completed), **‑in‑** (completed).  

- **Theme/Goal focus:** **i‑** (non‑completed), **in‑** (completed).  

- **Referent/Locative focus:** **‑an** (non‑completed), **‑in‑…‑an** (completed).  

- **Benefactive focus:** **i‑…‑an** (non‑completed), **in‑…‑an** (completed).  

- **Instrumental focus:** **(i)pan‑**, **inpan‑**; **(i)pañgi‑**, **inpañgi‑**.  



### 4.3 Causatives, reciprocals, ability

- **pa‑ / paka‑** (often on top of focus affixes) — causative/allowative (‘make/let X V’).  

- **maka‑** — ability/potential.  

- **mi‑ / aki‑** — reciprocal/distributive readings in some paradigms (cf. Benton’s exercises).  



> **Note.** Precise pairings of aspect with each focus set (e.g., AF *man‑/nan‑*) are discussed in Benton (1971) and subsequent studies. Root compatibility and subtle semantic classes are treated in work on **maN‑** and **oN‑** (see references).

---

## 5) Reduplication (very productive)

- **Plural nouns via partial reduplication**: multiple patterns are attested (CV‑, CVC‑, C1V‑, CVCV‑). Examples frequently cited in the literature include:  

  - *toó → totóo* ‘person → people’  

  - *báley → balbáley* ‘town → towns’  

  - *manók → manómanók* ‘chicken → chickens’  

  - *pláto → papláto* ‘plate → plates’  

- **Verbal frequentative/iterative** uses are also reported (reduplication marking repeated/ongoing action).  

*Refs:* Benton (1971); Rubino (WALS ch. 27; 2005 overview of reduplication).

---

## 6) Negation (useful surface facts)

- **Standard verbal negation:** **ag** (often realized as **aga** by allomorphy/phonotactics).  

- **Existential/possessive negation:** **anggapo** (NEG.exist) (you will also see **audi** in some handouts).  

*Refs:* Benton (1971); Rosetta Project handout; see recent discussion revising *ag/aga* distribution.

---

## 7) Morphophonemics you’ll encounter

- **Nasal assimilation** with **maN‑ / paN‑**: the nasal matches the place of articulation of the following consonant (e.g., **maN‑** + **p** → **mam‑**).  

- **Clitic sandhi**: case particles often cliticize as **=’y** (*so*) and **=’d** (*ed*) after vowels; the **linker** has allomorphs **ya / ‑n / a**.  

*Refs:* Benton (1971); Rosetta Project handout.

---

## 8) Quick mini‑reference

- **Case/article set:** **say/so (=’y)** NOM; **na / ni** GEN; **ed (=’d)** OBL; plurals **saray**, **so saray**; **dyad** (sentence‑initial oblique variant).  

- **Pronouns (core):** **siák ~ ‑ko** ‘I/my’; **sikatá ~ ‑ta** ‘we two (incl.)’; **sikatayó ~ ‑tayo** ‘we (incl.)’; **sikamí ~ ‑mi** ‘we (excl.)’; **siká ~ ‑mo** ‘you (sg.)’; **sikayó ~ ‑yo** ‘you (pl.)’; **sikató ~ to**, **sikará ~ da**.  

- **Focus markers (illustrative):** AF **man‑/nan‑**, **maN‑ (…)/aN‑**, **oN‑**; PF **‑en/‑in‑**; LOC/REF **‑an / ‑in‑…‑an**; BEN **i‑…‑an / in‑…‑an**; INST **(i)pan‑ / inpan‑**, **(i)pañgi‑ / inpañgi‑**; **maka‑** (ability); **pa‑/paka‑** (causative).  

- **Reduplication:** CV‑/CVC‑/C1V‑/CVCV‑ patterns for plural nouns; frequentative readings with verbs.  

---

## References & suggested reading (links)

- **Benton, Richard A. (1971). _Pangasinan Reference Grammar._** University of Hawai‘i Press / Humanities Open Book. Manifold project page: https://manifold.uhpress.hawaii.edu/projects/pangasinan-reference-grammar (see especially: *Personal pronouns*, *Case‑marking particles*, *Focus‑marking verbal affixes*, *Phonology*.)

- **Rosetta Project. _Pangasinan Grammar (Morphosyntax) handout._** Archive.org PDF (linker *ya* allomorphy; negation; focus examples): https://archive.org/download/rosettaproject_pag_morsyn-1/rosettaproject_pag_morsyn-1.pdf

- **WALS Online, Chapter 27 “Reduplication” (Rubino).** Notes Pangasinan plural‑by‑reduplication patterns: https://wals.info/chapter/27

- **Rubino, Carl (2005). “Reduplication: Form, function and distribution.”** (overview PDF): https://www.unice.fr/scheer/egg/Lagodekhi16/Rubino%2C%20Carl%20%282005%29%20-%20Reduplication%20Form%2C%20function%20and%20distribution.pdf

- **“A Closer Look at the Pangasinan Verbal Affixes maN‑ and oN‑.”** University of the Philippines Department of Linguistics (abstract): https://linguistics.upd.edu.ph/publication/a-closer-look-at-the-pangasinan-verbal-affixes-man-and-on/

- **“Revisiting the form of negation in Pangasinan.”** A concise discussion of **ag/aga** distribution: https://mitcho.com/research/pangasinan-neg.html

- **Wikipedia (quick reference only): _Pangasinan language_** (for at‑a‑glance tables; verify with Benton): https://en.wikipedia.org/wiki/Pangasinan_language



# Morphological Analysis for Pangasinan Lexicon

This notebook loads JSON lexicon and applies simple rule-based morphological parsing.

In [6]:
import json

# Load the lexicon (update path as needed)
with open("pangasinan_enriched.json", "r", encoding="utf-8") as f:
    lexicon = json.load(f)

print("Entries loaded:", len(lexicon))
lexicon[:2]  # peek at first entries

Entries loaded: 2595


[{'word': '1a',
  'meaning': 'interjection marking hesitation, agreement, disagreement, etc.',
  'source': 'Dictionary A, Dictionary A Sorted',
  'POS': 'INTERJECTION',
  'morphology': None},
 {'word': '2a',
  'meaning': 'linking particle, uniting adjectives or descriptive phrases with verbs and nouns, relative sentences to main sentence, etc. (also ya )',
  'source': 'Dictionary A, Dictionary A Sorted',
  'POS': 'PARTICLE',
  'morphology': None}]

In [7]:
# Define morphological rules
rules = [
    {"type": "nasal_prefix", "form": "ma", "label": "maN-", "meaning": "actor focus (nasal-assimilating, non-completed)"},
    {"type": "nasal_prefix", "form": "a", "label": "aN-", "meaning": "actor focus (nasal-assimilating, completed)"},
    {"type": "nasal_prefix", "form": "pa", "label": "paN-", "meaning": "causative/instrumental nasal-assimilating"},
    {"type": "nasal_prefix", "form": "o", "label": "oN-", "meaning": "actor focus (nasal-assimilating variant)"},
    {"type": "prefix", "form": "man", "label": "man-", "meaning": "actor focus (non-completed)"},
    {"type": "prefix", "form": "nan", "label": "nan-", "meaning": "actor focus (completed)"},
    {"type": "prefix", "form": "ma", "label": "ma-", "meaning": "causative / stative"},
    {"type": "prefix", "form": "pa", "label": "pa-", "meaning": "causative/allowative"},
    {"type": "prefix", "form": "paka", "label": "paka-", "meaning": "causative/allowative (intensive)"},
    {"type": "prefix", "form": "maka", "label": "maka-", "meaning": "ability/potential"},
    {"type": "prefix", "form": "mi", "label": "mi-", "meaning": "reciprocal/distributive"},
    {"type": "prefix", "form": "aki", "label": "aki-", "meaning": "reciprocal/distributive"},
    {"type": "prefix", "form": "ipan", "label": "(i)pan-", "meaning": "instrumental focus (non-completed)"},
    {"type": "prefix", "form": "inpan", "label": "inpan-", "meaning": "instrumental focus (completed)"},
    {"type": "prefix", "form": "ipañgi", "label": "(i)pañgi-", "meaning": "instrumental/apparatus focus (non-completed)"},
    {"type": "prefix", "form": "inpañgi", "label": "inpañgi-", "meaning": "instrumental/apparatus focus (completed)"},
    {"type": "prefix", "form": "i", "label": "i-", "meaning": "theme/goal or benefactive focus (non-completed)"},
    {"type": "prefix", "form": "in", "label": "in-", "meaning": "theme/goal or benefactive focus (completed)"},
    {"type": "suffix", "form": "an", "label": "-an", "meaning": "locative/referent focus (non-completed)"},
    {"type": "suffix", "form": "en", "label": "-en", "meaning": "patient focus (non-completed)"},
    {"type": "suffix", "form": "in", "label": "-in", "meaning": "patient/theme focus (completed)"},
    {"type": "suffix", "form": "tayo", "label": "-tayo", "meaning": "1pl inclusive genitive enclitic"},
    {"type": "suffix", "form": "mi", "label": "-mi", "meaning": "1pl exclusive genitive enclitic"},
    {"type": "suffix", "form": "mo", "label": "-mo", "meaning": "2sg genitive enclitic"},
    {"type": "suffix", "form": "yo", "label": "-yo", "meaning": "2pl genitive enclitic"},
    {"type": "suffix", "form": "ko", "label": "-ko", "meaning": "1sg genitive enclitic"},
    {"type": "suffix", "form": "ta", "label": "-ta", "meaning": "1du inclusive genitive enclitic"},
    {"type": "suffix", "form": "to", "label": "-to", "meaning": "3sg genitive enclitic"},
    {"type": "suffix", "form": "da", "label": "-da", "meaning": "3pl genitive enclitic"},
    {"type": "circumfix", "form": {"prefix": "i", "suffix": "an"}, "label": "i-…-an", "meaning": "benefactive focus (non-completed)"},
    {"type": "circumfix", "form": {"prefix": "in", "suffix": "an"}, "label": "in-…-an", "meaning": "benefactive/referent focus (completed)"},
    {"type": "infix", "form": "in", "label": "-in-", "meaning": "completed aspect marker"},
    {"type": "reduplication", "form": "CV", "label": "CV-", "meaning": "partial reduplication (plural nouns)"},
    {"type": "reduplication", "form": "CVC", "label": "CVC-", "meaning": "partial reduplication (plural nouns)"},
    {"type": "reduplication", "form": "C1V", "label": "C1V-", "meaning": "partial reduplication (plural nouns)"},
    {"type": "reduplication", "form": "CVCV", "label": "CVCV-", "meaning": "partial reduplication (plural nouns)"},
    {"type": "reduplication", "form": "full", "label": "full", "meaning": "full reduplication (intensifier/frequentative)"}
 ]

In [8]:
# Morphological analyzer
from collections import defaultdict
from copy import deepcopy

NASAL_ALLOMORPHS = ["m", "n", "ng", "ny"]
VOWELS = set("aeiouáéíóúâêîôû")
PARTIAL_REDUPE_LENGTHS = {"CV": 2, "CVC": 3, "C1V": 2, "CVCV": 4}

def _display_form(rule):
    if "label" in rule:
        return rule["label"]
    form = rule["form"]
    if isinstance(form, dict):
        return f"{form.get('prefix', '')}…{form.get('suffix', '')}"
    return form

def _make_record(rule, extra=None):
    record = {
        "type": rule["type"],
        "meaning": rule["meaning"],
        "form": _display_form(rule),
        "normalized_form": deepcopy(rule["form"]),
    }
    if extra:
        record.update(extra)
    return record

def _apply_circumfix(stem, rule):
    form = rule["form"]
    prefix = form.get("prefix", "")
    suffix = form.get("suffix", "")
    lower = stem.lower()
    if lower.startswith(prefix) and lower.endswith(suffix) and len(stem) > len(prefix) + len(suffix):
        if suffix:
            new_stem = stem[len(prefix):-len(suffix)]
        else:
            new_stem = stem[len(prefix):]
        return new_stem, _make_record(rule)
    return stem, None

def _apply_nasal_prefix(stem, rule):
    base = rule["form"]
    lower = stem.lower()
    if not lower.startswith(base):
        return stem, None
    remainder = stem[len(base):]
    rem_lower = remainder.lower()
    if not remainder:
        return stem, None
    for allo in NASAL_ALLOMORPHS:
        if rem_lower.startswith(allo):
            new_stem = remainder[len(allo):]
            return new_stem, _make_record(rule, {"applied_allomorph": allo})
    if rem_lower[0] in VOWELS:
        return remainder, _make_record(rule, {"applied_allomorph": ""})
    return stem, None

def _apply_prefix(stem, rule):
    prefix = rule["form"]
    if stem.lower().startswith(prefix) and len(stem) > len(prefix):
        return stem[len(prefix):], _make_record(rule)
    return stem, None

def _apply_suffix(stem, rule):
    suffix = rule["form"]
    if stem.lower().endswith(suffix) and len(stem) > len(suffix):
        return stem[:-len(suffix)], _make_record(rule)
    return stem, None

def _apply_infix(stem, rule):
    infix = rule["form"]
    lower = stem.lower()
    start = lower.find(infix, 1)  # avoid word-initial position
    if start != -1 and len(stem) > len(infix):
        new_stem = stem[:start] + stem[start + len(infix):]
        return new_stem, _make_record(rule, {"position": start})
    return stem, None

def _apply_reduplication(stem, rule):
    pattern = rule["form"]
    lower = stem.lower()
    if pattern == "full":
        if len(stem) % 2 == 0:
            half = len(stem) // 2
            if lower[:half] == lower[half:]:
                return stem[:half], _make_record(rule)
        return stem, None
    chunk_len = PARTIAL_REDUPE_LENGTHS.get(pattern)
    if not chunk_len or len(stem) < chunk_len * 2:
        return stem, None
    chunk = stem[:chunk_len]
    if lower[:chunk_len] == lower[chunk_len:chunk_len * 2]:
        return stem[chunk_len:], _make_record(rule, {"partial_chunk": chunk})
    return stem, None

APPLY_FUNCS = {
    "circumfix": _apply_circumfix,
    "nasal_prefix": _apply_nasal_prefix,
    "prefix": _apply_prefix,
    "suffix": _apply_suffix,
    "infix": _apply_infix,
    "reduplication": _apply_reduplication,
}

PROCESS_ORDER = ["circumfix", "nasal_prefix", "prefix", "suffix", "infix"]

def analyze(word, rules):
    stem = word
    processes = []
    grouped = defaultdict(list)
    for rule in rules:
        grouped[rule["type"]].append(rule)

    for rule_type in PROCESS_ORDER:
        applied = True
        while applied:
            applied = False
            for rule in grouped.get(rule_type, []):
                new_stem, record = APPLY_FUNCS[rule_type](stem, rule)
                if record is not None:
                    processes.append(record)
                    stem = new_stem
                    applied = True
                    break

    for rule in grouped.get("reduplication", []):
        new_stem, record = APPLY_FUNCS["reduplication"](stem, rule)
        if record is not None:
            processes.append(record)
            stem = new_stem
            break

    return {"root": stem if stem else word, "processes": processes}

In [9]:
# Quick smoke tests for the analyzer
sample_words = [
    "mangabak",
    "inalabasan",
    "pinabak",
    "totóo",
    "balbáley",
    "maabak",
    "ábakábak",
 ]
for word in sample_words:
    print(word, "→", analyze(word, rules))

mangabak → {'root': 'gabak', 'processes': [{'type': 'nasal_prefix', 'meaning': 'actor focus (nasal-assimilating, non-completed)', 'form': 'maN-', 'normalized_form': 'ma', 'applied_allomorph': 'n'}]}
inalabasan → {'root': 'nalabas', 'processes': [{'type': 'circumfix', 'meaning': 'benefactive focus (non-completed)', 'form': 'i-…-an', 'normalized_form': {'prefix': 'i', 'suffix': 'an'}}]}
pinabak → {'root': 'pabak', 'processes': [{'type': 'infix', 'meaning': 'completed aspect marker', 'form': '-in-', 'normalized_form': 'in', 'position': 1}]}
totóo → {'root': 'totóo', 'processes': []}
balbáley → {'root': 'balbáley', 'processes': []}
maabak → {'root': 'abak', 'processes': [{'type': 'nasal_prefix', 'meaning': 'actor focus (nasal-assimilating, non-completed)', 'form': 'maN-', 'normalized_form': 'ma', 'applied_allomorph': ''}]}
ábakábak → {'root': 'ábak', 'processes': [{'type': 'reduplication', 'meaning': 'partial reduplication (plural nouns)', 'form': 'CVCV-', 'normalized_form': 'CVCV', 'parti

In [10]:
# Apply analysis to whole lexicon
for entry in lexicon:
    entry["morphology"] = analyze(entry["word"], rules)

# Save enriched lexicon
with open("pangasinan_with_morphology.json", "w", encoding="utf-8") as f:
    json.dump(lexicon, f, ensure_ascii=False, indent=2)

print("Saved enriched lexicon with morphology!")

Saved enriched lexicon with morphology!


In [11]:
# Show enrichment statistics for the lexicon
from collections import Counter
import json

with open("pangasinan_with_morphology.json", "r", encoding="utf-8") as f:
    enriched_lexicon = json.load(f)

AFFIX_TYPES = {"prefix", "suffix", "infix", "nasal_prefix", "circumfix"}
affix_counter = Counter()
nasal_allomorph_counter = Counter()
redup_counter = Counter()

total = len(enriched_lexicon)

for entry in enriched_lexicon:
    morph = entry.get("morphology", {})
    for proc in morph.get("processes", []):
        ptype = proc.get("type")
        label = proc.get("form")
        if ptype == "reduplication":
            redup_counter[label] += 1
        elif ptype in AFFIX_TYPES:
            affix_counter[label] += 1
            if ptype == "nasal_prefix":
                nasal_allomorph_counter[(label, proc.get("applied_allomorph", "∅"))] += 1

print(f"Total entries enriched: {total}")
print("\nAffix statistics:")
for affix, count in affix_counter.most_common():
    print(f"  {affix}: {count}")

if nasal_allomorph_counter:
    print("\nNasal assimilation allomorphs:")
    for (affix, allo), count in nasal_allomorph_counter.most_common():
        print(f"  {affix} +{allo or '∅'}: {count}")

print("\nReduplication statistics:")
for form, count in redup_counter.most_common():
    print(f"  {form}: {count}")

Total entries enriched: 2595

Affix statistics:
  pa-: 112
  ma-: 97
  i-: 97
  aN-: 84
  -in-: 61
  maN-: 49
  -an: 46
  paN-: 41
  -to: 36
  -ta: 33
  -da: 16
  -yo: 16
  mi-: 14
  -en: 14
  -mo: 12
  -ko: 11
  oN-: 6
  -in: 5
  nan-: 3
  inpan-: 2

Nasal assimilation allomorphs:
  aN- +m: 41
  aN- +n: 37
  maN- +∅: 26
  paN- +n: 24
  maN- +n: 17
  paN- +∅: 13
  aN- +∅: 6
  maN- +m: 6
  oN- +n: 5
  paN- +m: 4
  oN- +m: 1

Reduplication statistics:
  CVCV-: 17
  CV-: 12
  full: 10
  CVC-: 8
