# Morphological Analysis for Pangasinan Lexicon

This notebook loads your JSON lexicon and applies simple rule-based morphological parsing.

It currently supports:
- Prefix detection (e.g., **ma-**)
- Suffix detection (e.g., **-an**)
- Reduplication (e.g., **ábakábak**)

You can expand the rules as you study more morphology.

In [1]:
import json

# Load your lexicon (update path as needed)
with open("pangasinan_enriched.json", "r", encoding="utf-8") as f:
    lexicon = json.load(f)

print("Entries loaded:", len(lexicon))
lexicon[:2]  # peek at first entries

Entries loaded: 2595


[{'word': '1a',
  'meaning': 'interjection marking hesitation, agreement, disagreement, etc.',
  'source': 'Dictionary A, Dictionary A Sorted',
  'POS': 'INTERJECTION',
  'morphology': None},
 {'word': '2a',
  'meaning': 'linking particle, uniting adjectives or descriptive phrases with verbs and nouns, relative sentences to main sentence, etc. (also ya )',
  'source': 'Dictionary A, Dictionary A Sorted',
  'POS': 'PARTICLE',
  'morphology': None}]

In [None]:
# Define morphological rules
rules = [
    {"type": "prefix", "form": "ma-", "meaning": "causative"},
    {"type": "suffix", "form": "-an", "meaning": "locative"},
    {"type": "suffix", "form": "-en", "meaning": "patient focus"},
    {"type": "reduplication", "form": "full", "meaning": "intensifier/repetition"}
]

In [3]:
# Morphological analyzer
def analyze(word, rules):
    analysis = {"root": word, "processes": []}

    # Prefixes
    for r in rules:
        if r["type"] == "prefix" and word.startswith(r["form"]):
            analysis["root"] = word[len(r["form"]):]
            analysis["processes"].append(r)

    # Suffixes
    for r in rules:
        if r["type"] == "suffix" and word.endswith(r["form"]):
            analysis["root"] = word[:-len(r["form"])]
            analysis["processes"].append(r)

    # Reduplication (simple: split word in half)
    for r in rules:
        if r["type"] == "reduplication":
            half = len(word)//2
            if word[:half] == word[half:]:
                analysis["root"] = word[:half]
                analysis["processes"].append(r)
    return analysis

In [None]:
# Try some test words
print(analyze("ábakábak", rules))
print(analyze("maabak", rules))
print(analyze("sett", rules))

{'root': 'ábak', 'processes': [{'type': 'reduplication', 'form': 'full', 'meaning': 'intensifier/repetition'}]}
{'root': 'maabak', 'processes': []}
{'root': 'abakan', 'processes': []}


In [5]:
# Apply analysis to whole lexicon
for entry in lexicon:
    entry["morphology"] = analyze(entry["word"], rules)

# Save enriched lexicon
with open("pangasinan_with_morphology.json", "w", encoding="utf-8") as f:
    json.dump(lexicon, f, ensure_ascii=False, indent=2)

print("Saved enriched lexicon with morphology!")

Saved enriched lexicon with morphology!
