# ðŸ““ Undoing Assimilation in the Bolinao Lexicon

---

## **1. Assimilation in Bolinao**

Assimilation happens when affixes or clitics adjust their form to match the sounds they attach to. In Bolinao, it shows up in several places:

### **1.1 Verb affixes (maN-, aN-, saN-)**
- `man + bayo â†’ mambayo`
- `man + ka + mati â†’ mangkamati`
- `an + giling â†’ manggiling`
- `saN- + kataâ€™gayan â†’ sangkataâ€™gayan`

Rule: the **/n/** in maN-/aN-/saN- changes place of articulation to match the first consonant of the root.

---

### **1.2 Pronoun prefixes (koN-, ikon-)**
- `koN- + ta â†’ konta`
- `ikon + ta â†’ ikonta`
- `koN- + mi â†’ komi`

Rule: the **/N/** assimilates to the first consonant of the pronoun root.

---

### **1.3 Deictic pronouns (iti, isen, itaw)**
- `mo + iti â†’ modti`
- `mo + isen â†’ modsen`
- `mo + itaw â†’ modtaw`

Rule: encliticization turns the glottal + /i/ into **/d/**.

---

### **1.4 Linkers (a, nin)**
- `Mangansyon ya a anak â†’ Mangansyon yay anak`
- `Aripen nako nin Dios â†’ Aripen nakon Dios`

Rule: linker changes shape to **-y** or **-n** after vowel-final words.

---

# Undoing Assimilation

## 1. Data Exploration
Weâ€™ll load the Bolinao lexicon CSV, inspect it, and prep for processing.

In [None]:
import pandas as pd
df = pd.read_csv("bolinao_lexicon.csv")
print(df.head())
print(df.info())
print("Unique words:", df['word'].nunique())

     word part_of_speech                                    meaning_english  \
0    a'lo              n  Pestle, a rounded piece of wood about five inc...   
1   a'nak              n  Referring to specific children individually, n...   
2   a'nem              n                    Six, the number following five.   
3   a'pat              n                                              Four.   
4  a'rong              n                                              Nose.   

  meaning_filipino                                     sample_bolinao  \
0            Halo.      Kustoy byat nansi a'lonman'ipambayo kon irik.   
1        Mga anak.  Si Ligaya a kaka sa sarba konran syam nin a'na...   
2            Anim.                         A'nem ray salay nan manok.   
3            Apat.                Nagbakasyon ako nin a'pat nin awro.   
4           Ilong.  Say a'rong ran Pilipino ket ambo' tuloy nin ma...   

                                      sample_english  upos  
0  Thepestlethat I am usi

## 2. Undoing Assimilation Rules

Weâ€™ll create a function to generate possible root candidates for each assimilation type.

In [None]:
def undo_assimilation(word: str):
    candidates = []

    # --- Verb affixes (maN-, aN-, saN-) ---
    if word.startswith("mamb"):
        candidates.append("b" + word[4:])    # mambayo -> bayo
    elif word.startswith("mamp"):
        candidates.append("p" + word[4:])
    elif word.startswith("mamm"):
        candidates.append("m" + word[4:])
    elif word.startswith("mang"):
        candidates.append("k" + word[4:])
    elif word.startswith("manng"):
        candidates.append("ng" + word[5:])
    elif word.startswith("man"):
        candidates.append(word[3:])          # fallback

    if word.startswith("an"):
        candidates.append(word[2:])          # an+root
    if word.startswith("san"):
        candidates.append(word[3:])          # san+root

    # --- Pronoun prefixes (koN-, ikon-) ---
    if word.startswith("kon"):
        candidates.append(word[3:])          # konta -> ta
    elif word.startswith("kom"):
        candidates.append("mi")              # komi -> mi
    elif word.startswith("iko"):
        candidates.append(word[2:])          # ikonra -> ra

    # --- Deictic pronouns ---
    if word.startswith("modtaw"):
        candidates.append("itaw")
    elif word.startswith("modt"):
        candidates.append("iti")             # modti -> iti
    elif word.startswith("mods"):
        candidates.append("isen")

    # --- Linkers (tightened) ---
    if len(word) <= 5:
        if word.endswith("y") and not word.endswith(("ay", "oy", "uy")):
            candidates.append(word[:-1] + " a")  # yay -> ya a
        if word.endswith("n") and not word.endswith(("an", "en", "on", "in")):
            candidates.append(word[:-1] + " nin")  # nakon -> nako nin

    return list(set(candidates))


## 3. Cross-Checking with the Lexicon

Now weâ€™ll test each word, generate candidates, and see if they exist in the lexicon with the same meaning.

In [None]:
confirmed = []

for idx, row in df.iterrows():
    word = row['word']
    meaning = row['meaning_english']
    upos_assimilated = row['upos']
    roots = undo_assimilation(word)

    for r in roots:
        match = df[df['word'].str.strip() == r.strip()]
        if not match.empty:
            record = {
                "assimilated": word,
                "root_candidate": r,
                "meaning_assimilated": meaning,
                "meaning_root": "; ".join(match['meaning_english'].unique()),
                "upos_assimilated": upos_assimilated,
                "upos_root": "; ".join(match['upos'].unique())
            }
            confirmed.append(record)

confirmed_df = pd.DataFrame(confirmed)

## 4. Save Final Root Words

Export confirmed pairs for further linguistic analysis.

In [None]:
confirmed_df.to_csv("bolinao_root_words_assimilation.csv", index=False)

# Redoing Assimilation

In [None]:
import pandas as pd

try:
    try:
        df_lexicon = pd.read_csv("Bolinao_Lexicon.csv", encoding='latin-1')
    except UnicodeDecodeError:
        df_lexicon = pd.read_csv("Bolinao_Lexicon.csv", encoding='cp1252')
    df_lexicon['word_clean'] = df_lexicon['word'].astype(str).str.lower().str.strip()
    existing_words = set(df_lexicon['word_clean'].unique())
    print(f"Lexicon loaded: {len(existing_words)} unique words.")

except FileNotFoundError:
    print("Error: bolinao_lexicon.csv not found.")
    existing_words = set()

core_wordlist = [
    {"root": "gamet", "meaning": "hand"}, {"root": "wiri", "meaning": "left"},
    {"root": "wanan", "meaning": "right"}, {"root": "bitih", "meaning": "leg/foot"},
    {"root": "daan", "meaning": "road/path"}, {"root": "tangoy", "meaning": "to swim"},
    {"root": "tapok", "meaning": "dust"}, {"root": "katat", "meaning": "skin"},
    {"root": "gorot", "meaning": "back"}, {"root": "tyan", "meaning": "belly"},
    {"root": "botol", "meaning": "bone"}, {"root": "agtay", "meaning": "liver"},
    {"root": "soso", "meaning": "breast"}, {"root": "abaya", "meaning": "shoulder"},
    {"root": "daya", "meaning": "blood"}, {"root": "olo", "meaning": "head"},
    {"root": "leey", "meaning": "neck"}, {"root": "sabot", "meaning": "hair"},
    {"root": "arong", "meaning": "nose"}, {"root": "angot", "meaning": "to sniff/smell"},
    {"root": "bebey", "meaning": "mouth"}, {"root": "ngipin", "meaning": "tooth"},
    {"root": "dila", "meaning": "tongue"}, {"root": "kalis", "meaning": "to laugh"},
    {"root": "akis", "meaning": "to cry"}, {"root": "soka", "meaning": "to vomit"},
    {"root": "kan", "meaning": "to eat"}, {"root": "inom", "meaning": "to drink"},
    {"root": "kayat", "meaning": "to bite"}, {"root": "sepsep", "meaning": "to suck"},
    {"root": "toly", "meaning": "ear"}, {"root": "ingar", "meaning": "to hear"},
    {"root": "mata", "meaning": "eye"}, {"root": "kit", "meaning": "to see"},
    {"root": "elek", "meaning": "to sleep"}, {"root": "taynep", "meaning": "to dream"},
    {"root": "tekre", "meaning": "to sit"}, {"root": "ideng", "meaning": "to stand"},
    {"root": "lalaki", "meaning": "man/male"}, {"root": "babayi", "meaning": "woman/female"},
    {"root": "anak", "meaning": "child"}, {"root": "ahawa", "meaning": "spouse"},
    {"root": "ina", "meaning": "mother"}, {"root": "tatay", "meaning": "father"},
    {"root": "bali", "meaning": "house"}, {"root": "atep", "meaning": "roof"},
    {"root": "ngaran", "meaning": "name"}, {"root": "robir", "meaning": "rope"},
    {"root": "tayi", "meaning": "to sew"}, {"root": "kadayem", "meaning": "needle"},
    {"root": "takaw", "meaning": "to steal"}, {"root": "pati", "meaning": "to kill"},
    {"root": "tadem", "meaning": "sharp"}, {"root": "obra", "meaning": "to work"},
    {"root": "tanem", "meaning": "to plant"}, {"root": "pili", "meaning": "to choose"},
    {"root": "pespes", "meaning": "to squeeze"}, {"root": "kotkot", "meaning": "to dig"},
    {"root": "haliw", "meaning": "to buy"}, {"root": "bantak", "meaning": "to throw"},
    {"root": "aso", "meaning": "dog"}, {"root": "manok", "meaning": "bird/chicken"},
    {"root": "salay", "meaning": "egg"}, {"root": "pakpak", "meaning": "wing"},
    {"root": "lompad", "meaning": "to fly"}, {"root": "ikoy", "meaning": "tail"},
    {"root": "olay", "meaning": "snake"}, {"root": "bolati", "meaning": "worm"},
    {"root": "gigang", "meaning": "spider"}, {"root": "kona", "meaning": "fish"},
    {"root": "yamot", "meaning": "root"}, {"root": "bonga", "meaning": "fruit"},
    {"root": "bato", "meaning": "stone"}, {"root": "boyangin", "meaning": "sand"},
    {"root": "ranom", "meaning": "water"}, {"root": "asin", "meaning": "salt"},
    {"root": "langit", "meaning": "sky"}, {"root": "bulan", "meaning": "moon"},
    {"root": "bitoen", "meaning": "star"}, {"root": "gonem", "meaning": "cloud"},
    {"root": "rapeg", "meaning": "rain"}, {"root": "kodor", "meaning": "thunder"},
    {"root": "kimat", "meaning": "lightning"}, {"root": "emot", "meaning": "warm"},
    {"root": "rayep", "meaning": "cold"}, {"root": "albet", "meaning": "wet"},
    {"root": "byat", "meaning": "heavy"}
]

def apply_assimilation(root):
    candidates = []
    r = root.lower().strip()
    if not r: return []

    # Rule 1: Bilabials (b, p) -> mamb- / mam-
    if r.startswith(('b', 'p')):
        candidates.append(f"mam{r[1:]}")   # Total Assimilation (bayo -> mamayo)
        if r.startswith('b'):
            candidates.append(f"mamb{r[1:]}") # Partial Assimilation (bayo -> mambayo)

    # Rule 2: Alveolars (d, s, t) -> man-
    elif r.startswith(('d', 's', 't')):
        candidates.append(f"man{r[1:]}")   # Total Assimilation (takaw -> manakaw)

    # Rule 3: Velar (k) -> mang- (k dropped)
    elif r.startswith('k'):
        candidates.append(f"mang{r[1:]}")  # Total Assimilation (kimit -> mangimit)

    # Rule 4: Glottal / Vowel -> mang-
    elif r[0] in 'aeiou':
        candidates.append(f"mang{r}")      # Standard (anak -> manganak)
        candidates.append(f"mang-{r}")     # Hyphenated variation

    # Rule 5: Other Consonants (l, r, w, y, g, h, n, m) -> mang- (No change to root)
    else:
        candidates.append(f"mang{r}")      # (gamet -> manggamet)

    return list(set(candidates))

results = []

for item in core_wordlist:
    root = item['root']
    root_meaning = item['meaning']

    generated_forms = apply_assimilation(root)

    for gen_word in generated_forms:
        status = "Verified in Lexicon" if gen_word in existing_words else "New Candidate"

        # Determine Meaning
        if status == "Verified in Lexicon":
            match = df_lexicon[df_lexicon['word_clean'] == gen_word]
            if not match.empty:
                final_meaning = match.iloc[0]['meaning_english']
            else:
                final_meaning = "Found (No definition)"
        else:
            # Generate Theoretical Meaning
            clean_root_meaning = root_meaning
            if clean_root_meaning.lower().startswith("to "):
                clean_root_meaning = clean_root_meaning[3:]

            final_meaning = f"[Predicted] To perform the action: {clean_root_meaning}"

        results.append({
            "Core Root": root,
            "Root Meaning": root_meaning,
            "Generated Word": gen_word,
            "Status": status,
            "Final Meaning": final_meaning
        })

df_results = pd.DataFrame(results)
verified = df_results[df_results['Status'] == "Verified in Lexicon"]
theoretical = df_results[df_results['Status'] == "New Candidate"]

print(f"--- SUMMARY ---")
print(f"Core Roots Tested: {len(core_wordlist)}")
print(f"Verified Assimilated Forms: {len(verified)}")
print(f"New Candidates Generated: {len(theoretical)}")

print("\n--- SAMPLE VERIFIED MATCHES (Table 6) ---")
print(verified[['Core Root', 'Generated Word', 'Final Meaning']].head(10).to_string(index=False))

print("\n--- SAMPLE NEW CANDIDATES (Table 7) ---")
print(theoretical[['Core Root', 'Generated Word', 'Final Meaning']].head(10).to_string(index=False))
df_results.to_csv("bolinao_core2_assimilation.csv", index=False)