# Conversion of Bribri text to human-readable ortographies
Rolando Coto-Solano. Last updated: May 8, 2021

The [Bribri language](https://en.wikipedia.org/wiki/Bribri_language) is spoken by approximately 7000 people in southern Costa Rica. Bribri has two major orthographies: the [Constenla (1998) system](http://www.editorial.ucr.ac.cr/lenguas/item/2341-curso-basico-de-bribri.html), and the [Jara (2013) system](https://www.lenguabribri.com/se-tt%C3%B6-bribri-ie-hablemos-en-bribri). In addition to this, Bribri writing is not fully standardized, so there is considerable spelling variation between documents.

In order to facilitate training, the sentences in the [AmNL2020 Shared Task](https://github.com/AmericasNLP/americasnlp2021/tree/main/data/bribri-spanish) use an intermediate representation of the orthography. This intermediate representation is meant for use by NLP algorithms; it unifies the existing orthographies but reduces the human readability of the text. If you are going to use the Bribri sentences in print, please convert the intermediate form into a human-readable form first (either Constenla or Jara).

---

# Conversión ortográfica del bribri a una ortografía legible
Rolando Coto-Solano. Última actualización: 8 de mayo del 2021

La [lengua bribri](https://en.wikipedia.org/wiki/Bribri_language) tiene aproximadamente 7000 hablantes, distribuidos en el sur de Costa Rica. El bribri tiene dos sistemas ortográficos principales: la ortografía de [Constenla (1998)](http://www.editorial.ucr.ac.cr/lenguas/item/2341-curso-basico-de-bribri.html), y la de [Jara (2013)](https://www.lenguabribri.com/se-tt%C3%B6-bribri-ie-hablemos-en-bribri). Además, la escritura del bribri no está estandarizada, así que existe mucha variación entre diferentes documentos escritos en la lengua.

Para facilitar el entrenamiento, las oraciones en la [tarea compartida del AmNL2020](https://github.com/AmericasNLP/americasnlp2021/tree/main/data/bribri-spanish) usan una representación intermedia para la ortografía. Esta representación intermedia está diseñada para usarse con algoritmos de procesamiento de lenguaje natural. Esta hecha para unificar las ortografías existentes, pero al mismo tiempo reduce la legibilidad del texto. Si usted va a usar las oraciones bribri en algún documento impreso, por favor convierte la forma intermedia a una de las dos formas legibles por humanos (Constenla o Jara).

## Function structure

Input:

>`bribriInput`: A string with a sentence in Bribri.<br>
>`outputOrthography`: A string with one of two options: `constenla` or `jara`. This will determine the transcription system used in the output. (There are numerous differences in these systems, but the most visible difference is the marking of nasal vowels. In Constenla, nasal vowels are marked with a line underneath (a̠). In Jara, nasal vowels are marked with a tilde above the vowel (ã).

Output:

>`bribriOutput`: A string where the special characters in `bribriInput` have been converted to Unicode output with the human-readable diacritics.

## Estructura de la función

Input:

>`bribriInput`: Una string con una oración en Bribri.<br>
>`outputOrthography`: Una string con una de dos opciones: `constenla` or `jara`. Esto determinará cuál sistema ortográfico se usará en la salida. (Hay numerosas diferencias entre los dos sistemas, pero la diferencia más visible es la marcación de las vocales nasales. En Constenla, las nasales se marcan con una línea debajo de la vocal (a̠). En Jara, las nasales se marcan con una virgulilla sobre la vocal (ã).

Output:

>`bribriOutput`: Una string en la que los caracteres especiales de `bribriInput` han sido convertidos a una forma legible por humanos, con los diacríticos correctos.

In [15]:
def convertToHumanSpelling(bribriInput, outputOrthography):

  bribriOutput = bribriInput

  punctuation = {
      " .":".", " ,":".", " !":"!", " ?":"?"
  }

  if (outputOrthography=="constenla"):

    # These use Sofía Flores' diacritic conventions,
    # where the line is a COMBINING MINUS SIGN BELOW 0x0320
    diacriticChars = {
      "ã":"a̠", "ẽ":"e̠","ĩ":"i̠", "õ":"o̠","ũ":"u̠",                  # Nasal low tone
      "Ã":"A̠", "Ẽ":"E̠","Ĩ":"I̠", "Õ":"O̠","Ũ":"U̠",                  # Nasal low tone, uppercase
      "áx":"á̠", "éx":"é̠", "íx":"í̠", "óx":"ó̠", "úx":"ú̠",           # Nasal falling tone
      "Áx":"Á̠", "Éx":"É̠", "Íx":"Í̠", "Óx":"Ó̠", "Úx":"Ú̠",           # Nasal falling tone, uppercase
      "àx":"à̠", "èx":"è̠", "ìx":"ì̠", "òx":"ò̠", "ùx":"ù̠",           # Nasal high tone
      "Àx":"À̠", "Èx":"È̠", "Ìx":"Ì̠", "Òx":"Ò̠", "Ùx":"Ù̠",           # Nasal high tone, uppercase
      "âx":"â̠", "êx":"ê̠", "îx":"î̠", "ôx":"ô̠", "ûx":"û̠",           # Nasal rising tone
      "Âx":"Â̠", "Êx":"Ê̠", "Îx":"Î̠", "Ôx":"Ô̠", "Ûx":"Û̠",           # Nasal rising tone, uppercase
      "éq":"ë́", "óq":"ö́", "èq":"ë̀", "òq":"ö̀", "êq":"ë̂", "ôq":"ö̂", # Lax vowels
      "Éq":"Ë́", "Óq":"Ö́", "Èq":"Ë̀", "Òq":"Ö̀", "Êq":"Ë̂", "Ôq":"Ö̂"  # Lax vowels,  uppercase
    }

    for c in diacriticChars: bribriOutput = bribriOutput.replace(c, diacriticChars.get(c))
    for c in punctuation: bribriOutput = bribriOutput.replace(c, punctuation.get(c))

  elif (outputOrthography=="jara"):

    diacriticChars = {
      "ã":"ã","ẽ":"ẽ","ĩ":"ĩ","õ":"õ","ũ":"ũ",                    # Nasal low tone
      "Ã":"Ã", "Ẽ":"Ẽ","Ĩ":"Ĩ", "Õ":"Õ","Ũ":"Ũ",                  # Nasal low tone, uppercase
      "áx":"ã́","éx":"ẽ́","íx":"ĩ́","óx":"ṍ","úx":"ṹ",               # Nasal falling tone
      "Áx":"Ã́","Éx":"Ẽ́","Íx":"Ĩ́","Óx":"Ṍ","Úx":"Ṹ",               # Nasal falling tone, uppercase
      "àx":"ã̀","èx":"ẽ̀","ìx":"ĩ̀","òx":"õ̀","ùx":"ũ̀",               # Nasal high tone
      "Àx":"Ã̀","Èx":"Ẽ̀","Ìx":"Ĩ̀","Òx":"Õ̀","Ùx":"Ũ̀",               # Nasal high tone, uppercase
      "âx":"ã̂","êx":"ẽ̂","îx":"ĩ̂","ôx":"õ̂","ûx":"ũ̂",               # Nasal rising tone
      "Âx":"Ã̂","Êx":"Ẽ̂","Îx":"Ĩ̂","Ôx":"Õ̂","Ûx":"Ũ̂",               # Nasal rising tone, uppercase
      "éq":"ë́","óq":"ö́","èq":"ë̀","òq":"ö̀","êq":"ë̂","ôq":"ö̂",      # Lax vowels
      "Éq":"Ë́", "Óq":"Ö́", "Èq":"Ë̀", "Òq":"Ö̀", "Êq":"Ë̂", "Ôq":"Ö̂"  # Lax vowels,  uppercase
    }

    coromaChanges = {
        "tk":"tch",
        "Ñãlàx":"Ñõlòx","ñãlàx":"ñõlòx",                   # road
        "Káx":"Kóx","káx":"kóx",                           # place
        "Kàxlĩ":"Kòxlĩ","kàxlĩ":"kòxlĩ",                   # rain
        "Káxwötã'":"Kóxwötã'","káxwötã'":"kóxwötã'",       # need
        "Káxwötã":"Kóxwötã","káxwötã":"kóxwötã",           # need
        "Dakarò":"Krò","dakarò":"krò"                      # chicken
    }

    for c in coromaChanges: bribriOutput = bribriOutput.replace(c, coromaChanges.get(c))
    for c in diacriticChars: bribriOutput = bribriOutput.replace(c, diacriticChars.get(c))
    for c in punctuation: bribriOutput = bribriOutput.replace(c, punctuation.get(c))
    
  else:

    print("Please specify one of the two available systems: constenla, jara")

  return(bribriOutput)

Ejemplo del uso de la función.

In [None]:
inputSentence = "Ye' shkèxnã bua'ë ."

print("Input           : " + inputSentence)

outputConstenla = convertToHumanSpelling(inputSentence,"constenla")
print("Output Constenla: " + outputConstenla)

outputJara = convertToHumanSpelling(inputSentence,"jara")
print("Output Jara     : " + outputJara)

Input           : Ye' shkèxnã bua'ë .
Output Constenla: Ye' shkè̠na̠ bua'ë.
Output Jara     : Ye' shkẽ̀nã bua'ë.
