Convert traditional orthography into Latin or pronunciation text.
Text is a TypeScript library which transforms traditional orthography into Latin/Romanized text, using the Talk spec. TalkText can be used to render Tone, which is a unique and modern rune-like writing system for pronunciations.
Caveat: It's not always possible to do transform traditional orthography into pronunciation text across every language, especially on a language like English, where it is impossible to generate pronunciation based on written words. You must memorize individual cases in English, and in some other languages. However, some languages do have the ability to get pretty close to correct pronunciation based purely on the native spelling, which is pretty cool. Taking advantage of that fact here!
- Script detection.
- Romanization transliterations of scripts/languages in various forms.
- Structured script data, such as what are the vowels, etc..
- Keyboard layout data for various languages.
npm install @cluesurf/textHere are some API examples.
import detect from '@cluesurf/text/detect'
detect([...'美丽的']) //=> { form: 'chinese', rank: 1 }For these languages you can currently call make:
| language | status |
|---|---|
| akkadian | ✔ |
| arabic | ✔ |
| chinese | ✔ |
| coptic | ✔ |
| devanagari | ✔ |
| finnish | ✔ |
| french | ✔ |
| geez | ✔ |
| georgian | ✔ |
| gothic | ✔ |
| gujarati | ✔ |
| gurmukhi | ✔ |
| hebrew | 🔧 |
| irish | 🔧 |
| italian | 🔧 |
| japanese | 🔧 |
| kannada | 🔧 |
| korean | 🔧 |
| latin | 🔧 |
| malayalam | 🔧 |
| navajo | 🔧 |
| old-norse | 🔧 |
| old-persian | 🔧 |
| oriya | 🔧 |
| pali | 🔧 |
| runic | 🔧 |
| swahili | 🔧 |
| tamil | 🔧 |
| telugu | 🔧 |
| thai | 🔧 |
| tibetan | 🔧 |
| turkish | 🔧 |
| ugaritic | 🔧 |
| vietnamese | 🔧 |
| welsh | 🔧 |
import make, {
symbols,
vowels,
boundVowels,
consonants,
} from '@cluesurf/text/arabic'
make('جَمِيل') //=> "djami_l"
vowels.forEach(console.log)import make from '@cluesurf/text/chinese'
make('měi lì de') //=> "me\\/i li\\ tO"import toWylie from '@cluesurf/text/tibetan/wylie/to'
import fromWylie from '@cluesurf/text/tibetan/wylie/from'
toWylie('རིག་པ་') //=> "rig pa"
fromWylie('rig pa') //=> "རིག་པ"Take the generated TalkText (the
ASCII output from the base make calls), and convert it into a more
compact, human readable, "simplified" form.
import talk from '@cluesurf/talk'
talk('rIg ph~a') //=> "ṙịg pɦa"Take the generated TalkText and convert it into a format compatible with ToneText fonts.
import talk from '@cluesurf/text/chinese'
import tone from '@cluesurf/tone'
tone(talk('měi lì de')) //=> "me8i li6 tO"...which is rendered as:
Here is a table explaining which languages we've looked at so far which can and can't have pronunciations automatically done.
| language | automatic | note |
|---|---|---|
| Chinese (Mandarin) | yes but not perfect | Pinyin can be used to auto generate pronunciations, but it doesn't always accurately reflect how people actually say each word, so it would be better to manually write each pronunciation if possible. |
| Korean | yes but not perfect | |
| Sanskrit | yes | With Devanagari, each sound has an exact pronunciation in Sanskrit, so we can get pretty close to exact pronunciations automatically done. |
| Finnish | yes | |
| Navajo | yes | Since it was fairly recently transcribed intoa Latin alphabet, it is phonetic for the most part. |
| Akkadian | yes | Because it is no longer spoken, we have at least a standard way f representing things. |
| Spanish | yes | Because it is no longer spoken, we have at least a standard way f representing things. |
| Hebrew | partially yes, but only for consonants unless diacritics given | |
| Arabic | partially yes, but only for consonants unless diacritics given | |
| English | no | Too many words need to have pronunciation memorized. |
| Tibetan | no | Modern Tibetan has evolved to where the script no longer is phonetic. |
| Vietnamese | no |
import hebrewSize from '@cluesurf/text/hebrew/size'
import thaiSize from '@cluesurf/text/thai/size'
// test a === b
// (this is the hebrew number for 123)
test('קג', hebrewSize.make(123))
test('๑๒๓', thaiSize.make(123))
test(123, thaiSize.read('๑๒๓'))
function test(a: unknown, b: unknown) {
if (a !== b) {
throw new Error(`${a} != ${b}`)
}
}The goal of this library is to easily convert a number in
JavaScript/TypeScript to a number in any of the worlds writing systems.
So for example, write the number 123 in Hebrew as קג.
Each language / writing system has many quirks on how they handle generating numbers. For example, there are two separate number systems in Korean (because they evolved separately), and in Chinese there are specific numbers for "general usage" and those for "financial usage" (in addition to there being "simplified" and "traditional" variants in both those categories!). Another example is some languages don't count by 10 the way English does, they may count by 5 or 16 or 60 or have some other interesting ways of grouping the numbers, so it can get rather complex potentially, but most cases it's pretty straightforward.
The goal is to, for each writing system in the code folder, create a way to convert a JavaScript number to the native writing system number, using their preferred standard system, and to convert it back from the native writing system format into JavaScript. So 2 functions.
For now, we are only focusing on basic numbers, i.e. "cardinal numbers", not ordinal numbers or other types of numbers.
This library in general has 2 methods per writing system:
make: Generates a number within that writing system, given a regular input number.read: Generates a regular number, given a number in some writing system.
So we have (TODO):
hebrewSize.make(123) // => קג
hebrewSize.read('קג') // => 123See the code folder for the current and future supported later. Once we are closer to finishing them we will document them here in the readme.
MIT
Made by ClueSurf, meditating on the universe ¤. Follow the work on YouTube, X, Instagram, Substack, Facebook, and LinkedIn, and browse more of our open-source work here on GitHub.
