Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
1163 lines (990 sloc) 43.8 KB
\chap{As Easy as A-B-C? The Lojban Letteral System and Its Uses}
\sect{What's a letteral, anyway?}
James Cooke Brown, the founder of the Loglan Project, coined
the word \q{letteral} (by analogy with \q{numeral}) to mean a
letter of the alphabet, such as \q{f} or \q{z}. A typical
example of its use might be
There are fourteen occurrences of the letteral\n
\T \q{e} in this sentence.
(Don't forget the one within quotation marks.) Using the word
\q{letteral} avoids confusion with \q{letter}, the kind you
write to someone. Not surprisingly, there is a Lojban gismu for
\q{letteral}, namely \q{lerfu}, and this word will be used in
the rest of this chapter.
Lojban uses the Latin alphabet, just as English does, right?
Then why is there a need for a chapter like this? After all,
everyone who can read it already knows the alphabet. The answer
is twofold:
First, in English there are a set of words that correspond
to and represent the English lerfu. These words are rarely
written down in English and have no standard spellings, but if
you pronounce the English alphabet to yourself you will hear
them: ay, bee, cee, dee ... . They are used in spelling out
words and in pronouncing most acronyms. The Lojban equivalents
of these words are standardized and must be documented
Second, English has names only for the lerfu used in writing
English. (There are also English names for Greek and Hebrew
lerfu: English-speakers usually refer to the Greek lerfu
conventionally spelled \q{phi} as \q{fye}, whereas \q{fee}
would more nearly represent the name used by Greek-speakers.
Still, not all English-speakers know these English names.)
Lojban, in order to be culturally neutral, needs a more
comprehensive system that can handle, at least potentially, all
of the world's alphabets and other writing systems.
Letterals have several uses in Lojban: in forming acronyms
and abbreviations, as mathematical symbols, and as pro-sumti
--- the equivalent of English pronouns.
In earlier writings about Lojban, there has been a tendency
to use the word \q{lerfu} for both the letterals themselves and
for the Lojban words which represent them. In this chapter,
that tendency will be ruthlessly suppressed, and the term
\q{lerfu word} will invariably be used for the latter. The
Lojban equivalent would be \q{lerfu valsi} or \q{lervla}.
\sect{A to Z in Lojban, plus one}
The first requirement of a system of lerfu words for any
language is that they must represent the lerfu used to write
the language. The lerfu words for English are a motley crew:
the relationship between \q{doubleyou} and \q{w} is strictly
historical in nature; \q{aitch} represents \q{h} but has no
clear relationship to it at all; and \q{z} has two distinct
lerfu words, \q{zee} and \q{zed}, depending on the dialect of
English in question.
All of Lojban's basic lerfu words are made by one of three
\item[] to get a lerfu word for a vowel, add \q{bu}; to get a lerfu word for a consonant, add \q{y}; the lerfu word for \q{}' is \q{.y'y}.
Therefore, the following table represents the basic Lojban
' a b c d e
.y'y. .abu by. cy. dy. .ebu
f g i j k l
fy. gy. .ibu jy. ky. ly.
m n o p r s
my. ny. .obu py. ry. sy.
t u v x y z
ty. .ubu vy. xy. .ybu zy.
There are several things to note about this table. The
consonant lerfu words are a single syllable, whereas the vowel
and \q{}' lerfu words are two syllables and must be preceded by
pause (since they all begin with a vowel). Another fact, not
evident from the table but important nonetheless, is that
\q{by} and its like are single cmavo of selma'o BY, as is
\q{.y'y}. The vowel lerfu words, on the other hand, are
compound cmavo, made from a single vowel cmavo plus the cmavo
\q{bu} (which belongs to its own selma'o, BU). All of the vowel
cmavo have other meanings in Lojban (logical connectives,
sentence separator, hesitation noise), but those meanings are
irrelevant when \q{bu} follows.
Here are some illustrations of common Lojban words spelled
out using the alphabet above:
ty. .abu ny. ry. .ubu\n
\q{t} \q{a} \q{n} \q{r} \q{u}
ky. .obu .y'y. .abu\n
\q{k} \q{o} \q{}' \q{a}
Spelling out words is less useful in Lojban than in English,
for two reasons: Lojban spelling is phonemic, so there can be
no real dispute about how a word is spelled; and the Lojban
lerfu words sound more alike than the English ones do, since
they are made up systematically. The English words \q{fail} and
\q{vale} sound similar, but just hearing the first lerfu word
of either, namely \q{eff} or \q{vee}, is enough to discriminate
easily between them --- and even if the first lerfu word were
somehow confused, neither \q{vail} nor \q{fale} is a word of
ordinary English, so the rest of the spelling determines which
word is meant. Still, the capability of spelling out words does
exist in Lojban.
Note that the lerfu words ending in \q{y} were written (in
\exref{17.2.1} and \exref{17.2.2}) with pauses after them. It is not strictly necessary
to pause after such lerfu words, but failure to do so can in
some cases lead to ambiguities:
mi cy. claxu\n
I lerfu-\q{c} without\n
I am without (whatever is referred to by)\n
\T the letter \q{c}.
{\noindent}without a pause after \q{cy} would be interpreted as:
(Observative:) doctor-without\n
Something unspecified is without a doctor.
A safe guideline is to pause after any cmavo ending in \q{y}
unless the next word is also a cmavo ending in \q{y}. The
safest and easiest guideline is to pause after all of them.
\sect{Upper and lower cases}
Lojban doesn't use lower-case (small) letters and upper-case
(capital) letters in the same way that English does; sentences
do not begin with an upper-case letter, nor do names. However,
upper-case letters are used in Lojban to mark irregular stress
within names, thus:
the name \q{Ivan} in Russian/Slavic pronunciation.
It would require far too many cmavo to assign one for each
upper-case and one for each lower-case lerfu, so instead we
have two special cmavo \q{ga'e} and \q{to'a} representing upper
case and lower case respectively. They belong to the same
selma'o as the basic lerfu words, namely BY, and they may be
freely interspersed with them.
The effect of \q{ga'e} is to change the interpretation of
all lerfu words following it to be the upper-case version of
the lerfu. An occurrence of \q{to'a} causes the interpretation
to revert to lower case. Thus, \q{ga'e .abu} means not \q{a}
but \q{A}, and Ivan's name may be spelled out thus:
.ibu ga'e vy. .abu ny. to'a\n
i \optional{upper} V A N [lower]
The cmavo and compound cmavo of this type will be called
\q{shift words}.
How long does a shift word last? Theoretically, until the
next shift word that contradicts it or until the end of text.
In practice, it is common to presume that a shift word is only
in effect until the next word other than a lerfu word is
It is often convenient to shift just a single letter to
upper case. The cmavo \q{tau}, of selma'o LAU, is useful for
the purpose. A LAU cmavo must always be immediately followed by
a BY cmavo or its equivalent: the combination is grammatically
equivalent to a single BY. (See \sectref{17.14}
for details.)
A likely use of \q{tau} is in the internationally
standardized symbols for the chemical elements. Each element is
represented using either a single upper-case lerfu or one
upper-case lerfu followed by one lower-case lerfu:
tau sy.\n
\optional{single shift} S\n
S (chemical symbol for sulfur)
tau sy. .ibu\n
\optional{single shift} S i\n
Si (chemical symbol for silicon)
If a shift to upper-case is in effect when \q{tau} appears, it
shifts the next lerfu word only to lower case, reversing its
usual effect.
\sect{The universal \q{bu}
So far we have seen \q{bu} only as a suffix to vowel cmavo
to produce vowel lerfu words. Originally, this was the only use
of \q{bu}. In developing the lerfu word system, however, it
proved to be useful to allow \q{bu} to be attached to any word
whatsoever, in order to allow arbitrary extensions of the basic
lerfu word set.
Formally, \q{bu} may be attached to any single Lojban word.
Compound cmavo do not count as words for this purpose. The
special cmavo \q{ba'e}, \q{za'e}, \q{zei}, \q{zo}, \q{zoi},
\q{la'o}, \q{lo'u}, \q{si}, \q{sa}, \q{su}, and \q{fa'o} may
not have \q{bu} attached, because they are interpreted before
\q{bu} detection is done; in particular,
zo bu\n
the word \q{bu}
{\noindent}is needed when discussing \q{bu} in Lojban. It is also illegal
to attach \q{bu} to itself, but more than one \q{bu} may be
attached to a word; thus \q{.abubu} is legal, if ugly. (Its
meaning is not defined, but it is presumably different from
\q{.abu}.) It does not matter if the word is a cmavo, a cmene,
or a brivla. All such words suffixed by \q{bu} are treated
grammatically as if they were cmavo belonging to selma'o BY.
However, if the word is a cmene it is always necessary to
precede and follow it by a pause, because otherwise the cmene
may absorb preceding or following words.
The ability to attach \q{bu} to words has been used
primarily to make names for various logograms and other unusual
characters. For example, the Lojban name for the \q{happy face}
is \q{.uibu}, based on the attitudinal \q{.ui} that means
\q{happiness}. Likewise, the \q{smiley face}, written \q{:-)}
and used on computer networks to indicate humor, is called
\q{zo'obu} The existence of these names does not mean that you
should insert \q{.uibu} into running Lojban text to indicate
that you are happy, or \q{zo'obu} when something is funny;
instead, use the appropriate attitudinal directly.
Likewise, \q{joibu} represents the ampersand character,
\q{\&', based on the cmavo ``joi} meaning \q{mixed and}.
Many more such lerfu words will probably be invented in
The \q{.} and \q{,} characters used in Lojbanic writing to
represent pause and syllable break respectively have been
assigned the lerfu words \q{denpa bu} (literally, \q{pause bu})
and \q{slaka bu} (literally, \q{syllable bu}). The written
space is mandatory here, because \q{denpa} and \q{slaka} are
normal gismu with normal stress: \q{denpabu} would be a fu'ivla
(word borrowed from another language into Lojban) stressed
\q{denPAbu}. No pause is required between \q{denpa} (or
\q{slaka}) and \q{bu}, though.
\sect{Alien alphabets}
As stated in \sectref{17.1}, Lojban's goal of
cultural neutrality demands a standard set of lerfu words for
the lerfu of as many other writing systems as possible. When we
meet these lerfu in written text (particularly, though not
exclusively, mathematical text), we need a standard Lojbanic
way to pronounce them.
There are certainly hundreds of alphabets and other writing
systems in use around the world, and it is probably an
unachievable goal to create a single system which can express
all of them, but if perfection is not demanded, a usable system
can be created from the raw material which Lojban provides.
One possibility would be to use the lerfu word associated
with the language itself, Lojbanized and with \q{bu} added.
Indeed, an isolated Greek \q{alpha} in running Lojban text is
probably most easily handled by calling it \q{.alfas. bu}. Here
the Greek lerfu word has been made into a Lojbanized name by
adding \q{s} and then into a Lojban lerfu word by adding
\q{bu}. Note that the pause after \q{.alfas.} is still
Likewise, the easiest way to handle the Latin letters \q{h},
\q{q}, and \q{w} that are not used in Lojban is by a consonant
lerfu word with \q{bu} attached. The following assignments have
been made:
.y'y.bu h
ky.bu q
vy.bu w
As an example, the English word \q{quack} would be spelled in
Lojban thus:
ky.bu .ubu .abu cy. ky.\n
\q{q} \q{u} \q{a} \q{c} \q{k}
Note that the fact that the letter \q{c} in this word has
nothing to do with the sound of the Lojban letter \q{c} is
irrelevant; we are spelling an English word and English rules
control the choice of letters, but we are speaking Lojban and
Lojban rules control the pronunciations of those letters.
A few more possibilities for Latin-alphabet letters used in
languages other than English:
ty.bu \th(thorn)
dy.bu \&dh; (edh)
However, this system is not ideal for all purposes. For one thing, it is
verbose. The native lerfu words are often quite long, and with \q{bu} added
they become even longer: the worst-case Greek lerfu word would be
\q{.Omikron. bu}, with four syllables and two mandatory pauses. In addition,
alphabets that are used by many languages have separate sets of lerfu words
for each language, and which set is Lojban to choose?
The alternative plan, therefore, is to use a shift word similar
to those introduced in \sectref{17.3}. After the
appearance of such a shift word, the regular lerfu words are
re-interpreted to represent the lerfu of the alphabet now in
use. After a shift to the Greek alphabet, for example, the
lerfu word \q{ty} would represent not Latin \q{t} but Greek
\q{tau}. Why \q{tau}? Because it is, in some sense, the closest
counterpart of \q{t} within the Greek lerfu system. In
principle it would be all right to map \q{ty.} to \q{phi} or
even \q{omega}, but such an arbitrary relationship would be
extremely hard to remember.
Where no obvious closest counterpart exists, some more or
less arbitrary choice must be made. Some alien lerfu may simply
not have any shifted equivalent, forcing the speaker to fall
back on a \q{bu} form. Since a \q{bu} form may mean different
things in different alphabets, it is safest to employ a shift
word even when \q{bu} forms are in use.
Shifts for several alphabets have been assigned cmavo of
selma'o BY:
lo'a Latin/Roman/Lojban alphabet
ge'o Greek alphabet
je'o Hebrew alphabet
jo'o Arabic alphabet
ru'o Cyrillic alphabet
The cmavo \q{zai} (of selma'o LAU) is used to create shift
words to still other alphabets. The BY word which must follow
any LAU cmavo would typically be a name representing the
alphabet with \q{bu} suffixed:
zai .devanagar. bu\n
Devanagari (Hindi) alphabet
zai .katakan. bu\n
Japanese katakana syllabary
zai .xiragan. bu\n
Japanese hiragana syllabary
Unlike the cmavo above, these shift words have not been
standardized and probably will not be until someone actually
has a need for them. (Note the \q{.} characters marking leading
and following pauses.)
In addition, there may be multiple visible representations
within a single alphabet for a given letter: roman vs. italics,
handwriting vs. print, Bodoni vs. Helvetica. These traditional
\q{font and face} distinctions are also represented by shift
words, indicated with the cmavo \q{ce'a} (of selma'o LAU) and a
following BY word:
ce'a .xelveticas. bu\n
Helvetica font
ce'a .xancisk. bu\n
ce'a .pavrel. bu\n
12-point font size
The cmavo \q{na'a} (of selma'o BY) is a universal shift-word
cancel: it returns the interpretation of lerfu words to the
default of lower-case Lojban with no specific font. It is more
general than \q{lo'a}, which changes the alphabet only,
potentially leaving font and case shifts in place.
Several sections at the end of this chapter contain tables
of proposed lerfu word assignments for various languages.
\sect{Accent marks and compound lerfu words}
Many languages that make use of the Latin alphabet add
special marks to some of the lerfu they use. French, for
example, uses three accent marks above vowels, called (in
English) \q{acute}, \q{grave}, and \q{circumflex}. Likewise,
German uses a mark called \q{umlaut}; a mark which looks the
same is also used in French, but with a different name and
These marks may be considered lerfu, and each has a
corresponding lerfu word in Lojban. So far, no problem. But the
marks appear over lerfu, whereas the words must be spoken (or
written) either before or after the lerfu word representing the
basic lerfu. Typewriters (for mechanical reasons) and the
computer programs that emulate them usually require their users
to type the accent mark before the basic lerfu, whereas in
speech the accent mark is often pronounced afterwards (for
example, in German \q{a umlaut} is preferred to ``umlaut
Lojban cannot settle this question by fiat. Either it must be
left up to default interpretation depending on the language in
question, or the lerfu-word compounding cmavo \q{tei} (of selma'o
TEI) and \q{foi} (of selma'o FOI) must be used. These cmavo are
always used in pairs; any number of lerfu words may appear between
them, and the whole is treated as a single compound lerfu word. The
French word ``\'{e}\'{e}', with acute accent marks on both
\q{e} lerfu, could be spelled as:
tei .ebu .akut. bu foi ty. tei .akut. bu .ebu foi\n
( \q{e} acute ) \q{t} ( acute \q{e} )
{\noindent}and it does not matter whether \q{akut. bu}
appears before or after \q{.ebu}; the \q{tei ... foi} grouping
guarantees that the acute accent is associated with the correct
lerfu. Of course, the level of precision represented by \exref{17.6.1} would rarely be required: it might
be needed by a Lojban-speaker when spelling out a French word
for exact transcription by another Lojban-speaker who did not
know French.
This system breaks down in languages which use more than one
accent mark on a single lerfu; some other convention must be
used for showing which accent marks are written where in that
case. The obvious convention is to represent the mark nearest
the basic lerfu by the lerfu word closest to the word
representing the basic lerfu. Any remaining ambiguities must be
resolved by further conventions not yet established.
Some languages, like Swedish and Finnish, consider certain
accented lerfu to be completely distinct from their unaccented
equivalents, but Lojban does not make a formal distinction,
since the printed characters look the same whether they are
reckoned as separate letters or not. In addition, some
languages consider certain 2-letter combinations (like \q{ll}
and \q{ch} in Spanish) to be letters; this may be represented
by enclosing the combination in \q{tei ... foi}.
In addition, when discussing a specific language, it is
permissible to make up new lerfu words, as long as they are
either explained locally or well understood from context: thus
Spanish \q{ll} or Croatian \q{lj} could be called \q{libu}, but
that usage would not necessarily be universally understood. \sectref{17.19} contains a table of proposed
lerfu words for some common accent marks.
\sect{Punctuation marks}
Lojban does not have punctuation marks as such: the denpa bu
and the slaka bu are really a part of the alphabet. Other
languages, however, use punctuation marks extensively. As yet,
Lojban does not have any words for these punctuation marks, but
a mechanism exists for devising them: the cmavo \q{lau} of
selma'o LAU. \q{lau} must always be followed by a BY word; the
interpretation of the BY word is changed from a lerfu to a
punctuation mark. Typically, this BY word would be a name or
brivla with a \q{bu} suffix.
Why is \q{lau} necessary at all? Why not just use a
\q{bu}-marked word and announce that it is always to be
interpreted as a punctuation mark? Primarily to avoid
ambiguity. The \q{bu} mechanism is extremely open-ended, and it
is easy for Lojban users to make up \q{bu} words without
bothering to explain what they mean. Using the \q{lau} cmavo
flags at least the most important of such nonce lerfu words as
having a special function: punctuation. (Exactly the same
argument applies to the use of \q{zai} to signal an alphabet
shift or \q{ce'a} to signal a font shift.)
Since different alphabets require different punctuation
marks, the interpretation of a \q{lau}-marked lerfu word is
affected by the current alphabet shift and the current font
\sect{What about Chinese characters?}
Chinese characters (\q{han<sup>4</sup>zi<sup>4</sup>} in
Chinese, \q{kanji} in Japanese) represent an entirely different
approach to writing from alphabets or syllabaries. (A
syllabary, such as Japanese hiragana or Amharic writing, has
one lerfu for each syllable of the spoken language.) Very
roughly, Chinese characters represent single elements of
meaning; also very roughly, they represent single syllables of
spoken Chinese. There is in principle no limit to the number of
Chinese characters that can exist, and many thousands are in
regular use.
It is hopeless for Lojban, with its limited lerfu and shift
words, to create an alphabet which will match this diversity.
However, there are various possible ways around the
First, both Chinese and Japanese have standard
Latin-alphabet representations, known as \q{pinyin} for Chinese
and \q{romaji} for Japanese, and these can be used. Thus, the
word \q{han<sup>4</sup>zi<sup>4</sup>} is conventionally
written with two characters, but it may be spelled out as:
.y'y.bu .abu ny. vo zy. .ibu vo\n
\q{h} \q{a} \q{n} 4 \q{z} \q{i} 4
The cmavo \q{vo} is the Lojban digit \q{4}. It is grammatical
to intersperse digits (of selma'o PA) into a string of lerfu
words; as long as the first cmavo is a lerfu word, the whole
will be interpreted as a string of lerfu words. In Chinese, the
digits can be used to represent tones. Pinyin is more usually
written using accent marks, the mechanism for which was
explained in \sectref{17.6}.
The Japanese company named \q{Mitsubishi} in English is
spelled the same way in romaji, and could be spelled out in
Lojban thus:
my. .ibu ty. sy. .ubu by. .ibu sy. .y'y.bu .ibu\n
\q{m} \q{i} \q{t} \q{s} \q{u} \q{b} \q{i} \q{s} \q{h} \q{i}
Alternatively, a really ambitious Lojbanist could assign lerfu
words to the individual strokes used to write Chinese
characters (there are about seven or eight of them if you are a
flexible human being, or about 40 if you are a rigid computer
program), and then represent each character with a \q{tei}, the
stroke lerfu words in the order of writing (which is
standardized for each character), and a \q{foi}. No one has as
yet attempted this project.
\sect{lerfu words as pro-sumti}
So far, lerfu words have only appeared in Lojban text when
spelling out words. There are several other grammatical uses of
lerfu words within Lojban. In each case, a single lerfu word or
more than one may be used. Therefore, the term \q{lerfu string}
is introduced: it is short for ``sequence of one or more lerfu
A lerfu string may be used as a pro-sumti (a sumti which
refers to some previous sumti), just like the pro-sumti
\q{ko'a}, \q{ko'e}, and so on:
.abu prami by.\n
A loves B
In \exref{17.9.1}, \q{.abu} and \q{by.}
represent specific sumti, but which sumti they represent must
be inferred from context.
Alternatively, lerfu strings may be assigned by \q{goi}, the
regular pro-sumti assignment cmavo:
le gerku goi gy. cu xekri .i gy. klama le zdani\n
The dog, or G, is black. G goes to the house.
There is a special rule that sometimes makes lerfu strings more
advantageous than the regular pro-sumti cmavo. If no assignment
can be found for a lerfu string (especially a single lerfu
word), it can be assumed to refer to the most recent sumti
whose name or description begins in Lojban with that lerfu. So\exref{17.9.2} can be rephrased:
le gerku cu xekri. .i gy. klama le zdani\n
The dog is black. G goes to the house.
(A less literal English translation would use \q{D} for \q{dog}
Here is an example using two names and longer lerfu
la stivn. mark. djonz. merko\n
\T .i la .aleksandr. paliitc. kuzNIETsyf. rusko\n
\T .i symyjy. tavla .abupyky. bau la lojban.\n
Steven Mark Jones is-American.\n
\T Alexander Pavlovitch Kuznetsov is-Russian.\n
\T SMJ talks-to APK in Lojban.
Perhaps Alexander's name should be given as \q{ru'o.abupyky}
What about
.abu dunda by. cy.\n
A gives B C
Does this mean that A gives B to C? No. \q{by. cy.} is a single
lerfu string, although written as two words, and represents a
single pro-sumti. The true interpretation is that A gives BC to
someone unspecified. To solve this problem, we need to
introduce the elidable terminator \q{boi} (of selma'o BOI).
This cmavo is used to terminate lerfu strings and also strings
of numerals; it is required when two of these appear in a row,
as here. (The other reason to use \q{boi} is to attach a free
modifier --- subscript, parenthesis, or what have you --- to a
lerfu string.) The correct version is:
.abu \optional{boi} dunda by. boi cy. [boi]\n
A gives B to C
{\noindent}where the two occurrences of \q{boi} in brackets are elidable,
but the remaining occurrence is not. Likewise:
xy. boi ro \optional{boi} prenu cu prami\n
X all persons loves.\n
X loves everybody.
{\noindent}requires the first \q{boi} to separate the lerfu string \q{xy.}
from the digit string \q{ro}.
\sect{References to lerfu}
The rules of \sectref{17.9} make it impossible
to use unmarked lerfu words to refer to lerfu themselves. In
the sentence:
.abu. cu lerfu\n
A is-a-letteral.
{\noindent}the hearer would try to find what previous sumti \q{.abu}
refers to. The solution to this problem makes use of the cmavo
\q{me'o} of selma'o LI, which makes a lerfu string into a sumti
representing that very string of lerfu. This use of \q{me'o} is
a special case of its mathematical use, which is to introduce a
mathematical expression used literally rather than for its
me'o .abu cu lerfu\n
the-expression \q{a} is-a-letteral.
Now we can translate \exref{17.1.1} into
dei vasru vo lerfu\n
\T po'u me'o .ebu\n
this-sentence contains four letterals\n
\T which-are the-expression \q{e}.\n
This sentence contains four \q{e}s.
Since the Lojban sentence has only four \q{e} lerfu rather
than fourteen, the translation is not a literal one --- but \exref{17.10.4} is a Lojban truth just as \exref{17.1.1} is an English truth.
Coincidentally, the colloquial English translation of \exref{17.10.4} is also true!
The reader might be tempted to use quotation with ``lu ...
li'u'' instead of \q{me'o}, producing:
10.4.5) lu .abu li'u cu lerfu
\optional{quote} .abu [unquote] is-a-letteral.
(The single-word quote \q{zo} cannot be used, because \q{.abu}
is a compound cmavo.) But \exref{17.10.4} is
false, because it says:
The word \q{.abu} is a letteral
{\noindent}which is not the case; rather, the thing symbolized by the word
\q{.abu} is a letteral. In Lojban, that would be:
la'e lu .abu li'u cu lerfu\n
The-referent-of \optional{quote} .abu [unquote] is-a-letteral.
{\noindent}which is correct.
\sect{Mathematical uses of lerfu strings}
This chapter is not about Lojban mathematics, which is
explained in \chapref{18}, so the
mathematical uses of lerfu strings will be listed and
exemplified but not explained.
A lerfu string as mathematical variable:
li .abu du li by. su'i cy.\n
the-number a equals the-number b plus c\n
a = b + c
A lerfu string as function name (preceded by \q{ma'o} of
selma'o MAhO):
li .y.bu du li ma'o fy. boi xy.\n
the-number y equals the number the-function f of x\n
y = f(x)
Note the \q{boi} here to separate the lerfu strings \q{fy}
and \q{xy}.
A lerfu string as selbri (followed by a cmavo of selma'o
le vi ratcu le'i mi ratcu\n
the here rat is-nth-of the-set-of my rats\n
This rat is my Nth rat.
A lerfu string as utterance ordinal (followed by a cmavo of
selma'o MAI):
A lerfu string as subscript (preceded by \q{xi} of selma'o XI):
xy. xi ky.\n
x sub k
A lerfu string as quantifier (enclosed in \q{vei ... ve'o}
vei ny. \optional{ve'o} lo prenu\n
( \q{n} ) persons
The parentheses are required because \q{ny. lo prenu} would be
two separate sumti, \q{ny.} and \q{lo prenu}. In general, any
mathematical expression other than a simple number must be in
parentheses when used as a quantifier; the right parenthesis
mark, the cmavo \q{ve'o}, can usually be elided.
All the examples above have exhibited single lerfu words
rather than lerfu strings, in accordance with the conventions
of ordinary mathematics. A longer lerfu string would still be
treated as a single variable or function name: in Lojban,
\q{.abu by. cy.} is not the multiplication \q{a x b x c} but is
the variable \q{abc}. (Of course, a local convention could
exist that made the value of a variable like \q{abc}, with a
multi-lerfu-word name, equal to the values of the variables
\q{a}, \q{b}, and \q{c} multiplied together.)
There is a special rule about shift words in mathematical
text: shifts within mathematical expressions do not affect
lerfu words appearing outside
<dt>mathematical expressions, and vice versa.</dt>
An acronym is a name constructed of lerfu. English examples
are \q{DNA}, \q{NATO}, \q{CIA}. In English, some of these are
spelled out (like \q{DNA} and \q{CIA}) and others are
pronounced more or less as if they were ordinary English words
(like \q{NATO}). Some acronyms fluctuate between the two
pronunciations: \q{SQL} may be \q{ess cue ell} or
In Lojban, a name can be almost any sequence of sounds that
ends in a consonant and is followed by a pause. The easiest way
to Lojbanize acronym names is to glue the lerfu words together,
using \q{}' wherever two vowels would come together (pauses are
illegal in names) and adding a final consonant:
la dyny'abub. .i la ny'abuty'obub.\n
.i la cy'ibu'abub. .i la sykybulyl.\n
.i la .ibubymym. .i la ny'ybucyc.\n
There is no fixed convention for assigning the final consonant.
In \exref{17.12.1}, the last consonant of the
lerfu string has been replicated into final position.
Some compression can be done by leaving out \q{bu} after
vowel lerfu words (except for \q{.y.bu}, wherein the \q{bu}
cannot be omitted without ambiguity). Compression is moderately
important because it's hard to say long names without
introducing an involuntary (and illegal) pause:
la dyny'am. .i la ny'aty'om.\n
.i la cy'i'am. .i la sykybulym.\n
.i la .ibymym. .i la ny'ybucym.\n
In \exref{17.12.2}, the final consonant
\q{m} stands for \q{merko}, indicating the source culture of
these acronyms.
Another approach, which some may find easier to say and
which is compatible with older versions of the language that
did not have a \q{}' character, is to use the consonant \q{z}
instead of \q{}':
la dynyzaz. .i la nyzatyzoz.\n
.i la cyzizaz. .i la sykybulyz.\n
.i la .ibymyz. .i la nyzybucyz.\n
One more alternative to these lengthy names is to use the lerfu
string itself prefixed with \q{me}, the cmavo that makes sumti
into selbri:
la me dy ny. .abu\n
that-named what-pertains-to \q{d} \q{n} \q{a}
This works because \q{la}, the cmavo that normally
introduces names used as sumti, may also be used before a
predicate to indicate that the predicate is a (meaningful)
la cribe cu ciska\n
that-named \q{Bear} writes\n
Bear is a writer
\exref{17.12.5} does not of course refer to a
bear (\q{le cribe} or \q{lo cribe}) but to something else,
probably a person, named \q{Bear}. Similarly, ``me dy ny.
.abu'' is a predicate which can be used as a name, producing a
kind of acronym which can have pauses between the individual
lerfu words.
\sect{Computerized character codes}
Since the first application of computers to non-numerical
information, character sets have existed, mapping numbers
(called \q{character codes}) into selected lerfu, digits, and
punctuation marks (collectively called \q{characters}).
Historically, these character sets have only covered the
English alphabet and a few selected punctuation marks.
International efforts are now underway to create a unified
character set that can represent essentially all the characters
in essentially all the world's writing systems. Lojban can take
advantage of these encoding schemes by using the cmavo \q{se'e}
(of selma'o BY). This cmavo is conventionally followed by digit
cmavo of selma'o PA representing the character code, and the
whole string indicates a single character in some computerized
character set:
me'o se'ecixa cu lerfu\n
\T la .asycy'i'is. loi merko rupnu\n
the-expression \optional{code} 36 is-a-letteral\n
\T in-set ASCII\n
\T for-the-mass-of American currency-units.\n
The character code 36 in ASCII represents\n
\T American dollars.\n
\q{\&ollar;} represents American dollars.
Understanding \exref{17.13.1} depends on
knowing the value in the ASCII character set (one of the
simplest and oldest) of the \q{\&ollar;} character.
Therefore, the \q{se'e} convention is only intelligible to
those who know the underlying character set. For precisely
specifying a particular character, however, it has the
advantages of unambiguity and (relative) cultural neutrality,
and therefore Lojban provides a means for those with access to
descriptions of such character sets to take advantage of them.
As another example, the Unicode character set (also known as
ISO 10646) represents the international symbol of peace, an
inverted trident in a circle, using the base-16 value 262E. In
a suitable context, a Lojbanist may say:
me'o se'erexarerei sinxa le ka panpi\n
the-expression \optional{code} 262E is-a-sign-of\n
\T the quality-of being-at-peace
When a \q{se'e} string appears in running discourse, some
metalinguistic convention must specify whether the number is
base 10 (as above) or some other base, and which character set
is in use.
\sect{List of all auxiliary lerfu-word cmavo}
cmavo selma'o meaning
bu BU makes previous word into
a lerfu word
ga'e BY upper case shift
to'a BY lower case shift
tau LAU case-shift next lerfu word only
lo'a BY Latin/Lojban alphabet shift
ge'o BY Greek alphabet shift
je'o BY Hebrew alphabet shift
jo'o BY Arabic alphabet shift
ru'o BY Cyrillic alphabet shift
se'e BY following digits are
a character code
na'a BY cancel all shifts
zai LAU following lerfu word
specifies alphabet
ce'a LAU following lerfu word
specifies font
lau LAU following lerfu word
is punctuation
tei TEI start compound lerfu word
foi FOI end compound lerfu word
Note that LAU cmavo must be followed by a BY cmavo or the
equivalent, where \q{equivalent} means: either any Lojban word
followed by \q{bu}, another LAU cmavo (and its required
sequel), or a \q{tei ... foi} compound cmavo.
\sect{Proposed lerfu words --- introduction}
The following sections contain tables of proposed lerfu
words for some of the standard alphabets supported by the
Lojban lerfu system. The first column of each list is the lerfu
(actually, a Latin-alphabet name sufficient to identify it).
The second column is the proposed name-based lerfu word, and
the third column is the proposed lerfu word in the system based
on using the cmavo of selma'o BY with a shift word.
These tables are not meant to be authoritative (several
authorities within the Lojban community have niggled over them
extensively, disagreeing with each other and sometimes with
themselves). They provide a working basis until actual usage is
available, rather than a final resolution of lerfu word
problems. Probably the system presented here will evolve
somewhat before settling down into a final, conventional
For Latin-alphabet lerfu words, see \hyperref[sec:17:2]{Section
2} (for Lojban) and \sectref{17.5} (for
non-Lojban Latin-alphabet lerfu).
\sect{Proposed lerfu words for the Greek alphabet}
alpha .alfas. bu .abu
beta .betas. bu by
gamma .gamas. bu gy
delta .deltas. bu dy
epsilon .Epsilon. bu .ebu
zeta .zetas. bu zy
eta .etas. bu .e'ebu
theta .tetas. bu ty. bu
iota .iotas. bu .ibu
kappa .kapas. bu ky
lambda .lymdas. bu ly
mu .mus. bu my
nu .nus. bu ny
xi .ksis. bu ksis. bu
omicron .Omikron. bu .obu
pi .pis. bu py
rho .ros. bu ry
sigma .sigmas. bu sy
tau .taus. bu ty
upsilon .Upsilon. bu .ubu
phi .fis. bu py. bu
chi .xis. bu ky. bu
psi .psis. bu psis. bu
omega .omegas. bu .o'obu
rough .dasei,as. bu .y'y
smooth .psiles. bu xutla bu
\sect{Proposed lerfu words for the Cyrillic alphabet}
The second column in this listing is based on the historical
names of the letters in Old Church Slavonic. Only those letters
used in Russian are shown; other languages require more letters
which can be devised as needed.
a .azys. bu .abu
b .bukys. bu by
v .vedis. bu vy
g .glagolis. bu gy
d .dobros. bu dy
e .iestys. bu .ebu
zh .jivet. bu jy
z .zemlias. bu zy
i .ije,is. bu .ibu
short i .itord. bu
k .kakos. bu ky
l .liudi,ies. bu ly
m .myslites. bu my
n .naciys. bu ny
o .onys. bu .obu
p .pokois. bu py
r .riytsis. bu ry
s .slovos. bu sy
t .tvriydos. bu ty
u .ukys. bu .ubu
f .friytys. bu fy
kh .xerys. bu xy
ts .tsis. bu tsys. bu
ch .tcriyviys. bu tcys. bu
sh .cas. bu cy
shch .ctas. bu ctcys. bu
hard sign .ier. bu jdari bu
yeri .ierys. bu .y.bu
soft sign .ieriys. bu ranti bu
reversed e .ecarn. bu
yu .ius. bu .iubu
ya .ias. bu .iabu
\sect{Proposed lerfu words for the Hebrew alphabet}
aleph .alef. bu .alef. bu
bet .bet. bu by
gimel .gimel. bu gy
daled .daled. bu dy
he .xex. bu .y'y
vav .vav. bu vy
zayin .zai,in. bu zy
khet .xet. bu xy. bu
tet .tet. bu ty. bu
yud .iud. bu .iud. bu
kaf .kaf. bu ky
lamed .LYmed. bu ly
mem .mem. bu my
nun .nun. bu ny
samekh .samex. bu samex. bu
ayin .ai,in. bu .ai,in bu
pe .pex. bu py
tzadi .tsadik. bu tsadik. bu
quf .kuf. bu ky. bu
resh .rec. bu ry
shin .cin. bu cy
sin .sin. bu sy
taf .taf. bu ty.
dagesh .daGEC. bu daGEC. bu
hiriq .xirik. bu .ibu
tzeirekh .tseirex. bu .eibu
segol .seGOL. bu .ebu
qubbutz .kubuts. bu .ubu
qamatz .kamats. bu .abu
patach .patax. bu .a'abu
sheva .cyVAS. bu .y.bu
kholem .xolem. bu .obu
shuruq .curuk. bu .u'ubu
19. Proposed lerfu words for some accent marks and multiple
This list is intended to be suggestive, not complete: there
are lerfu such as Polish \q{dark} l and Maltese h-bar that do
not yet have symbols.
acute .akut. bu or
.pritygal. bu [pritu galtu]
grave .grav. bu
or .zulgal. bu [zunle galtu]
circumflex .cirkumfleks. bu
or .midgal. bu [midju galtu]
tilde .tildes. bu
macron .makron. bu
breve .brevis. bu
over-dot .garmoc. bu [gapru mokca]
umlaut/trema relmoc. bu [re mokca]
over-ring .garjin. bu [gapru djine]
cedilla .seDIlys. bu
double-acute .re'akut. bu [re akut.]
ogonek .ogoniek. bu
hacek .xatcek. bu
ligatured fi tei fy. ibu foi
Danish/Latin ae tei .abu .ebu foi
Dutch ij tei .ibu jy. foi
German es-zed tei sy. zy. foi
\sect{Proposed lerfu words for radio communication}
There is a set of English words which are used, by
international agreement, as lerfu words (for the English
alphabet) over the radio, or in noisy situations where the
utmost clarity is required. Formally they are known as the
\q{ICAO Phonetic Alphabet}, and are used even in
non-English-speaking countries.
This table presents the standard English spellings and
proposed Lojban versions. The Lojbanizations are not
straightforward renderings of the English sounds, but make some
concessions both to the English spellings of the words and to
the Lojban pronunciations of the lerfu (thus \q{carlis. bu},
not \q{tcarlis. bu}).
Alfa .alfas. bu
Bravo .bravos. bu
Charlie .carlis. bu
Delta .deltas. bu
Echo .ekos. bu
Foxtrot .fokstrot. bu
Golf .golf. bu
Hotel .xoTEL. bu
India .indias. bu
Juliet .juliet. bu
Kilo .kilos. bu
Lima .limas. bu
Mike .maik. bu
November .novembr. bu
Oscar .oskar. bu
Papa .paPAS. bu
Quebec .keBEK. bu
Romeo .romios. bu
Sierra .sieras. bu
Tango .tangos. bu
Uniform .Uniform. bu
Victor .viktas. bu
Whiskey .uiskis. bu
X-ray .eksreis. bu
Yankee .iankis. bu
Zulu .zulus. bu
Something went wrong with that request. Please try again.