The following examples show how to convert between different representations of Pinyin.
Create the Converter and convert from standard Pinyin to Pinyin with tones represented by numbers:
>>> from cjklib.reading import * >>> targetOp = operator.PinyinOperator(toneMarkType='numbers') >>> pinyinConv = converter.PinyinDialectConverter( ... targetOperators=[targetOp]) >>> pinyinConv.convert(u'hànzì', 'Pinyin', 'Pinyin') u'han4zi4'
Convert Pinyin written with numbers, the ü (u with umlaut) replaced by character v and omitted fifth tone to standard Pinyin:
>>> sourceOp = operator.PinyinOperator(toneMarkType='numbers', ... yVowel='v', missingToneMark='fifth') >>> pinyinConv = converter.PinyinDialectConverter( ... sourceOperators=[sourceOp]) >>> pinyinConv.convert('nv3hai2zi', 'Pinyin', 'Pinyin') u'nǚháizi'
Or more elegantly:
>>> f = ReadingFactory() >>> f.convert('nv3hai2zi', 'Pinyin', 'Pinyin', ... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'v', ... 'missingToneMark': 'fifth'}) u'nǚháizi'
Decompose the reading of a dictionary entry from CEDICT into syllables and convert the ü-vowel and forms of Erhua sound:
>>> pinyinFrom = operator.PinyinOperator(toneMarkType='numbers', ... yVowel='u:', Erhua='oneSyllable') >>> syllables = pinyinFrom.decompose('sun1nu:r3') >>> print syllables ['sun1', 'nu:r3'] >>> pinyinTo = operator.PinyinOperator(toneMarkType='numbers', ... Erhua='twoSyllables') >>> pinyinConv = converter.PinyinDialectConverter( ... sourceOperators=[pinyinFrom], targetOperators=[pinyinTo]) >>> pinyinConv.convertEntities(syllables, 'Pinyin', 'Pinyin') [u'sun1', u'nü3', u'r5']
Or more elegantly with entities already decomposed:
>>> f.convertEntities(['sun1', 'nu:r3'], 'Pinyin', 'Pinyin', ... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'u:', ... 'Erhua': 'oneSyllable'}, ... targetOptions={'toneMarkType': 'numbers', ... 'Erhua': 'twoSyllables'}) [u'sun1', u'nü3', u'r5']
Fix cosmetic errors in Pinyin input (note tone mark and apostrophe):
>>> f.convert(u"Wǒ peí nǐ qù Xīān.", 'Pinyin', 'Pinyin') u"Wǒ péi nǐ qù Xī'ān."
Fix more errors in Pinyin input (note diacritics):
>>> string = u"Wŏ peí nĭ qù Xīān." >>> dialect = operator.PinyinOperator.guessReadingDialect(string) >>> f.convert(string, 'Pinyin', 'Pinyin', sourceOptions=dialect) u"Wǒ péi nǐ qù Xī'ān."
.. currentmodule:: cjklib.reading.converter
.. autoclass:: cjklib.reading.converter.PinyinDialectConverter :show-inheritance: :members: :undoc-members: