Skip to content

joniumGit/languager

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Languager

codecov

ISO639 Language Service

This module identifies languages from ISO639-1 and ISO639-3 codes or ISO names and provides a convenient class to access related attributes. It is also possible to lookup codes for languages based on their ISO names. However, the name based lookup will be slower as all language names are compared in lowercase for that.

Basic flows:

  • Code is 2 characters
    • Lookup long code
    • Lookup data
  • Code is 3 characters
    • Lookup data
  • Code is something else
    • Loop all language names to check for match
      • input_language == language_name.lower()
    • Lookup data

This means that that Name based lookup is n-times slower than the other two options. But this really should not make a difference.

Data

The data is taken from iso639-3.sil.org and is stored in the tables folder. Further releases will update these tables.

Code

The code in generator.py generates a single python file that contains all lookup tables and methods.

Language

  • code: The ISO639-3 Code
  • short: ISO639-1 Code if available
  • deprecated: True if the definition is deprecated
  • macro: True if this is in a macrolanguage gropup
  • parent: The parent macrolanguage
  • macros: Any languages belonging to this macrolanguage

Examples

Checking the macrolanguages for Chinese:

from languager import get_language

lang = get_language('zho')
# lang = get_language('zh')
# lang = get_language('Chinese')
# lang = get_language('does not exist', default='zho')

for language in lang.macros:
    print(language)

# czo
# csp
# yue
# cnp
# cmn
# czh
# hak
# nan
# wuu
# cjy
# lzh
# gan
# mnp
# cpx
# hsn
# cdo