🌐 iso·gloss

ISO 639 and IETF Language Code Lookup Tool

isogloss is a Python–based command–line tool designed for looking up language details based on ISO 639 codes and IETF (BCP-47) language tags. It provides comprehensive information about languages, including their names, native names, and additional details associated with each code or tag.

There is also a web–based version here. The BCP47 parser has some known issues, documented below in the "Errata" section.

Elsewhere, the word isogloss means a boundary line on a map denoting the regional use of a particular linguistic characteristic, but in this case it just seemed to fit.

Features

Lookup language details using ISO 639-1, 639-2/B, 639-2/T, or 639-3 codes.
Lookup language details by language name.
Lookup language details using IETF BCP-47 language tags
- Examples: en-GB, en-US, sv-SE, zh-cmn-Hans-CN-pinyin-ud1-p9t4-x-private1, and so on.

Installation

Clone the repository to your local machine:

git clone https://github.com/thunderpoot/isogloss.git

Create a virtual environment and install requirements

python3.11 -m venv venv
source venv/bin/activate
pip install unidecode

Usage

The script can be run directly from the command line. Below are some examples of how to use it:

To look up information by ISO 639 code:

$ isogloss/isogloss.py -c swe
{
  "639-1": "sv",
  "Scope": "Individual",
  "Type": "Living",
  "Native name(s)": "svenska",
  "Other name(s)": "",
  "639-2/T": "swe",
  "639-2/B": "",
  "639-3": "swe",
  "Name(s)": "Swedish"
}

To look up information by language name:

$ isogloss/isogloss.py -n "egyptian arabic"
{
    "Egyptian Arabic": "arz"
}

Example of lookup via native name:

$ isogloss/isogloss.py -n 日本語
{
    "\u65e5\u672c\u8a9e Nihongo": "jpn"
}

Example of multiple results being found:

$ isogloss/isogloss.py -n norwegian
{
    "Norwegian Nynorsk": "nno",
    "Nynorsk, Norwegian": "nno",
    "Bokm\u00e5l, Norwegian": "nob",
    "Norwegian Bokm\u00e5l": "nob",
    "Norwegian": "nor",
    "Norwegian Sign Language": "nsl",
    "Traveller Norwegian": "rmg"
}

Language names are normalised, allowing for case–insensitive and accent–insensitive matching when searching:

$ isogloss/isogloss.py -n espanol
{
    "Judeo-espa\u00f1ol": "lad",
    "espa\u00f1ol": "spa"
}

To look up information by IETF language tag:

$ isogloss/isogloss.py -i fr-FR
{
    "Language": {
        "639-1": "fr",
        "Scope": "Individual",
        "Type": "Living",
        "Native name(s)": "fran\u00e7ais",
        "Other name(s)": "",
        "639-2/T": "fra",
        "639-2/B": "fre",
        "639-3": "fra",
        "Name(s)": "French"
    },
    "Region": "France"
}

$ isogloss/isogloss.py -i zh-cmn-Hans-CN-pinyin-ud1-p9t4-x-private1
{
    "Primary Language": {
        "639-1": "zh",
        "639-2/B": "chi",
        "639-2/T": "zho",
        "639-3": "zho",
        "Deprecated": false,
        "Name(s)": "Chinese",
        "Native name(s)": "\u4e2d\u6587 Zh\u014dngw\u00e9n; \u6c49\u8bed; \u6f22\u8a9e H\u00e0ny\u01d4",
        "Other name(s)": "",
        "Scope": "Macrolanguage",
        "Type": "Living"
    },
    "Extended Languages": [
        {
            "639-1": "",
            "639-2/B": "",
            "639-2/T": "",
            "639-3": "cmn",
            "Deprecated": false,
            "Name(s)": "Mandarin Chinese",
            "Native name(s)": "",
            "Other name(s)": "",
            "Scope": "Individual",
            "Type": "Living"
        }
    ],
    "Script": "Han (Simplified variant)",
    "Region": "China",
    "Variant": "pinyin",
    "Extension": "ud1-p9t4",
    "Private Use": "x-private1"
}

$ isogloss/isogloss.py -i ar-ajp-apc-apd-Arab-CV-arevela-g-231243-r-sdarre-x-private-x-private1 | jq
{
  "Primary Language": {
    "639-1": "ar",
    "639-2/B": "",
    "639-2/T": "ara",
    "639-3": "ara",
    "Deprecated": false,
    "Name(s)": "Arabic",
    "Native name(s)": "العربية; al'Arabiyyeẗ",
    "Other name(s)": "",
    "Scope": "Macrolanguage",
    "Type": "Living"
  },
  "Extended Languages": [
    {
      "639-1": "",
      "639-2/B": "",
      "639-2/T": "",
      "Deprecated": true,
      "Language Name(s)": "South Levantine Arabic",
      "Language Type": "Living",
      "Native name(s)": "",
      "Other name(s)": "",
      "Scope": "Individual"
    },
    {
      "639-1": "",
      "639-2/B": "",
      "639-2/T": "",
      "639-3": "apc",
      "Deprecated": false,
      "Name(s)": "Levantine Arabic",
      "Native name(s)": "",
      "Other name(s)": "",
      "Scope": "Individual",
      "Type": "Living"
    },
    {
      "639-1": "",
      "639-2/B": "",
      "639-2/T": "",
      "639-3": "apd",
      "Deprecated": false,
      "Name(s)": "Sudanese Arabic",
      "Native name(s)": "",
      "Other name(s)": "",
      "Scope": "Individual",
      "Type": "Living"
    }
  ],
  "Script": "Arabic",
  "Region": "Cabo Verde",
  "Variant": "arevela",
  "Extension": "g-231243-r-sdarre",
  "Private Use": "x-private-x-private1"
}

Files

data/consolidated_langs.json: Contains language data in JSON format used for the lookup.
data/region_names.json: Contains region data in JSON format used for the BCP47 lookup.
data/script_codes.json: Contains script code data in JSON format used for the BCP47 lookup.
data/deprecated-639-3.csv: Contains deprecated ISO 639-3 codes in CSV format, for quick reference.

Errata

There are known issues with the BCP47 parser in the web interface. It uses regular expressions to validate input, such that:

Examples of invalid tags (malformed):

en-GB-oed-x-private
de-CH-1901-co-phonebk-sc-gothic-x-bavaria

(and more)

Examples of inputs that reveal parsing bugs:

ca-valencia-nedis (Highlighted input section is missing "valencia")
en-US-u-islamcal (Variant "u" and Extension "islamcal", Extension section says "u - islamcal")
es-419-fonipa (Extended languages blank)
de-Latf-1901 (Region undefined)
sl-rozaj (rozaj is coloured differently in the result container to how it is in the highlighted input section)

Contributing

Contributions, issues, and feature requests are welcome!

Author

Written by T E Vaughan

Sponsorship

If you find this project useful, please consider sponsoring my work. <3

Related Standards and RFCs

The codes used in this program conform to the following ISO standards:

Standards

ISO 639 Language codes
ISO 3166-1 alpha-2 Country codes
ISO 15924 Script codes

RFCs

RFC 1766 Tags for the Identification of Languages
RFC 4646 Tags for Identifying Languages
RFC 4647 Matching of Language Tags

License

This project is MIT licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
docs		docs
isogloss		isogloss
man		man
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
isogloss.jpg		isogloss.jpg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 iso·gloss

ISO 639 and IETF Language Code Lookup Tool

Features

Installation

Usage

Files

Errata

Examples of valid tags:

Examples of invalid tags (malformed):

Examples of inputs that reveal parsing bugs:

Contributing

Author

Sponsorship

Related Standards and RFCs

Standards

RFCs

License

About

Uh oh!

Uh oh!

Languages

License

thunderpoot/isogloss

Folders and files

Latest commit

History

Repository files navigation

🌐 iso·gloss

ISO 639 and IETF Language Code Lookup Tool

Features

Installation

Usage

Files

Errata

Examples of valid tags:

Examples of invalid tags (malformed):

Examples of inputs that reveal parsing bugs:

Contributing

Author

Sponsorship

Related Standards and RFCs

Standards

RFCs

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages