isogloss is a Python–based command–line tool designed for looking up language details based on ISO 639 codes and IETF (BCP-47) language tags. It provides comprehensive information about languages, including their names, native names, and additional details associated with each code or tag.
There is also a web–based version here. The BCP47 parser has some known issues, documented below in the "Errata" section.
Elsewhere, the word isogloss means a boundary line on a map denoting the regional use of a particular linguistic characteristic, but in this case it just seemed to fit.
- Lookup language details using ISO 639-1, 639-2/B, 639-2/T, or 639-3 codes.
 - Lookup language details by language name.
 - Lookup language details using IETF BCP-47 language tags
- Examples: 
en-GB,en-US,sv-SE,zh-cmn-Hans-CN-pinyin-ud1-p9t4-x-private1, and so on. 
 - Examples: 
 
Clone the repository to your local machine:
git clone https://github.com/thunderpoot/isogloss.git
Create a virtual environment and install requirements
python3.11 -m venv venv
source venv/bin/activate
pip install unidecode
The script can be run directly from the command line. Below are some examples of how to use it:
To look up information by ISO 639 code:
$ isogloss/isogloss.py -c swe
{
  "639-1": "sv",
  "Scope": "Individual",
  "Type": "Living",
  "Native name(s)": "svenska",
  "Other name(s)": "",
  "639-2/T": "swe",
  "639-2/B": "",
  "639-3": "swe",
  "Name(s)": "Swedish"
}
To look up information by language name:
$ isogloss/isogloss.py -n "egyptian arabic"
{
    "Egyptian Arabic": "arz"
}
Example of lookup via native name:
$ isogloss/isogloss.py -n 日本語
{
    "\u65e5\u672c\u8a9e Nihongo": "jpn"
}
Example of multiple results being found:
$ isogloss/isogloss.py -n norwegian
{
    "Norwegian Nynorsk": "nno",
    "Nynorsk, Norwegian": "nno",
    "Bokm\u00e5l, Norwegian": "nob",
    "Norwegian Bokm\u00e5l": "nob",
    "Norwegian": "nor",
    "Norwegian Sign Language": "nsl",
    "Traveller Norwegian": "rmg"
}
Language names are normalised, allowing for case–insensitive and accent–insensitive matching when searching:
$ isogloss/isogloss.py -n espanol
{
    "Judeo-espa\u00f1ol": "lad",
    "espa\u00f1ol": "spa"
}
To look up information by IETF language tag:
$ isogloss/isogloss.py -i fr-FR
{
    "Language": {
        "639-1": "fr",
        "Scope": "Individual",
        "Type": "Living",
        "Native name(s)": "fran\u00e7ais",
        "Other name(s)": "",
        "639-2/T": "fra",
        "639-2/B": "fre",
        "639-3": "fra",
        "Name(s)": "French"
    },
    "Region": "France"
}
$ isogloss/isogloss.py -i zh-cmn-Hans-CN-pinyin-ud1-p9t4-x-private1
{
    "Primary Language": {
        "639-1": "zh",
        "639-2/B": "chi",
        "639-2/T": "zho",
        "639-3": "zho",
        "Deprecated": false,
        "Name(s)": "Chinese",
        "Native name(s)": "\u4e2d\u6587 Zh\u014dngw\u00e9n; \u6c49\u8bed; \u6f22\u8a9e H\u00e0ny\u01d4",
        "Other name(s)": "",
        "Scope": "Macrolanguage",
        "Type": "Living"
    },
    "Extended Languages": [
        {
            "639-1": "",
            "639-2/B": "",
            "639-2/T": "",
            "639-3": "cmn",
            "Deprecated": false,
            "Name(s)": "Mandarin Chinese",
            "Native name(s)": "",
            "Other name(s)": "",
            "Scope": "Individual",
            "Type": "Living"
        }
    ],
    "Script": "Han (Simplified variant)",
    "Region": "China",
    "Variant": "pinyin",
    "Extension": "ud1-p9t4",
    "Private Use": "x-private1"
}
$ isogloss/isogloss.py -i ar-ajp-apc-apd-Arab-CV-arevela-g-231243-r-sdarre-x-private-x-private1 | jq
{
  "Primary Language": {
    "639-1": "ar",
    "639-2/B": "",
    "639-2/T": "ara",
    "639-3": "ara",
    "Deprecated": false,
    "Name(s)": "Arabic",
    "Native name(s)": "العربية; al'Arabiyyeẗ",
    "Other name(s)": "",
    "Scope": "Macrolanguage",
    "Type": "Living"
  },
  "Extended Languages": [
    {
      "639-1": "",
      "639-2/B": "",
      "639-2/T": "",
      "Deprecated": true,
      "Language Name(s)": "South Levantine Arabic",
      "Language Type": "Living",
      "Native name(s)": "",
      "Other name(s)": "",
      "Scope": "Individual"
    },
    {
      "639-1": "",
      "639-2/B": "",
      "639-2/T": "",
      "639-3": "apc",
      "Deprecated": false,
      "Name(s)": "Levantine Arabic",
      "Native name(s)": "",
      "Other name(s)": "",
      "Scope": "Individual",
      "Type": "Living"
    },
    {
      "639-1": "",
      "639-2/B": "",
      "639-2/T": "",
      "639-3": "apd",
      "Deprecated": false,
      "Name(s)": "Sudanese Arabic",
      "Native name(s)": "",
      "Other name(s)": "",
      "Scope": "Individual",
      "Type": "Living"
    }
  ],
  "Script": "Arabic",
  "Region": "Cabo Verde",
  "Variant": "arevela",
  "Extension": "g-231243-r-sdarre",
  "Private Use": "x-private-x-private1"
}
data/consolidated_langs.json: Contains language data in JSON format used for the lookup.data/region_names.json: Contains region data in JSON format used for the BCP47 lookup.data/script_codes.json: Contains script code data in JSON format used for the BCP47 lookup.data/deprecated-639-3.csv: Contains deprecated ISO 639-3 codes in CSV format, for quick reference.
There are known issues with the BCP47 parser in the web interface. It uses regular expressions to validate input, such that:
- 
en - 
fr-CA - 
i-klingon - 
az-Arab-IR - 
sr-Cyrl-RS - 
zh-cmn-Hans - 
ja-JP-x-tokyo - 
uz-Cyrl-UZ-1992 - 
bo-Tibt-x-dialect - 
zh-cmn-Hans-CN-x-private1 - 
hy-Latn-IT-arevela-x-test 
- 
en-GB-oed-x-private - 
de-CH-1901-co-phonebk-sc-gothic-x-bavaria 
(and more)
- 
ca-valencia-nedis(Highlighted input section is missing "valencia") - 
en-US-u-islamcal(Variant "u" and Extension "islamcal", Extension section says "u - islamcal") - 
es-419-fonipa(Extended languages blank) - 
de-Latf-1901(Region undefined) - 
sl-rozaj(rozaj is coloured differently in the result container to how it is in the highlighted input section) 
Contributions, issues, and feature requests are welcome!
Written by T E Vaughan
If you find this project useful, please consider sponsoring my work. <3
The codes used in this program conform to the following ISO standards:
- ISO 639 Language codes
 - ISO 3166-1 alpha-2 Country codes
 - ISO 15924 Script codes
 
- RFC 1766 Tags for the Identification of Languages
 - RFC 4646 Tags for Identifying Languages
 - RFC 4647 Matching of Language Tags
 
This project is MIT licensed.
