Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place names are unavailable in many relevant languages #586

Open
1ec5 opened this issue Nov 27, 2022 · 3 comments
Open

Place names are unavailable in many relevant languages #586

1ec5 opened this issue Nov 27, 2022 · 3 comments
Labels
bug Something isn't working internationalization openmaptiles A change is needed in OpenMapTiles to support this

Comments

@1ec5
Copy link
Member

1ec5 commented Nov 27, 2022

This style uses vector tiles that contain place names in many languages, but the selection of languages leaves some room for improvement. For a style that’s focused on the Americas, particularly the United States, language support is nevertheless dominated by European languages:

Languages represented in United States place POI generated by OpenMapTiles/Planetiler
  • Albanian (sq)
  • Amharic (am)
  • Arabic (ar)
  • Armenian (hy)
  • Azerbaijani (az)
  • Basque (eu)
  • Belarusian (be)
  • Bosnian (bs)
  • Breton (br)
  • Bulgarian (bg)
  • Catalan (ca)
  • Chinese (zh)
  • Corsican (co)
  • Croatian (hr)
  • Czech (cs)
  • Danish (da)
  • Dutch (nl)
  • English (en)
  • Esperanto (eo)
  • Estonian (et)
  • Finnish (fi)
  • French (fr)
  • Georgian (ka)
  • German (de)
  • Greek (el)
  • Hebrew (he)
  • Hindi (hi)
  • Hungarian (hu)
  • Icelandic (is)
  • Indonesian (id)
  • Irish (ga)
  • Italian (it)
  • Japanese (ja)
  • Kannada (kn)
  • Kazakh (kk)
  • Korean (ko)
  • Kurdish (ku)
  • Latin (la)
  • Latvian (lv)
  • Lithuanian (lt)
  • Luxembourgish (lb)
  • Macedonian (mk)
  • Malayalam (ml)
  • Maltese (mt)
  • Norwegian (no)
  • Occitan (oc)
  • Polish (pl)
  • Portuguese (pt)
  • Romanian (ro)
  • Romansh (rm)
  • Russian (ru)
  • Scottish Gaelic (gd)
  • Serbian (sr)
  • Slovak (sk)
  • Slovenian (sl)
  • Spanish (es)
  • Swedish (sv)
  • Tamil (ta)
  • Telugu (te)
  • Thai (th)
  • Turkish (tr)
  • Ukrainian (uk)
  • Welsh (cy)
  • Western Frisian (fy)

This list excludes 11 of the 30 most spoken languages in the United States – even the fourth and fifth most spoken, Tagalog and Vietnamese. Of the many languages that have official status within the U.S. at the state/territorial level, only English and Spanish are supported. Support for explicitly tagged minority languages could be important for this style in the future. Natural features often have notable names in indigenous languages. There are many communities across the country that have points of interest and even streets primarily in an immigrant language. This project would be able to “Challenge the status quo” of raster maps more powerfully if it could expose a wider variety of languages.

The names themselves come from a combination of Wikidata and OpenStreetMap, but the decision about which languages to expose is made by by OpenMapTiles. Historically, adding new language-qualified name fields has required a potentially painful tradeoff in tile size. However, nowadays it should be quite feasible to add new name fields only when tagged in OSM, relying on the client to fall back to another field when a given language is unavailable. Every GL JS–compatible renderer in the last several years has had support for the coalesce expression operator. #578 demonstrates the effective use of this expression operator in any style.

@1ec5 1ec5 added openmaptiles A change is needed in OpenMapTiles to support this internationalization labels Nov 27, 2022
@ZeLonewolf
Copy link
Member

Let's generate a list of the 11 missing languages so we can file a corresponding issue with OpenMapTiles.

@1ec5
Copy link
Member Author

1ec5 commented Nov 28, 2022

Here are the gaps among the 35 most spoken languages in the U.S. Please correct me if I’ve flubbed anything:

  1. Chinese (zh) – supported, but not very usable without a distinction between zh-Hans and zh-Hant and/or between zh-CN, zh-HK, and zh-TW and/or between cmn and yue
  2. Tagalog (tl), Filipino (fil)
  3. Vietnamese (vi)
  4. Haitian Creole (ht)
  5. Yiddish (yi)
  6. Persian (fa), Tajik (tg)
  7. Gujarati (gu)
  8. Bengali (bn)
  9. Lao (lo)
  10. Urdu (ur)
  11. Punjabi (pa, pnb)
  12. Hmong (hmn)
  13. Swahili (sw)
  14. Khmer (km)
  15. Navajo (nv)

Note that GL JS also lacks support for some of these languages’ writing systems, though this is also true of some of the languages OpenMapTiles already supports, such as Hindi.

Here are the gaps among the languages with official status in U.S. states and territories:

  • Alaska
    • Inupiat (ik)
    • Central Siberian Yupik (ess)
    • Central Alaskan Yup’ik (esu)
    • Alutiiq (ems)
    • Unangan (Aleut) (ale)
    • Dena'ina (tfn)
    • Deg Xinag (ing)
    • Holikachuk (hoi)
    • Koyukon (koy)
    • Upper Kuskokwim (kuu)
    • Gwichʼin (gwi)
    • Lower Tanana (taa)
    • Upper Tanana (tau)
    • Tanacross (tcb)
    • Hän (haa)
    • Ahtna (aht)
    • Eyak (eya)
    • Tlingit (tli)
    • Haida (hai)
    • Coast Tsimshian (tsi)
  • American Samoa
    • Samoan (sm)
  • Guam
    • Chamorro (ch)
  • Hawaii
    • Hawaiian (haw)
  • Northern Mariana Islands
    • Carolinian (cal)
    • Chamorro (ch)
  • Oklahoma
    • Cherokee (chr)
  • South Dakota
    • Dakota (dak)
    • Lakota (lkt)

I think the broader point about this long list is that, these days, it’s kind of antiquated for a tileset to limit itself to a fixed set of name fields. Ideally, a tileset would just dump whatever name fields are in OSM for a given feature, substituting the corresponding Wikidata label where available. Backwards compatibility would be the only reason for limiting the languages, but I don’t know of any client that depends on tiles to contain only a fixed set of fields.

TomPohys pushed a commit to openmaptiles/openmaptiles that referenced this issue Jan 17, 2023
This PR adds eight missing languages, all of which are among the top 25 languages spoken in the United States, as discussed in osm-americana/openstreetmap-americana#586, and have at least 10,000 `name:xx` usages in the database. Additionally, most of these languages are the primary or national language in the countries where they are most commonly spoken.
@1ec5
Copy link
Member Author

1ec5 commented Mar 25, 2023

The Planetiler-based tiles Americana uses by default now include many more languages, though some of the indigenous languages in #586 (comment) are still missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internationalization openmaptiles A change is needed in OpenMapTiles to support this
Projects
None yet
Development

No branches or pull requests

3 participants