`BCP47`

BCP47 provides tools to parse, validate, normalize, and match language tags following the BCP 47 standard. BCP 47 (Best Current Practice 47) is the IETF standard that governs how human languages are identified in internet protocols—it defines tags like en-US (English, United States).

The package bundles access to the IANA Language Subtag Registry, the authoritative source of valid language, script, region, and variant subtags.

Installation

You can install the development version of BCP47 from GitHub with:

# install.packages('pak')
pak::pak('christopherkenny/BCP47')

Core Functions

Function	Description
`bcp_parse()`	Decompose a tag into its subtag components
`bcp_validate()`	Check whether subtags appear in the IANA registry
`bcp_normalize()`	Apply canonical casing and substitute preferred values
`bcp_match_language()`	Find the best available language for a set of preferences
`bcp_process_registry()`	Download and parse the IANA registry
`bcp_cache_*()`	Manage the local registry cache

Examples

Parsing

bcp_parse() decomposes a tag into its RFC 5646 components. All subtags are returned in lower-case.

library(BCP47)

bcp_parse('en-US')
#> $language
#> [1] "en"
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] NA
#> 
#> $region
#> [1] "us"
#> 
#> $variants
#> NULL
#> 
#> $extensions
#> list()
#> 
#> $private
#> NULL
bcp_parse('zh-Hans-CN')
#> $language
#> [1] "zh"
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] "hans"
#> 
#> $region
#> [1] "cn"
#> 
#> $variants
#> NULL
#> 
#> $extensions
#> list()
#> 
#> $private
#> NULL
bcp_parse('de-1901')
#> $language
#> [1] "de"
#> 
#> $extlang
#> NULL
#> 
#> $script
#> [1] NA
#> 
#> $region
#> [1] NA
#> 
#> $variants
#> [1] "1901"
#> 
#> $extensions
#> list()
#> 
#> $private
#> NULL

Language Matching

bcp_match_language() implements the RFC 4647 “Lookup” scheme. It finds the best available language for a user’s ordered list of preferences, progressively stripping subtags to find a match.

# User prefers en-US, then French. Only 'en' and 'de' are available.
bcp_match_language(c('en-US', 'fr'), c('en', 'de'))
#> [1] "en"

# Prefer Traditional Chinese, fall back to Simplified, then English
bcp_match_language(
  c('zh-Hant-TW', 'zh-Hans', 'en'),
  c('zh-Hans', 'en', 'fr')
)
#> [1] "zh-Hans"

# No match — return a default
bcp_match_language('pt-BR', c('fr', 'de'), default = 'en')
#> [1] "en"

Validation and Normalization

bcp_validate() and bcp_normalize() check and canonicalize tags against the IANA registry. They download (and cache) the registry on first use.

# Check whether subtags are registered
bcp_validate('en-US') # TRUE
bcp_validate('xx-ZZ') # FALSE — neither subtag is registered

# Canonicalize casing and suppress default scripts
bcp_normalize('en-us') # "en-US"  (region uppercased)
bcp_normalize('en-Latn-US') # "en-US"  (Latn is the default script for English)
bcp_normalize('sr-latn') # "sr-Latn" (Latn is not the default for Serbian)

Registry Access

The IANA registry is parsed into a tidy data frame you can query directly:

reg <- bcp_process_registry()
head(reg)

# Find all scripts
reg[reg$type == 'script', c('subtag', 'description')]

# Check the registry date
attr(reg, 'last_update')

Cache Management

Registry data is cached locally to avoid repeated downloads:

bcp_cache_path() # where the cache lives
bcp_cache_size() # how big it is
bcp_cache_update() # refresh from IANA
bcp_cache_clear() # delete the cache

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.claude		.claude
.github		.github
R		R
data-raw		data-raw
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
BCP47.Rproj		BCP47.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`BCP47`

Installation

Core Functions

Examples

Parsing

Language Matching

Validation and Normalization

Registry Access

Cache Management

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BCP47

Installation

Core Functions

Examples

Parsing

Language Matching

Validation and Normalization

Registry Access

Cache Management

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`BCP47`

Packages