An ever growing suite of libraries which aim to make portable scripting easier, safer, and more efficient written for scripts written for "command interpretation services and common utility programs" as defined in the "Shell and Utilities" volume of the POSIX.1-2008 standard.
The suite libraries provide commands that enable commonly undertaken tasks to be accomplished more easily, while maintaining maximum portability, by providing more advanced features on top of the POSIX.1 standard, and providing workarounds for environments or utilities which can cause problems.
To the extent possible, every library in the suite is self-contained and has no dependencies outside the library except for a compatible POSIX.1 environment.
While some libraries may provide for being directly executed (i.e. as indistinguishable from a binary command), most libraries are intended to be "sourced" into any script where the library commands will then be available.
- Emulated arrays for any shell.
- Emulated double ended queues, queues, and stacks for any shell.
- The use of these is recommended ahead of other data types (e.g. emulated arrays) wherever possible as they are much faster.
- Both queues and stacks are specializations of double ended queues and so are provided in the same library.
- Argument processing made easy - like
getopt
orgetopts
but much more powerful, while requiring less work.
- Wrapper for
libgetargs.sh
which allows invoking without needing to be sourced.
Additional libraries are under active development including:
libcommand.sh
- Commands for invoking other commands.libmap.sh
- Emulated associative arrays.libpath.sh
- Commands for processing paths.libshell.sh
- Commands for general shell functionality.
Several variables affect the libraries, some of these should be set to specific values for the suite to work as intended1, while others determine suite settings.
All libraries assume the POSIX.1 defined default environment is in effect.
Much of the POSIX.1 standard is only defined for this environment.
- Should be set to
C
or the equivalentPOSIX
. (Other values may work, but are not supported.) - Standards compliant shells and utilities will give
LC_ALL
precedence over other locale setting variables, however these additional locale variables may be (incorrectly) used in some cases and so may also need set. The standard also definesLANG
,LC_COLLATE
,LC_CTYPE
,LC_MESSAGES
,LC_MONETARY
,LC_NUMERIC
, andLC_TIME
, with some platforms extending this further still. - Note that setting the locale to
C
orPOSIX
does not preclude using characters outside the locale, but may affect how those characters are interpreted.
- Should be the standard defined default of
\t\n
(<space><tab><newline>
). - A number of shell operations are defined with reference
to the value contained in
IFS
, making it a powerful tool, but also a source of many frustrations. While the libraries in the Suite attempt to avoid any dependence on the value ofIFS
, it is near impossible to ensure this is actually the case as the effects ofIFS
are often somewhat hidden and can be easy to miss. - Although much better supported in modern environments,
traditionally the use of
IFS
was highly implementation dependent. Less mainstream environments may still retain quirks that make it difficult to useIFS
portably.
- Should be set if any GNU tools are to be used.
- A non-standard environment variable used primarily by GNU provided tools, but may have been adopted by others. Tells these commands to more closely match the standard than they otherwise would.
- Many GNU tools significantly deviate from the standard unless this is set (regardless of the current locale). (Exactly which tools deviate and if it is actually an issue is hard to determine in advance.)
A number of environment variables affect the functionality of each of the libraries. These include both variables that instruct the library to work-around specific platform issues, and variables that convey user preferences. (See also the information on compatibility.)
Where it is possible, platform specific issues are detected automatically, with the associated variables providing a way to force enabling or disabling specific work-arounds if necessary. Automatic detection should always be preferred - this detects use cases that are actually problematic and not more general issues.
Each configuration variable belongs to a specific CLASS:
- CONSTANT - a configuration option that is read only once when the library
is first sourced and must not be set after this point (the
readonly
command may be used to enforce this). - VARIABLE - a configuration option that can be modified at any point and may affect the next command.
Additionally, each configuration variable has to a specific TYPE:
- TEXT - has a value that is arbitrary text with constraints defined by each specific variable.
- FLAG - enables or disables specific functionality. The value
0
(<zero>
) turns a flag OFF, while any other text will turn a flag ON, EXCEPT for flags where automatic detection is applicable where the valueA
is special and forces the use of automatic detection. A flag that is unset or set but null (i.e. empty) will use an appropriate default value.
Configuration variables are never modified by a library.
Many configuration variables can be set for all libraries with a single, suite wide variable. Where such a suite wide variable is available a library specific variable is always available in addition and has precedence. (Not all libraries use all suite wide settings.)
Suite wide variables include:
- Type: FLAG
- Class: VARIABLE
- Default: OFF
- [Enable]/Disable library error message output.
- OFF: error messages will be written to
STDERR
as:[<IDENTIFIER>]: ERROR: <MESSAGE>
. - ON: library error messages will be suppressed.
- Each library also stores the most recent error message in a library specific variable, which is unaffected by this flag.
- Unless otherwise stated, both the library versions of this option and the suite version can be modified between command invocations and should affect the next command.
- Does not affect errors from non-library commands, which may still produce output.
- Type: FLAG
- Class: VARIABLE
- Default: OFF
- Enable/[Disable] causing library errors to terminate the current (sub-)shell.
- OFF: errors stop any further processing, and cause a non-zero exit status, but do not cause an exception.
- ON: any library error will cause an "unset variable"
shell exception using the
${parameter:?[word]}
parameter expansion, whereword
is set to an error message that should be displayed by the shell (this message is NOT suppressed byBETTER_SCRIPTS_CONFIG_QUIET_ERRORS
). - Unless otherwise stated, both the library versions of this option and the suite version can be modified between command invocations and should affect the next command.
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable using only single digit shell
parameters, i.e.
$0
to$9
. - OFF: Use multi-digit shell parameters.
- ON: Use only single-digit shell parameters.
- Multi-digit parameters are faster but may not be supported by all implementations.
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable using only
shift
and notshift N
for multiple parameters. - OFF: Use
shift N
. - ON: Use only
shift
. - Multi-parameter
shift
is faster but may not be supported by all implementations
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable using alternatives to
/dev/null
as a redirection source/target (e.g. for output suppression). - OFF: Use
/dev/null
. - ON: Use an alternative to
/dev/null
. - Using
/dev/null
as a redirection target is a common idiom, but not always possible (e.g. restricted shells generally forbid this), the alternative is to capture output (and ignore it) but this is much slower as it involves a subshell.
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable using alternatives to
expr
for matching a "Basic Regular Expression (BRE)". - OFF: Use
expr
. - ON: Use an alternative command (i.e.
sed
). expr
is much faster if it works correctly, but some implementations make that difficult, whilesed
is more robust for this use case.
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable ignoring
expr
exit status to indicate a match was made. - OFF: Use
expr
exit status to determine if a match was made. - ON: Use a workaround to determine if a match was made. (This requires a sub-shell and is therefore far slower.)
- Some versions of
expr
do not always properly set the exit status, making it impossible to determine if a match was actually made.
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- Disable/[Enable] using
expr
for any "Basic Regular Expression" (BRE) that includes nested captures. - When set, any BRE that uses nested captures will not
be used with
expr
, but will use a case specific work-around. - Some versions of
expr
do not work well with or do not support nested captures.
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable using
setopt
in Z Shell to ensure POSIX.1 like behavior. - OFF: Use
setopt
to set the appropriate options. - ON: Don't use
setopt
, even in Z Shell. - Automatically enabled if Z Shell is detected.
- Any use of
setopt
is scoped as tightly as possible and should not affect other commands. - If Z Shell is used and the current environment has already been set to be POSIX.1 compliant, forcing this ON may improve performance.
- Z Shell has some defaults that cause non-standard
behavior, however also provides
setopt
which can be tightly scoped to set options when required without impacting other platforms.2
- Type: FLAG
- Class: CONSTANT
- Default: <automatic>
- [Disable]/Enable using the non-standard
fgrep
instead ofgrep -F
. - OFF: Use
grep -F
. - ON: Use
fgrep
. - Automatically enabled if Z Shell is detected.
- While
grep -F
is standard, it is not always supported - implementations that do not support it usually provide the non-standardfgrep
instead.
Each library provides a number of variables that are set by the library to convey information outside of command invocation.
These variables must not be set by external commands except if this is
explicitly permitted. Variables may use the readonly
command to enforce
this.
Along with any library only information variables, every library also provides a version of some standard variables:
- A whole number >= 1.
- Incremented when there are significant changes, or any changes break compatibility with previous library versions.
- Follows Semantic Versioning v2.0.0.
- A whole number >= 0.
- Incremented for significant changes that do not break compatibility with previous versions.
- Reset to 0 when
BS_<LIBRARY>_VERSION_MAJOR
changes. - Follows Semantic Versioning v2.0.0.
- A whole number >= 0.
- Incremented for minor revisions or bugfixes.
- Reset to 0 when
BS_<LIBRARY>_VERSION_MINOR
changes. - Follows Semantic Versioning v2.0.0.
- A string indicating a pre-release version.
- Always null for full-release versions.
- Possible values include
alpha
,beta
,rc
, etc, (a numerical suffix may also be appended). - Follows Semantic Versioning v2.0.0.
- Full (numerical) version combining
BS_<LIBRARY>_VERSION_MAJOR
,BS_<LIBRARY>_VERSION_MINOR
, andBS_<LIBRARY>_VERSION_PATCH
as a single value. - Can be used in numerical comparisons.
- Format is
MNNNPPP
where,M
is theMAJOR
version,NNN
is theMINOR
version (3 digit, zero padded), andPPP
is thePATCH
version (3 digit, zero padded).
- Full version combining
BS_<LIBRARY>_VERSION_MAJOR
,BS_<LIBRARY>_VERSION_MINOR
,BS_<LIBRARY>_VERSION_PATCH
, andBS_<LIBRARY>_VERSION_RELEASE
as a formatted string. - Format is
BetterScripts '<library>' vMAJOR.MINOR.PATCH[-RELEASE]
.
- Stores the error message of the most recent library error.
- ONLY valid immediately following a command from the appropriate library for which the exit status is not a success code.
- Valid even when error output is suppressed.
- Set (and non-null) once the library has been sourced.
- Dependant scripts can query if this variable is set to determine if a specific library has been sourced.
- Also serves as a guard to avoid errors caused by sourcing a library multiple times.
- Should be set to the location the Suite is installed.
- Multiple paths may be specified - formatted like the
standard variable
PATH
. - Useful for users if libraries are not installed in a
location that is available in
PATH
. - Currently used only by test helper scripts.
As each library in the Suite is designed to be independent versioning is on a per-library basis; there is no versioning for the Suite as a whole.
Bundled releases will be made available with major changes and will be versioned using the date of the release. This version is not available in the
- POSIX.1-2008
- Also known as:
- The Open Group Base Specifications Issue 7
- IEEE Std 1003.1-2008
- The Single UNIX Specification Version 4 (SUSv4)
- The more recent POSIX.1-2017 is functionally identical to POSIX.1-2008, but incorporates some errata.
- Also known as:
- FreeBSD SYSEXITS(3)
- Although not a standard, the values specified by SYSEXITS are widely used and are the only common exit codes generally available.
- Libraries use these values wherever possible, however
other exit codes may occur:
- Values returned by external commands are propagated where possible and appropriate.
- As per POSIX.1 the value
1
if used forfalse
for commands that require reporting a non-success, non-error exit status.
- Semantic Versioning v2.0.0
- Each library has its own version number, each of which complies with Semantic Versioning v2.0.0.
- Some libraries may provide version numbers for additional purposes, these also follow Semantic Versioning v2.0.0, but may not include all elements.
- Inclusive Naming Initiative.
The suite provided Makefile has targets that allow for
installation of both libraries and documentation in configurable locations (by
default libraries are installed in /usr/local/bin
, Markdown
documentation
in /usr/local/share/doc
and man
page documentation in the appropriate
/usr/local/share/man
directory for the documentation category - note that
these are not POSIX.1 specified).
Most of the suite libraries are intended to be sourced by other scripts using
the .
(aka dot) command, for which the standard says:
If file does not contain a <slash>, the shell shall use the search path specified by PATH to find the directory containing file. Unlike normal command search, however, the file searched for by the dot utility need not be executable.
As such libraries are installed as non-executable unless direct invocation is supported for a specific library.
More information about installation is available by invoking the help
target
from the Makefile, i.e. make help
.
Up-to-date versions of documentation for each library is always present in
the main BetterScripts POSIX suite repository in both
Markdown
and man
page formats.
Much of this documentation is generated from other files within the suite, with
Markdown
documentation for libraries being generated from comments in the
libraries themselves, while man
page documentation is generated from
Markdown
documentation for both common and library documentation.
All Markdown
documentation aims to be compatible with the original
Markdown
specification, with reference to
CommonMark
to resolve any ambiguities. Although an extension to
the original standard, footnotes are used throughout the Markdown
documentation as they are highly useful, widely supported, and acceptably
rendered by Markdown
flavors that do not support them.
Documentation can be regenerated using the suite provided Makefile.
Library commands document arguments with a tag indicating argument usage:
- in: provides data TO the command.
- out: receives data FROM the command.
- in/out: provides data TO AND receives data FROM the command.
- ref: an additional tag indicating the argument is
passed by NAME instead of VALUE.
- For a typical POSIX.1 variable this means omitting the
$
from the name when passing it to the command, i.e. instead of passing$Variable
(or${Variable}
) useVariable
. - Only POSIX.1 compliant names are permitted. Due to the
security considerations of using
eval
with arbitrary text POSIX.1 names are enforced for all variable names; providing a non-standard name will cause an error (even if the name is supported by the current shell). - Variables passed by name are GLOBAL variables.
The use of
local
variables (as supported by many shells) will not work as expected. (POSIX.1 has no concept oflocal
variables.)
- For a typical POSIX.1 variable this means omitting the
The BetterScripts POSIX Suite is supported in any environment that is compatible with the "command interpretation services and common utility programs" as defined in the "Shell and Utilities" volume of the POSIX.1-2008 standard.
The number of environments that are at least partially POSIX.1 compliant is enormous - even if it were possible to test all of them, access to many is difficult as they are tied to proprietary/specialist systems. Therefore, much of the compatibility work for the Suite is based on resources such as "autoconf: Portable Shell Programming"). While such resources are incredibly useful, they often omit details such as the specific platforms for which problems occur, or even dates for when the problem was discovered or last seen. The result is that it is highly likely some of the workarounds implemented are unnecessary.3
Additionally:
- The POSIX.1 standard has remained relative consistent between versions (as relates to functionality required by the Suite), although the POSIX.1-2008 version of the standard is the reference version used for creating the Suite it is likely that earlier versions will also be supported.
- Non-compliant shells and utilities may be supported by specific libraries, or specific commands within those libraries.
- A shell and/or utilities which are not supported may still be able to make use of the Suite - any such tool is termed compatible. The difference between shells and utilities which are supported and those which are compatible is that any erroneous behavior specific to the latter is not technically a bug and unlikely to be addressed.
- Commands are designed to be functionally equivalent regardless of the
value of any of the standard specified shell options (e.g.
errexit
,nounset
, etc). - Where a shell or utility is known to deviate from the functionality required by a library a work-around may be provided if it is relatively simple, performant, and can be scoped to only affect library commands.
- Some common, but non-standard functionality is supported, for example, "restricted" shells.
- Tests for suite libraries are provided along with a test harness in which they are run. These are not intended to determine platform support, but are primarily for regression testing. Additionally, the test harness, while POSIX.1 compliant, may require a more capable platform than that of individual libraries. Still, if tests run successfully for a specific platform it is likely the platform will be fully supported.
The Suite has been tested in multiple operating systems including Ubuntu, Oracle Solaris, FreeBSD, OpenBSD, and Windows Subsystem for Linux.
Multiple implementations of "Shells and Utilities" have also been tested including:
sh
4,bash
,busybox
,dash
,ksh88
,ksh93
,mksh
,oksh
,pdksh
,posh
,yash
, andzsh
(including "restricted" versions of these shells where known to exist) - all shells are tested in "default" mode along with any POSIX.1 compatibility mode.Various implementations of utilities have also been tested.
- Libraries have been written to maximize performance without sacrificing configurability, safety or utility - with a general philosophy of "you don't pay for what you don't use".
- For most use cases library performance should not be an issue and will likely be far outweighed by other factors.5
- Where library performance is an issue, configuration of each library can have a significant affect on performance. Where configuration is known to affect performance, this is noted.
- The most significant factor in the performance of any library is the specific
external commands used by the library:
- The shell used is the single most significant factor, for example,
bash
is highly user friendly and provides many advanced tools beyond those required by the standard, however the much less well specifieddash
performs significantly better for all suite libraries. - Utilities like
sed
,grep
,awk
, etc. are available in multiple implementations, each of which has it's own performance characteristics.
- The shell used is the single most significant factor, for example,
- Many libraries in the Suite provide emulated versions of data structures that are not normally available. These are stored in in a standard shell variable which is manipulated using standard utilities or the shell command language. Performance of these data structures is highly dependent on the size of the data stored. Although implementation dependent, shells tend to be optimized for processing short strings, with strings that may be hundreds or thousands of characters long performance can rapidly decrease.
The tools and libraries in the Suite are subject to the limitations imposed by the particular environment in which they are invoked. Each implementation of the required utilities and command execution environment will have specific limitations that may be different to those in another implementation and may change between versions of the same utilities.6
For most use cases it is deemed unlikely that these limitations will be an issue, however, there will be cases where some limitations may cause problems.
It is impossible to determine all the possible limitations that may exist or may be of issue (even when considering only those specified in the standard). However, of the known limitations, perhaps the most likely to be encountered across multiple libraries from the Suite is the command line length limit, which can be encountered in a number of scenarios, and in unexpected ways.
The standard specifies this as {ARG_MAX}
and defines it as:
The number of bytes available for [a] new process' combined argument and environment lists... It is implementation-defined whether null terminators, pointers, and/or any alignment bytes are included in this total.
The value for any particular environment can be queried using the command
getconf ARG_MAX
, though this value can only be used as a guide since it is
impossible to know how many bytes any command will require in advance. For a
modern system this value can be several million bytes, while older systems it
can be significantly less.
It is possible to increase the available command line length for commands, by, for example, by:
- reducing the number (and size) of exported variables;
- avoiding characters that use more than a single byte.
Importantly, any variable which is exported and also has it's contents used as an argument to a command will count TWICE towards this limit.
It is recommended that variables containing library data are not exported.
All libraries require a number of internal commands and variables to provide
the provided functionality, these are distinguishable from other values by a
prefix: commands these are prefixed with fn_bs_
; while variables are prefixed
with g_BS_
, c_BS_
, or i_BS
.
These are strictly for internal usage and must not be invoked or referenced outside the library to which they belong.
Footnotes
-
While it would be possible to set some environment variables to the required values when needed by a specific library (e.g. setting the
POSIX
locale), this is not always easy to do while avoiding changing the state for the invoker and maintaining performance. Setting all variables as part of a command might be possible in many cases, but would require huge lines of code for each command, and setting variables may not even be possible (e.g. standard variables may bereadonly
in a restricted shell, while utilities likeenv
can not be used for shell builtins). Finally, setting these variables to the expected value assumes that other values do not work, which may not be true and may make some uses of the libraries impossible without any real need. ↩ -
Technically since the default configuration of Z Shell is non-standard it is not supported by the suite, however this work-around is provided since it can be easily scoped, and does not notably affect performance, and causes no issues with other environments. Similar work-arounds for other environments are not always possible (e.g. the GNU specific
POSIXLY_CORRECT
environment variable can not so easily be dealt with). ↩ -
Legacy systems and software can often be found in older organizations, especially where the organizations (and hence the associated systems) are somewhat specialized. While many of these systems are no longer actively maintained by the original manufactures, they continue to be used. Even those systems that are still maintained may contain long obsolete software. For example, Oracle Solaris 11.4 shipped in September 2023, yet contains a version of
ksh88
(i.e. the 1988 version of KornShell) - although this shell is largely similar to more modern shells it does deviate somewhat, here it simply serves as an example of how even maintained systems can continue to support very old software. Where practicable, the Suite is intended to support all such systems. ↩ -
While
sh
is often simply a link to another shell, this is not always true - for some platforms it is a unique shell. ↩ -
Initially there was an alternative version of each library which excluded as much of the main version as possible, while leaving most functionality intact, with the intention being that these versions would give the absolute best performance possible. This plan was ultimately abandoned as the resulting gains were surprisingly small despite the significant reduction in file sizes and customization - most of the expensive computation is unavoidable. ↩
-
Many of these limitations are specified in the standard, with specific constraints, though generally the actual value is "implementation defined". ↩