Many people hate boilerplate. Sometimes they create whole new DSLs/prog.langs to avoid it. While not invalid, generating from a template and automatically compiling or otherwise processing the expansion is also valid and about 100X easier { in units of highly objective effort-ons ;-) }. It may also need less language learning for users & allow easier access to 3rd party libs.
Part of the culture of Nim is to be sufficiently general purpose/nice enough to
use the language everywhere - as its own macro language, for compiler config via
NimScript .nims files, etc. But command prompts are still a thing. Sometimes
you want to easily just run some calculation on some data in a file. Applying
insights of the first paragraph, we get something like rp
in ~ 100 non-blank
non-comment LoC (with an extra auto-from-CSV headers named field extension!).
The rp --help
usage message (slightly reformatted) covers the basics.
rp [optional-params] Nim stmts to run (guarded by `where`); none => echo row
Gen & Run prelude,fields,begin,where,stmts,epilog row processor against input.
Defined within where & every stmt are:
s[fieldIdx] & row give MSlice ($ to get a Nim string)
fieldIdx.i gives a Nim int, fieldIdx.f a Nim float.
nf & nr (like AWK); NOTE: fieldIdx is 0-origin.
A generated program is left at outp.nim, easily copied for "utilitizing". If
you know AWK & Nim, you can learn rp FAST.
Examples (most need data):
seq 0 1000000 | rp -w'row.len<2' # Print short rows
rp 'echo s[1]," ",s[0]' # Swap field order
rp -b'var t=0' t+=nf -e'echo t' # Print total field count
rp -b'var t=0' -w'0.i>0' t+=0.i -e'echo t' # Total >0 field0 ints
rp 'let x=0.f' 'echo (1+x)/x' # cache field 0 parse
rp -d, -fa,b,c 'echo s[a],b.f+c.i.float' # named fields (CSV)
rp -mfoo echo\ s[2] # column of row matches
rp -pimport\ stats -bvar\ r:RunningStat r.push\ 0.f -eecho\ r
Add niceties (eg. import lenientops) to prelude in ~/.config/rp.
Options:
-p=, --prelude= strings {} Nim code for prelude/imports section
-b=, --begin= strings {} Nim code for begin/pre-loop section
-w=, --where= string "true" Nim code for row inclusion
-m=, --match= string "" row must match regex (IF -w="true")
-e=, --epilog= strings {} Nim code for epilog/end loop section
-f=, --fields= string "" delim-sep field names (match row0)
-g=, --genF= string "$1" make field names from this fmt; eg c$1
-n=, --nim= string "nim" path to a nim compiler (>=v1.4)
-r, --run bool true Run at once using nim r .. < input
-a=, --args= string "" "": -d:danger; '+' prefix appends
-c=, --cache= string "" "": --nimcache:/tmp/rp (--incr:on?)
-v=, --verbose= int 0 Nim compile verbosity level
-o=, --outp= string "/tmp/rpXXX" output executable; .nim NOT REMOVED
-s, --src bool false show generated Nim source on stderr
-i=, --input= string "/dev/stdin" path to mmap|read as input
-d=, --delim= string "white" inp delim chars; Any repeats => fold
-u, --uncheck bool false do not check&skip header row vs fields
-M=, --MaxCols= int 0 max split optimization; 0 => unbounded
-W=, --Warn= string "" "": --warning[CannotOpenFile]=off
Corresponding to our first 6 examples are these awk
commands:
seq 0 1000000 | awk 'length<2' # (17 v. 15) short rows
awk '{print $2," ",$1}' # (25 v. 29) Swap fields
awk '{t+=NF}END{print t}' # (32 v. 29) total fields
awk '{if($1>0)t+=$1}END{print t}' # (44 v. 35) Total >0 field0
awk '{x=0+$1;print(1+x)/x}' # (32 v. 33) cache parse
awk '/foo/{print $3}' # (19 v. 24) column of row matches
awk -F, 'BEGIN{a=1;b=2;c=3}{print $a,$b+$c}' # (41 v. 51) named CSV fields
The numbers in ()s are (rp
v. awk
key presses) counting SHIFTs totaling
210 for rp
vs. 216 for awk
. That is with minimal SHIFT use (ie. one
SHIFT down to enter "}END{") for US keyboard layouts. awk
needs much more
shifting and minimal ways may feel "unnatural" (most folks I know would not
stay shifted through that "}END{" sequence). Press counts for the Nim eg.s can
also be better trimmed with \
instead of '
.1 So, even biased towards
awk
a couple ways, awk
is still more program entry work (barely).
The point of key press analysis is A) to observe any CLI is already "part DSL" -
language boundaries can ease syntax requirements on both & B) just ballpark
interactive ergonomics in a "quickly write a pipeline stage in-line" mode.2
Repeated constructs can & should surely be saved in files / abstracted. Nim
shines brighter then (as would most prog.langs with abstraction & ecosystems
beyond awk
). Said shining manifests in the e.g. using std/stats.RunningStat
type to get stats which would require much more work in awk
.
There are easy ideas to round out functionality whose value depends upon your
use case. It may be nice to automatically open files like print 1 > "myPath"
does in awk
, for example, with some tiny autorp.nim
module like this added
as an import in ~/.config/rp
as prelude = "import autorp"
:
import std/[re, tables], cligen/mslice; export re
# NOTE: more imports => slower compiles
template af(nm, openMode) = # Auto-File tables
var `nm Tab`: Table[string, File]
proc nm*(fNm: string): File =
try: `nm Tab`[fNm] except: (let f = open(fNm, openMode); `nm Tab`[fNm]=f; f)
af ar, fmRead; af aw, fmReadWrite; af aa, fmAppend
var cpTab: Table[string, Regex] # cached pattern: compiled expression
proc cp*(pat: string): Regex =
try: cpTab[pat] except: (let f = pat.re; cpTab[pat] = f; f)
you can then just say with 54 keydowns (including "rp ")3:
printf "%s\n" "brown bread mat hair 42" \
"blue cake mug shirt -7" \
"yellow banana window shoes 3.14" |
rp -w'row=~cp"w"' '"myPath".aw.write 4.f,"\n"'
where only table lookups occur on a per-row basis. More run-time efficient is, of course, to use 72 key downs to elide those lookups with instead:
rp -b'let p=cp"w";let o=aw"myPath"' -wrow=\~p 'o.write 4.f,"\n"'
About all that is lost via this approach vs. a new PL is fast interpreter
start-up time and automatic lifting of Table
lookups into variables.
As another example, the begin
section which has the split columns s
in scope
can also be used with ~/.config/rp
to extend the generated template language
with new (probably terse) column-index-keyed procs, such as D1
to parse some
"DateTime" in "common format #1". Since --begin
is a Strings
sequence,
~/.config/rp
code is injected first & in-order. Since rp
uses mergeCfgEnv
you can do RP_CONFIG=x rp ..
to try new configs, have other "dialects", etc.
Populating global namespaces with such features has trade-offs, in compile-time
duration if nothing else. So, it is deferred to ~/.config/rp
authors.
Some nim.cfg --path
tweaks and ~/.config/rp
hacking, and you can cover any
awk
use case with a fully strongly type-checked, compilable prog.lang with
terse but general syntax. You can also use this as a prototyping environment,
copying generated code away from /tmp/rp\*.nim
to be the basis for new,
standalone programs (yes, with cligen/[mfile,mslice]
-dependency as currently
written), probably instead compiled slowly with full optimizations.4
Some more discussion is here which inspired Ben to write Prig and an article about it discussed (at least) here.
Few prog.langs have both easy to enter/terse expressions & fast compiles. For a
comparison point, see crp.md
/crp.nim
in this repo which uses C for the
base-code language or Ben's Go examples.
Footnotes
-
Down to 16+25+30+41+30+19+41=202 for
rp
. Similar \-optimizing forawk
saves only 1 stroke at 215 for 7% less key press work thanawk
. But sure, there may be shells not needing braces protected, single quotes need less "inline thought" than backslash, 7 presses come just from from the length of"rp"
vs."awk"
. Even so, it becomes a hard case to make thatawk
's syntax optimization saves much, but easy to argue it restricts libs & perf. ↩ -
Ben's
prig
article (linked later) uses chars not key presses. Visual length is easier to measure and a more appropriate metric for code reading vs. key presses for code entry. Which matters more all depends. Entry seems more common when selling 1-liners in my experience. Once one considers shell history / command edit analysis, finger reach/strain/etc., comparison gets complex fast. E.g., Caps-Lock can be a thing. Better methodology might start with X event logging over long, realistic sessions and real-time metrics use, but things then become rather user-idiosyncratic and you need pools of users. ↩ -
Equivalent awk might be
awk '/w/{print$5>"myPath"}'
- only 33 keydowns, winning by quite a bit in this fancier case, at the cost of its bespoke syntax. ↩