Performs freq-analysis of "words", on directories and files.
Most of the code has been copy-pasted from myself, lol
This needs a Rust toolchain. Recommended command:
cargo install --path . --config 'build.rustflags="-C target-cpu=native"'Assuming you've downloaded and cded into the repo
Invoke the program by passing the paths whose stats you want to get:
# example
fr3 file.txt directory/
file.txt
test 3
bruh 1
directory/
the 64
wiki 12
yeet_3 6
# non-UTF8 paths are supportedOr simply pass nothing, if you want stats about WD, identical to fr3 ..
Important
The default regex is for human use only. It'll always be API-unstable, as it can change across patch versions and maybe even across runs of the same version!
Note
Tie-breaking is unspecified. That is, if 2 or more words have the same count, they will be printed in arbitrary order within their partition.
I'm considering to lexicographically sort ties
You can define what a "word" is by passing a regex:
fr3 --re '[^,\n\r]+' table.csv
table.csv
my record 3
joe 2
zayda 1
1 1
0 1
2 1
id 1
name 1
column 1Prepend (?i) for case-insensitive regex. Note that counting is always case-sensitive:
fr3 --re '(?i)\bthe\b' prose.md
prose.md
the 6
The 3
THE 1The output format, while rudimentary, is mostly unambiguous. The only way (that I know) it can be ambiguous, is if your words contain \n, but that's just asking for trouble, so please let me pretend \n doesn't exist :3
Here's one last example usage; list your top 8 most used cmds:
fr3 -r.+ ~/.bash_history -strue | head -8
# equivalent to
sort ~/.bash_history | uniq -c | sort -nr | head -8Note the -strue (--sort=true), this is needed because pipes aren't TTYs
This program is single-threaded, as it's IO-bound.
I'm considering to support matching delimiters by regex (split)