This repository contains a collection of Bash functions that I have accumulated over the years. They are small convenience tools intended to shorten repetitive or cumbersome command-line workflows and to improve the interactive user experience.
The functions target tasks commonly encountered during day-to-day work on HPC systems, such as copying, renaming, or inspecting large numbers of files, often with lightweight progress or summary feedback. Each function is documented below with a brief description and an example use case.
- These functions are intended for interactive use in Bash on Linux-based HPC systems.
- They rely on GNU core utilities and may not behave identically on macOS or BSD systems.
- As with any bulk file operation, it is recommended to test on a small subset first.
- Note: commands that touch large numbers of files may generate substantial metadata traffic on shared filesystems.
Nothing to install. Just grab the script if you like what it does and put it in your bash profile.
These functions are provided as-is and are meant to be copied, adapted, or modified to suit local environments.
pfunsLists all functions detected in the bash profile along with the line they start on.
Works for all functions, not just the ones here.
--- Functions found in /home/user/.bash_profile ---
concatenate (line 140)
cpp (line 188)
dirdiff (line 233)
filetree (line 292)
glimpse (line 314)
histgrep (line 396)
lines (line 449)
math (line 528)
max_width (line 561)
note (line 596)
pfuns (line 610)
rename_pattern (line 661)
rm_but (line 709)
rm_top (line 777)
sizesum (line 803)
treesize (line 826)
update_timestamps (line 877)
----------------------------------------------------
Concatenate multiple text files, inserting a separator between them.
Usage: concatenate [-s <separator>] <file> [file ...]
Options:
-s Separator characters
Example: concatenate -s "\n\n" *.sh > all.shDefaults to a single newline. E.g., can be used to paste all bash scripts in this repo together:
bash concatenate.sh
concatenate *.sh > all_funs.sh
concatenate ~/.bash_profile all_funs.sh > ~/.bash_profile_newCopy a large number of files with a progress bar.
Usage: cpp <source> <destination>
Example: cpp ../*.fa tmp/It does not do byte-tracking for single large files.
Simply displays a progress bar for the copying process.
cpp ../*.fa tmp/[#........................................] 73/20339
Compare the contents of two directories.
Usage: dirdiff [-f] <dir1> <dir2>
Options:
-f Ignore file extensions; compare only basenames
Output semantics:
- <file> present only in <dir1>
+ <file> present only in <dir2>The first argument is always the reference directory.
E.g. tmp1 directory has file1 and file2, and tmp2 has file2 and file3, then:
dirdiff tmp1 tmp2- file1
+ file3
i.e., compared to tmp1, tmp2 lacks file1 but has file3.
Can be more useful in certain cases with the -f flag, which ignores file extensions.
Print a simple tree-like view of a directory.
Usage: filetree <directory>
Example: filetree .
filetree srcfiletree tmp1filetree tmp1
tmp1
|____file1
|____file2
This function is only relevant if you work in R.
dplyr::glimpse() is a very natural way of inspecting rectangular data with headers.
The bash function glimpse () calls data.table::fread() on the data and passes it to dplyr::glimpse() for display.
Preview the structure of a delimited text file via dplyr::glimpse().
gz files are supported.
Usage: glimpse [-h] [-n <rows>] <file>
Options:
-h Pass this flag for headerless files
-n Number of rows to parse through to get
to the data (should be > comments + any header)
Example: glimpse data.txt
glimpse -h data_no_header.csv
glimpse -n 5 data_with_comments.tsv.gzglimpse gencc_2025-11-06.tsvRows: 1
Columns: 15
$ uuid <chr> "GENCC_000101-HGNC_10896-OMIM_182212-HP_0000006-GENCC_100001"
$ gene_curie <chr> "HGNC:10896"
$ gene_symbol <chr> "SKI"
$ disease_title <chr> "Shprintzen-Goldberg syndrome"
$ disease_original_curie <chr> "OMIM:182212"
$ classification_title <chr> "Definitive"
$ moi_curie <chr> "HP:0000006"
$ moi_title <chr> "Autosomal dominant"
$ submitted_as_date <dttm> 2018-03-30 13:31:56
$ submitted_as_public_report_url <lgl> NA
$ submitted_as_notes <lgl> NA
$ submitted_as_pmids <lgl> NA
$ submitted_as_assertion_criteria_url <chr> "PMID:\302\24028106320"
$ submitted_as_submission_id <int> 1034
$ submitted_run_date <IDate> 2020-12-24
Search shell history with optional negative filtering.
Usage: histgrep <pattern> [OPTIONS]
Options:
-v, --invert-match Invert match
--not <pattern> Exclude matches
Example: histgrep foldx
histgrep slurm --not sbatchMake sure to add this line to ~/.bashrc for the date-time format to work:
export HISTTIMEFORMAT="%Y-%m-%d %T "histgrep singularity --not module2025-12-18 12:57:10 singularity --version
2025-12-18 12:57:19 which singularity
Print ranges or lists of lines from a file.
Usage: lines [-l] <range|list> <file>
Options:
-l Show line numbers
Examples: lines 5-10 file.txt
lines 1:3 file.txt
lines -l 2,4,7,9,32 file.txt
lines -l 3-5 ~/.bash_profile 3 if [ -f ~/.bashrc ]; then
4 . ~/.bashrc
5 fi
Perform basic arithmetics in one line.
Usage: math <expression>
Example: math 3+2+1^4.5
math "(3+2-1)^4.5" # use quotes when expression has brackets
Operators: + - * or x / or : ^ or **
This is a primitive wrapper around bc and awk.
I'm just very used to using R as a calculator, and I often miss this functionality.
math 3+2+2^5+1^42*12/3.1440.8217
Report the maximum line width in a text file and where it occurs.
Usage: max_width <file>
Example: max_width query.fasta
max_width A0A024R1R8.famax width: 228 on line: 1297
Create a note file typed verbatim.
Usage: note README.txt
Exit: Ctrl-D
I found this useful when working in a directory and want to quickly jot down notes.
Not substantially faster than using an editor, but occasionally convenient (and fun).
Usage: rename_pattern <pattern> [replacement]
Example:
rename_pattern ".clustal" ".fa" # replace .clustal with .fa
rename_pattern _draft # remove _draft from file names
rename_pattern ".clustal" ".fa" [##.......................................] 230/20339
Remove all files in the current directory except those matching one or more patterns.
Usage: rm_but [-d] <pattern1> [pattern2 ...]
Options:
-d Include directories
Example: rm_but *.txt
rm_but -d README
E.g., with file.aln, file.fa, file.muscle in a directory:
rm_but ".aln" Keeping: file.aln
removed 'file.fa'
removed 'file.muscle'
Remove top N lines of a text file.
Usage: rm_top <n> <file>
Example: rm_top 1 file.txt > file_no_header.txt
Really only useful for removing headers...
Calculate the total size of regular files in a directory (excluding subdirectories).
Usage: sizesum <path>
Example: sizesum .
It is useful when results (files) are being generated in a directory while other directories are present.
sizesum .52.27 GB
Display directory sizes (human-readable) for specified paths.
Usage: treesize -d <max_depth> <path> [<path> ...]
Options: -d <max_depth> Limit depth of recursion (default 1)
Example: treesize -d 2 .
treesize -d 2 . 13.5 GB .
6.0 GB ./a_subset
4.0 GB ./b_subset
860.3 MB ./c_subset
91.7 MB ./d_subset
Update timestamps for all files in a directory and its subdirectories.
Usage: update_timestamps <directory>
Example: update_timestamps /tmp/scratch/
I found this useful when working in a scratch directory that gets wiped every few weeks, and backup is not immediately possible. Runnning this command will ensure that all the files get a fresh timestamp to win you time. (Noting that scratch spaces with such rules exist for a reason, so this is only for emergency scenarios).
update_timestamps /tmp/scratch/Timestamps updated for files in /tmp/scratch/ and its subdirectories.