# Strings: Parameter substitution

This notebook describes basic string manipulation in Bash. Many of the string operations are actually
variations of *parameter expansion*. Not all of the parameter expansion operations are described in this
notebook. See https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html for the
official parameter expansion documentation.

### Length of a string

The length of a string `str` is given by `${#str}`:

In [1]:
str="cisc220"
n=${#str}
echo $n

7


### Substrings

`${str:offset}` expands to the substring of `str` starting at `offset` and going to the end of the string.

`${str:offset:length}` expands to the substring of `str` starting at `offset` and consisting of `length` characters.

In [7]:
s="abcdefghij"
echo "${s:0}"       # all characters starting from first
echo "${s:1}"       # all characters starting from second
echo "${s:2}"       # all characters starting from third

echo "${s:3:5}"     # 5 characters starting from third
echo "${s:3:4}"     # 4 characters starting from third
echo "${s:3:3}"     # 3 characters starting from third

echo "${s:1:0}"     # 0 characters (starting from first)

abcdefghij
bcdefghij
cdefghij
defgh
defg
def



`offset` can be negative in which case it is treated as an offset from the end of the string; however, a space
must be inserted in front of the offset because `:-` indicates a type of substitution in a parameter expansion:

In [6]:
s="abcdefghij"
echo "${s: -1}"      # all characters starting from last
echo "${s: -2}"      # all characters starting from second last
echo "${s: -3}"      # all characters starting from third last

j
ij
hij


`length` can be negative in which case it is treated as an offset from the end of the string. The expansion yields
the substring starting from index `offset` going to but not including index `length`:

In [11]:
s="abcdefghij"
echo "${s:0:-1}"     # all characters starting from first going to but not including last
echo "${s:1:-2}"     # all characters starting from second going to but not including second last
echo "${s: -5:-3}"   # all characters starting from fifth last going to but not including third last

abcdefghi
bcdefgh
fg


Observe that the space is not required before a negative `length` because the interpreter can determine that the
sequence `:-` does not signify a substitution in this context (because at the point where the `:-` occurs the
interpreter already knows that we are in a substring expression).

### Removing the leading part of a string

`${str#pattern}` expands to the string formed by removing the shortest leading part of `str` that matches `pattern`
(the parameter `str` is not modified).

`${str##pattern}` expands to the string formed by removing the longest leading part of `str` that matches `pattern`
(the parameter `str` is not modified).

In [12]:
# get the extension of a filename (not including a path)
fname="file.txt"
echo ${fname#*.}
echo ${fname##*.}

fname="file.txt.zip"
echo ${fname#*.}
echo ${fname##*.}

txt
txt
txt.zip
zip


Inside a script, it is occassionally useful to get the name of the script. Recall that `$0` contains the
pathname of how the script was called. To get the just the name of the script (i.e., the basename of the script),
we can use `##` to remove everything up to and including the final `/`:

---

```sh
#!/bin/bash

# scriptname.sh

script=${0##*/}
echo "\$0          : $0"
echo "script name : $script"

```

---

In [2]:
./scripts/strings/scriptname.sh 

$0         : ./scripts/strings/scriptname.sh
script name : scriptname.sh


### Removing the trailing part of a string

`${str%pattern}` expands to the string formed by removing the shortest trailing part of `str` that matches `pattern` (the parameter `str` is not modified).

`${str%%pattern}` expands to the string formed by removing the longest trailing part of `str` that matches `pattern` (the parameter `str` is not modified).

In [13]:
# get the basename of a filename (not including a path)
fname="file.txt"
echo ${fname%.*}
echo ${fname%%.*}

fname="file.txt.zip"
echo ${fname%.*}
echo ${fname%%.*}

file
file
file.txt
file


### Replacing a substring

`${str/pattern/repl}` expands to the string formed by replacing the first substring of `str` that matches `pattern`
with `repl`.

`${str//pattern/repl}` expands to the string formed by replacing the all substrings of `str` that match `pattern` with `repl`.

In [14]:
s="sparring with a purple porpoise"
echo "${s/p/t}"                          # replace first occurrence of p with t
echo "$s" 

echo "${s//p/t}"                         # replace all occurrences of p with t
echo "$s"

starring with a purple porpoise
sparring with a purple porpoise
starring with a turtle tortoise
sparring with a purple porpoise


In [14]:
s="sparring with a purple porpoise"
echo "${s/p/t}"                          # replace first occurrence of p with t
echo "$s" 

echo "${s//p/t}"                         # replace all occurrences of p with t
echo "$s"

starring with a purple porpoise
sparring with a purple porpoise
starring with a turtle tortoise
sparring with a purple porpoise


`repl` can be the empty string in which case the expansion yields the string formed by deleting occurrences
of `pattern` from `str`:

In [16]:
s="sparring with a purple porpoise"
echo "${s/p/}"                          # delete first occurrence of p
echo "$s" 

echo "${s//p/}"                         # delete all occurrences of p
echo "$s"

sarring with a purple porpoise
sparring with a purple porpoise
sarring with a urle oroise
sparring with a purple porpoise


### Case conversion

`${str,pattern}` expands to the string formed by possibly converting the first character of `str` to lowercase.
The conversion occurs if the first character of `str` matches `pattern`, otherwise the expansion
is equal to `str`.

`${str^pattern}` is similar to `${str,pattern}` but performs an uppercase conversion instead.

In [21]:
str="ABC"

# 1. replace A with a if A matches A
echo "1. "${str,A}

# 2. replace A with a if A is equal to the string ABC
echo "2. "${str,ABC}

# 3. replace A with a if A matches any one of A, B, or C
echo "3. "${str,[ABC]}

# 4. always replace A with a
echo "4. "${str,?}

# 5. always replace A with a (pattern missing, Bash assumes ?)
echo "5. "${str,}

1. aBC
2. ABC
3. aBC
4. aBC
5. aBC


In [23]:
str="xyz"

# 1. replace x with X if x matches X
echo "1. "${str^x}

# 2. replace x with X if x is equal to the string xyz
echo "2. "${str^xyz}

# 3. replace x with X if x matches any one of U, V, or X
echo "3. "${str^[uvx]}

# 4. always replace x with X
echo "4. "${str^?}

# 5. always replace x with X (pattern missing, Bash assumes ?)
echo "5. "${str^}

1. Xyz
2. xyz
3. Xyz
4. Xyz
5. Xyz


`${str,,pattern` and `${str^^pattern}` attempt to convert each character of `str` by matching each character
to `pattern`:

In [25]:
str="ABCXYZ"

# 1. replace each character of str with its lowercase version if that character matches B
echo "1. "${str,,B}

# 2. replace each character of str with its lowercase version if that character matches ABCxyz
echo "2. "${str,,ABCXYZ}

# 3. replace each character of str with its lowercase version if that character matches any one of A-C
echo "3. "${str,,[A-C]}

# 4. always replace each character with its lowercase version
echo "4. "${str,,?}

# 5. always replace each character with its lowercase version (pattern missing, Bash assumes ?)
echo "5. "${str,,}

1. AbCXYZ
2. ABCXYZ
3. abcXYZ
4. abcxyz
5. abcxyz


### Unset or empty strings

There are eight variations of parameter substitution that deal with unset and/or empty strings.
All of the variations test if the parameter is unset. Variations that include a `:` also test
if parameter is equal to the empty string.

| Expansion | Result if `str` is unset | Result if `str` is empty | Result if `str` is non-empty |
| :---- | :---- | :---- | :---- |
| `${str-word}`  | `word` | empty string | `str` |
| `${str:-word}` | `word` | `word` | `str` |
| `${str=word}`  | `word`, assigns `word` to `str` | empty string | `str` |
| `${str:=word}` | `word`, assigns `word` to `str` | `word`, assigns `word` to `str` | `str` |
| `${str+word}`  | nothing is substituted | empty string | `word` |
| `${str:+word}` | nothing is substituted | nothing is substituted | `word` |
| `${str?word}`  | writes `word` to standard error, exits | empty string | `str` |
| `${str:?word}` | writes `word` to standard error, exits | writes `word` to standard error, exits | `str` |

In brief:

* the `-` variations are useful for producing default values from unset or empty variables
* the `=` variations are useful for setting default values to unset or empty variables (but note that
you cannot set the value of a positional parameter this way)
* the `+` variations are useful for producing alternate values from a set variable
* the `?` variations are useful for indicating errors from unset or empty variables

The script named `empty.sh` illustrates the various parameter substitutions. Calling the script with no
command line argument results in `str` being unset, otherwise `str` is assigned the value of the first
command line argument.

Run the following cell to run `empty.sh` with no command line arguments:

In [7]:
./scripts/strings/empty.sh

${UNSET-WORD} 	WORD	str=UNSET
${UNSET:-WORD}	WORD	str=UNSET
${UNSET=WORD} 	WORD	str=WORD
${UNSET:=WORD}	WORD	str=WORD
${UNSET+WORD} 		str=UNSET
${UNSET:+WORD}		str=UNSET
./scripts/strings/empty.sh: line 58: str: WORD


: 1

The first column is the parameter substitution (the script uses the word `UNSET` to indicate an unset variable),
the second column is the result of the substitution, and the third column is the value stored in `str` (again,
the script uses the word `UNSET` to indicate that `str` is unset).

Notice that `${str?word}` results in an error and the script exiting before `${str:?word}` can be attempted.

Run the following cell to run `empty.sh` with the empty string:

In [6]:
./scripts/strings/empty.sh ""

${-WORD} 		str=
${:-WORD}	WORD	str=
${=WORD} 		str=
${:=WORD}	WORD	str=WORD
${+WORD} 	WORD	str=
${:+WORD}		str=
${?WORD} 		str=
./scripts/strings/empty.sh: line 63: str: WORD


: 1

Run the following cell to run `empty.sh` with a non-empty string:

In [5]:
./scripts/strings/empty.sh xyz

${xyz-WORD} 	xyz	str=xyz
${xyz:-WORD}	xyz	str=xyz
${xyz=WORD} 	xyz	str=xyz
${xyz:=WORD}	xyz	str=xyz
${xyz+WORD} 	WORD	str=xyz
${xyz:+WORD}	WORD	str=xyz
${xyz?WORD} 	xyz	str=xyz
${xyz:?WORD}	xyz	str=xyz
