## Pattern matching: `grep`ing and globbing
* if we know the pattern of a string, but not the exact form
* first globbing, direct feature of your shell (i.e. can be different between bash, zsh..)

assume you want to list and count the ipython notebooks in the current directory:

In [None]:
ls

In [None]:
ls | wc -l

basically:
* `*` matches arbitrary (amounts and types of) characters
* `?` matches ONE arbitraty character
* `**` matches paths across directory boundaries (in `bash` with `shopt -s globstar`, by default in `zsh`)

also not globs but allows one 'string' to be expanded into multiple things:
* `{x,y}` expands to both `x` and `y`

In [None]:
ls *.ipynb | wc -l

In [None]:
ls 0?0_*.ipynb | wc -l

In [None]:
ls 0{0,1,2,3,7}0_*.ipynb | wc -l

In [None]:
ls 0{0,1,2,3,7,10}0_*.ipynb | wc -l

In [None]:
diff /home/julgoe/Documents/utils/{fastAndDeep,fastAndDeep_old/fastAndDeep_betweenRepos}/src/experiment.py | wc -l

In [None]:
shopt -s globstar
ls ../**/*.tex | head

## regex and grep and sed

* regex allows more complicated patterns, including 'matching groups'
* IMHO confusing and complicated, but speeds up tasks amazingly!
* very helpful ressource at the beginning: https://regex101.com/

for just matching we use `grep`, more on that in a second

* general idea:
  * everything that's not a number is a command
  * `[]` denote groups of charactes:
    * leading `^` inverts group
    * `-` used to shorten ranges like `a-z`
  * `.` is 'any char'
  * each letter/group can have a multiplier:
    * `*` arbitrary number of times (0 or more)
    * `+` at least once
    * `?` 0 or 1 time
    * `{n,m}` between n and m times
    * this can be greedy or nongreedy (matching as many or as little as possible), for us usually greedy
  * special abbreviations for classes like chars, numbers, whitespace
  * `^` is line start, `$` is line end
  * `()` for matching groups, also different patterns

In [None]:
testpattern () { echo -n "Pattern '$1': "; echo "foobar4206969" | grep --color=auto -P $1; }
testpattern "foo"

In [None]:
testpattern "[for]"

In [None]:
testpattern "[a-z079]"

In [None]:
testpattern "[a-z079]{2,5}"

In [None]:
testpattern "(foo|420)"

`grep` either looks in the `stdin` or to the files it is given.
the output is generally the lines that match the pattern

important options are:
* `--color=` to highlight the matches
* `-r` look recursively in all the folders
* `-P` extended, perl regex
* `-v` invert the matches
* `-i` case-insensitive
* `-o` only return the parts matching the pattern (i.e. the coloured bits above)

In [None]:
echo "foobar4206969" | grep --color=auto -P "(foo|420)"
echo
echo "foobar4206969" | grep --color=auto -Po "(foo|420)"

This last one has a special use (`-o`):
* `\K` means that the 'match begins at that point
* similarly, `(?=PATTERN)` matches the PATTERN but doesn't include it in the match

In [None]:
iw dev wlp0s20f3 link

In [None]:
iw dev wlp0s20f3 link | grep -P "SSID:"

In [None]:
iw dev wlp0s20f3 link | grep -oP "SSID: \K.*"

In [None]:
ls | grep -oP "^[0-9]{3}_\K.*(?=\.ipynb)"

### `sed` and replacements

* search pattern and replace with something else
* syntax `'s/PATTERN/REPLACEMENT/OPTIONS'`
* the delimeter (`/` in this case) can be chosen, e.g. `#`. this helps when working with paths including slashes
* matched pattern can be reused in replacement
* can be used in your favourite editor as well
  * in `vim`: select some text, do `:s/....`;
  * or for the whole file do `:%s/...`

In [None]:
ls | grep -P "[0-9]+.*\.ipynb" | sed 's/.ipynb//'

In [None]:
ls | grep -P "[0-9]+.*\.ipynb" | sed -E 's/(.*).ipynb/iPython Notebook: \1/'

Can work on files as well, also `-i/--in-place`, example (this will change files, be sure to only execute in sensible places)

```
sed -i 's/selfpredicting/self-predicting/g' **/*.tex
```


For a fun one look e.g. [here](https://github.com/JulianGoeltz/automised_latex_template/blob/main/additionalInfo.md#applying-changes)

**Exercise:**
* Write a function `clearFilename` that get's a path and returns only the alphanumeric parts (bonus for doing it once with sed, and once with grep)
* Write a function that takes a filename 'foobarr_2.pickle' and returns it with the extension, the number and underscore removed. How would you automatically rename all/a subset of files in a folder using this function?
* Given an squeue function like below, return all jobs running on a RyzenHost with only the last 3 digits of their jobid and their username

In [None]:
filename='foobar_2.pickle'

In [None]:
squeue () {
    cat <<-EOF
Thu Dec 01 19:54:51 2022
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) 
           4479803      cube clustern quiggeld  PENDING       0:00   6:00:00      1 (launch failed requeued held) 
           4486967      cube clustern quiggeld  PENDING       0:00   6:00:00      1 (launch failed requeued held) 
           4487128      cube clustern quiggeld  PENDING       0:00   6:00:00      1 (launch failed requeued held) 
           4487826      cube clustern quiggeld  PENDING       0:00   6:00:00      1 (launch failed requeued held) 
           4488503      cube clustern quiggeld  PENDING       0:00   6:00:00      1 (launch failed requeued held) 
           4492183      cube clustern quiggeld  PENDING       0:00   6:00:00      1 (launch failed requeued held) 
           4503427      cube clustern quiggeld  RUNNING    2:22:00   6:00:00      1 RyzenHost0 
           4503266      cube clustern quiggeld  RUNNING    4:11:31   6:00:00      1 RyzenHost0 
           4503251      cube clustern quiggeld  RUNNING    4:19:00   6:00:00      1 RyzenHost0 
           4503355  cubectrl     bash rheinema  RUNNING    3:03:15  12:00:00      1 HBPHost19 
           4459022      einc singular luboeins  RUNNING 21-05:45:01 UNLIMITED      1 EINCHost0 
           4460179      einc singular luboeins  RUNNING 20-01:38:19 UNLIMITED      1 EINCHost0 
           4502964    extoll     bash  thommes  RUNNING    7:00:00 UNLIMITED      1 EXTHost1 
EOF
}

squeue