# "Filtering file content with the `grep` command"
> Cheatsheet on the grep command in macos

- toc: true 
- badges: true
- comments: true
- categories: [cheatsheet]

> Important: The exclamation mark `!` and percent `%` symbols at the beginning of the lines should be ignored.  Indeed, this page is generated from a `jupyter notebook`, each cell of which runs `python` code.  In order to run `shell` commands (as "magic commands") one inserts a `!` or `%` at the beginning of the line.

# References

## Documentation. 
- [`man` page documentation](https://ss64.com/osx/grep.html)
- [Examples using `grep`](https://www.tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html) (tldp.org)
- [Character classes and bracket expressions](https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html) (gnu.org)
- [A large collection of Unix/Linux ‘grep’ command examples](https://alvinalexander.com/unix/edu/examples/grep.shtml)
- [grep or and not operators](https://www.thegeekstuff.com/2011/10/grep-or-and-not-operators/)
- [15 practical unix grep command examples](https://www.thegeekstuff.com/2009/03/15-practical-unix-grep-command-examples/)

- [What's the difference between `\b` and `\<` in the `grep` command?](https://unix.stackexchange.com/questions/121739/whats-the-difference-between-b-and-in-the-grep-command) (unix.stackexchange)
- [Tutorial: Find Strings in Text Files Using Grep with Regular Expressions](https://thenewstack.io/tutorial-find-strings-in-text-files-using-grep-with-regular-expressions/) (Matt Zand, thenewstack)
- [Regular Expressions In grep examples](https://www.cyberciti.biz/faq/grep-regular-expressions/) (cyberciti.biz)
- [regex quickstart](https://www.rexegg.com/regex-quickstart.html) (Rex Egg)

## Sample text files
The sample text files used in this post are directly available from the OS:
- calendar files in `/usr/share/calendar`
- dictionary words in `/usr/share/dict/words`
- meaning of flowers in `/usr/share/misc/flowers`
- birth token `/usr/share/misc/birthtoken`
- the ascii table `/usr/share/misc/ascii`
- units `/usr/share/misc/units.lib`

# System information

Version of `grep` used in this page:

In [1]:
!grep -V

grep (BSD grep) 2.5.1-FreeBSD


Operating system used in this page:

In [51]:
!uname -v

Darwin Kernel Version 19.4.0: Wed Mar  4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64


# Basic usage

## Match a string

In a single file:

In [3]:
!grep "Alan" /usr/share/calendar/calendar.birthday

06/07	Alan Mathison Turing died, 1954
06/23	Alan Mathison Turing born, 1912


In multiple files:

In [4]:
!grep "Alan" /usr/share/calendar/calendar.*

/usr/share/calendar/calendar.birthday:06/07	Alan Mathison Turing died, 1954
/usr/share/calendar/calendar.birthday:06/23	Alan Mathison Turing born, 1912
/usr/share/calendar/calendar.computer:06/07	Alan Mathison Turing died, 1954
/usr/share/calendar/calendar.computer:06/23	Alan Mathison Turing born, 1912
/usr/share/calendar/calendar.freebsd:06/06	Alan Eldridge <alane@FreeBSD.org> died in Denver, Colorado, 2003
/usr/share/calendar/calendar.history:06/28	Supreme Court decides in favor of Alan Bakke, 1978


## Insert line number: `-n`

In [None]:
!grep -n Alan /usr/share/calendar/calendar.birthday

## Highlight match: `--color`
>Note: This is not currently visible once the `jupyter notebook` has been converted to `html`.

In [5]:
!grep --color "Alan" /usr/share/calendar/calendar.birthday

06/07	[01;31m[KAlan[m[K Mathison Turing died, 1954
06/23	[01;31m[KAlan[m[K Mathison Turing born, 1912


## Match count: `-c`

In a single file:

In [6]:
!grep -c "Alan" /usr/share/calendar/calendar.birthday

2


In multiple files:

In [10]:
!grep -c "Alan" /usr/share/calendar/calendar.[bc]*

/usr/share/calendar/calendar.birthday:2
/usr/share/calendar/calendar.christian:0
/usr/share/calendar/calendar.computer:2
/usr/share/calendar/calendar.croatian:0


## Case-insensitive match

In [12]:
!grep --color -i "unix" /usr/share/calendar/calendar.computer

01/01	The Epoch (Time 0 for [01;31m[KUNIX[m[K systems, Midnight GMT, 1970)
05/19	[01;31m[KUNIX[m[K is 10000 days old, 1997
08/14	First [01;31m[KUnix[m[K-based mallet created, 1954


## File names with a match: `-l`

In [13]:
!grep -l "Alan" /usr/share/calendar/calendar.*

/usr/share/calendar/calendar.birthday
/usr/share/calendar/calendar.computer
/usr/share/calendar/calendar.freebsd
/usr/share/calendar/calendar.history


## Position of match in file: `-b`

In [14]:
!grep -b "Alan" /usr/share/calendar/calendar.birthday

6906:06/07	Alan Mathison Turing died, 1954
7346:06/23	Alan Mathison Turing born, 1912


## Include/exclude files: `--include`, `--exclude`

Exclude file in search:

In [16]:
!grep "Alan" --exclude /usr/share/calendar/calendar.computer /usr/share/calendar/calendar.*

/usr/share/calendar/calendar.birthday:06/07	Alan Mathison Turing died, 1954
/usr/share/calendar/calendar.birthday:06/23	Alan Mathison Turing born, 1912
/usr/share/calendar/calendar.freebsd:06/06	Alan Eldridge <alane@FreeBSD.org> died in Denver, Colorado, 2003
/usr/share/calendar/calendar.history:06/28	Supreme Court decides in favor of Alan Bakke, 1978


Include files:

In [18]:
!grep "Alan" --include "calendar.*" /usr/share/calendar/*

/usr/share/calendar/calendar.birthday:06/07	Alan Mathison Turing died, 1954
/usr/share/calendar/calendar.birthday:06/23	Alan Mathison Turing born, 1912
/usr/share/calendar/calendar.computer:06/07	Alan Mathison Turing died, 1954
/usr/share/calendar/calendar.computer:06/23	Alan Mathison Turing born, 1912
/usr/share/calendar/calendar.freebsd:06/06	Alan Eldridge <alane@FreeBSD.org> died in Denver, Colorado, 2003
/usr/share/calendar/calendar.history:06/28	Supreme Court decides in favor of Alan Bakke, 1978


## Whole word match: `-w`

In [52]:
!grep -w  -n --color "Francis" /usr/share/calendar/calendar.birthday

31:01/22	Sir [01;31m[KFrancis[m[K Bacon born, 1561
273:11/20	Robert [01;31m[KFrancis[m[K Kennedy (RFK) born in Boston, Massachusetts, 1925


... as opposed to string matches:

In [53]:
!grep -n --color "Francis" /usr/share/calendar/calendar.birthday

31:01/22	Sir [01;31m[KFrancis[m[K Bacon born, 1561
104:03/30	[01;31m[KFrancis[m[Kco Jose de Goya born, 1746
129:04/29	William Randolph Hearst born in San [01;31m[KFrancis[m[Kco, 1863
151:05/30	Mel (Melvin Jerome) Blanc born in San [01;31m[KFrancis[m[Kco, 1908
273:11/20	Robert [01;31m[KFrancis[m[K Kennedy (RFK) born in Boston, Massachusetts, 1925


## Lines before/after/around match

Show two lines before each match:

In [21]:
!grep -n --color -B2 "uncomputed" /usr/share/dict/words

212942-uncomputableness
212943-uncomputably
212944:[01;31m[Kuncomputed[m[K


Show three lines after each match:

In [22]:
!grep -n --color -A3 "uncomputed" /usr/share/dict/words

212944:[01;31m[Kuncomputed[m[K
212945-uncomraded
212946-unconcatenated
212947-unconcatenating


Show two lines before and three lines after each match:

In [28]:
!grep -n --color -B2 -A3 "uncomputed" /usr/share/dict/words

212942-uncomputableness
212943-uncomputably
212944:[01;31m[Kuncomputed[m[K
212945-uncomraded
212946-unconcatenated
212947-unconcatenating


Show two lines around match:

In [29]:
!grep -n --color -C2 "uncomputed" /usr/share/dict/words

212942-uncomputableness
212943-uncomputably
212944:[01;31m[Kuncomputed[m[K
212945-uncomraded
212946-unconcatenated


# Regular expressions

- `?` The preceding item is optional and matched at most once
- `*` The preceding item will be matched zero or more times
- `+` The preceding item will be matched one or more times
- `{n}` The preceding item is matched exactly n times
- `{n,}` The preceding item is matched n or more times
- `{,m}` The preceding item is matched at most m times
- `{n,m}` The preceding item is matched at least n times, but not more than m times

- `\<` matches the beginning of a word
- `\>` matches the end of a word
- `\b` matches both boundaries if at the end or at the beginning

Classes of characters:  
- `[[:alnum:]]`: Alphanumeric characters.
- `[[:alpha:]]`: Alphabetic characters
- `[[:blank:]]`: Blank characters: space and tab.
- `[[:digit:]]`: Digits: ‘0 1 2 3 4 5 6 7 8 9’.
- `[[:lower:]]`: Lower-case letters: ‘a b c d e f g h i j k l m n o p q r s t u v w x y z’.
- `[[:space:]]`: Space characters: tab, newline, vertical tab, form feed, carriage return, and space.
- `[[:upper:]]`: Upper-case letters: ‘A B C D E F G H I J K L M N O P Q R S T U V W X Y Z’.


## Word anchors: `^`, `$`, `\>`, `\>`, `\b`

Lines beginning with pattern:

In [32]:
!grep -n --color "^compute" /usr/share/dict/words

40564:[01;31m[Kcompute[m[K
40565:[01;31m[Kcompute[m[Kr


Lines ending with pattern:

In [33]:
!grep -n --color "compute$" /usr/share/dict/words 

40564:[01;31m[Kcompute[m[K
117000:mis[01;31m[Kcompute[m[K
164643:re[01;31m[Kcompute[m[K


Beginning a word:

In [34]:
!grep -n --color '\<compute' /usr/share/dict/words

40564:[01;31m[Kcompute[m[K
40565:[01;31m[Kcompute[m[Kr


Ending a word:

In [35]:
!grep -n --color 'compute\>' /usr/share/dict/words

40564:[01;31m[Kcompute[m[K
117000:mis[01;31m[Kcompute[m[K
164643:re[01;31m[Kcompute[m[K


Words of specified length:

In [55]:
!grep -n --color '\<.\{24\}\>' /usr/share/dict/words

72632:[01;31m[Kformaldehydesulphoxylate[m[K
140339:[01;31m[Kpathologicopsychological[m[K
175108:[01;31m[Kscientificophilosophical[m[K
200796:[01;31m[Ktetraiodophenolphthalein[m[K
203042:[01;31m[Kthyroparathyroidectomize[m[K


Words with fixed length and speficied starting and ending characters:

In [56]:
!grep -n --color '\<y...h\>' /usr/share/dict/words

234368:[01;31m[Kyamph[m[K
234449:[01;31m[Kyarth[m[K
234632:[01;31m[Kyerth[m[K
234702:[01;31m[Kyirth[m[K
234824:[01;31m[Kyouth[m[K


Words with specified first and last characters, of any length:

In [57]:
!grep -n'\<q.*x\>' /usr/share/dict/words

161224:quadratrix
161400:quadruplex
161963:quincunx


## Boolean `OR`

In [42]:
!grep -n --color -E 'computer|hardware' /usr/share/dict/words

40565:[01;31m[Kcomputer[m[K
82436:[01;31m[Khardware[m[K
82437:[01;31m[Khardware[m[Kman


In [43]:
!grep -n --color -E 'Rose|Violet' /usr/share/misc/birthtoken

4:February:Amethyst:[01;31m[KViolet[m[K
8:June:Pearl:[01;31m[KRose[m[K


## Character classes

Match whole words `gray` or `grey`:

In [44]:
!grep -n --color '\<gr[ae]y\>' /usr/share/dict/words

79755:[01;31m[Kgray[m[K
79976:[01;31m[Kgrey[m[K


Match words of two characters, first an upper case, the second lower case:

In [47]:
!grep --color '\<[[:upper:]][[:lower:]]\>' /usr/share/misc/birthtoken 

May:Emerald:Lily [01;31m[KOf[m[K The Valley


Match numbers of any length:

In [50]:
!grep -n --color -E '\d+' /usr/share/calendar/calendar.australia

4: * $FreeBSD: src/usr.bin/calendar/calendars/calendar.australia,v [01;31m[K1[m[K.[01;31m[K7[m[K [01;31m[K2006[m[K/[01;31m[K10[m[K/[01;31m[K06[m[K [01;31m[K23[m[K:[01;31m[K20[m[K:[01;31m[K01[m[K flz Exp $
10:LANG=en_AU.ISO[01;31m[K8859[m[K-[01;31m[K1[m[K
13:Jan [01;31m[K26[m[K	Australia Day
15:Apr [01;31m[K25[m[K	Anzac Day
20:Mar [01;31m[K18[m[K	Canberra Day (ACT)
21:[01;31m[K8[m[K/MonFirst	Bank Holiday (ACT, NSW)
22:[01;31m[K10[m[K/MonFirst	Labour Day (ACT, NSW, SA)
25:[01;31m[K3[m[K/MonSecond	Labour Day (Vic)
29:Feb [01;31m[K11[m[K	Regatta Day (Tas)
30:Feb [01;31m[K27[m[K	Launceston Cup (Tas)
31:Mar [01;31m[K11[m[K	Eight Hours Day (Tas)
33:Oct [01;31m[K10[m[K	Launceston Show Day (Tas)
34:Oct [01;31m[K24[m[K	Hobart Show Day (Tas)
35:Nov [01;31m[K04[m[K	Recreation Day (N Tas)
39:Dec [01;31m[K26[m[K	Proclamation Day holiday (SA)
42:[01;31m[K3[m[K/MonFirst	Labour Day (WA)
43:[

# TODO

- `v`
- search directory recursively