You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+78Lines changed: 78 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,3 +71,81 @@ match 24 files found?
71
71
skip No files found.
72
72
Corresponding regex: \d+ files? found\?
73
73
```
74
+
### whitespaces
75
+
Whitespaces include space(\_), tab(\t), newline(\n) and carriage return(\r). Apart from these metacharacters, \s covers all whitespaces.
76
+
```text
77
+
match 1. abc
78
+
match 2. abc
79
+
match 3. abc
80
+
skip 4.abc
81
+
Corresponding regex: \d\.\s+abc
82
+
```
83
+
84
+
### starting and ending
85
+
It is best practice to write as specific regular expressions as possible to ensure that false positivesdo not creep in. E.g. search for 'success' in a file also taking into account 'Error: unsuccessful attempt'. To tighten patterns, **(^)hat** and **($)dollar** signs are used to mark the start and end of a line. ***Note***: This hat sign is different from the one used earlier in this tutorial to exclude characters.
86
+
```text
87
+
match Mission: successful
88
+
skip Last Mission: unsuccessful
89
+
skip Next Mission: successful upon capture of target
90
+
Corresponding regex: ^Mission: successful$
91
+
```
92
+
93
+
### match groups
94
+
Regular expressions allow information extraction for further processing. This is done by defining groups of characters and capturing them using the special parentheses **(** and **)** metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. For example, **^(IMG\d+\.png)$** will capture and extract the full image filename, but if extension is not required, the pattern will be **^(IMG\d+)\.png$** which only captures the part before the period.
Nested groups can be used to extract multiple layers of information. Using previous example,the filename and the picture number both can be extracted using the same pattern by writing an expression like **^(IMG(\d+))\.png$**. The nested groups are read from left to right in the pattern, with the first capture group being the contents of the first parentheses group, etc.
104
+
```text
105
+
capture Jan 1987 -> Jan 1987 1987
106
+
capture May 1969 -> May 1969 1969
107
+
capture Aug 2011 -> Aug 2011 2011
108
+
Corresponding regex: (\w+\s(\d+))
109
+
```
110
+
111
+
### conditionals
112
+
The **| (logical OR, aka. the pipe)** is used to denote different possible sets of characters. Example, "Buy more (milk|bread|juice)" will match only the strings _Buy more milk_, _Buy more bread_, or _Buy more juice_.
113
+
```text
114
+
match I love cats
115
+
match I love dogs
116
+
skip I love logs
117
+
skip I love cogs
118
+
Corresponding regex: I love (cats|dogs)
119
+
```
120
+
121
+
### back referencing and other special characters
122
+
Back referencing varies depending on the implementation. However, many systems allow to reference captured groups by using **\0** (usually the full matched text), **\1** (group 1), **\2** (group 2), etc. For example, **"\2-\1"** to put the second captured data first, and the first captured data second.
123
+
Additionally, there is a special metacharacter \b which matches the boundary between a word and a non-word character. It's most useful in capturing entire words (for example by using the pattern \w+\b).
0 commit comments