# Work in progress : )

# Introduction to regular expression with examples 

by Victoria Liao

This tutorial is written in Scala but the patterns can be used in other languages

-----------------------------------------------------------

### Table of content

1. What is regular expression
1. Regular expression patterns
    - Basic tokens
    - Practical problems
1. Reference

-----------------------------------------------------------


## 1. What is regular expression?

Everything in a string is essentially a character, and we are writing patterns to match a specific string. This
pattern is called regular expression. 

Regular expressions are useful in extracting information from text.

---------------------------------------

## 2. Regular expression patterns

To make it easier, I always use `#findFirstIn` in the section.

### 2.1 Basic tokens

#### Exact match 

1. "abc": Match a substring that is the same as the pattern

--------------------

#### Digit

1. \d: Any digit from 0 to 9
1. \D: Any Non-digit character

--------------------

#### Wildcard

1. Dot ".": the wildcard

--------------------

#### Match character

1. [abc]: Match specific characters
1. [^abc]: Exclude specific characters

--------------------

#### Range

1. [a-z]: Match a char within the range
1. [^a-z]: Exclude a char within the range
1. [a-z0-9]: Match a char within multiple ranges

--------------------

#### Alphanumeric

1. "\w": Any Alphanumeric character
1. "\W": Any Non-alphanumeric character

----------------------

#### Repetitions 

1. {m}:	m repetitions
1. {m,n}: m to n repetitions
1. {m,}: m to infinite repetitions
1. Kleene Star * : 0 or more repetitions
1. Kleene Plus + :	1 or more repetitions

----------------------

In [169]:
//  Scala dependency 
import scala.util.matching.Regex

[32mimport [39m[36mscala.util.matching.Regex[39m

----------------------

### Exact match 

#### "abc": Match a substring that is the same as the pattern

`"foo 1"` match `"foo 1"` in `"foo 1 fooo"`


In [170]:
val pattern = "foo 1".r
val text = "foo 1 fooo"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = foo 1
[36mtext[39m: [32mString[39m = [32m"foo 1 fooo"[39m
[36mres169_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"foo 1"[39m)

-----------------------------------

### Digit

#### \d: Any digit from 0 to 9 

The preceding slash `\` distinguishes it from the simple d character and indicates that it is a metacharacter.

> **Note**: need to use double slash in Scala string for \d - `"\\d".r`

`"\\d"`: 

- match `1` in `1234`  
- match `2` in `2 foo`

-------------------------

### \D: Any Non-digit character

 `"\\D"` :
 - match `" "` (space) in `1234 a` 
 - match `a` in `a 2 foo`


In [171]:
val pattern = "\\d".r
val text = "1234"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \d
[36mtext[39m: [32mString[39m = [32m"1234"[39m
[36mres170_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"1"[39m)

In [172]:
val pattern = "\\d".r
val text = "2 foo"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \d
[36mtext[39m: [32mString[39m = [32m"2 foo"[39m
[36mres171_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"2"[39m)

In [173]:
val pattern = "\\D".r
val text = "1234 a"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \D
[36mtext[39m: [32mString[39m = [32m"1234 a"[39m
[36mres172_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m" "[39m)

In [174]:
val pattern = "\\D".r
val text = "a 2 foo"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \D
[36mtext[39m: [32mString[39m = [32m"a 2 foo"[39m
[36mres173_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"a"[39m)

---------------------

### Wildcard

##### Dot ".": the wildcard

A wildcard is a card that can represent any card in the deck in poker games. Similarly, . (dot)  can match any single character (letter, digit, whitespace, everything). 


**Note**: 
```
.  is the wildcard
\\. is the dot symbol / period
```
-----------

`...\\.`: match 
- `"cat."`
- `"896."`
- `"?=+."`	

and skip	
- `abc1`

In [175]:
val pattern = "...\\.".r
val text = "cat."
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = ...\.
[36mtext[39m: [32mString[39m = [32m"cat."[39m
[36mres174_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"cat."[39m)

In [176]:
val pattern = "...\\.".r
val text = "abc1"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = ...\.
[36mtext[39m: [32mString[39m = [32m"abc1"[39m
[36mres175_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

### Match character

#### [abc]: Match specific characters

Defining the specific characters you want to match inside square brackets. 

For example, the pattern `[abc]` will only match a single `a`, `b`, or `c` letter and nothing else.

-----------

`[cmf]an`: match 
- `"can"`
- `"man"`
- `"fan"`	

and skip	
- `dan`
- `ran`
- `pan`

-----------

#### [^abc]: Exclude specific characters

We exclude specific characters by using the square brackets and the `^` (hat). 
For example, the pattern `[^abc]` will match any single character except for the letters `a`, `b`, or `c`.

-----------

 `[^cmf]an`: match 
- `dan`
- `ran`
- `pan`

and skip	
- `"can"`
- `"man"`
- `"fan"`	



In [177]:
val pattern = "[cmf]an".r
val text = "can"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [cmf]an
[36mtext[39m: [32mString[39m = [32m"can"[39m
[36mres176_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"can"[39m)

In [178]:
val pattern = "[cmf]an".r
val text = "dan"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [cmf]an
[36mtext[39m: [32mString[39m = [32m"dan"[39m
[36mres177_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [179]:
val pattern = "[^cmf]an".r
val text = "dan"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [^cmf]an
[36mtext[39m: [32mString[39m = [32m"dan"[39m
[36mres178_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"dan"[39m)

In [180]:
val pattern = "[^cmf]an".r
val text = "can"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [^cmf]an
[36mtext[39m: [32mString[39m = [32m"can"[39m
[36mres179_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

-------------------------------------------

### Range

#### [a-z]: Match a char within the range

Match a character in list of sequential characters by using the dash to indicate a character range. 

`[0-6]`: match any single digit character from `0` to `6` 

-------------------------------------------

#### [^a-z]: Exclude a char within the range

`[^n-p]`: match any single character except for letters `n` to `p`

-------------------------------------------

#### [a-z0-9]: Match a char within multiple ranges
Multiple character ranges can also be used in the same set of brackets 

`[A-Z0-9]`:  match any single digit character from `A-Z` or `0` to `9`  

-------------------------------------------

In [181]:
val pattern = "[A-C][n-p][a-c]".r
val text = "Ana"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [A-C][n-p][a-c]
[36mtext[39m: [32mString[39m = [32m"Ana"[39m
[36mres180_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"Ana"[39m)

In [182]:
val pattern = "[A-C][n-p][a-c]".r
val text = "aax"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [A-C][n-p][a-c]
[36mtext[39m: [32mString[39m = [32m"aax"[39m
[36mres181_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [183]:
val pattern = "[A-C0-9][A-C0-9]".r
val text = "A0x"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [A-C0-9][A-C0-9]
[36mtext[39m: [32mString[39m = [32m"A0x"[39m
[36mres182_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"A0"[39m)

-----------------------------

### Alphanumeric

#### "\w":	Any Alphanumeric character

Equivalent to the character range `[A-Za-z0-9_]`

 `\\w`: match 
- `A` in `Ana`	
- `0` in `*012`

and skip `"***"`

-------------------------

#### "\W":	Any Non-alphanumeric character

`\\W`: match  `*` in `"***"`

and skip 
- `Ana`	
- `0123 Bob`


In [184]:
val pattern = "\\w".r
val text = "*012"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w
[36mtext[39m: [32mString[39m = [32m"*012"[39m
[36mres183_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"0"[39m)

In [185]:
val pattern = "\\w".r
val text = "***"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w
[36mtext[39m: [32mString[39m = [32m"***"[39m
[36mres184_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [186]:
val pattern = "\\W".r
val text = "***"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \W
[36mtext[39m: [32mString[39m = [32m"***"[39m
[36mres185_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"*"[39m)

In [187]:
val pattern = "\\W".r
val text = "Ana"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \W
[36mtext[39m: [32mString[39m = [32m"Ana"[39m
[36mres186_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

----------------------------

### Repetitions

#### {m}:	m repetitions

`B{3}`:  match the `B` character exactly three times

----------------------------

#### {m,n}: m to n repetitions

`B{1,3}`: match the `B` character for 1-3 times

----------------------------

#### {m,}: m to infinite repetitions

`B{3,}`:  match the `B` character for at least 3 times

----------------------------

> **Note**: {,m} is Illegal

----------------------------

#### Kleene Star * : 0 or more repetitions

`\d*`: match any number of digits

----------------------------

#### Kleene Plus + :	1 or more repetitions

`\d+` match any number of digits with at least one digit.

----------------------------

#### Task: 
Match: `aaaabcc`	,`aabbbbc`, `aacc`
Skip: `a`

#### Solutions

- `a\w+` 
- `a{2}[abc]*`
- `aa+b*c+` 
- `a{2,4}b{0,4}c{1,2}`

----------------------------



In [188]:
val pattern = "pur{3}".r
val text = "purrrrr"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = pur{3}
[36mtext[39m: [32mString[39m = [32m"purrrrr"[39m
[36mres187_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"purrr"[39m)

In [189]:
val pattern = "pur{1,3}".r
val text = "purrr"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = pur{1,3}
[36mtext[39m: [32mString[39m = [32m"purrr"[39m
[36mres188_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"purrr"[39m)

In [190]:
val pattern = "pur{1,3}".r
val text = "pu"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = pur{1,3}
[36mtext[39m: [32mString[39m = [32m"pu"[39m
[36mres189_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [191]:
val pattern = "pur{3,}".r
val text = "purrrrrrr"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = pur{3,}
[36mtext[39m: [32mString[39m = [32m"purrrrrrr"[39m
[36mres190_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"purrrrrrr"[39m)

In [192]:
// Note: {,m} is Illegal

// val pattern = "pur{,3}".r
// val text = "purrrrrrr"
// pattern findFirstIn text 

// Error msg you will get: 

// java.util.regex.PatternSyntaxException: Illegal repetition near index 2
// pur{,3}
//   ^
//   java.util.regex.Pattern.error(Pattern.java:2027)
//   java.util.regex.Pattern.closure(Pattern.java:3320)
//   java.util.regex.Pattern.sequence(Pattern.java:2213)
// ...

In [193]:
val pattern = "\\w+".r
val text = ""
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w+
[36mtext[39m: [32mString[39m = [32m""[39m
[36mres192_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [194]:
val pattern = "\\w*".r
val text = ""
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w*
[36mtext[39m: [32mString[39m = [32m""[39m
[36mres193_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m""[39m)

In [195]:
val pattern = "\\w*".r
val text = "anyAlphanumeric"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w*
[36mtext[39m: [32mString[39m = [32m"anyAlphanumeric"[39m
[36mres194_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"anyAlphanumeric"[39m)

-----------------------------------------------


# Reference

1. Regexone.com. 2021. RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and the ABCs.
[online] Available at: [RegexOne - Learn Regular Expressions, 2021](https://regexone.com/lesson/introduction_abcs) [Accessed 5 June 2021].
1. Tutorialspoint.com. 2021. Scala - Regular Expressions - Tutorialspoint. [online] Available at: [Scala -
Regular Expressions - Tutorialspoint, 2021](https://www.tutorialspoint.com/scala/scala_regular_expressions.htm) [Accessed 5 June 2021]
1. Dib, F., 2021. regex101: build, test, and debug regex. [online] regex101. Available at: [Dib, 2021](https://regex101.com/) [Accessed 5 June 2021].