# Instruction to regular expression 

by Victoria Liao

Note: this tutorial is written in Scala but the patterns can be used in other languages

-----------------------------------------------------------

### Table of content

1. What is regular expression
1. Regular expression patterns
    - Basic tokens
    - Practical problems
1. Reference

------------------------

Downlaod jupter notebook to markdown / pdf / html

```
jupyter nbconvert --to pdf your_jupyter_notebook.ipynb
jupyter nbconvert --to markdown your_jupyter_notebook.ipynb
jupyter nbconvert --to html your_jupyter_notebook.ipynb
```

## 1. What is regular expression?

Everything in a string is essentially a character, and we are writing patterns to match a specific string. This
pattern is called regular expression. 

Regular expressions are useful in extracting information from text.

---------------------------------------

----------------------
## 2. Regular expression patterns

To make it easier, I always use `#findFirstIn` in the section.

### 2.1 Basic tokens

#### Exact match 
1. "abc": Match a sub string that is the same as the pattern
--------------------

#### Digit
1. \d: Any digit from 0 to 9
1. \D: Any Non-digit character
--------------------

#### Wildcard
1. Dot ".": the wildcard

--------------------

#### Match character
1. [abc]: Match specific characters
1. [^abc]: Exclude specific characters

--------------------

#### Range
1. [a-z]: Match a char within the range
1. [^a-z]: Exclude a char within the range
1. [a-z0-9]: Match a char within multiple ranges

--------------------

#### Alphanumeric

1. "\w": Any Alphanumeric character
1. "\W": Any Non-alphanumeric character
----------------------


In [34]:
//  Scala dependency 
import scala.util.matching.Regex

[32mimport [39m[36mscala.util.matching.Regex[39m

### Exact match 
#### "abc": Match a sub string that is the same as the pattern


Pattern `"foo 1"` 

Match `"foo 1"` in 
 - `"foo 1 fooo"`
 - `"bar foo 1"`

In [35]:
val pattern = "foo 1".r
val text = "foo 1 fooo"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = foo 1
[36mtext[39m: [32mString[39m = [32m"foo 1 fooo"[39m
[36mres34_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"foo 1"[39m)

-----------------------------------

### Digit

#### \d: Any digit from 0 to 9 

The preceding slash `\` distinguishes it from the simple d character and indicates that it is a metacharacter.

> Take away: need to use double slash in Scala string for \d - `"\\d".r`

-----------

Pattern `"\\d"`

 Match `1` in `1234`
 
 Match `2` in `2 foo`

-------------------------

### \D: Any Non-digit character


Pattern `"\\D"`

 Match `" "` (space) in `1234 a`
 
 Match `a` in `a 2 foo`


In [36]:
val pattern = "\\d".r
val text = "1234"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \d
[36mtext[39m: [32mString[39m = [32m"1234"[39m
[36mres35_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"1"[39m)

In [37]:
val pattern = "\\d".r
val text = "2 foo"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \d
[36mtext[39m: [32mString[39m = [32m"2 foo"[39m
[36mres36_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"2"[39m)

In [38]:
val pattern = "\\D".r
val text = "1234 a"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \D
[36mtext[39m: [32mString[39m = [32m"1234 a"[39m
[36mres37_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m" "[39m)

In [39]:
val pattern = "\\D".r
val text = "a 2 foo"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \D
[36mtext[39m: [32mString[39m = [32m"a 2 foo"[39m
[36mres38_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"a"[39m)

---------------------

### Wildcard

##### Dot ".": the wildcard

A wildcard is a card that can represent any card in the deck in poker games. Similarly, . (dot)  can match any single character (letter, digit, whitespace, everything). 

```
Note: 
.  is the wildcard
\\. is the dot symbol / period
```
-----------

Pattern `...\\.` 

Match 
- `"cat."`
- `"896."`
- `"?=+."`	

Skip	
- `abc1`

In [40]:
val pattern = "...\\.".r
val text = "cat."
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = ...\.
[36mtext[39m: [32mString[39m = [32m"cat."[39m
[36mres39_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"cat."[39m)

In [41]:
val pattern = "...\\.".r
val text = "abc1"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = ...\.
[36mtext[39m: [32mString[39m = [32m"abc1"[39m
[36mres40_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

### Match character

#### [abc]: Match specific characters

Defining the specific characters you want to match inside square brackets. 

For example, the pattern `[abc]` will only match a single `a`, `b`, or `c` letter and nothing else.

-----------

Pattern `[cmf]an`

Match 
- `"can"`
- `"man"`
- `"fan"`	

Skip	
- `dan`
- `ran`
- `pan`

-----------

#### [^abc]: Exclude specific characters

We exclude specific characters by using the square brackets and the `^` (hat). 
For example, the pattern `[^abc]` will match any single character except for the letters `a`, `b`, or `c`.

-----------

Pattern `[^cmf]an`

Match 
- `dan`
- `ran`
- `pan`

Skip	
- `"can"`
- `"man"`
- `"fan"`	



In [42]:
val pattern = "[cmf]an".r
val text = "can"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [cmf]an
[36mtext[39m: [32mString[39m = [32m"can"[39m
[36mres41_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"can"[39m)

In [43]:
val pattern = "[cmf]an".r
val text = "dan"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [cmf]an
[36mtext[39m: [32mString[39m = [32m"dan"[39m
[36mres42_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [44]:
val pattern = "[^cmf]an".r
val text = "dan"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [^cmf]an
[36mtext[39m: [32mString[39m = [32m"dan"[39m
[36mres43_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"dan"[39m)

In [45]:
val pattern = "[^cmf]an".r
val text = "can"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [^cmf]an
[36mtext[39m: [32mString[39m = [32m"can"[39m
[36mres44_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

-------------------------------------------

### Range

#### [a-z]: Match a char within the range

Match a character in list of sequential characters by using the dash to indicate a character range. 

`[0-6]`: match any single digit character from `0` to `6` 

#### [^a-z]: Exclude a char within the range

`[^n-p]`: match any single character except for letters `n` to `p`

#### [a-z0-9]: Match a char within multiple ranges
Multiple character ranges can also be used in the same set of brackets 

`[A-Z0-9]`:  match any single digit character from `A-Z` or `0` to `9`  



In [46]:
val pattern = "[A-C][n-p][a-c]".r
val text = "Ana"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [A-C][n-p][a-c]
[36mtext[39m: [32mString[39m = [32m"Ana"[39m
[36mres45_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"Ana"[39m)

In [47]:
val pattern = "[A-C][n-p][a-c]".r
val text = "aax"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [A-C][n-p][a-c]
[36mtext[39m: [32mString[39m = [32m"aax"[39m
[36mres46_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [48]:
val pattern = "[A-C0-9][A-C0-9]".r
val text = "A0x"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = [A-C0-9][A-C0-9]
[36mtext[39m: [32mString[39m = [32m"A0x"[39m
[36mres47_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"A0"[39m)

-----------------------------

### Alphanumeric

#### "\w":	Any Alphanumeric character

Equivalent to the character range `[A-Za-z0-9_]`

Pattern  `\\w`

Match 
- `A` in `Ana`	
- `0` in `*012`

Skip `"***"`

-------------------------

#### "\W":	Any Non-alphanumeric character

Pattern  `\\W`

Match  `*` in `"***"`

Skip 
- `Ana`	
- `0123 Bob`


In [49]:
val pattern = "\\w".r
val text = "*012"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w
[36mtext[39m: [32mString[39m = [32m"*012"[39m
[36mres48_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"0"[39m)

In [50]:
val pattern = "\\w".r
val text = "***"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \w
[36mtext[39m: [32mString[39m = [32m"***"[39m
[36mres49_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

In [51]:
val pattern = "\\W".r
val text = "***"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \W
[36mtext[39m: [32mString[39m = [32m"***"[39m
[36mres50_2[39m: [32mOption[39m[[32mString[39m] = [33mSome[39m([32m"*"[39m)

In [52]:
val pattern = "\\W".r
val text = "Ana"
pattern findFirstIn text 

[36mpattern[39m: [32mRegex[39m = \W
[36mtext[39m: [32mString[39m = [32m"Ana"[39m
[36mres51_2[39m: [32mOption[39m[[32mString[39m] = [32mNone[39m

# Reference

1. Regexone.com. 2021. RegexOne - Learn Regular Expressions - Lesson 1: An Introduction, and the ABCs.
[online] Available at: [RegexOne - Learn Regular Expressions, 2021](https://regexone.com/lesson/introduction_abcs) [Accessed 5 June 2021].
1. Tutorialspoint.com. 2021. Scala - Regular Expressions - Tutorialspoint. [online] Available at: [Scala -
Regular Expressions - Tutorialspoint, 2021](https://www.tutorialspoint.com/scala/scala_regular_expressions.htm) [Accessed 5 June 2021]
1. Dib, F., 2021. regex101: build, test, and debug regex. [online] regex101. Available at: [Dib, 2021](https://regex101.com/) [Accessed 5 June 2021].