# Regular expressions, interactive examples

In [None]:
import sys
sys.path.append("../src/")

In [None]:
# Use the autoreload extension such that you are always up-to-date without notebook restarts
%load_ext autoreload
%autoreload 2

In [None]:
from regex_intro import show_regex_search_result

# Part 0: The math

Invented by the mathematician Spehen Kleene, he formulated regular expressions (and called it regular language), and is defined as such:

Given a finite alphated $\Sigma$, the following constants are defined as regular expressions:

- $\emptyset$ denoting the empty set and is a valid regular expression
- All $a_i \in \Sigma$ is **$a_i$** a regular expression
- are $x$ and $y$ regular expressions, then $(x|y)$, $(xy)$, $(x^*)$ are regular expressions

## Part 1: Searching

### Simply searching for a string

Similarly to a normal search bar, you can of course just search for a given string.

In [None]:
regex = r""
test_string = "This will match test in this string"

show_regex_search_result(regex,test_string) ## this basically calls re.compile, and then searches for all occurences of a given text

## Special character classes

You can search for any given character using the `[]` brackets. So searching for characters would be

```
[xy] - Searches for x or y
[a-e] - Searches for all word characters between a and e
```

The same is possible with numbers of course

In [None]:
regex = r""

test_string = "This searches for all occurences of es in this string, in combination and on its own"

show_regex_search_result(regex, test_string)

In [None]:
regex = r""

test_string = "This searches for all occurences between a and k in a given string"

show_regex_search_result(regex,test_string)

So to search for all word characters, we would need to write `[a-zA-Z0-9_]`. This sucks to always write, so lets use some abbreviations:

```
\w - Matches any word characters
\d - Matches a digit
\s - Matches a whitespace
.  - Matches any character (but not line return)
```

In [None]:
regex = r""

test_string = "This searches for all Word characters, so basically everything"

show_regex_search_result(regex,test_string)

In [None]:
regex = r""

test_string = "This searches for digits, for example 5 or 8"

show_regex_search_result(regex,test_string)

In [None]:
regex = r""

test_string = "This text also has whitespaces, which can be matched like this"

show_regex_search_result(regex,test_string)

In [None]:
regex = r""

test_string = "This matches everything, also numbers like 9, but not the \n line return"

show_regex_search_result(regex,test_string)

### Quantifiers

You can quantify a given search pattern using special characters:

```
+   - Match 1 or more
*   - Match 0 or more
?   - Match 0 or 1
{x}  - Match x times
{x,}  - Match x times or more
{x,y} - Match between x and y times

```

In [None]:
# Match 1 or more
regex = r""

test_string = "This will match all all t's that have characters before them"

show_regex_search_result(regex, test_string)

In [None]:
# Match 0 or more
regex = r""

test_string = "This will match all ts as well as the characters before t"

show_regex_search_result(regex, test_string) #

In [None]:
# Match 0 or more
regex = r""

test_string = "This will match all ts as well as the character before it, if it exists"

show_regex_search_result(regex, test_string) #

In [None]:
# Match exactly 2 or more
regex_2 = r""
regex_2_ = r""
regex_2_4= r""

test_string = "This will match all ts if they fulfill the specific boundary condition of items before it"

show_regex_search_result(regex_2, test_string)
show_regex_search_result(regex_2_, test_string)
show_regex_search_result(regex_2_4, test_string)

## Some special search patterns

You can also match various other things, or modify search strings with other characters (incomplete list)

```
^           - Negates a search pattern in [] brackets, or start of a string when used on its own
\Capital    - Searches everything but a given pattern, e.g. \D searches for everything that is not a number
\<          - Start of word
\>          - End of word
\t          - Tabs
\n          - Newline
\r          - Carriage Return
```


In [None]:
regex = r""

test_string = "This searches for all non digit characters so 9 wouldn't be matched"

show_regex_search_result(regex,test_string)

regex = r""

test_string = "You can write dirty dirty regex like this, so that 9 is matched."

show_regex_search_result(regex,test_string)