In [2]:
from __future__ import division

from IPython.core.display import Image

import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
%matplotlib inline

```
1. This is a string

2. That is also a string

3. This is an illusion

4. THIS IS LOUD

that isn't thus

bob this is bob
bob bob_ ralph_ bobbobbobbybobbob
ababababab

6. tHiS	iS	CoFu SEd

777. THIS IS 100%-THE-BEST!!!

8888. this_is_a_fiiile.py

hidden bob
```

<a id="where-is-regex-implemented"></a>
## Where are `regex` Implemented?

---

There are any number of places where `regex`s can be run — from your text editor, to the `bash` shell, to Python, and even SQL. It is typically baked into the standard libary of programming languages.

In Python, it can be imported like so:

```python
import re
```

<a id="basic-regular-expression-syntax"></a>
## Basic Regular Expression Syntax
---

<a id="literals"></a>
### Literals

Literals are essentially just what you think of as characters in a string. For example:

```
a
b
c
X
Y
Z
1
5
100
``` 

These are all considered literals.

<a id="character-classes"></a>
### Character Classes

A character class is a set of characters matched as an "or."

```
[io]
```

So, this class would run as "match either i or o."

You can include as many characters as you like in between the brackets.

Character classes match only a single character.

<a id="character-classes-can-also-accept-certain-ranges"></a>
### Character Classes Can Also Accept Certain Ranges

For example, the following will all work:
    
```
[a-f]
[a-z]
[A-Z]
[a-zA-Z]
[1-4]
[a-c1-3]
```

<a id="character-class-negation"></a>
### Character Class Negation

We can also add **negation** to character classes. For example:

```
[^a-z]
```

This means match *ANYTHING* that is *NOT* `a` through `z`.

### Exercise #1

#### Solution

`[Tt]h[^i][st]`

**Solution Breakdown:**  

`[Tt]` = _'T' or 't'_              
`h`    = _'h'_                      
`[^i]` = *Anything that is _not_ 'i'*  
`[st]` =_'s' or 't'_               

#### Exercise #2

1. `[0-9]`
2. `\d`
3. `[^\D]` **or** `[^a-zA-Z\s\%\'!\-\._]`  
>_The latter option of solution #3 is specific to our text block, as we explicitly specify the special characters to exclude._

<a id="exercise-3"></a>
## Exercise #3

---

Use an anchor and a character class to find the **bab** and the **bob** at the end of the line, but not elsewhere.

#### Exercise #3

`b[oa]b$`

#### Exercise #4

<a id="exercise-4"></a>
## Exercise #5
---

1. Find **bob**, but only if it occurs three times in a row without any spaces.
2. Find **bob** if it occurs twice in a row, with or without spaces.

#### Exercise #5
1. `(bob){3}`
2. `(bob)( )?(bob)` **or**  `(bob ?){2}`

<a id="groups-and-capturing"></a>
## Groups and Capturing

---

In `regex`, parentheses — `()` — denote groupings. These groups can then be quantified.

Additionally, these groups can be designated as either "capture" or "non-capture."

To mark a group as a capture group, just put it in parenthesis — (match_phrase).

To mark it as a non-capture group, punctuate it like so — (?:match_phrase).


### Exercise 6#

1. `(bob)(?=_)`
2. `(bob)(?=_|\n)`
3. `(bob)(?!( |\n))`

<a id="regex-in-python-and-pandas"></a>
## Regex in Python and `pandas`

---

Let's practice working with `regex` in Python and `pandas` using the string below.

In [None]:

my_string = """
I said a hip hop,
The hippie, the hippie,
To the hip, hip hop, and you don't stop, a rock it
To the bang bang boogie, say, up jump the boogie,
To the rhythm of the boogie, the beat.
"""

In [None]:
# Import the `regex` module.
import re

<a id="regex-search-method"></a>
### `regex`' `.search()` Method

In [None]:
# `.search()` returns a match object.
mo = re.search('h([io])p', my_string)

In [None]:
# Everything that matches the expression:
mo.group()

In [None]:
# The match groups (like $1, $2):
mo.group(1)

<a id="regex-findall-method"></a>
### `regex`' `.findall()` Method

In [None]:
mo = re.findall('h[io]p', my_string)

In [None]:
mo

In [None]:
# `.findall()` will return only the capture groups, if included.
mo = re.findall('h([io])p', my_string)

In [None]:
mo

<a id="using-pandas"></a>
### Using `pandas`

In [None]:
fish = pd.Series(['onefish', 'twofish','redfish', 'bluefish'])
fish

<a id="strcontains"></a>
### `str.contains`

In [None]:
# Get all fish that start with "b."
fish[fish.str.contains('^b')]

<a id="strextract"></a>
### `str.extract`

In [None]:
# `.extract()` maps capture groups to new Series.
fish.str.extract('(.*)fish', expand=False)

<a id="independent-practice"></a>
## Independent Practice
---

Pull up the following tutorials for regular expressions in Python. 

- [TutorialPoint](http://www.tutorialspoint.com/python/python_reg_expressions.htm)  
- [Google Regex Tutorial](https://developers.google.com/edu/python/regular-expressions) (findall)

In the cells below, import Python's `regex` library and experiment with matching on the string.

Try out some of the following:
- Match with and without case sensitivity.
- Match using word borders (try "bob").
- Use positive and negative lookaheads.
- Experiment with the multi-line flag.
- Try matching the second or third instance of a repetitive pattern ("ab" or "bob," for example).
- Try using `re.sub` to replace a matching string.
- Note the difference between `search` and `match`.
- What happens to the order of groups if they are nested?

In [None]:
test = """
1. This is a string

2. That is also a string

3. This is an illusion

4. THIS IS LOUD

that isn't thus

bob this is bob
bob bob_ ralph_ bobbobbobbybobbob
ababababab

6. tHiS	iS	CoFu SEd

777. THIS IS 100%-THE-BEST!!!

8888. this_is_a_fiiile.py

hidden bob

"""

<a id="extra-practice"></a>
## Extra Practice

---

Pull up the [Regex Golf](http://regex.alf.nu/) website and solve as many as you can!

If you get bored, try [Regex Crossword](https://regexcrossword.com/).