# Regular Expressions

- [Examples](#Examples)
- [Basic Regex](#Basic-Regex)
    - [Metacharacters](#Metacharacters)
    - [Repitition](#Repitition)
    - [Any of / None of](#Any-of-/-None-of)
    - [Anchors](#Anchors)
    - [Other Functions](#Other-Functions)
    - [Capture Groups](#Capture-Groups)
    - [Flags](#Flags)
    - [Usage with Pandas](#Usage-with-Pandas)

## Examples

Say I want to parse the following lines in a log file:

<div style="font-family: monospace; overflow: scroll; white-space: pre">GET /api/v1/sales?page=86 [16/Apr/2019:193452+0000] HTTP/1.1 {200} 510348 "python-requests/2.21.0" 97.105.19.58
POST /users_accounts/file-upload [16/Apr/2019:193452+0000] HTTP/1.1 {201} 42 "User-Agent: Mozilla/5.0 (X11; Fedora; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36" 97.105.19.58
GET /api/v1/items?page=3 [16/Apr/2019:193453+0000] HTTP/1.1 {429} 3561 "python-requests/2.21.0" 97.105.19.58
</div>

Extract various components of an address:

In [None]:
addresses = pd.Series([
    '84 Rainey Street, Arlen, TX',
    '4 Privet Drive, Little Whinging, Surrey, U.K.',
    '740 Evergreen Terrace, Springfield',
    '1 Infinite Loop, Cupertino, California',
    'Wayne Manor, Gotham City',
    '124 Conch Street, Bikini Bottom',
])
addresses

In [None]:
data = addresses.str.extract(r'^(\d+)?\s*(.*?),\s*([\w\s]+)')
data.columns = ['house_no', 'street', 'city']
data

In [None]:
# find all the csv files refrenced in the curriculum (this won't work for you)
# !(cd ~/codeup/curriculum/data-science/content && rg --vimgrep ".*pd.read_csv\(['\"](.+)['\"]\).*" -r '$1')

In [None]:
# find all the imports in .py files in the curriculum (this won't work for you)
# !(cd ~/codeup/curriculum/data-science/content && rg --vimgrep '^import\s+([\.\w]+)\s*(as\s*\w+)?.*$' -r '$1')

## Basic Regex

- what is a regex? (bigger than python, different flavors)
- raw strings
- re.findall (but also others)

In [None]:
import re

In [None]:
# for demonstration in this lesson
from zgulde.hl_matches import hl_all_matches_nb as hl # pip install zgulde

In [None]:
subject = 'Hello, Bayes! Today is Dec 3 and the temperature is 70 degrees.'

In [None]:
re.findall(r'H', subject)

In [None]:
re.findall(r'e', subject)

In [None]:
hl(r'e', subject)

In [None]:
hl(r'70', subject)

### Metacharacters

In [None]:
hl(r'\w', subject)

In [None]:
hl(r'\d', subject)

In [None]:
hl(r'\s', subject)

### Repitition

In [None]:
hl(r'\w+', subject)

### Any of / None of

In [None]:
hl(r'[aeiou]', subject)

### Anchors

In [None]:
hl('r^.', subject)

In [None]:
hl(r'.{3}$', subject)

### Other Functions

- `re.search`
- `re.sub`
- `re.compile` + flags

### Capture Groups

In [None]:
hl(r'\w+(\w)', subject)

In [None]:
## double letter

### Flags

In [None]:
re.compile(r'', re.IGNORECASE | re.MULTILINE | re.VERBOSE)

### Usage with Pandas

In [None]:
pd.Series.str.extract