## NLP: Basics: How to use Regex

### Using regex in Python

**Regular Expressions:** Text string for describing a search pattern.

Python's regex package is commonly used regex resource.

In [1]:
import re

re_test = 'This is a made up string to test 2 different regex methods'
re_test_messy = 'This          is a made up         string to test 2           different regex methods'
re_test_messy1 = 'This-is-a-made/up.string*to>>>>test----2""""""different~regex-methods'

### Splitting a sentence into a list of words

In [3]:
re.split('\s', re_test)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [4]:
re.split('\s', re_test_messy)

['This',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'is',
 'a',
 'made',
 'up',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'string',
 'to',
 'test',
 '2',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 'different',
 'regex',
 'methods']

In [7]:
re.split('\s+', re_test_messy)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [8]:
re.split('\s+', re_test_messy1)

['This-is-a-made/up.string*to>>>>test----2""""""different~regex-methods']

In [10]:
# \W : Search for non word character and split from there
re.split('\W+', re_test_messy1) 

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [12]:
re.findall('\S+', re_test)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [13]:
re.findall('\S+', re_test_messy)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [14]:
re.findall('\S+', re_test_messy1)

['This-is-a-made/up.string*to>>>>test----2""""""different~regex-methods']

In [16]:
re.findall('\w+', re_test_messy1)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

### Replacing a specific string

In [19]:
pep8_test = 'I try to follow PEP8 guidelines'
pep7_test = 'I try to follow PEP7 guidelines'
peep8_test = 'I try to follow PEEP8 guidelines'

In [30]:
# We are here searching for PEP8, PEP7 and PEEP8 and replace it with PEP8 Python Styleguide

In [21]:
# regex is case sensitive

re.findall('[a-z]+', pep8_test)

['try', 'to', 'follow', 'guidelines']

In [22]:
re.findall('[A-Z]+', pep8_test)

['I', 'PEP']

In [23]:
re.findall('[A-Z0-9]+', pep8_test)

['I', 'PEP8']

In [28]:
re.findall('[A-Z]+[0-9]+', peep8_test)

['PEEP8']

In [33]:
re.sub('[A-Z]+[0-9]+', 'PEP8 Python Styleguide', peep8_test)

'I try to follow PEP8 Python Styleguide guidelines'

### Other examples of regex methods

* re.search()
* re.match()
* re.fullmatch()
* re.finditer()
* re.escape()