# NLP Basics: Learning how to use regular expressions

### Using regular expressions in Python

Python's `re` package is the most commonly used regex resource. More details can be found [here](https://docs.python.org/3/library/re.html).

In [1]:
import re

# We have 3 test sentence to learn the Regular Expressions

re_test = 'This is a made up string to test 2 different regex methods'
re_test_messy = 'This      is a made up     string to test 2    different regex methods'
re_test_messy1 = 'This-is-a-made/up.string*to>>>>test----2""""""different~regex-methods'

### Splitting a sentence into a list of words

In [3]:
re.split('\s', re_test) # It will look for single whitespace, and split based on the Whitespace, and makes the list of words

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [4]:
re.split('\s', re_test_messy) # Checking the single whitespace with 2nd sentence

['This',
 '',
 '',
 '',
 '',
 '',
 'is',
 'a',
 'made',
 'up',
 '',
 '',
 '',
 '',
 'string',
 'to',
 'test',
 '2',
 '',
 '',
 '',
 'different',
 'regex',
 'methods']

In [5]:
re.split('\s+', re_test_messy) # adding "+" will look for one or more spaces

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [6]:
re.split('\s+', re_test_messy1) # Since it doesn't hold any space, it not changed the output

['This-is-a-made/up.string*to>>>>test----2""""""different~regex-methods']

In [7]:
re.split('\W+', re_test_messy1) # "W+" will search for any non word character and splits based on that

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [8]:
# Using findall method, it has same syntex as split

re.findall('\S+', re_test)

# "\S+" here looks for one or more non whitespace characters

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [9]:
re.findall('\S+', re_test_messy)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

In [10]:
re.findall('\S+', re_test_messy1) # It cannot handle the scenario

['This-is-a-made/up.string*to>>>>test----2""""""different~regex-methods']

In [12]:
re.findall('\w+', re_test_messy1)

['This',
 'is',
 'a',
 'made',
 'up',
 'string',
 'to',
 'test',
 '2',
 'different',
 'regex',
 'methods']

### Replacing a specific string

In [None]:
pep8_test = 'I try to follow PEP8 guidelines'
pep7_test = 'I try to follow PEP7 guidelines'
peep8_test = 'I try to follow PEEP8 guidelines'

### Other examples of regex methods

- re.search()
- re.match()
- re.fullmatch()
- re.finditer()
- re.escape()