# Regular Expressions

Here are some examples of regular expressions:

`.ob` -- matches `Bob`, `sob`, `8ob`, `!ob`, ....<br>
`Lo*l` -- matches `Ll`, `Lol`, `Looooool`, ....<br>
`Lo?l` -- matches `Ll` and `Lol`.<br>
`A.*e` -- matches `Ae`, `Abe`, `Arrrrrre`, ....<br> 
`[BFG]ad` -- matches `Bad`,  `Fad`, and `Gad`.<br>
`[0-9]*\.[0-9]*` -- matches `3.14159`, `1000.0`, ....<br>

# 1)- Importing key libraries

In [1]:
#support both Python 2 and Python 3 with minimal overhead.
from __future__ import absolute_import, division, print_function

In [2]:
# I am an engineer. I care only about error not warning. So, let's be maverick and ignore warnings.
import warnings
warnings.filterwarnings('ignore')

In [3]:
# What's life without style :). So, let's add style to our dataframes
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))

In [8]:
import pandas as pd
import numpy as np
import re

In [5]:
# first install: pip install version_information
%reload_ext version_information
%version_information math, numpy, sys, platform

Software,Version
Python,3.6.3 64bit [MSC v.1900 64 bit (AMD64)]
IPython,7.4.0
OS,Windows 10 10.0.16299 SP0
math,The 'math' distribution was not found and is required by the application
numpy,1.16.2
sys,The 'sys' distribution was not found and is required by the application
platform,1.0.8
Sat Aug 17 11:08:20 2019 W. Europe Daylight Time,Sat Aug 17 11:08:20 2019 W. Europe Daylight Time


In [6]:
# Example 1

x = 'Begin With Review And Friend'

In [9]:
re.search(r'W[a-z]*', x)

<_sre.SRE_Match object; span=(6, 10), match='With'>

In [10]:
re.search(r'W[a-z]*', x).group()

'With'

In [11]:
re.match(r'W[a-z]*', x)

In [12]:
re.match(r'Begin', x)

<_sre.SRE_Match object; span=(0, 5), match='Begin'>

In [13]:
re.search(r'^Begin', x)

<_sre.SRE_Match object; span=(0, 5), match='Begin'>

In [14]:
re.search(r'^With', x)

In [15]:
line = "3.8 Liters in 1 Gallon"

searchObj = re.search(r'([.0-9]*) liters in ([.0-9]*) gallon', line, re.I)

In [16]:
searchObj.groups()

('3.8', '1')

In [17]:
searchObj.group()

'3.8 Liters in 1 Gallon'

In [18]:
if searchObj:
   print("searchObj.group() : ", searchObj.group())
   print("searchObj.group(1) : ", searchObj.group(1))
   print("searchObj.group(2) : ", searchObj.group(2))
else:
   print("Nothing found!!")

searchObj.group() :  3.8 Liters in 1 Gallon
searchObj.group(1) :  3.8
searchObj.group(2) :  1


### The sub Function

In [19]:
phone = "858-959-5559 # This is Phone Number"

In [20]:
# delete Python-style comments
num = re.sub(r'#.*$', "", phone)
num

'858-959-5559 '

In [21]:
# this removes that ending space
num.strip()

'858-959-5559'

In [23]:
filename = 'sample_name_L001_R1_001.fastq'

In [24]:
filename

'sample_name_L001_R1_001.fastq'

In [25]:
re.sub(r'.*_(L00[12])_(R[12])_.*.fastq', r'\1_\2', filename)

'L001_R1'

### The compile Function

Regular expressions can be compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions.


In [26]:
p = re.compile('ab*')
p

re.compile(r'ab*', re.UNICODE)

In [27]:
m = p.search('Drab')
m

<_sre.SRE_Match object; span=(2, 4), match='ab'>

In [28]:
m.group()

'ab'

In [29]:
m.span()

(2, 4)

In [30]:
m.start()

2

In [31]:
m.end()

4

In [32]:
q = re.compile('[a-z]+')
n = q.match('tempo')

In [35]:
q

re.compile(r'[a-z]+', re.UNICODE)

In [36]:
n

<_sre.SRE_Match object; span=(0, 5), match='tempo'>

In [37]:
n.group()

'tempo'

In [38]:
n.start(), n.end()

(0, 5)

### findall Function

In [39]:
regex = re.compile('[a-z]')
string = 'Here Is A String'
letters = regex.findall(string)
letters

['e', 'r', 'e', 's', 't', 'r', 'i', 'n', 'g']

In [40]:
regex = re.compile(r'[A-Za-z][a-z]*')
string = 'Here is a string.'
words = regex.findall(string)
words

['Here', 'is', 'a', 'string']

In [41]:
' '.join(words)

'Here is a string'

In [42]:
numbers = [23, 45, 23, 22]
numbers = [str(x) for x in numbers]
numbers

['23', '45', '23', '22']

In [43]:
','.join(numbers)

'23,45,23,22'

### string methods

In [44]:
string.find('string')

10

In [45]:
string.startswith('Here')

True

In [46]:
string.endswith('string.')

True

In [47]:
string.startswith('is', 5)

True

#### Regular Expression Examples

| Example | Description
|:----------|:------------|
| python | Match "python".
| [Pp]ython | Match "Python" or "python"
| rub[ye] | Match "ruby" or "rube"
| [aeiou] | Match any one lowercase vowel
| [0-9] | Match any digit; same as [0123456789]
| [a-z] | Match any lowercase ASCII letter
| [A-Z] | Match any uppercase ASCII letter
| [a-zA-Z0-9] | Match any of the above
| [^aeiou] | Match anything other than a lowercase vowel
| [^0-9] | Match anything other than a digit

#### Regular Expression Patterns

Except for control characters, `(+ ? . * ^ $ ( ) [ ] { } | \)`, all characters match themselves. You can escape a control character by preceding it with a backslash.

Following table lists the regular expression syntax that is available in Python:

| Pattern | Description
|:----------|:------------|
| ^ | Matches beginning of line.
| `$` | Matches end of line.
| . | Matches any single character except newline. Using m option allows it to match newline as well.
| [...] | Matches any single character in brackets.
| [^...] | Matches any single character not in brackets
| re* | Matches 0 or more occurrences of preceding expression.
| re+ | Matches 1 or more occurrence of preceding expression.
| re? | Matches 0 or 1 occurrence of preceding expression.
| re{n} | Matches exactly n number of occurrences of preceding expression.
| re{n,} | Matches n or more occurrences of preceding expression.
| re{n, m} | Matches at least n and at most m occurrences of preceding expression.
| `a` &#124; `b` | Matches either a or b.
| (re) | Groups regular expressions and remembers matched text.
| \w | Matches word characters.
| \W | Matches nonword characters.
| \s | Matches whitespace. Equivalent to [\t\n\r\f].
| \S | Matches nonwhitespace.
| \d | Matches digits. Equivalent to [0-9].
| \D | Matches nondigits.
| \A | Matches beginning of string.
| \Z | Matches end of string. If a newline exists, it matches just before newline.
| \z | Matches end of string.
| \G | Matches point where last match finished.
| \b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
| \B | Matches nonword boundaries.
| \n, \t, etc. | Matches newlines, carriage returns, tabs, etc.
| \1...\9 | Matches nth grouped subexpression.
| \10 | Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.