# Python Regular Expressions

A regular expression is a special sequence of characters that helps to match or find other strings or sets of strings, using a specialized syntax held in a pattern.

 It is extremely useful for extracting information from text such as code, files, log, spreadsheets or even documents.
 
 While using the regular expression the first thing is to recognize is that everything is essentially a character, and we are writing patterns to match a specific sequence of characters also referred as string. Ascii or latin letters are those that are on your keyboards and Unicode is used to match the foreign text. It includes digits and punctuation and all special characters like $,#,@,!,%, etc.
 

__re module__ is used here. The re module raises the exception re.error if an error occurs while compiling or using a regular expression.

In [2]:
import re

## 1. re.search() 

### Syntax : re.search(pattern, text_string, flags=0)
It takes a pattern and a text_string to scan, and returns a Match object when the pattern is found in the text_string. If the pattern is not found, search() returns None.

flags : You can specify different flags using bitwise OR (|). These are modifiers.

search() is also case-sensitive.



In [3]:
patterns = ['name', 'is', 'Hii', 'George']
text_string = 'Hello, my name is Jeswin george'

In [4]:
for pattern in patterns:
    print("Searching for '{}' in '{}'".format(pattern, text_string))
    
    if re.search(pattern,  text_string):
        print('Match found\n')
    else:
        print('No matches found\n')

Searching for 'name' in 'Hello, my name is Jeswin george'
Match found

Searching for 'is' in 'Hello, my name is Jeswin george'
Match found

Searching for 'Hii' in 'Hello, my name is Jeswin george'
No matches found

Searching for 'George' in 'Hello, my name is Jeswin george'
No matches found



Observe in the above example __pattern 'George'__ is not found in the __text_string 'Hello, my name is Jeswin george'__. Hence , showing that search() is case-sensitive.

The Match object returned by search() holds information about the nature of the match, including the original input string, the regular expression used, and the location within the original string where the pattern occurs.

In [5]:
pattern = 'name'
text = 'Hello, my name is Jeswin george'

In [6]:
match = re.search(pattern, text)

The __start()__ and __end()__ methods give the integer indexes into the string showing where the text matched by the pattern occurs.

In [7]:
ms = match.start()
me = match.end()
print("Text '{}' has a matched pattern '{}' starting from index {} to index {}".format(text,pattern,ms,me))

Text 'Hello, my name is Jeswin george' has a matched pattern 'name' starting from index 10 to index 14


# Flags, Modifeirs and Escape sequences

<img src = '1re.PNG'>
<img src = '2re.PNG'>
<img src = '3re.PNG'>
<img src = '4re.PNG'>

## 2. re.match()

This function attempts to match RE pattern to string with optional flags.

Here is the syntax for this function −

### re.match(pattern, string, flags=0)

The re.match function returns a match object on success, None on failure. 

We usegroup(num) or groups() function of match object to get matched expression.

__group(num=0) : 	This method returns entire match (or specific subgroup num)__

__groups()  :	This method returns all matching subgroups in a tuple (empty if there weren't any)__

In [8]:
line = " I am the watcher on the walls. I am the shield that guards the realms of men."
pattern = "am"
match = re.match(pattern, line)
search = re.search(pattern,line)

In [9]:
print("Output of re.match is : ",match)
print("Output of re.search is : ",search)

Output of re.match is :  None
Output of re.search is :  <_sre.SRE_Match object; span=(3, 5), match='am'>


Since the pattern does not appear at the start of the input line, it is not found using match(). The sequence appears two other times in the text, though, so search() finds it. The search() method of a compiled regular expression accepts optional start and end position parameters to limit the search to a substring of the input.

## Difference between re.match() and re.search()

Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string

## 3. re.compile()

re includes module-level functions for working with regular expressions as text strings, but it is usually more efficient to compile the expressions your program uses frequently. 

__The compile() function converts an expression string into a RegexObject.__



In [18]:
# Pre-compile the patterns
regexes = [ re.compile(p) for p in [ 'Python', 'Java']]
text = 'I like programming using python.'

for regex in regexes:
    print('Looking for {} in {} ->'.format(regex.pattern, text))
    
    if regex.search(text):
        print('found a match!')
    else:
        print('no match')

Looking for Python in I like programming using python. ->
no match
Looking for Java in I like programming using python. ->
no match


The module-level functions maintain a cache of compiled expressions, but the size of the cache is limited and using compiled expressions directly means you can avoid the cache lookup overhead. By pre-compiling any expressions your module uses when the module is loaded you shift the compilation work to application startup time, instead of a point where the program is responding to a user action.

## 4. re.findall()