Credits: https://github.com/bansalkanav/Machine_Learning_and_Deep_Learning

# Regular Expressions

**What are Regualar Expressions?**  
A regular expression, often abbreviated as "regex" or "regexp," is a powerful tool used in computer science and text processing to describe and match patterns in strings. It's a sequence of characters that defines a search pattern. These patterns can be used for tasks like searching, extracting, replacing, and validating text.

Regular expressions are widely used in tasks like text searching and manipulation, data validation, parsing, and more. They are supported by many programming languages and text processing tools, making them a versatile and essential tool for working with text data.

**Why Regular Expressions?**  
Regular expressions are used for a variety of tasks in computer science, data processing, and text analysis due to their powerful pattern matching capabilities. Here are some key reasons why regular expressions are widely used:

1. **Pattern Matching:** Regular expressions allow you to search for specific patterns or sequences of characters within a larger body of text. This is useful for tasks like finding specific words, dates, email addresses, or other structured information.

2. **Text Extraction:** They can be used to extract specific pieces of information from a text document, such as names, phone numbers, URLs, or any other structured data.

3. **Data Validation:** Regular expressions are used to validate input data. For example, you can use them to check if a user-provided email address or phone number is in the correct format.

4. **Search and Replace:** They enable you to search for specific patterns in a text and replace them with something else. This is useful for tasks like cleaning up data or making text substitutions.

5. **Parsing and Tokenization:** Regular expressions are essential for breaking down text into smaller units or tokens. This is used in tasks like natural language processing and compiler design.

6. **Web Scraping:** When extracting data from websites, regular expressions can be used to locate and extract specific elements or information from HTML pages.

7. **Log File Analysis:** Regular expressions are invaluable for searching and parsing log files, allowing you to extract important information or identify patterns of interest.

8. **Pattern Validation:** They are used to validate whether a string adheres to a specific pattern or format, such as checking if a password meets certain criteria.

9. **Data Transformation and Cleaning:** Regular expressions can be used to clean and transform text data. For example, removing unnecessary characters or formatting.

10. **Language Agnostic:** Regular expressions are supported by most programming languages, making them a versatile tool that can be applied in a wide range of contexts.

11. **Efficiency:** When used correctly, regular expressions can provide very efficient search and match operations, especially for complex patterns.

Overall, regular expressions are a fundamental tool for working with text data and are an essential skill for tasks ranging from data preprocessing to text analysis and beyond. They provide a flexible and powerful means to perform complex pattern matching operations.


**Applications of Regular Expressions**  
Regular expressions (regex) find applications in a wide range of real-time scenarios across various domains. Here are some of the most important applications of regex:

1. **Data Validation and Form Input:**
   - Ensuring that user-provided data (like email addresses, phone numbers, passwords, etc.) adhere to specified formats before processing.

2. **Search and Replace in Text Editors:**
   - Find and replace operations in text editors or IDEs, allowing for quick and precise changes in code or documents.

3. **Log File Parsing and Analysis:**
   - Extracting relevant information from log files, helping to identify patterns, errors, or anomalies in system logs.

4. **Web Scraping and Data Extraction:**
   - Extracting specific information from web pages, like email addresses, phone numbers, product names, etc., for further analysis.

5. **Data Cleaning and Transformation:**
   - Preprocessing text data by removing unnecessary characters, fixing formatting issues, and standardizing data for analysis.

6. **Search Engines and Information Retrieval:**
   - Powering search engines for matching user queries to relevant content on websites or databases.

7. **URL Routing and Validation:**
   - Validating and parsing URLs to ensure they follow the correct format and extracting specific parameters from them.

8. **Lexical Analysis in Compiler Design:**
   - Tokenizing source code into meaningful units for further processing by a compiler.

9. **Natural Language Processing (NLP):**
   - Tokenizing sentences or words, extracting entities (like names, dates, locations), and performing advanced text processing tasks.

10. **Network Security and Firewall Rules:**
    - Defining and enforcing rules for allowing or blocking specific types of traffic based on patterns in network traffic logs.

11. **Database Querying and Validation:**
    - Validating and querying databases for specific patterns or formats, such as social security numbers, credit card numbers, etc.

12. **Formal Language Theory and Automata:**
    - In computer science theory, regex is used in the definition of regular languages and finite automata.

13. **Validation of Configuration Files:**
    - Ensuring that configuration files for software or systems follow the correct syntax and structure.

14. **Extracting Metadata from Documents:**
    - Parsing documents (like PDFs, Word documents) to extract metadata such as titles, authors, dates, etc.

15. **URL Rewriting in Web Servers:**
    - Modifying URLs on the fly to improve SEO or to direct traffic to specific pages.

16. **Pattern Matching in DNA Sequences:**
    - Identifying specific genetic sequences or motifs in DNA for biological research.

These are just some of the many real-time applications of regular expressions. Their versatility and powerful pattern-matching capabilities make them an invaluable tool in various fields of computer science and data processing.

**References for Practice**  
Try to solve all the interactive tutorial from below mentioned website:

https://regexone.com/lesson/introduction_abcs

https://regex101.com

## Meta Characters
`.`, `^`, `$`, `*`, `+`, `?`, `{`, `}`, `[`, `]`, `(`, `)`, `|`, `\`

## User-defined Character Classes
- `[abc]` - Match either a or b or c
- `[^abc]` - Match any character except a or b or c
- `[a-z]` - Match a lower case english alphabet character
- `[A-Z]` - Match an upper case english alphabet character
- `[a-zA-Z]` - Match any english alphabet character
- `[0-9]` - Match any digit character
- `[a-zA-Z0-9_]` - Match any alphanumeric character
- `[^a-zA-Z0-9_]` - Match any character except alphanumeric character

## Pre-defined Character Classes
- `\d` - Match a digit character i.e. `[0-9]`
- `\D` - Match any character except digit character. i.e. `[^0-9]`
- `\w` - Match an alpha-numeric character i.e. `[a-zA-Z0-9_]`
- `\W` - Match any character except alpha-numeric character i.e. `[^a-zA-Z0-9_]`
- `\s` - Match a space character.
- `\S` - Match any character except space.
- `\t` - Match a tab character.
- `\n` - Match a next line character.

## Quantifiers
- `a*` - Match zero or more number of characters
- `a+` - Match one or more number of characters
- `a?` - Match atmost 1 character i.e. 0 or 1
- `a{n}` - Match exactly n number of character
- `a{m, n}` - Match atleast m number and atmost n number of characters

## Groups
- Capturing Groups - `(X)`
- Non Capturing Groups - `(?:X)`

## Importing the Required Module

In [7]:
import re

## re.fullmatch()

It returns a match object if and only if the entire string matches the pattern. Otherwise, it will return `None`.

**Syntax**  
```python
import re
re.findall(<regex_pattern>, string)
```

### Regular Expression to match an Indian Phone number

**Rules:**  
1. Should start with either 6 or 7 or 8 or 9
2. Should contain exactly 10 digits

In [8]:
# Step 1 - Ask the user to enter a phone number
user_entered_mobile_number = input("Enter a mobile number to validate: ")

# Step 2 - Create a regex pattern for Indian Phone Numbers
regex = r"[6-9]\d{9}"

# Step 3 - Match the Regex with the user entered mobile number
match = re.fullmatch(regex, user_entered_mobile_number)

# Step 4 - If match exist, print "Valid"
if match:
    print(user_entered_mobile_number, 'is valid.')
else:
    print(user_entered_mobile_number, 'not valid.')

Enter a mobile number to validate:  9090909090


9090909090 is valid.


### Preceding `r` in a string 

The 'r' at the start of the pattern string designates a python "raw" string.  
The 'r' means that the string is to be treated as a raw string, which means all escape codes will be ignored.  
For an example:  
`'\n'` will be treated as a newline character, while `r'\n'` will be treated as the characters \ followed by n.

In [5]:
print('This is the first line. \n')
print('There are two new lines before this line.')

This is the first line. 

There are two new lines before this line.


In [6]:
print(r'This is the first line. \n')
print(r'There are two new lines before this line.')

This is the first line. \n
There are two new lines before this line.


### Regular Expression to validate a Python Identifier

**Rules:**  
1. Should contain only alpha-numeric character (i.e. No special characters allowed)
2. Should never start with a digit
3. Has no length limit
4. Reserved words or keywords not allowed

In [9]:
# Step 1 - Ask the user to enter a python identifier
user_defined_identifier = input("Enter any identifier to validate:")

# Step 2 - Create a regex pattern for a valid Python Identifier
regex = r"[a-zA-Z_]\w*"

# Step 3 - Match the Regex with the user entered identifier
match = re.fullmatch(regex, user_defined_identifier)

# Step 4 - If match exist, print "Valid"
if match:
    print(user_defined_identifier, 'is valid.')
else:
    print(user_defined_identifier, 'not valid.')

Enter any identifier to validate: var_1


var_1 is valid.


In [10]:
# Create a list of predefined keywords in Python
keywords = ['if', 'for', 'def', 'while', 'True', 'False', "None"]

# Step 1 - Ask the user to enter a python identifier
user_defined_identifier = input("Enter any identifier to validate:")

# Step 2 - Create a regex pattern for a valid Python Identifier
regex = r"[a-zA-Z_]\w*"

# Step 3 - Match the Regex with the user entered identifier
match = re.fullmatch(regex, user_defined_identifier)

# Step 4 - If match exist, print "Valid"
if user_defined_identifier in keywords:
    print("You have entered a pre-defined python keyword.")
elif match:
    print(user_defined_identifier, 'is valid.')
else:
    print(user_defined_identifier, 'not valid.')

Enter any identifier to validate: if


You have entered a pre-defined python keyword.


## re.findall()

It iterates over a string to find a subset of characters that match a specified pattern. It will return a list of every pattern match that occurs in a given string. The string is scanned left-to-right, and matches are returned in the order found.

**Syntax**
```python
import re
re.findall(<regex_pattern>, string)
```

In [11]:
string = """MediaTek MT8183 Processor4 GB LPDDR4 RAMAndroid Operating System29.46 cm (11.6 Inch) Display1 Year Pick and Drop Warranty
Intel Core i5 Processor (11th Gen)8 GB DDR4 RAMWindows 11 Operating System512 GB SSD39.62 cm (15.6 Inch) Display1 Year Carry-in Warranty
AMD Athlon Dual Core Processor8 GB DDR4 RAMWindows 11 Operating System512 GB SSD39.62 cm (15.6 Inch) Display1 Year Onsite Warranty
Apple M2 Processor8 GB Unified Memory RAMMac OS Operating System256 GB SSD34.54 cm (13.6 Inch) DisplayBuilt-in Apps: iMovie, Siri, GarageBand, Pages, Numbers, Photos, Keynote, Safari, Mail, FaceTime, Messages, Maps, Stocks, Home, Voice Memos, Notes, Calendar, Contacts, Reminders, Photo Booth, Preview, Books, App Store, Time Machine, TV, Music, Podcasts, Find My, QuickTime Player1 Year Limited Warranty
Intel Core i3 Processor (10th Gen)8 GB LPDDR4X RAM64 bit Windows 11 Operating System512 GB SSD39.62 cm (15.6 inch) Display1 Year Onsite Warranty
AMD Athlon Dual Core Processor4 GB DDR4 RAMDOS Operating System256 GB SSD39.62 cm (15.6 Inch) Display1 Year Onsite Warranty
Intel Celeron Dual Core Processor (4th Gen)8 GB DDR4 RAM64 bit Windows 11 Operating System256 GB SSD39.62 cm (15.6 inch) Display1 Years
MediaTek MT8183 Processor4 GB LPDDR4 RAMAndroid Operating System29.46 cm (11.6 Inch) Display1 Year Pick and Drop Warranty
Intel Core i3 Processor (12th Gen)8 GB LPDDR5 RAM64 bit Windows 11 Operating System256 GB SSD39.62 cm (15.6 Inch) Display1 Year International Travelers Warranty
AMD Ryzen 5 Hexa Core Processor8 GB DDR4 RAM64 bit Windows 11 Operating System512 GB SSD39.62 cm (15.6 inch) DisplayMicrosoft Office Home 2019 & Office 365, HP Documentation, HP SSRM, HP Smart1 Year Onsite Warranty
"""

### Extract the Processor Details

In [18]:
regex = r'(?:MediaTek|AMD|Intel|Apple)[\s\w]+Processor'

re.findall(regex, string)

['MediaTek MT8183 Processor',
 'Intel Core i5 Processor',
 'AMD Athlon Dual Core Processor',
 'Apple M2 Processor',
 'Intel Core i3 Processor',
 'AMD Athlon Dual Core Processor',
 'Intel Celeron Dual Core Processor',
 'MediaTek MT8183 Processor',
 'Intel Core i3 Processor',
 'AMD Ryzen 5 Hexa Core Processor']

### Extract the RAM Details

In [32]:
regex = r'(?:\([\w\s]+\))?\d+\sGB[\s\w]+RAM'

re.findall(regex, string)

['4 GB LPDDR4 RAM',
 '(11th Gen)8 GB DDR4 RAM',
 '8 GB DDR4 RAM',
 '8 GB Unified Memory RAM',
 '(10th Gen)8 GB LPDDR4X RAM',
 '4 GB DDR4 RAM',
 '(4th Gen)8 GB DDR4 RAM',
 '4 GB LPDDR4 RAM',
 '(12th Gen)8 GB LPDDR5 RAM',
 '8 GB DDR4 RAM']

### Extracting the OS Details

In [27]:
regex = r'(?:\d+\sbit|Android|Mac|Window|DOS)[\s\w]+Operating System'

re.findall(regex, string)

['Android Operating System',
 'Windows 11 Operating System',
 'Windows 11 Operating System',
 'Mac OS Operating System',
 '64 bit Windows 11 Operating System',
 'DOS Operating System',
 '64 bit Windows 11 Operating System',
 'Android Operating System',
 '64 bit Windows 11 Operating System',
 '64 bit Windows 11 Operating System']

### Extracting Storage Details

In [34]:
regex = r'[\d]+\s(?:GB|TB)\s(?:HDD|SSD)'

re.findall(regex, string)

['512 GB SSD',
 '512 GB SSD',
 '256 GB SSD',
 '512 GB SSD',
 '256 GB SSD',
 '256 GB SSD',
 '256 GB SSD',
 '512 GB SSD']

### Extract Display Details

In [35]:
regex = r'\d+\.?\d+\scm.*Display'

re.findall(regex, string)

['29.46 cm (11.6 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '34.54 cm (13.6 Inch) Display',
 '39.62 cm (15.6 inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 inch) Display',
 '29.46 cm (11.6 Inch) Display',
 '39.62 cm (15.6 Inch) Display',
 '39.62 cm (15.6 inch) Display']

### Extract the Warranty Details

In [36]:
regex = r'\d+\sYear.*Warranty'

re.findall(regex, string)

['1 Year Pick and Drop Warranty',
 '1 Year Carry-in Warranty',
 '1 Year Onsite Warranty',
 '1 Year Limited Warranty',
 '1 Year Onsite Warranty',
 '1 Year Onsite Warranty',
 '1 Year Pick and Drop Warranty',
 '1 Year International Travelers Warranty',
 '1 Year Onsite Warranty']

## Search and Replace

In [68]:
# re.sub(pattern, replacement, targetString)

import re

s = re.sub('\d', '#', 'a7b6j9k2h4')

print(s)

a#b#j#k#h#


In [71]:
# re.sub(regex, replacement, targetString)

import re

phone = '1233-123-345 # This is a phone'

print('Phone num: ', phone)

num = re.sub('#.*$', '', phone)

print('Phone num: ',num)

num = re.sub('\D', '', phone)

print('Phone num: ', num)

Phone num:  1233-123-345 # This is a phone
Phone num:  1233-123-345 
Phone num:  1233123345


## re.split()

In [72]:
import re

l = re.split('-', 'I-Learn-Python-Regex')

print(l)

['I', 'Learn', 'Python', 'Regex']


In [73]:
import re

l = re.split('\W', 'abcdefg@gmail.com')

print(l)

['abcdefg', 'gmail', 'com']


In [75]:
import re

l = re.split('[.]', 'www.facebook.com')

print(l)

['www', 'facebook', 'com']


# Step by Step creating RegEx

### STEP-1 : Create pattern object

In [1]:
# compile() -> convert the pattern into regex object

import re

pattern = re.compile('Python')

print(type(pattern))

<class 're.Pattern'>


### STEP-2 : Create Matcher object

In [2]:
# finditer() -> we can check how many matcher are available

matcher = pattern.finditer('I Python am learning python Regex in Python!')

print(type(matcher))


<class 'callable_iterator'>


### STEP-3 : Iterate over the Matcher

In [3]:
# start() -> starting index of matched string
# end() -> end+1 index
# group() -> returns matched sring

for m in matcher:
    print(type(m))
    print('Match is at:{}, End:{}, Pattern found: {}'.
          format(m.start(), m.end(), m.group()))
print('DONE!!')


<class 're.Match'>
Match is at:2, End:8, Pattern found: Python
<class 're.Match'>
Match is at:37, End:43, Pattern found: Python
DONE!!


## Using Various Combinations

In [1]:
# Pattern = 'ab'
# Target = 'abaababa'
# Way - 1

import re

pattern = re.compile('ab')

matcher = pattern.finditer('abaababa')

for m in matcher:
    print(m.group(), m.start(), m.end())

ab 0 2
ab 3 5
ab 5 7


In [2]:
# Way - 2

import re

matcher = re.compile('ab').finditer('abaababa')

for m in matcher:
    print(m.group(), m.start(), m.end())

ab 0 2
ab 3 5
ab 5 7


In [3]:
# Way - 3
# re.finditer(pattern, target)

import re

matcher = re.finditer('a+', 'aaaaabbcdegaaabdfgfdgaabjukilua')

for m in matcher:
    print(m.group(), m.start(), m.end())

aaaaa 0 5
aaa 11 14
aa 21 23
a 30 31


In [27]:
import re

matcher = re.finditer('ab', 'abaababa')

count = 0

for m in matcher:
    count += 1
    print('Match is at:{}, End:{}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at:0, End:2, Pattern found: ab
Match is at:3, End:5, Pattern found: ab
Match is at:5, End:7, Pattern found: ab
Total count:  3


In [28]:
matcher = re.finditer('ba', 'abaababa')

count = 0

for m in matcher:
    count += 1
    print('Match is at:{}, End:{}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at:1, End:3, Pattern found: ba
Match is at:4, End:6, Pattern found: ba
Match is at:6, End:8, Pattern found: ba
Total count:  3


In [33]:
lst = [1, 2.0, '3', True]

print(type(lst))

for i in lst:
    print(type(i))

<class 'list'>
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>


In [37]:
matcher = re.finditer('an', 'Kanav Bansal')

print(type(matcher))

for m in matcher:
    print(m.group(), m.start(), m.end())
    print(type(m))

<class 'callable_iterator'>
an 1 3
<class 're.Match'>
an 7 9
<class 're.Match'>


In [38]:
matcher = re.finditer('[abc]', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 2, End: 3, Pattern found: b
Total count:  2


In [39]:
matcher = re.finditer('[a-z]', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 2, End: 3, Pattern found: b
Match is at: 5, End: 6, Pattern found: k
Match is at: 7, End: 8, Pattern found: z
Total count:  4


In [40]:
matcher = re.finditer('[^a-z]', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 1, End: 2, Pattern found: 7
Match is at: 3, End: 4, Pattern found: @
Match is at: 4, End: 5, Pattern found:  
Match is at: 6, End: 7, Pattern found: 9
Total count:  4


In [41]:
matcher = re.finditer('\s', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 4, End: 5, Pattern found:  
Total count:  1


In [42]:
matcher = re.finditer('\w', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 1, End: 2, Pattern found: 7
Match is at: 2, End: 3, Pattern found: b
Match is at: 5, End: 6, Pattern found: k
Match is at: 6, End: 7, Pattern found: 9
Match is at: 7, End: 8, Pattern found: z
Total count:  6


In [43]:
matcher = re.finditer('\W', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 3, End: 4, Pattern found: @
Match is at: 4, End: 5, Pattern found:  
Total count:  2


In [44]:
matcher = re.finditer('^a', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Total count:  1


In [45]:
matcher = re.finditer('z$', 'a7b@ k9z')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 7, End: 8, Pattern found: z
Total count:  1


In [46]:
matcher = re.finditer('a', 'abaababa')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 2, End: 3, Pattern found: a
Match is at: 3, End: 4, Pattern found: a
Match is at: 5, End: 6, Pattern found: a
Match is at: 7, End: 8, Pattern found: a
Total count:  5


In [47]:
matcher = re.finditer('a+', 'abaababa')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 2, End: 4, Pattern found: aa
Match is at: 5, End: 6, Pattern found: a
Match is at: 7, End: 8, Pattern found: a
Total count:  4


In [48]:
matcher = re.finditer('a*', 'abaababa')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 1, End: 1, Pattern found: 
Match is at: 2, End: 4, Pattern found: aa
Match is at: 4, End: 4, Pattern found: 
Match is at: 5, End: 6, Pattern found: a
Match is at: 6, End: 6, Pattern found: 
Match is at: 7, End: 8, Pattern found: a
Match is at: 8, End: 8, Pattern found: 
Total count:  8


In [49]:
matcher = re.finditer('a?', 'abaababa')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 1, End: 1, Pattern found: 
Match is at: 2, End: 3, Pattern found: a
Match is at: 3, End: 4, Pattern found: a
Match is at: 4, End: 4, Pattern found: 
Match is at: 5, End: 6, Pattern found: a
Match is at: 6, End: 6, Pattern found: 
Match is at: 7, End: 8, Pattern found: a
Match is at: 8, End: 8, Pattern found: 
Total count:  9


In [51]:
matcher = re.finditer('a?', 'abaabbababbb')

count = 0

for m in matcher:
    count += 1
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
print('Total count: ',count)

Match is at: 0, End: 1, Pattern found: a
Match is at: 1, End: 1, Pattern found: 
Match is at: 2, End: 3, Pattern found: a
Match is at: 3, End: 4, Pattern found: a
Match is at: 4, End: 4, Pattern found: 
Match is at: 5, End: 5, Pattern found: 
Match is at: 6, End: 7, Pattern found: a
Match is at: 7, End: 7, Pattern found: 
Match is at: 8, End: 9, Pattern found: a
Match is at: 9, End: 9, Pattern found: 
Match is at: 10, End: 10, Pattern found: 
Match is at: 11, End: 11, Pattern found: 
Match is at: 12, End: 12, Pattern found: 
Total count:  13


## re.match()

In [56]:
# Used to match the given pattern at the beginning of the target string
# If it finds the pattern than returns match object.
# Now we can use start(), end(), group() with match object
# If nothing found then retuns None.


import re

regex = input('Enter pattern: ')

m = re.match(regex, 'abcdefgh')

print(type(m))

if m == None:
    print('Match is not available at beginning of the string!')

else:
    print('Match found at beginning of the string.')
    print('Match is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))
    


Enter pattern: bcd
<class 'NoneType'>
Match is not available at beginning of the string!


## re.search()

In [59]:
# search the target irrespective of the location
# If match is found returns the first occurance
# otherwie returns None.

# If it finds the pattern than returns match object.
# Now we can use start(), end(), group() with match object
# If nothing found then retuns None.


import re

regex = input('Enter pattern: ')

m = re.search(regex, 'abcdefghi')

if m != None:
    print('Match available.')
    print('First occurance is at: {}, End: {}, Pattern found: {}'.format(m.start(), m.end(), m.group()))

else:
    print('Match is not available in the whole string!')
    


Enter pattern: bcd
Match available.
First occurance is at: 1, End: 4, Pattern found: bcd


## re.match() vs re.search()

In [61]:
target = 'Dogs are better than cats'

pattern = 'cats'

m = re.match(pattern, target)

if m == None:
    print('NO MATCH')
else:
    print('MATCH SUCC')
    
m = re.search(pattern, target)

if m == None:
    print('NO SEARCH')
else:
    print('SEARCH SUCC')

NO MATCH
SEARCH SUCC


## Other Examples

In [16]:
# # re.IGNORECASE flag

# import re

# s = 'I am learning Python'

# m = re.search('python$', s)

# if m:
#     print('Cool!!')
    
# else:
#     print('Not Cool!!')

In [23]:
# # Python Identifier

# import re

# s = input('Enter any identifier to validate: ')

# m = re.fullmatch('[a-zA-Z_]+[a-zA-Z0-9_]*', s)

# if m:
#     print(s, 'is valid')
# else:
#     print(s, 'not valid')

In [37]:
# # Match Phone number

# import re

# s = input('Enter a mobile number to validate: ')

# m = re.fullmatch('[6-9]\d{9}', s)

# if m:
#     print(s, 'is Valid')
# else:
#     print(s, 'not valid')