### What is Regular Expression?

* A regular expression, also known as regex, is a special sequence of characters that defines a search pattern to find a string or set of strings, such as a word or a sentence.


* Regular expression is a search pattern.


* It can be used to detect the presence or absence of a text by a given search pattern, and it can detect one or more search patterns.


* After detecting the pattern, regular expressions can also replace or manipulate the matched strings based on the given pattern, as well as split the text into one or more subpatterns.


* A regular expression is a pattern of characters that is used to match and manipulate text. It is a sequence of characters and metacharacters that represent a set of rules to match one or more strings of text.

### How to use regular expression in python


* 're' is a in-built Python library which is used to work with regular expressions.


* To use regular expressions in Python, you first need to import the re module.

### Use of Regular Expression


* Search and replace: You can use regular expressions to search for a pattern in a string and replace it with another string.


* Text validation: Regular expressions can be used to validate whether a given string conforms to a specific pattern or format. For example, you could use regular expressions to check whether an email address or phone number is formatted correctly.


* Data extraction / Web Scrapping: You can use regular expressions to extract specific data from a larger body of text. For example, you could use regular expressions to extract all of the URLs from an HTML file or you can extract a specific thing from a website.


* Parsing and tokenization: Regular expressions can be used to break a string down into smaller pieces or tokens, based on specific patterns or delimiters.


* Cleaning and preprocessing: Regular expressions can be used to clean and preprocess text data, such as removing unwanted characters or formatting, before further analysis.


* File handling: Regular expressions can be used to search for patterns within a file, such as finding all instances of a specific word or phrase within a text document.

### Meta Characters used in Regular Expression




![12234737_be2bb558-a9dc-4977-9243-8b790f75fc92_lg.webp](attachment:12234737_be2bb558-a9dc-4977-9243-8b790f75fc92_lg.webp)

| Meta Characters  | Meaning
|------------|---------------------------------------------------------------------------------------------------------|
| . (dot)    |  Matches any single character except newline.                                                      
|^ (caret)   | Matches the start of the string.
|$ (dollar sign)| Matches the end of the string.
|* (asterisk) | Matches zero or more occurrences of the preceding character or group.
|+ (plus)| Matches one or more occurrences of the preceding character or group.
|? (question mark)| Matches zero or one occurrence of the preceding character or group.
| (vertical bar)| Matches either the expression before or after the vertical bar.
|[] (square brackets)| Matches any single character that is in the specified set of characters.
|[^] (caret within square brackets)|Matches any single character that is not in the specified set of characters.
|() (parentheses)| Groups a sequence of characters together and creates a capture group.
|{} (curly braces) | Matches a specified number of occurrences of the preceding character or group.
|\ (backslash)| Escapes the following character, allowing it to be used as a literal character.

| Quantifiers  | Meaning
|--------------|---------------------------------------------------------------------------------------------------------|
| * | Matches zero or more occurrences of the preceding character or group. For example, a* matches zero or more occurrences of the character 'a'.
|+ | Matches one or more occurrences of the preceding character or group. For example, a+ matches one or more occurrences of the character 'a'.
|? | Matches zero or one occurrence of the preceding character or group. For example, a? matches zero or one occurrence of the character 'a'.
|{m} | Matches exactly m occurrences of the preceding character or group. For example, a{3} matches exactly three occurrences of the character 'a'.
|{m,n} | Matches between m and n occurrences of the preceding character or group. For example, a{2,4} matches between two and four occurrences of the character 'a'.

| Wildcard  | Meaning
|--------------|---------------------------------------------------------------------------------------------------------|
|'.' (dot) | Matches any character except a newline character. For example, a. matches any string that starts with 'a' followed by any character.

## Anchors

In [1]:
# 1) '^' - Matches the start of the string. For example, ^a matches any string that starts with the character 'a'.

# 2) '$' - Matches the end of the string. For example, a$ matches any string that ends with the character 'a'.

# 3) '\b' - Matches a word boundary. For example, \bcat\b matches the word 'cat' when it is a separate word and not part of a larger word.

### Import the Regex Library in python 

In [2]:
import re 

### re.search(pattern, string, flags=0)

* Searches the string with the matching pattern. Returns a Match object if found. If not found, returns None.


* In a string if pattern matches many times it will return only first occurance of the matched pattern.


* It is used to find that pattern is present in a string or not?


* After finding the pattern in a string it will give the index location.

In [3]:
re.search(r'sameer', 'sameer sameer sameer jhchc ahascja askjasscks sjosjk')

<re.Match object; span=(0, 6), match='sameer'>

In [4]:
re.search(r'sameer', 'sam sam sam') # if the pattern is not matched it will return nothing.

In [5]:
# print output of re.search()

match = re.search('sameer', 'sameer sameer sameer jhchc ahascja askjas')
print(match.group())

sameer


### re.findall(pattern, string, flags=0)


* Use to find a pattern in a string. Returns a list of string with matched pattern, if pattern is found. Returns a empty list if pattern is not found.


* In a string if pattern matches many times it will return all occurances of the matched pattern in a list of strings.


* It is used to find a pattern.

In [6]:
re.findall(r'sameer', 'sameer sameer sameer sam jhchc sam ahascja askjasscks sjosjk')

['sameer', 'sameer', 'sameer']

In [7]:
# print output of re.findall()

match = re.findall(r'sameer', 'sameer sameer sameer jhchc ahascja askjas')
print(match)

['sameer', 'sameer', 'sameer']


# Use of Meta Characters

In [8]:
a = "charlie and the chocolate factory"

b = "asameer130@gmail.com"

c = "hello"

d = "xyz, yz, xyzz, xxzzy, zyz, xxyz"

In [9]:
# If we search a string which contains '.' (dot) like email address or url than we can't find it by given '.' in pattern.
# It will return the first letter of the string.

match = re.search(r'.',b)  
print(match)              

<re.Match object; span=(0, 1), match='a'>


### Use of \ (backslash) : Escapes special characters, allowing them to be treated as literal characters.

* It is used to drop special meaning of characters following it.


* It will return the string with matched pattern along with the index location of the string. Return none if pattern is not found.

In [10]:
match = re.search(r'\.',b)  
print(match)    

<re.Match object; span=(16, 17), match='.'>


In [11]:
match = re.search(r'\.c',b)  
print(match)   

<re.Match object; span=(16, 18), match='.c'>


In [12]:
match = re.search(r'\.ca',b)  
print(match)   

None


In [13]:
pattern = r'\d{3}-\d{2}-\d{4}' # Matches a social security number in the format XXX-XX-XXXX
text = 'John Doe: 123-45-6789'
matches = re.findall(pattern, text)
print(matches)

['123-45-6789']


### Use of [ ] (square brakets)

* Matches any single character inside the brackets.


* They're used to specifying a character class, which is a set of characters.

In [14]:
pattern = r'[aeiou]'
text = 'apple, orange, grape'
matches = re.findall(pattern, text)
print(matches)

['a', 'e', 'o', 'a', 'e', 'a', 'e']


In [15]:
match = re.search(r'[.]',b)  
print(match) 

<re.Match object; span=(16, 17), match='.'>


In [16]:
match = re.findall(r'[a]',b)  
print(match) 

['a', 'a', 'a']


In [17]:
match = re.search(r'[hel]',c)  
print(match) 

<re.Match object; span=(0, 1), match='h'>


In [18]:
match = re.findall(r'[hel]',c)  
print(match) 

['h', 'e', 'l', 'l']


In [19]:
match = re.search(r'[a-l]',c)  
print(match)

<re.Match object; span=(0, 1), match='h'>


In [20]:
match = re.findall(r'[a-l]',c)  
print(match)

['h', 'e', 'l', 'l']


In [21]:
match = re.findall(r'[a-l]','bar jguyuy jhuiu kihiho')  
print(match)

['b', 'a', 'j', 'g', 'j', 'h', 'i', 'k', 'i', 'h', 'i', 'h']


In [22]:
match = re.findall(r'[0-5]','655466127')  
print(match)

['5', '5', '4', '1', '2']


### Use of [^] (caret within square brackets): Matches any single character that is not in the specified set of characters.

In [23]:
# If we use caret in square brackets means don't search for given pattern and returns everything except given pattern.

match = re.findall('[^a-l]',c)  
print(match)

['o']


In [24]:
match = re.findall('[^a]','sameer')  # Return everything except 'a'
print(match)

['s', 'm', 'e', 'e', 'r']


In [25]:
match = re.findall('[^a|e]','sameer ameer air')  # Return everything except 'a'
print(match)

['s', 'm', 'r', ' ', 'm', 'r', ' ', 'i', 'r']


### Use of ^ (caret)


* Matches the beginning of a string..


* It is used to search a srting which starts with a specific word.


* It is only used to search the beginning of a string.

In [26]:
pattern = r'^Hello'
text = 'Hello, world!'
matches = re.findall(pattern, text)
print(matches)

['Hello']


In [27]:
pattern = r'^world'
text = 'Hello, world!'
matches = re.findall(pattern, text)
print(matches)

[]


In [28]:
match = re.search('^c',a)  
print(match) 

<re.Match object; span=(0, 1), match='c'>


In [29]:
match = re.search('^a',a)  
print(match)

None


In [30]:
match = re.findall('^c',a)  
print(match)

['c']


In [31]:
match = re.search('^s','sameer faf ckljkd ss sfan')  
print(match)

<re.Match object; span=(0, 1), match='s'>


In [32]:
match = re.findall('^f','sameer faf ckljkd ss sfan')  
print(match)

[]


In [33]:
match = re.findall('^ss','sameer faf ckljkd ss sfan')  
print(match)

[]


### Use of  '.'  (dot)


* It maches any character except newline.


In [34]:
match = re.search('f.','sameer sf5af ckljkd ss sf5n')  
print(match)

<re.Match object; span=(8, 10), match='f5'>


In [35]:
match = re.search('f..','sameer faf ckljkd ss sfan')  
print(match)

<re.Match object; span=(7, 10), match='faf'>


In [36]:
match = re.findall('s..','sameer faf cklshjkd ss sfan')  
print(match)

['sam', 'shj', 'ss ', 'sfa']


In [37]:
match = re.findall('s.m','sameer faf ckljkd ss sfan sum shm')  
print(match)

['sam', 'sum', 'shm']


In [38]:
pattern = r'f..t'
text = 'foot, fight, fruit'
matches = re.findall(pattern, text)
print(matches) 

['foot']


### Use of $ (Dollar)


* Matches the end of a string.


* It is used to search a srting which end with a specific word or character.


* It is only used to search the end of a string.

In [39]:
pattern = r'world!$'
text = 'Hello, world!'
matches = re.findall(pattern, text)
print(matches)

['world!']


In [40]:
pattern = r'Hello$'
text = 'Hello, world!'
matches = re.findall(pattern, text)
print(matches)

[]


In [41]:
match = re.search('s$','sameer faf ckljkd ss sfan')  
print(match)

None


In [42]:
match = re.search('n$','sameer faf ckljkd ss sfan')  
print(match)

<re.Match object; span=(24, 25), match='n'>


In [43]:
match = re.search('an$','sameer faf ckljkd ss sfan')  
print(match)

<re.Match object; span=(23, 25), match='an'>


In [44]:
match = re.findall('an$','sameer an faf ckljkd ss sfan')  
print(match)

['an']


### Use of | (vertical bar) : Matches either the expression before or after the |


* It means OR, such as condition.


* Matches either the expression before or after the vertical bar. i.e "(com|in|org)".


* In a single regex we can search for multiple patterns by using | also i.e "^.*.xml$|^.*.jar$|^.*.exe$".


* It is used to find the characters in between words.

In [45]:
pattern = r'Hello|Hi'
text = 'Hello, world! Hi, there.'
matches = re.findall(pattern, text)
print(matches)

['Hello', 'Hi']


In [46]:
match = re.findall('sa|fa|c','sameer faf ckljkd ss sfan')  
print(match)

['sa', 'fa', 'c', 'fa']


In [47]:
match = re.findall('sa|so','sameer faf ckljkd ss sfan')  
print(match)

['sa']


In [48]:
match = re.findall('sa|lj','sameer faf ckljkd ss sfan')  
print(match)

['sa', 'lj']


### Use of '?' (Question Mark)


*  Matches zero or one occurrence.


* It tells that whether a pattern is present or absent.

In [49]:
pattern = r'colou?r'
text = 'color, colour, colouur'
matches = re.findall(pattern, text)
print(matches)

['color', 'colour']


In [50]:
match = re.findall('ab?','a, ab, abbbb')  
print(match)

['a', 'ab', 'ab']


In [51]:
match = re.findall('sam?','jjsammmm.,.,mm hdhdhdsaammmmmjdjsamdjsamhdhhdsam')  
print(match)

['sam', 'sa', 'sam', 'sam', 'sam']


In [52]:
match = re.findall('s|a|m?','a s m sam jjsammmm.,.,mm hdhdhdsaammmmmjdjsamdjsamhdhhdsam')  
print(match)

['a', '', 's', '', 'm', '', 's', 'a', 'm', '', '', '', 's', 'a', 'm', 'm', 'm', 'm', '', '', '', '', 'm', 'm', '', '', '', '', '', '', '', 's', 'a', 'a', 'm', 'm', 'm', 'm', 'm', '', '', '', 's', 'a', 'm', '', '', 's', 'a', 'm', '', '', '', '', '', 's', 'a', 'm', '']


In [53]:
match = re.findall('ab?','abc')  
print(match)

['ab']


In [54]:
match = re.findall('ab?','aaaaabbbbbbbbbc')  
print(match)

['a', 'a', 'a', 'a', 'ab']


In [55]:
match = re.findall('a?b','abbbbbbbbbc aaaab')  
print(match)

['ab', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'ab']


In [56]:
match = re.findall('b?','abbbbbbbbbc')  
print(match)

['', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', '', '']


In [57]:
match = re.findall('bb?','abbbbbbbbbc')  
print(match)

['bb', 'bb', 'bb', 'bb', 'b']


In [58]:
match = re.findall('b?b','abbbbbbbbbc')  
print(match)

['bb', 'bb', 'bb', 'bb', 'b']


In [59]:
match = re.findall('sa?','sameer sam sar sad sag')  
print(match)

['sa', 'sa', 'sa', 'sa', 'sa']


In [60]:
match = re.findall('sa?','ssssasasa samsa sarsa sadsa sagsa')  
print(match)

['s', 's', 's', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa']


In [61]:
match = re.findall('sa?s','sasasasasssss')  # 'sa' 1 baar ho 's' bhi 1 baar ho
print(match)

['sas', 'sas', 'ss', 'ss']


In [62]:
match = re.findall('sa?','sasasa samsasa00 sarsasasa00 sadsa sagsa')  
print(match)

['sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa']


### Use of * (asterisk)


* Matches zero or more occurrences of the preceding character or group.


* It will return the first letter of pattern and the whole pattern.

In [63]:
# 'g' se starting ho, 'o' zero ya kitni bhi baar aaye or ending 'd' pe ho

pattern = r'go*d'
text = 'gd, god, good, goood'
matches = re.findall(pattern, text)
print(matches)

['gd', 'god', 'good', 'goood']


In [64]:
match = re.findall('sa*','saaasasa samsasa00 sarsasasa00 sadsa sagsa')  
print(match)

['saaa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa', 'sa']


In [65]:
match = re.findall('sa*','sun sameer')  
print(match)

['s', 'sa']


In [66]:
match = re.findall('ab*','aaaabbbcaaaab sameer aaabbb')  
print(match)

['a', 'a', 'a', 'abbb', 'a', 'a', 'a', 'ab', 'a', 'a', 'a', 'abbb']


In [67]:
match = re.findall('ab*','abbbc abbbbb bbb')  
print(match)

['abbb', 'abbbbb']


In [68]:
match = re.findall('ab*','abbbcbbabc a a a') 
print(match)

['abbb', 'ab', 'a', 'a', 'a']


In [69]:
match = re.findall('a*b','aaaabbbcbbabc  aaabbbbb') # 'a' kitni bhi baar ho par 'b' bs 1 baar ho.
print(match)

['aaaab', 'b', 'b', 'b', 'b', 'ab', 'aaab', 'b', 'b', 'b', 'b']


### Use of + (plus)


* Matches one or more occurrences of the preceding character or group.

In [70]:
# Starting 'g' se ho 'o' 1 baar ya zyada baar aaye 'd' pe ending ho

pattern = r'go+d'
text = 'gd, god, good, goood'
matches = re.findall(pattern, text)
print(matches)

['god', 'good', 'goood']


In [71]:
match = re.findall('x+','xxxxxxjxxxx hdhdhxjje')  # 'x' kitni bhi baar repeat ho skta hai
print(match)

['xxxxxx', 'xxxx', 'x']


In [72]:
match = re.findall('x+s','xxxxxsss') # 'x' kitni hi baar repeat ho or uske baad 's' aaye
print(match)

['xxxxxs']


In [73]:
match = re.findall('x+a','xxxxxs') # 'x' kitni hi baar repeat ho or uske baad 'a' aaye 
print(match)

[]


### Use of {} (curly braces)

* {n}: Matches exactly n occurrences of the preceding character or group.


* Matches a specified number of occurrences

In [74]:
pattern = r'go{2}d'
text = 'gd, god, good, goood'
matches = re.findall(pattern, text)
print(matches)

['good']


In [75]:
pattern = r'go{2,3}d'
text = 'gd, god, good, goood, goooood'
matches = re.findall(pattern, text)
print(matches)

['good', 'goood']


In [76]:
match = re.findall('x{2,4}','xxxxxs jjjxx xcx ksksksxxx gggxj') # jaha pe 'x' 2 se 4 baar aaya hai vo output dega
print(match)

['xxxx', 'xx', 'xxx']


### Use of () (parentheses) : Groups multiple subexpressions into a single expression.

* Enclose a group of regex.


* Groups a sequence of characters together and creates a capture group.

In [77]:
# 'foo' or 'bar' jaha pe ho return do. Chahe word ke starting me ho, beech me ho ya ending me ho.

pattern = r'(foo|bar)'
text = 'foofoofoo, foobar, barfoobar'
matches = re.findall(pattern, text)
print(matches)

['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'bar']


In [78]:
# jaha pe 'f' ho uske baad 'o' 1 ya 1 se zyada baar ho return karo

pattern = r'(fo)+'
text = 'foot, bfoot, bfooooooob'
matches = re.findall(pattern, text)
print(matches)

['fo', 'fo', 'fo']


In [79]:
match = re.findall('(x|y)','xxxxxs jjjxx yx xcx ksksksxxx') # jaha pe 'x' ya 'y' hai vo output dega
print(match)

['x', 'x', 'x', 'x', 'x', 'x', 'x', 'y', 'x', 'x', 'x', 'x', 'x', 'x']


In [80]:
match = re.findall('(a|b)','abc bbc cbbba') # jaha pe 'a' or 'b' hai output dega
print(match)

['a', 'b', 'b', 'b', 'b', 'b', 'b', 'a']


In [81]:
match = re.findall('(a|b)c','abc bbc cbbba bccc bn ac bg') # jaha pe 'a' ya 'b' hai or uske baad c hai vo output dega
print(match)

['b', 'b', 'b', 'a']


# Character Sets / Regex Sets


* It is a set of characters inside a pair of square brackets [ ] with a special meaning.

| Pattern  | Matches                                                                                    |
|----------|--------------------------------------------------------------------------------------------|
|[arn] | Returns a match where one of the specified characters (a, r, or n) are present
|[a-n] | Returns a match for any lower case character, alphabetically between a and n
|[^arn] |Returns a match for any character EXCEPT a, r, and n
|[0123] | Returns a match where any of the specified digits (0, 1, 2, or 3) are present
|[0-9] | Returns a match for any digit between 0 and 9
|[0-5][0-9] | Returns a match for any two-digit numbers from 00 and 59
|[a-zA-Z]|Returns a match for any character alphabetically between a and z, lower case OR upper case
|[+, *, ., (), $,{}] |In sets, metacharacters has no special meaning.

In [82]:
pattern = r'[arn]'
text = 'The rat in the hat.'
matches = re.findall(pattern, text)
print(matches)

['r', 'a', 'n', 'a']


In [83]:
pattern = r'[a-n]'
text = 'The rat in the hat.'
matches = re.findall(pattern, text)
print(matches)

['h', 'e', 'a', 'i', 'n', 'h', 'e', 'h', 'a']


In [84]:
pattern = r'[^a-n]'
text = 'The rat in the hat.'
matches = re.findall(pattern, text)
print(matches)

['T', ' ', 'r', 't', ' ', ' ', 't', ' ', 't', '.']


In [85]:
pattern = r'[0123]'
text = 'The rat in the hat 987135286.'
matches = re.findall(pattern, text)
print(matches)

['1', '3', '2']


In [86]:
pattern = r'[^0123]'
text = 'The rat in the hat 987135286.'
matches = re.findall(pattern, text)
print(matches)

['T', 'h', 'e', ' ', 'r', 'a', 't', ' ', 'i', 'n', ' ', 't', 'h', 'e', ' ', 'h', 'a', 't', ' ', '9', '8', '7', '5', '8', '6', '.']


In [87]:
pattern = r'[0-9]'
text = 'The rat in the hat 987135286.'
matches = re.findall(pattern, text)
print(matches)

['9', '8', '7', '1', '3', '5', '2', '8', '6']


In [88]:
# first digit '0' se '4' ke beech me ho or last digit '0' se '8' ke beech me ho. i.e 0-48

pattern = r'[0-4][0-8]'
text = 'The rat in the hat 987135286.'
matches = re.findall(pattern, text)
print(matches)

['13', '28']


In [89]:
pattern = r'[a-zA-Z]'
text = 'The rat in the Hat 6464.'
matches = re.findall(pattern, text)
print(matches)

['T', 'h', 'e', 'r', 'a', 't', 'i', 'n', 't', 'h', 'e', 'H', 'a', 't']


In [90]:
pattern = r'[.]'
text = 'asameer130@gmail.com'
matches = re.findall(pattern, text)
print(matches)

['.']


In [91]:
pattern = r'.'
text = 'asameer130@gmail.com'
matches = re.findall(pattern, text)
print(matches)

['a', 's', 'a', 'm', 'e', 'e', 'r', '1', '3', '0', '@', 'g', 'm', 'a', 'i', 'l', '.', 'c', 'o', 'm']


# Special Sequence in Regex


* In Python regular expressions, a "special sequence" refers to a backslash \ followed by a character that has a special meaning in the regular expression syntax. These special sequences are used to match specific types of characters or to represent special characters that cannot be matched directly.


* It makes easier to write a commonly used pattern.

| Special Sequences  | Meaning                                                                                    |
|--------------------|--------------------------------------------------------------------------------------------|
| \A                 | Matches if the string begins with the given character.
| \d                 | Matches any decimal digit character (equivalent to [0-9]).
|\D| Matches any non-digit character (equivalent to [^0-9]).
|\d+| Matches one or more decimal digits. For example, the regex pattern "\d+" would match any string with one or more digits.
|\w| Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
|\W| Matches any non-alphanumeric character (equivalent to [^a-zA-Z0-9_]).
|\s| Matches any whitespace character (equivalent to [ \t\n\r\f\v]).
|\S| Matches any non-whitespace character (equivalent to [^ \t\n\r\f\v]).
|\b| Matches a word boundary (the position between a word character and a non-word character).
|\B| Matches a non-word boundary.
|\n| Matches a newline character.
|\t| Matches a tab character.
|\Z| Matches if the string ends with the given regex.

# Use of Special Sequences

In [92]:
a = "Harry Potter"

### \A: Matches if the string begins with the given character. Similar to ^ (carret)

In [93]:
match = re.search(r'\AHar',a)
print(match)

<re.Match object; span=(0, 3), match='Har'>


In [94]:
# Pot is present in the string but output is none because \A only find the pattern at the beginning of a string. SImilar to ^.

match = re.search(r'\APot',a) 
print(match)

None


In [95]:
match = re.findall(r'\APot','Pot Pot jxdxnPot bsjsjPot Potjjs') 
print(match)

['Pot']


### \Z: Matches the end of a string. Similar to $ (Dollar Sigh)

In [96]:
pattern = r'world!\Z'
text = 'Hello, world! world!'
matches = re.findall(pattern, text)
print(matches)

['world!']


In [97]:
pattern = r'world!\Z'
text = 'world! hello'
matches = re.findall(pattern, text)
print(matches)

[]


### \d: Matches any decimal digit character and integer (equivalent to [0-9]).

In [98]:
match = re.search(r'\d','The number is 55 4.2a')
print(match)

<re.Match object; span=(14, 15), match='5'>


In [99]:
match = re.findall(r'\d','The number is 555 4.2a 7.5.6.5')
print(match)

['5', '5', '5', '4', '2', '7', '5', '6', '5']


### \d+: Matches one or more decimal digits. For example, the regex pattern "\d+" would match any string with one or more digits.

In [100]:
match = re.search(r'\d+','The number is 555 4.2a 555')
print(match)

<re.Match object; span=(14, 17), match='555'>


In [101]:
match = re.findall(r'\d+','The number is 555 4.2a 555')
print(match)

['555', '4', '2', '555']


### \D: Matches any non-digit character (equivalent to [^0-9]).

In [102]:
match = re.search(r'\D','The number is 55 4.2a')
print(match)

<re.Match object; span=(0, 1), match='T'>


In [103]:
match = re.findall(r'\D','The number is 55 4.2a')
print(match)

['T', 'h', 'e', ' ', 'n', 'u', 'm', 'b', 'e', 'r', ' ', 'i', 's', ' ', ' ', '.', 'a']


### \w: Matches any alphanumeric character (equivalent to [a-z,A-Z,0-9_]). Matching a word character:

In [104]:
match = re.search(r'\w','The number is 55 4.2a')
print(match)

<re.Match object; span=(0, 1), match='T'>


In [105]:
match = re.findall(r'\w','The number is 55 4.2a_$')
print(match)

['T', 'h', 'e', 'n', 'u', 'm', 'b', 'e', 'r', 'i', 's', '5', '5', '4', '2', 'a', '_']


### \W: Matches any non-alphanumeric character (equivalent to [^a-zA-Z0-9_]). Matching a non-word character.

In [106]:
match = re.search(r'\W','The number is 55 4.2a')
print(match)

<re.Match object; span=(3, 4), match=' '>


In [107]:
match = re.findall(r'\W','The number is 55 4.2a')
print(match)

[' ', ' ', ' ', ' ', '.']


### \s: Matches any whitespace character (equivalent to [ \t\n\r\f\v]). Matching a whitespace character.


In [108]:
match = re.search(r'\s','The number is 4.2a')
print(match)

<re.Match object; span=(3, 4), match=' '>


In [109]:
match = re.findall(r'\s','The number is 4.2a')
print(match)

[' ', ' ', ' ']


### \S: Matches any non-whitespace character (equivalent to [^ \t\n\r\f\v]). Matching a non-whitespace character.

In [110]:
match = re.search(r'\S','The number is 4.2a')
print(match)

<re.Match object; span=(0, 1), match='T'>


In [111]:
match = re.findall(r'\S','The number is 4.2a')
print(match)

['T', 'h', 'e', 'n', 'u', 'm', 'b', 'e', 'r', 'i', 's', '4', '.', '2', 'a']


### \b: Matches a word boundary (the beginning or end of a word).

In [112]:
# Jaha pe 'number' word ho output do

match = re.findall(r'\bnumber\b','The number is 4.2a')
print(match)

['number']


In [113]:
# Special Sequences ko use krne ke liye regular expression se phle r likha zaroori hai warna kabhi kabhi spl. sequences kaam nahi krti hai.

match = re.findall('\bnumber\b','The number is 4.2a')
print(match)

[]


In [114]:
# Double string lgane se fark nahi pdega r lagana important hai

match = re.findall("\bnumber\b",'The number is 4.2a')
print(match)

[]


In [115]:
# Jaha pe 'number' ho uske baad kuch bhi hi ho output do

match = re.findall(r'number\b','The number is 4.2a')
print(match)

['number']


In [116]:
# Phle kuch bhi ho uske baad 'number' ho, output do

match = re.findall(r'\bnumber','number is 4.2a')
print(match)

['number']


In [193]:
# Pura word match hoga toh output dega warna nahi

match = re.findall(r'\bnumb','The numb is 4.2a')
print(match)

[]


### \B: Matches a non-word boundary.

In [197]:
# Word ke andar 'cat' ho toh output do starting ya ending me ho toh output nahi dega.

pattern = r'\Bcat\B'
text = 'The acata in the hat cat.'    
matches = re.findall(pattern, text)
print(matches) 

['cat']


In [119]:
pattern = r'\Bcat\B'
text = 'The acat in the hat.'
matches = re.findall(pattern, text)
print(matches)

[]


In [120]:
pattern = r'\Bcat\B'
text = 'The cata in the hat.'
matches = re.findall(pattern, text)
print(matches)

[]


In [121]:
# Word ke andar pattern kitni bhi baar match ho output dega. 
# Starting ya ending me pattern match nahi krega. Only word ke andar pattern match krega.

pattern = r'\Bcat\B'
text = 'The acatcatcata in the hat.'
matches = re.findall(pattern, text)
print(matches)

['cat', 'cat', 'cat']


### \t: Matches a tab character.

In [122]:
pattern = r'\tWorld'
text = 'Hello\World, Hello\nWorld'
matches = re.findall(pattern, text)
print(matches)

[]


In [123]:
print('Hello World')

Hello World


In [124]:
print('Hello\tWorld')     

Hello	World


In [207]:
pattern = r'\tWorld'

text = 'Hello	World, Hi	World'

matches = re.findall(pattern, text)

print(matches)

['\tWorld', '\tWorld']


In [126]:
pattern = r'World'

text = 'Hello	World, Hi	World'

matches = re.findall(pattern, text)

print(matches)

['World', 'World']


In [127]:
pattern = r'Hello\tWorld'

text = 'Hello\tWorld, Hello\nWorld'

matches = re.findall(pattern, text)

print(matches)

['Hello\tWorld']


In [128]:
# Starting 'hello' se ho then 1 ya zyda baar space ho fir 'world' ho

pattern = r'Hello\s+World'

text = 'Hello  World, Hello\tWorld'

matches = re.findall(pattern, text)

print(matches)

['Hello  World', 'Hello\tWorld']


### \n: Matches a newline character.

In [129]:
print('Hello\nWorld')

Hello
World


In [130]:
pattern = r'Hello\nWorld'

text = 'Hello\tWorld, Hello\nWorld'

matches = re.findall(pattern, text)

print(matches) 

['Hello\nWorld']


# Functions in Regex 


* The re library in Python provides functions for working with regular expressions. These functions are use to perform various tasks.

 | Function | Meaning                                                                                    |
|----------|--------------------------------------------------------------------------------------------|
| re.search(pattern, string, flags=0)| Searches the string for the first occurrence of the pattern, and returns a Match object if found. If not found, returns None.
| re.compile(pattern, flags=0) | Compiles a regular expression pattern into a regular expression object, which can be used for more efficient repeated matching.
| re.match(pattern, string, flags=0) | Matches the pattern at the beginning of the string, and returns a Match object if found. If not found, returns None.
| re.findall(pattern, string, flags=0) | Finds all non-overlapping occurrences of the pattern in the string, and returns them as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.
|re.sub(pattern, repl, string, count=0, flags=0) | Replaces all occurrences of the pattern in the string with the replacement string repl, and returns the modified string.
| re.split(pattern, string, maxsplit=0, flags=0) |Splits the string at all occurrences of the pattern, and returns the parts as a list of strings.
| re.finditer(pattern, string, flags=0) |Returns an iterator that yields Match objects for each non-overlapping occurrence of the pattern in the string.
| re.subn(pattern, replacement, text) | subn() is similar to sub() in all ways, except in its way of providing output. It returns a tuple with a count of the total of replacement and the new string rather than just the string.
| re.escape(text) | Returns string with all non-alphanumerics backslashed, this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

### 1) re.search(pattern, string, flags=0): Searches the string for the first occurrence of the pattern, and returns a Match object if found. If not found, returns None.

In [131]:
text = 'Hello, world world world!'
pattern = r'world'
match = re.search(pattern, text)
print(match)

<re.Match object; span=(7, 12), match='world'>


In [132]:
text = 'Hello, world world world!'
pattern = r'world'
match = re.search(pattern, text)
print(match.group())

world


In [133]:
text = 'Hello, world world world!'
pattern = r'hat'
match = re.search(pattern, text)
print(match)

None


### 2) re.findall(pattern, string, flags=0): Finds all non-overlapping occurrences of the pattern in the string, and returns them as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

In [134]:
text = 'Hello, world world world!'
pattern = r'world'
match = re.findall(pattern, text)
print(match)

['world', 'world', 'world']


In [135]:
text = 'Hello, world world world!'
pattern = r'hat'
match = re.findall(pattern, text)
print(match)

[]


In [136]:
text = 'The quick brown fox jumps over the lazy dog.'
pattern = r'\b\w{4}\b'
matches = re.findall(pattern, text)
print(matches)

['over', 'lazy']


In [137]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'\d+'

match = re.findall(pattern, text)

print(match)

['89', '90', '70']


In [138]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'[A-Z]'

match = re.findall(pattern, text)

print(match)

['J', 'L', 'D']


In [139]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'[A-Z][a-z]'

match = re.findall(pattern, text)

print(match)

['Jo', 'Li', 'Da']


In [140]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'[A-Z][a-z]*'

match = re.findall(pattern, text)

print(match)

['John', 'Lisa', 'David']


In [141]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'[A-Z][a-z].*'

match = re.findall(pattern, text)

print(match)

['John has scored 89 marks', 'Lisa has scored 90 marks', 'David has scored 70 marks']


### 3) re.match(): Searches for a match at the beginning of the string and returns a match object if found. Returns None otherwise.

In [142]:
text = 'Hello, world world world!'
pattern = r'Hello'
match = re.match(pattern, text)
print(match)

<re.Match object; span=(0, 5), match='Hello'>


In [143]:
text = 'Hello, world world world!'
pattern = r'world'
match = re.match(pattern, text)
print(match)

None


### 4) re.sub(pattern, repl, string, count=0, flags=0): Substitutes the matched string(s) with a replacement string.

In [208]:
text = 'Hello, world world!'

pattern = r'world'

replacement = 'Python'

new_text = re.sub(pattern, replacement, text)

print(new_text)

Hello, Python Python!


In [145]:
match = re.sub(r'world','Python','Hello, world!')

print(match)

Hello, Python!


In [146]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'\s'

replacement = ''

match = re.sub(pattern, replacement, text)

print(match)

Johnhasscored89marksLisahasscored90marksDavidhasscored70marks


### 5) subn() is similar to sub() in all ways, except in its way of providing output. It returns a tuple with a count of the total of replacement and the new string rather than just the string.

In [210]:
text = 'Hello, world!'

pattern = r'world'

replacement = 'Python'

new_text = re.subn(pattern, replacement, text)

print(new_text)

('Hello, Python!', 1)


In [148]:
match = re.subn(r'world','Python','Hello, world world world!')

print(match)

('Hello, Python Python Python!', 3)


### 6) re.split(pattern, string, maxsplit=0, flags=0): Splits the text into a list of substrings at each match of the pattern.

In [149]:
text = 'The quick brown fox jumps quick over the lazy dog.'
pattern = r'quick'
words = re.split(pattern, text)
print(words)

['The ', ' brown fox jumps ', ' over the lazy dog.']


In [150]:
text = 'The quick brown fox jumps quick over the lazy dog.'
pattern = r'quick'
words = re.split(pattern, text,maxsplit = 1)
print(words)

['The ', ' brown fox jumps quick over the lazy dog.']


In [151]:
text = 'The quick brown fox jumps quick over the lazy dog.'
pattern = r'quick'
words = re.split(pattern, text,maxsplit = 4)
print(words)

['The ', ' brown fox jumps ', ' over the lazy dog.']


### 7) re.compile(pattern, flags=0):  Compiles a regular expression pattern into a regular expression object, which can be used for more efficient repeated matching.

In [152]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = re.compile(r'[a-d]')

match = re.findall(pattern, text)

print(match)

['a', 'c', 'd', 'a', 'a', 'a', 'c', 'd', 'a', 'a', 'd', 'a', 'c', 'd', 'a']


In [212]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = re.compile(r'[0-9]')

match = re.findall(pattern, text)

print(match)

['8', '9', '9', '0', '7', '0']


In [154]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = re.compile(r'[0-9]+')

match = re.findall(pattern, text)

print(match)

['89', '90', '70']


In [155]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = re.compile(r'[0-9]*')

match = re.findall(pattern, text)

print(match)

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '89', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '90', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '70', '', '', '', '', '', '', '']


In [156]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'[0-9]*'

match = re.findall(pattern, text)

print(match)

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '89', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '90', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '70', '', '', '', '', '', '', '']


In [157]:
text = """John has scored 89 marks

Lisa has scored 90 marks

David has scored 70 marks"""

pattern = r'[a-d]'

match = re.findall(pattern, text)

print(match)

['a', 'c', 'd', 'a', 'a', 'a', 'c', 'd', 'a', 'a', 'd', 'a', 'c', 'd', 'a']


In [158]:
match = re.compile(r'[a-d]','abcd')
print(match)

TypeError: unsupported operand type(s) for &: 'str' and 'int'

### 8) re.escape(text): Returns string with all non-alphanumerics backslashed, this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

In [220]:
text = """John has scored 89 marks.

Lisa has $scored 90 marks.

David has scored 70 marks."""

match = re.escape(text)

print(match)

John\ has\ scored\ 89\ marks\.\
\
Lisa\ has\ \$scored\ 90\ marks\.\
\
David\ has\ scored\ 70\ marks\.


In [160]:
text = 'nhanck@.$*.'

match = re.escape(text)

print(match)

nhanck@\.\$\*\.


### 9) re.finditer(pattern, string, flags=0): Returns an iterator yielding match objects for all non-overlapping matches of a regular expression pattern in the input string.

In [161]:
pattern = r'\d+'  

text = 'The quick brown fox jumps over the lazy dog 123 times.'

match = re.finditer(pattern, text)

print(match)

<callable_iterator object at 0x00000208B28030A0>


In [221]:
pattern = r'\d+'  # Matches one or more digits

text = 'The quick brown fox jumps 85 over the lazy dog 123 times.'

# Use re.finditer() to find all matches
matches = re.finditer(pattern, text)

# Iterate over the matches and print their start and end positions
for match in matches:
    print('Match found:', match.group(), 'at position:', match.start(), '-', match.end())

Match found: 85 at position: 26 - 28
Match found: 123 at position: 47 - 50


In [163]:
pattern = r'\d+|[a-b]'  # Matches one or more digits

text = 'The quick brown fox jumps over the lazy dog 123 times.'

# Use re.finditer() to find all matches
matches = re.finditer(pattern, text)

# Iterate over the matches and print their start and end positions
for match in matches:
    print('Match found:', match.group(), 'at position:', match.start(), '-', match.end())

Match found: b at position: 10 - 11
Match found: a at position: 36 - 37
Match found: 123 at position: 44 - 47


# Greedy VS Non-Greedy Approach / Regex

## Greedy Regex

In regular expressions, metacharacters like *, +, and { } are greedy by default, which means they try to match as many characters as possible. 

### Non-Greedy Regex

Non-greedy, match as few characters as possible. You can make a greedy regex to non-greedy by appending a ? to it.

In [164]:
# Greedy approach

text = 'saaabbb jhhjbj aapp'

pattern = r'ab*' # 'a' 1 baar aaye or 'b' kitni bhi baar aaye or jaha pe 'b' ho usse phle 'a' zaroor aaye, aisa output dega

match = re.findall(pattern, text)

print(match)

['a', 'a', 'abbb', 'a', 'a']


In [165]:
# Non-Greedy approach

text = 'saaabbb aapp b aabb ab'

pattern = r'ab?' # 'a' 1 baar aaye 'b' bhi 1 baar aaye or jaha pe 'b' aaye usse phle 'a' zaroor ho.
match = re.findall(pattern, text)

print(match)

['a', 'a', 'ab', 'a', 'a', 'a', 'ab', 'ab']


In [166]:
# Greedy approach

text = 'aaabbbbbb'

pattern = r'ab{3,5}' # 'a' 1 baar aaye or 'b' 3-5 baar aaye

match = re.findall(pattern, text)

print(match)

['abbbbb']


In [167]:
# Non-Greedy approach

text = 'aaabbbbbb'

pattern = r'ab{3,5}?' # 'a' 1 baar aaye or 'b' 3 baar aaye

match = re.findall(pattern, text)

print(match)

['abbb']


In [168]:
# Greedy approach

text = 'A <div>foo</div> bar <div>baz</div> qux'

pattern = r'<div>.*</div>' # jaha bhi '<div>' ho uske baad kuch bhi ho kitni bhi baar ho output do

matches = re.findall(pattern, text)

print(matches)

['<div>foo</div> bar <div>baz</div>']


In [169]:
# Non-greedy approach

text = 'A <div>foo</div> bar <div>baz</div> qux'

pattern = r'<div>.*?</div>' # jaha bhi '<div>' ho uske baad kuch bhi ho kitni bhi baar, kya aisa pattern hai string me? or uske 
                            # baad fir '<div>' ho.
matches = re.findall(pattern, text)

print(matches)

['<div>foo</div>', '<div>baz</div>']


In [170]:
# Greedy approach

text = "<HTML><TITLE>My Page</TITLE></HTML>"

pattern = r'<.*>' # starting '<' ho ending '>' ho beech me kuch ho kitni bhi baar ho

matches = re.findall(pattern, text)

print(matches)

['<HTML><TITLE>My Page</TITLE></HTML>']


In [171]:
# Non-Greedy approach

text = "<HTML><TITLE>My Page</TITLE></HTML>"

pattern = r'<.*?>' # starting '<' ho beech me kuch bhi ho kitni hi baar ho, kya aisa pattern hai apne pass or last me '>' ho
matches = re.findall(pattern, text)

print(matches)

['<HTML>', '<TITLE>', '</TITLE>', '</HTML>']


In [172]:
# Thing to be note:- be sure that where to use ? in between pattern otherwise it will not give the result you want

text = "<HTML><TITLE>My Page</TITLE></HTML>"

pattern = r'<.*>?' # starting '<' ho beech me kuch bhi ho kitni hi baar ho, kya aisa pattern hai apne pass or last me '>' ho
matches = re.findall(pattern, text)

print(matches)

['<HTML><TITLE>My Page</TITLE></HTML>']


# Match Object


* A match object contains all the information about the search and the result and if there is no match found then, None will be returned.


* i.e: <re.Match object; span=(0, 6), match='sameer'>

In [173]:
text = "John has scored 98 marks."

pattern = r'\d+'

match = re.search(pattern, text)

# It will give the output that whether the pattern is found or not. 
# If found, then it will give their index location, where the pattern is found in the string.
# Also output the the result, what it found by given regex pattern.

print(match) 

<re.Match object; span=(16, 18), match='98'>


In [174]:
# '.re' use to search the which regex has been used to search the pattern

print(match.re)

re.compile('\\d+')


In [175]:
# '.string' is use to search the string or text of the given pattern.

print(match.string)

John has scored 98 marks.


In [176]:
# '.start()' is use to find the starting location where the pattern is found

print(match.start())

16


In [177]:
# '.end()' is use to find the ending location where the pattern is found

print(match.end())

18


In [178]:
# '.span' is used to search the index location where the pattern is found.

print(match.span())

(16, 18)


In [179]:
# '.group()' is used to search that what our regex is found by given pattern in the string.

print(match.group())

98


# Phone Number and Email Verification and Web Scrapping using Regex.

In [223]:
# How to verify Phone Numbers.

phn_num = ['552-782-1495', '547-325-7456', '785-a65-8525', '513-5411-526']

pattern = r'^\d{3}-\d{3}-\d{4}$'

for phone_number in phn_num:
    match = re.match(pattern, phone_number)
    if match:
        print(f"{phone_number} :- is a valid phone number.\n")
    else:
        print(f"{phone_number} :- is not a valid phone number.\n")

552-782-1495 :- is a valid phone number.

547-325-7456 :- is a valid phone number.

785-a65-8525 :- is not a valid phone number.

513-5411-526 :- is not a valid phone number.



In [181]:
# How to verify email address

emails = ['random.guy123@gmail.com', 'mr_x_in_bombay@gov.in', '1@ued.org','@gmail.com','abc!@yahoo.in', 'sam_12@gov.us',
          'neeraj@']

pattern = r'^.*@.*\.(com|in)$' # This regex is right but usefull but not everytime.


for email in emails:
    match = re.match(pattern, email)
    if match:
        print(f"{email} :- is a valid email.\n")
    else:
        print(f"{email} :- is not a valid is a valid email.\n")

random.guy123@gmail.com :- is a valid email.

mr_x_in_bombay@gov.in :- is a valid email.

1@ued.org :- is not a valid is a valid email.

@gmail.com :- is a valid email.

abc!@yahoo.in :- is a valid email.

sam_12@gov.us :- is not a valid is a valid email.

neeraj@ :- is not a valid is a valid email.



In [182]:
# Emails that should match: random.guy123@gmail.com, mr_x_in_bombay@gov.in


emails = ['random.guy123@gmail.com', 'mr_x_in_bombay@gov.in', '1@ued.org','@gmail.com','abc!@yahoo.in', 'sam_12@gov.us',
          'neeraj@']

pattern = r'^\S{1,}@.*\.(com|in)$'

for email in emails:
    match = re.match(pattern, email)
    if match:
        print(f"{email} :- is a valid email.\n")
    else:
        print(f"{email} :- is not a valid is a valid email.\n")

random.guy123@gmail.com :- is a valid email.

mr_x_in_bombay@gov.in :- is a valid email.

1@ued.org :- is not a valid is a valid email.

@gmail.com :- is not a valid is a valid email.

abc!@yahoo.in :- is a valid email.

sam_12@gov.us :- is not a valid is a valid email.

neeraj@ :- is not a valid is a valid email.



In [183]:
# Emails that should match: random.guy123@gmail.com, mr_x_in_bombay@gov.in


emails = ['random.guy123@gmail.com', 'mr_x_in_bombay@gov.in', '1@ued.org','@gmail.com','abc!@yahoo.in', 'sam_12@gov.us',
          'neeraj@']

pattern = r'[\w.%_].*@.*\.(com|in)$'

for email in emails:
    match = re.match(pattern, email)
    if match:
        print(f"{email} :- is a valid email.\n")
    else:
        print(f"{email} :- is not a valid is a valid email.\n")

random.guy123@gmail.com :- is a valid email.

mr_x_in_bombay@gov.in :- is a valid email.

1@ued.org :- is not a valid is a valid email.

@gmail.com :- is not a valid is a valid email.

abc!@yahoo.in :- is a valid email.

sam_12@gov.us :- is not a valid is a valid email.

neeraj@ :- is not a valid is a valid email.



### Web Scrapping using Regex

In [184]:
import requests

In [185]:
url = 'https://www.fakepersongenerator.com/random-address.html'
response = requests.get(url)
html_text = response.text

# Extract all phone numbers from the HTML
phone_numbers = re.findall(r'\d{3}-\d{3}-\d{4}', html_text)

# Print the results
for i in phone_numbers:
    print(i)

630-860-8836
708-288-0613


In [237]:
url = 'https://anonymsms.com/fake-phone-number/'

response = requests.get(url)

html_text = response.text    

# Extract all phone numbers from the HTML
phone_numbers = re.findall(r'[+]\d{12}', html_text)

# Print the results
for i in phone_numbers:
    print('\n', i)


 +447876517058

 +447876517058

 +447553799295

 +447553799295

 +995571013851

 +995571013851

 +447774693639

 +447774693639

 +447553873158

 +447553873158

 +380966218950

 +380966218950

 +447388189052

 +447388189052


In [186]:
url = 'https://usaddressgenerator.com/'
response = requests.get(url)
html_text = response.text

# Extract all phone numbers from the HTML
phone_numbers = re.findall(r'\(\d{3}\) \d{3}-\d{4}', html_text)

# Print the results
for i in phone_numbers:
    print(i)

(206) 463-5153
(207) 774-6884
(207) 774-6884
(208) 468-9709
(208) 489-3101
(208) 726-8729
(208) 726-8729
(208) 734-9611
(208) 734-9611
(951) 487-8914
(805) 643-9424
(805) 529-1832
(530) 275-4140
(530) 257-0914
(415) 467-3419
(760) 771-1781
(909) 267-9440
(714) 961-6811
(714) 826-8310
(951) 699-2321
(760) 637-2644
(714) 484-6711
(916) 415-0893
(760) 636-5648
(619) 283-2035
(805) 529-0264
(760) 200-3316
(661) 878-8389
(760) 379-2780
(805) 569-2622
(510) 888-9108
(805) 938-5595
(916) 339-2704
(310) 370-3228
(805) 733-0295
(303) 494-5435
(307) 576-2224
(307) 734-5242
(317) 440-5897
(317) 440-5897
(406) 844-3525
(410) 727-8665
(435) 257-2404
(435) 673-4805
(208) 752-4051
(208) 746-5727
(208) 624-7695
(208) 347-2820
(208) 459-6587
(208) 743-8191
(208) 836-5708
(208) 362-0456
(208) 773-4079
(303) 494-5435
(307) 576-2224
(307) 734-5242
(317) 440-5897
(317) 440-5897
(406) 844-3525
(410) 727-8665
(435) 257-2404
(435) 673-4805
(805) 497-6611
(907) 267-1712
(907) 267-1712
(916) 536-1262
(951) 679-