<a href="https://colab.research.google.com/github/Karthikraja131/Python/blob/main/python_Regex.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Regex
* Regex, short for Regular Expressions, is a powerful tool used in computer science and programming for searching, manipulating, and validating strings of text based on patterns.

* In Python, the **re** module provides support for regular expressions.

* **Pattern Matching:**  Regular expressions define a pattern that can be used to match character combinations in strings.
 For example, **\d matches any digit, \w matches any alphanumeric character, and . matches any character except a newline.

 * **Special Characters:** Regular expressions use special characters to represent different types of patterns. For example, **^** matches the start of a string, **$** matches the end of a string, * matches zero or more occurrences of the preceding character, **+** matches one or more occurrences, and **?** matches zero or one occurrence.

 * **Character Classes:** Character classes allow you to match specific sets of characters. For example, **[a-z]** matches any lowercase letter, **[0-9]** matches any digit, and **[^a-z]** matches any character that is not a lowercase letter.

 * **Quantifiers:** Quantifiers specify **how many occurrences** of a character or group should be matched. For example, **{n}** matches exactly n occurrences, **{n,}** matches at least n occurrences, and **{n,m}** matches between n and m occurrences.

 * **Grouping and Capturing: Parentheses ()**  are used to group characters together, and also for capturing groups. Capturing groups allow you to extract parts of a matched string.

 * **Flags:** Flags modify the behavior of regular expressions. For example, the re.IGNORECASE flag can be used to perform case-insensitive matching.




[a-z]  - a to z

[A-Z]  -  A to Z

[a-cA-C0-9]  - a to c, A to C and 0 to 9

{ }  - number of occurances

{4}  - 4 occurances

{4,6}  - 4 to 6 occurances

r'  - raw data

\s  - space

\d  - digits

\w  - AlphaNumeric

##Important Functions

The Python `re` module provides support for regular expressions. Some of the important functions in the `re` module are:

**1. `re.compile(pattern, flags=0)`:** This function compiles a regular expression **pattern** into a regular expression object, which can then be **used for matching.**

**2. `re.search(pattern, string, flags=0)`:** This function searches for the **first occurrence of a pattern** within a string and **returns a match object if found. It returns `None` if no match is found.**

**3. `re.match(pattern, string, flags=0)`:** This function **matches a pattern only at the beginning of the string.** It returns a match object if the pattern is found at the start of the string; otherwise, it returns `None`.

**4. `re.findall(pattern, string, flags=0)`:** This function **finds all occurrences** of a pattern in a string and returns them as a list of strings.

**5. `re.finditer(pattern, string, flags=0)`:** This function finds all occurrences of a pattern in a string and **returns an iterator** that yields match objects for each match.

**6. `re.sub(pattern, repl, string, count=0, flags=0)`:** This function **replaces occurrences of a pattern in a string with a replacement string** (`repl`). The `count` parameter specifies the maximum number of substitutions to perform (default is 0, meaning all occurrences will be replaced).

**7. `re.split(pattern, string, maxsplit=0, flags=0)`:** This function splits a string into substrings using a regular expression pattern as the delimiter. The `maxsplit` parameter specifies the maximum number of splits to perform.

These are some of the key functions in the `re` module for working with regular expressions in Python. Regular expressions offer powerful tools for pattern matching and text manipulation tasks.

## Pro Tips

* If one condition is passed and if there is other condition also to be processed to pass further, we can go for the nested loops.

# To find the Mobile number pattern from the

given string.

```
# This is formatted as code
```



In [None]:
import re

mobileNumberPattern = re.compile("(0|91)?[6-9][0-9]{9}")
result=mobileNumberPattern.search("My mobine number is 8825417088")
print(result)

# (0|91)? is used to find the first '0' or(|) '91' ? is if they present one time or none.
# [6-9] is used to find the First one digit is from 6 to 9.
# [0-9]{9} is used to find the next 9 digits from 0 to 9.

<re.Match object; span=(20, 30), match='8825417088'>


* To find the Mobile number pattern from the given string by using function

In [None]:
import re
def mobileNumberPresent(text):
  mnoPattern = re.compile("(0|91)?[6-9][0-9]{9}")
  return mnoPattern.search(text)

sentence = input("Enter the sentence:")
result = mobileNumberPresent(sentence)
print(result)
if result==None:
  print("No is not present")
else:
  print("No is present")


Enter the sentence:My mobile number is 8852417088
<re.Match object; span=(20, 30), match='8852417088'>
No is present


Another structure

In [None]:
import re
def mobileNumberPresent(text):
  mnoPattern = re.compile("\d{3}-\d{3}-\d{4}")
  return mnoPattern.search(text)

sentence = input("Enter the sentence:")
result = mobileNumberPresent(sentence)
if result==None:
  print("No is not present")
else:
  print("No is present")

Enter the sentence:123-321-5436
No is present


To find the telephone number for different locations

9144-22590000 : Chennai

91462-2521234  : Tirunalveli

9140-23456789  : Hydrabad

914562-212121   : Sivakasi

Obsurve the tele numbers first part is 4 to 6 digits stainting 91 and there is hyphen and then 6 to 8 digits but staritng 1 to 9.

In [None]:
import re
def telePhoneNumber(text):
  telePattern = re.compile(r'((91)[0-9]{2,4})-([1-9][0-9]{5,8})')
  return telePattern.search(text)
sentence = input('Enter the tele phone number: ')
result= telePhoneNumber(sentence)
print(result)
if result == None:
  print("Tele Phone Number is not present ")
else:
  print("Tele Phone Number is present ")


Enter the tele phone number: my mobile number is 8825417080
None
Tele Phone Number is not present 


*  re.groups()   : it gives the values as seperate groups as per compile.

In [None]:
import re
def telePhoneNumber(text):
  telePattern = re.compile(r'(91)((\d){2,4})-([1-9][0-9]{5,8})')
  result= telePattern.search(text)
  print(result.groups())
  return result
text= "My telePhone number is   911252-521129"
presentNumber=  telePhoneNumber(text)
#print(presentNumber.groups())
print(presentNumber.group(1))
print(presentNumber.group(2))
print(presentNumber.group(3))
print(presentNumber.group(4))
print(presentNumber.groups(1))



91
1252
2
521129
('91', '1252', '2', '521129')


What if there are morethan One Mobile number.

re.findall()

In [None]:
import re

def isMobileNumbers(text):
  pattern = re.compile(r"(91[6-9][0-9]{9})")
  number=pattern.findall(text)
  print(type(number))
  print((number))
  return number


sentence=" My mobile numburs are 8825417088,918880251241 and 918015219426"
result=isMobileNumbers(sentence)
#print(result)
for numbers in result:
  print(numbers)

<class 'list'>
['918880251241', '918015219426']
918880251241
918015219426


To find the Birthday pattern

In [None]:
import re

def Bdy(text):
  pattern = re.compile(r'([0-3]?[0-9])/([0-1][0-2])/([0-2][0-9]{3})')
  Bdy = pattern.findall(text)
  return Bdy


sentence = input("Enter the text for Bdy finding : ")
print(sentence)
#for Bdy in sentence:
#  print(Bdy)

Enter the text for Bdy finding : 31/05/1995
31/05/1995


In [None]:
# Validate the given date
import re

datePattern = re.compile(r'([0-3][0-9])/([0-1][0-9])/((\d){4})')
dateFound = datePattern.search("My birth date is 31/02/1995")
if not dateFound is None:
  print(dateFound.groups())
  day = dateFound.group(1)
  month = dateFound.group(2)
  year = dateFound.group(3)
  print(day)
  print(month)
  print(year)

  if (month) in ["01","03","05","07","08","10","12"]:
    if int(day) >31:
      print("Invalid date and it can't be morethan 31 for the months : Jan, Mar, May, July, Agust, Oct and Dec ")
  elif month in ['04','06','09','11']:
    if day >30:
      print("Invalid date and it can't be morethan 30 for the months : April, June, Sep and Nov" )
  else:
    if int(day) >29:
      y=int(year)
      if not ((y%4==0) and (y%100==0) and (y%400==0)):
        print("Invalid date and it can't be morethan 28 for the Feb")
else:
  print("Invalid date")




('31', '02', '1995', '5')
31
02
1995
Invalid date and it can't be morethan 28 for the Feb


**Password Checker**
* Password should be min 8 char

* It should atleast one Capital letter

* It should be atleast one Numeric value


In [None]:
import re

def PasswordChecker(password):
  if len(password)<8:
    print("The password should be minimum 8 Charectors")
  else:
    if re.search('[a-z]',password) and re.search('[A-Z]',password) and re.search('[0-9]',password):
       print("Strong Password")
    else:
      print("Password should have Both combination of upper lower characters and Numeric values ")


password= 'Raja1235'
PasswordChecker(password)

Strong Password


In [None]:

def PasswordChecker(password):
  if len(password)<8:
    print("The password should be minimum 8 Charectors")
  else:
    if re.search('[a-z]',password) and re.search('[A-Z]',password) and re.search('[0-9]',password):
       print("Strong Password")
    else:
      print("Password should have Both combination of upper lower characters and Numeric values ")


password= 'raja5231'
PasswordChecker(password)


Password should have Both combination of upper lower characters and Numeric values 


**Word counter**

```
# This is formatted as code
```



In [None]:
import re
def WordCounter(sentence):
  pattern=re.compile(r'\s')           # \s = space
  count=pattern.findall(sentence)
  return len(count)+1

paragraph = "My name is Karthikraja, i am doind Data scientist Job"
result= WordCounter(paragraph)
print("The Paragraph contains the",result,"number of words")

The Paragraph contains the 10 number of words


**Sentence Counter**

In [1]:
import re
def SentenceCounter(sentence):               #3
  pattern = re.compile(r'\.')               #4
  count = pattern.findall(sentence)         #5
  return len(count)                         #6

paragraph = " I like to do AI models because i love data science. I also like the civil Engineer."    #1
result = SentenceCounter(paragraph)     #2
print("The total number of Paragraph is",result)    #7

The total number of Paragraph is 2


##Function sub##
* re.sub(pattern, repl, string, count=0, flags=0): This function replaces occurrences of a pattern in a string with a replacement string (repl). The count parameter specifies the maximum number of substitutions to perform (default is 0, meaning all occurrences will be replaced).*italicized text*

In [2]:
#By using sub function
import re
result = re.sub('[a-z]','$',"1a2b3c4d5e")
print(result)

1$2$3$4$5$


In [3]:
# using count flag
import re
result = re.sub('[a-z]','$',"1a2b3c4d5e",count=2)
print(result)

1$2$3c4d5e


In [6]:
# using a text variable
import re
text = '1ka2r3th4thi5k6ra7ja'
result = re.sub('[a-z]','$',text)
print(result)

1$$2$3$$4$$$5$6$$7$$


In [9]:
# using a text variable
import re
text = '1ka2r3th4thi5k6ra7ja'
result = re.sub('[a-z]','@',text)
print(result)

1@@2@3@@4@@@5@6@@7@@


## Usig subn function##


In [10]:
#By using 'subn' function
import re
result = re.subn('[a-z]','$',"1a2b3c4d5e")       # so it will give replaced number of the count.
print(result)

('1$2$3$4$5$', 5)


In [14]:
print(result[0])
print(result[1])

1$2$3$4$5$
5


##split function

*  re.split(pattern, string, maxsplit=0, flags=0): This function splits a string into substrings using a regular expression pattern as the delimiter. The maxsplit parameter specifies the maximum number of splits to perform.

In [21]:
# example program
import re
text = 'Bangalore-560076'
result= re.split('-',text)
print(result)
for i in result:
  print(i)


['Bangalore', '560076']
Bangalore
560076


## Find the specific types of strngs or objects
*  $  -  used to find the end of the para

*  ^  -  used to find the begining of the para

In [28]:
import re
text = 'today is the good friday'
result = re.search('day',text)    # this will gives the only the word is present or not.
print(result)

if result==None:
    print('The word is not present')
else:
    print('The word is present')

<re.Match object; span=(2, 5), match='day'>
The word is present


In [29]:
  # to find the given word is presented in the last word or paragraph
import re
text = 'today is the good friday'
result = re.search('day$',text)

if result==None:
    print('The word is not present')
else:
    print('The word is present at the end of the para')

The word is present at the end of the para


In [60]:
# To find the starting word of the Paragraph
import re
text = 'today is the day good fri'
result = re.search('^today',text)

if result==None:
    print('The word is  present at the starting')
else:
    print('The word is NOT present at the starting of the para')

The word is NOT present at the starting of the para


In [52]:
# Find the first word boundary (\b) and capture one or more word characters (\w+)

import re

text = "This is a string with a starting word."

match = re.search(r"\b\w+", text)

if match:
  starting_word = match.group()
  print("Starting word:", starting_word)
else:
  print("No starting word found")


Starting word: This


In [55]:
# Match exact word at the beginning with ^

import re
text = "This is a string with a starting word."
specific_word = "This"  # Replace with your desired word

match = re.search(rf"^{specific_word}", text)

if match:
  print(f"The specific word '{specific_word}' is present at the starting point.")
else:
  print(f"The specific word '{specific_word}' is not found at the starting point.")


The specific word 'Dhis' is not found at the starting point.
