# Regular Expression

A regular expression, sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.

In [1]:
import re

### Try to match Phone Number

There are 4 types of phone numbers i mentioned in the chat, we need to create regular expression that will matches all:
- 03090023904
- 0309-9923209
- (042) 12354900
- (042)12540090

In [2]:
chat = "My phone number::: === 03091135109 or (032) 73345918 or 0308-1292891 or (042)12393055"

In [3]:
pattern = "\d{11}|\(\d{3}\) \d{8}|\d{4}-\d{7}|\(\d{3}\)\d{8}"
re.findall(pattern, chat)

['03091135109', '(032) 73345918', '0308-1292891', '(042)12393055']

It successfully matches all the `Phone Numbers`

### Try to match Email

In [4]:
chat = "My email can be :: abc@gmail.com :: abdullahkhan2265917@gmail.io :: hamza_09.khan@gmail.com :: miscrosot@outlook.com"

In [5]:
pattern = "[0-9A-Za-z._]+@[a-zA-Z0-9]+\.[a-zA-Z]+"
re.findall(pattern, chat)

['abc@gmail.com',
 'abdullahkhan2265917@gmail.io',
 'hamza_09.khan@gmail.com',
 'miscrosot@outlook.com']

It successfully matches all the `Emails`

### How chatbots identify order number and email from user text

Customer can present their text in many forms, out target is capture phone number and email from text.

In [12]:
chat1 = "Hello, my order # 12345 and email: abdullahkhan4465918@outlook.com is here. Resolve the issue in product."
chat2 = "I have a problem with my order number 12900"
chat3 = "My order 34511 is showing some issues, i pay 300$ for it. Here is my microsoft email abcdef@microsoft.io"

In [13]:
pattern_phone = "order[^\d]*(\d+)"
pattern_email = "[0-9A-Za-z._]+@[a-zA-Z0-9]+\.[a-zA-Z]+"

In [15]:
for chat in [chat1, chat2, chat3]:
    match = re.findall(pattern_phone, chat)
    match2 = re.findall(pattern_email, chat)
    print(f"{match} == {match2}")
    print("")

['12345'] == ['abdullahkhan4465918@outlook.com']

['12900'] == []

['34511'] == ['abcdef@microsoft.io']



# Practice through ChatGPT

**Question 01**: Text: "Date: 2022-02-25"

**What to match**: The date in the format YYYY-MM-DD.

In [16]:
chat = "Date can be ::: 2022-02-25 ::: 1989-5-9"
pattern = "\d{4}-\d{1,2}-\d{1,2}"
re.findall(pattern, chat)

['2022-02-25', '1989-5-9']

**Question 02**: Text: "The quick brown fox jumps over the lazy dog."

**What to match**: Any word starting with the letter "q."

In [17]:
chat = "Words ::: quiet, cute ::: hello,quarter ::: make the food quickly"
pattern = "q[a-zA-Z0-9]+"
re.findall(pattern, chat)

['quiet', 'quarter', 'quickly']

**Question 03**: Text: "Product code: ABC-123-XYZ"

**What to match**: The middle part of the product code (123).

In [18]:
chat = "resolve my issue. New Product code is this fEQ-556-Edd. Previous was eef-120-eif"
pattern = "\D{3}-(\d{3})-\D{3}"
re.findall(pattern, chat)

['556', '120']

**Question 04**: Text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit."

**What to match**: Any word ending with "ing."


In [19]:
chat = "I am waiting for someone. I sure he will do swimming right now."
pattern = "[a-zA-Z0-9]*ing"
re.findall(pattern, chat)

['waiting', 'swimming']

**Question 05**:

In [20]:
# Text: "<div class='content'>Some text here</div>, <p>More text</p>"
# What to match: The content inside the HTML tags.

chat = "<div class='content'>Some text here</div>, <p>More text</p>"
pattern = "<[a-zA-Z0-9 ;:='\"-]*>([a-zA-Z0-9 ]*)<"
re.findall(pattern, chat)

['Some text here', 'More text']

<p style="color:green;font-weight:900;">End of Code!</p>