# 1. Groupings

We use groupings to create subpatterns within a larger pattern to match, capture, and backreference text. There are two types of groupings in regex: 
capturing groups that capture the matched subpatterns and save them for referencing later in the pattern or in a replacement string, and non-capturing
groups that we use to group subpatterns without storing them for later use. We represent capturing groups using parentheses (), and non-capturing groups 
using (?:).

In [6]:
# Read the necessary dataset
import pandas as pd


df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/reviews_uk.csv")
df.head()

Unnamed: 0,review_id,text
0,txt1,"I recently visited London, and the British Mus..."
1,txt2,"While exploring Edinburgh, I had the chance to..."
2,txt3,"During my stay in Oxford, I attended lectures ..."
3,txt4,I watched a play at Shakespeare's Globe Theatr...
4,txt5,"My favorite British author is Charles Dickens,..."


In [7]:

text = "\n ".join(df['text'])
print(text)

I recently visited London, and the British Museum and Buckingham Palace were among the highlights of my trip.
 While exploring Edinburgh, I had the chance to try traditional haggis at a local restaurant and visit the historic Edinburgh Castle.
 During my stay in Oxford, I attended lectures at the prestigious University of Oxford and explored the charming Bodleian Library.
 I watched a play at Shakespeare's Globe Theatre in London, and the performance was outstanding.
 My favorite British author is Charles Dickens, and his novel 'Great Expectations' is a literary masterpiece.
 I traveled to Manchester, and the ancient stone circle left me in awe of its mysterious history.
 During my trip to Scotland, I explored the scenic Highlands and Loch Ness, hoping to catch a glimpse of the elusive Loch Ness Monster.
 I enjoyed a traditional afternoon tea in Windsor, where I had scones with clotted cream and visited Windsor Castle.
 I had a delightful fish and chips meal in a quaint English village

In [9]:
"""
We define the author_regex regular expression pattern to match the phrase author is followed by a word (\w+). We use the re.compile() function to compile
the regular expression pattern into a regular expression object for later use in matching operations. We then use author_regex.search(text) to search
for a match in text and store the result in author_match.

We define and compile another regular expression pattern called location_regex to match the phrase Shakespeare's Globe Theatre in followed by a
word (\w+), and use location_regex.search(text) to search for a match in text and store the result in location_match.

We define and compile a regular expression pattern travel_regex to match either the word visited or visit or the phrase traveled to , and 
use color_regex.findall(text) to find all non-overlapping matches in text and store the results in travel_matches.

"""
import re

author_regex = re.compile(r"author is (\w+)")
author_match = author_regex.search(text)
location_regex = re.compile(r"Shakespeare's Globe Theatre in (\w+)")
location_match = location_regex.search(text)
travel_regex = re.compile(r"(?:visited|visit|traveled to)")
travel_matches = travel_regex.findall(text)

In [10]:
print("Capturing group result 1:", author_match)
print(f"author is {author_match.group(1)}\n")
print("Capturing group result 2:", location_match)
print(f"Shakespeare's Globe Theatre in {location_match.group(1)}\n")
print("Non-capturing group result:", travel_matches)
print(f"Different synonym of travel: {', '.join(travel_matches)}") 

Capturing group result 1: <re.Match object; span=(493, 510), match='author is Charles'>
author is Charles

Capturing group result 2: <re.Match object; span=(396, 433), match="Shakespeare's Globe Theatre in London">
Shakespeare's Globe Theatre in London

Non-capturing group result: ['visited', 'visit', 'traveled to', 'visited']
Different synonym of travel: visited, visit, traveled to, visited


# 2. Lookarounds

We use lookarounds (also known as assertions) to assert that a pattern is preceded or followed by another pattern without including the preceding or
following pattern in the match. There are two types—lookaheads, and lookbehinds—and they can be either positive or negative.

Positive lookahead (?=...): This lookaround asserts that the succeeding characters must match the pattern inside the lookahead but doesn’t include them
in the match result. For example, foo(?=bar) would match foo only if it’s followed by bar, but bar would not be included in the match result.

Negative lookahead (?!...): This lookaround asserts that the succeeding characters must not match the pattern inside the lookahead. We use it to 
specify a negative condition that must not be met. For example, foo(?!bar) would match foo only if it’s not followed by bar.

Positive lookbehind (?<=...): This lookaround asserts that the preceding characters must match the pattern inside the lookbehind but doesn’t include
them in the match result. For example, (?<=foo)bar would match bar only if it’s preceded by foo, but foo would not be included in the match result.

Negative lookbehind (?<!...): This lookaround asserts that the preceding characters must not match the pattern inside the lookbehind. We use them to 
specify a negative condition that must not be met. For example, (?<!foo)bar would match bar only if it’s not preceded by foo.



In [11]:
df = pd.read_csv("C:/Users/ariji/OneDrive/Desktop/Data/labeled_feedback.csv")
df.head()

Unnamed: 0,timestamp,username,feedback,label
0,08-08-2023 10:00,@TechEnthusiast,The new telecom product offers amazing connect...,Positive
1,08-08-2023 10:15,@GadgetGuru,The new telecom product is a game-changer! It'...,Positive
2,08-08-2023 10:30,@FrequentCaller,I've noticed a significant improvement in call...,Positive
3,08-08-2023 10:45,@BusinessOwner,The new product has enhanced our business oper...,Positive
4,08-08-2023 11:00,@DigitalNomad,"As a digital nomad, I rely on consistent inter...",Positive


In [24]:

"""
We use a positive lookahead to find matches where the \samazing pattern in df['feedback'] follows the text and store the result in match1.

We use a positive lookahead to find matches where the lag-free. pattern follows the text in df['feedback'] and store the result in match2.

We use a negative lookahead to find matches where the text is not followed by the problems pattern in df['feedback'] and store the result in match3.

We use a negative lookahead to find matches where the text is not followed by the waste of pattern in df['feedback'] and store the result in match4.

We use a positive lookbehind to find matches where the text is preceded by the satisfied pattern in df['feedback'] and store the result in match5.

We use a positive lookbehind to find matches where the text is preceded by the great leader. pattern (“great leader” followed by a period) in
df['feedback'] and store the result in match6.

We use a negative lookbehind to find matches where the text is not preceded by the not satisfied pattern in df['feedback'] and store the result in 
match7.

We use a negative lookbehind to find matches where the text is not preceded by the highly recommend it. pattern in df['feedback'] and store the result
in match8.

"""

match1 = df['feedback'].str.contains(r"(?=\samazing)", regex=True)
match2 = df['feedback'].str.contains(r"(?=lag-free\.)", regex=True)
match3 = df['feedback'].str.contains(r"(?!problems)", regex=True)
match4 = df['feedback'].str.contains(r"(?!waste\sof)", regex=True)
match5 = df['feedback'].str.contains(r"(?<=satisfied\s)", regex=True)
match6 = df['feedback'].str.contains(r"(?<=great\sleader\.)", regex=True)
match7 = df['feedback'].str.contains(r"(?<!not\ssatisfied\s)", regex=True)
match8 = df['feedback'].str.contains(r"(?<!highly\srecommend\sit\.)", regex=True)

In [25]:

print("Positive lookahead 1:", df[match1]['feedback'].tolist())
print("\nPositive lookahead 2:", df[match2]['feedback'].tolist())
print("\nNegative lookahead 1:", df[match3]['feedback'].tolist())
print("\nNegative lookahead 2:", df[match4]['feedback'].tolist())
print("\nPositive lookbehind 1:", df[match5]['feedback'].tolist())
print("\nPositive lookbehind 2:", df[match6]['feedback'].tolist())
print("\nNegative lookbehind 1:", df[match7]['feedback'].tolist())
print("\nNegative lookbehind 2:", df[match8]['feedback'].tolist())

Positive lookahead 1: ["The new telecom product offers amazing connectivity and lightning-fast speeds. I'm thoroughly impressed!"]

Positive lookahead 2: ["The new telecom product is a game-changer! It's made my online gaming experience so much smoother and lag-free."]

Negative lookahead 1: ["The new telecom product offers amazing connectivity and lightning-fast speeds. I'm thoroughly impressed!", "The new telecom product is a game-changer! It's made my online gaming experience so much smoother and lag-free.", "I've noticed a significant improvement in call quality and signal strength with the new telecom product. Great job!", 'The new product has enhanced our business operations by providing reliable internet for all our devices. A must-have for any office.', 'As a digital nomad, I rely on consistent internet wherever I go. The new telecom product has kept me connected no matter where I am!', 'While the new product offers good speeds, I experienced occasional dropouts in my connectio