# $$ Step\ 4\ : Regular\ expressions\ $$

_____________

**Regular Expressions (regex)** are a powerful tool used for searching and manipulating strings that match a specified pattern. They allow you to define a pattern that can match a wide range of strings, making them incredibly useful for filtering and processing text.

### Why Use Regular Expressions?

Regular expressions are particularly useful when you want to search for patterns in text rather than specific words or phrases. They are commonly used for:
- **Text validation** (e.g., validating email addresses or phone numbers).
- **String searching** (finding certain patterns in large amounts of text).
- **String manipulation** (replacing, splitting, or extracting parts of text).

### Regex Syntax

The syntax of regular expressions can vary depending on the specific task, but here are a few commonly used elements:
- **`.`**: Matches any single character except a newline.
- **`^`**: Matches the start of the string.
- **`$`**: Matches the end of the string.
- **`[]`**: Matches any one character inside the brackets.
- **`*`**: Matches zero or more occurrences of the preceding character.
- **`+`**: Matches one or more occurrences of the preceding character.
- **`?`**: Matches zero or one occurrence of the preceding character.
- **`|`**: Logical OR, matches the pattern before or after the `|`.

___________________

In [2]:
import re 

## Raw Strings :

In Python, certain characters have special meanings. For instance, `\n` represents a new line. However, there are times when we want the `\n` in our strings to be treated as a literal backslash followed by the letter 'n', instead of being interpreted as a new line.

To achieve this, we can prefix the string with the letter `'r'`, which tells Python to treat the string as a **raw string**.

In [3]:
# print text without using raw string indicator
my_folder = "C:\desktop\download\notes"
print(my_folder)

C:\desktop\download
otes


In [5]:
# include raw string indicator
my_folder = r"C:\desktop\download\notes"
print(my_folder)

C:\desktop\download\notes


## re.search :

The `re.search` function is used to check if a specific pattern exists within a string. It follows the syntax `re.search("pattern", "string")`. If the pattern is found, it returns the match; otherwise, it returns `None` if the pattern is not present in the string.

In [6]:
result_search = re.search("food", r"That restaurant has the best food")
print(result_search)

<re.Match object; span=(29, 33), match='food'>


In [7]:
print(result_search.group()) # returns just the matching pattern

food


In [9]:
# Sample review text
review = "I really enjoyed the course on Python programming. The concepts were well-explained, and the exercises were great!"

# Regex pattern to find the product name "JavaScript course"
pattern = r"\bJavaScript course\b"  # Pattern to search for the product name "JavaScript course"

# Search for the pattern in the review
match = re.search(pattern, review)

print(match)

None


## re.sub :

The re.sub function allows us to find a specific pattern in a string and replace it with another string. \\

It follows the syntax:
`re.sub("pattern to find", "replacement text", "string")`


In [10]:
string = r"The waitress Sarah was really rude to me "

new_string = re.sub(r"Sarah", r"Taylor", string) # replace the incorrect spelling of sarah

print(new_string)

The waitress Taylor was really rude to me 


In [11]:

review = r"Overall, I really loved the product. The quality of the items was excellent. The packaging was poor, and it made the unboxing experience frustrating. However, the delivery was fast, and the customer service was helpful. I'm satisfied with the purchase, but the packaging could be improved."

# Replace 'poor' with 'acceptable' and 'fast' with 'quick'
new_review = re.sub(r"poor", r"acceptable", review)
new_review = re.sub(r"fast", r"quick", new_review)

# Print the updated review
print(new_review)


Overall, I really loved the product. The quality of the items was excellent. The packaging was acceptable, and it made the unboxing experience frustrating. However, the delivery was quick, and the customer service was helpful. I'm satisfied with the purchase, but the packaging could be improved.


## Regex Syntax :

The real power of regex is being able to leverage the syntax to create more complex searches/replacements.

In [12]:
# Sample customer reviews
customer_reviews = [
    'john was extremely helpful and kind during my visit',
    'i had a great experience with anna, she helped me find exactly what i needed!',
    'the staff was really rude, and i would not recommend it to anyone',
    'michael helped me understand the products better, great service',
    'julia was very attentive, she guided me through every step',
    'david was able to find everything i was looking for, excellent service'
]

#### Find reviews containing names that start with 'j' (e.g., John, Julia)

In [13]:
j_reviews = []
pattern_to_find = r"\b[jJ]\w+"  # \b for word boundary, [jJ] for any word starting with 'j'
for string in customer_reviews:
    if re.search(pattern_to_find, string):
        j_reviews.append(string)

print("Reviews with names starting with 'J':")
print(j_reviews)

Reviews with names starting with 'J':
['john was extremely helpful and kind during my visit', 'julia was very attentive, she guided me through every step']


#### Find reviews that mention the word 'service' :

In [14]:
service_reviews = []
pattern_to_find = r"service"  # Searching for the word 'service' in reviews
for string in customer_reviews:
    if re.search(pattern_to_find, string):
        service_reviews.append(string)

print("\nReviews mentioning 'service':")
print(service_reviews)


Reviews mentioning 'service':
['michael helped me understand the products better, great service', 'david was able to find everything i was looking for, excellent service']


#### Find reviews that contain an exclamation mark :

In [15]:
exclamation_reviews = []
pattern_to_find = r"!"  # Searching for exclamation marks in reviews
for string in customer_reviews:
    if re.search(pattern_to_find, string):
        exclamation_reviews.append(string)

print("\nReviews containing an exclamation mark:")
print(exclamation_reviews)


Reviews containing an exclamation mark:
['i had a great experience with anna, she helped me find exactly what i needed!']


#### Replace all occurrences of the word 'help' with 'assist' in the reviews

In [16]:
assist_reviews = []
pattern_to_find = r"\bhelp\b"  # The word boundary ensures we only match 'help' as a full word
for string in customer_reviews:
    new_string = re.sub(pattern_to_find, 'assist', string)
    assist_reviews.append(new_string)

print("\nReviews with 'help' replaced by 'assist':")
print(assist_reviews)


Reviews with 'help' replaced by 'assist':
['john was extremely helpful and kind during my visit', 'i had a great experience with anna, she helped me find exactly what i needed!', 'the staff was really rude, and i would not recommend it to anyone', 'michael helped me understand the products better, great service', 'julia was very attentive, she guided me through every step', 'david was able to find everything i was looking for, excellent service']


#### Remove any words shorter than 4 characters (e.g., 'and', 'the', 'i')

In [17]:
filtered_reviews = []
pattern_to_find = r"\b\w{1,3}\b"  # \b for word boundaries, \w matches any word, {1,3} for words 1 to 3 characters
for string in customer_reviews:
    no_short_words = re.sub(pattern_to_find, "", string)  # Remove words shorter than 4 characters
    filtered_reviews.append(no_short_words.strip())

print("\nReviews with words shorter than 4 characters removed:")
print(filtered_reviews)


Reviews with words shorter than 4 characters removed:
['john  extremely helpful  kind during  visit', 'great experience with anna,  helped  find exactly what  needed!', 'staff  really rude,   would  recommend   anyone', 'michael helped  understand  products better, great service', 'julia  very attentive,  guided  through every step', 'david  able  find everything   looking , excellent service']
