## 1. What is Regular Expression (Regex)?
A regular expression, often abbreviated as regex, is a sequence of characters that defines a search pattern. It is used to match and manipulate strings based on patterns, allowing you to find, replace, or extract specific parts of text. Regex is a powerful tool used in various programming languages and tools for text processing tasks.
## 2. Getting Started with Pandas and Regex
Before we dive into using regex with Pandas, make sure you have both Pandas and Python installed on your system. You can install Pandas using the following command:

In [1]:
# pip install pandas

Once you have Pandas installed, you can import it into your Python script or Jupyter Notebook using the following:

In [2]:
# import pandas as pd

Now that we have Pandas ready, let’s start exploring how to use regex with Pandas.

## 3. Using Regex with Pandas: Examples
### Example 1: Extracting Information
Let’s assume you have a dataset containing strings that include email addresses, and you want to extract all the email addresses from the dataset. Regex can be used to identify and extract these patterns. Here’s how you can achieve this using Pandas and regex:

In [3]:
import re
import pandas as pd


# Sample dataset
data = {
    "text": [
        "Contact us at john@example.com for inquiries.",
        "Please email alice@example.com for more information.",
        "Reach out to support@example.com if you need assistance.",
    ]
}

df = pd.DataFrame(data)

# Define the regex pattern for matching email addresses
pattern = r"[\w\.-]+@[\w\.-]+"

# Apply the regex pattern to extract email addresses
df["email_addresses"] = df["text"].apply(lambda x: re.findall(pattern, x))

print(df)

                                                text        email_addresses
0      Contact us at john@example.com for inquiries.     [john@example.com]
1  Please email alice@example.com for more inform...    [alice@example.com]
2  Reach out to support@example.com if you need a...  [support@example.com]


In this example, the regex pattern r'[\w\.-]+@[\w\.-]+' matches the common structure of an email address. The re.findall() function is used to find all occurrences of this pattern within the text column of the DataFrame. The extracted email addresses are then stored in a new column called email_addresses.
### Example 2: Data Cleaning and Transformation
Suppose you have a dataset with a column containing messy strings that include various characters and symbols. You want to clean up these strings and extract relevant information using regex. Here’s an example of how you can achieve this using Pandas and regex:

In [4]:
# Sample dataset
data = {
    "raw_text": ["Product ID: 123-XYZ", "Product ID: 456-ABC", "Product ID: 789-PQR"]
}

df = pd.DataFrame(data)

# Define the regex pattern for extracting product IDs
pattern = r"Product ID: (\d+-\w+)"

# Apply the regex pattern to extract product IDs
df["product_id"] = df["raw_text"].apply(
    lambda x: re.search(pattern, x).group(1) if re.search(pattern, x) else None
)

print(df)

              raw_text product_id
0  Product ID: 123-XYZ    123-XYZ
1  Product ID: 456-ABC    456-ABC
2  Product ID: 789-PQR    789-PQR


In this example, the regex pattern r'Product ID: (\d+-\w+)' captures the product IDs following the “Product ID: ” text. The parentheses in the pattern create a capture group, allowing us to extract the specific part of the pattern we’re interested in. The re.search() function is used to search for the pattern within the raw_text column, and .group(1) retrieves the captured product ID.

## 4. Conclusion

In this tutorial, we’ve explored how to use Pandas with regular expressions (regex) for text data manipulation tasks. We started by introducing the concept of regex and its importance in text processing. We then demonstrated how to get started with Pandas and regex, and provided two examples to showcase its practical application.

Regular expressions offer a powerful way to manipulate and extract information from text data. When combined with Pandas, they become a valuable tool for data preprocessing, cleaning, and analysis. As you continue working with real-world datasets, you’ll likely encounter scenarios where regex can significantly simplify complex text manipulation tasks. By mastering the integration of Pandas with regex, you’ll be better equipped to handle various text-related challenges in your data analysis projects.