**Miguel Ramirez**

**Week 2 Assignment**

**Professor Cohen**

<u>Data Acquisition and Management</u>


# Difficulties of Using Regular Expressions in Python (with Examples and Output)
## While regular expressions are a powerful tool for pattern matching in Python, they can also present certain challenges:
### Complexity and Readability:
Example 1: Basic Digit Matching (Including Leading Zeros)

In [2]:
import re

pattern = r"\d+"
text = "001423"

match = re.search(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match found")

Match found: 001423


Explanation: 

Pattern: r"\d+" matches one or more digits, including leading zeros.

Text: "001423" matches the pattern.

____________________________________________________________________________________________________________________________________________

Example 2: Rejecting Leading Zeros in Numbers

In [7]:
import re

pattern = r"^(?!0)\d+$"
text = "00123"

match = re.search(pattern, text)
if match:
    print("Match found:", match.group())
else:
    print("No match found")

No match found


Explanation: 

Pattern: ^(?!0)\d+$ ensures the string of digits does **not** start with 0.

Text: "00123" does not match since it starts with a zero.

_________________________________________________________________________________________________________________________________________________________________________________________________________________________

Example 3: Email Validation Using Regular Expressions

Validating Email Addresses

We will now use regular expressions to validate email addresses with both valid and invalid examples.

In [6]:
import re

username_regex = r"^[a-zA-Z0-9._-]+"
domain_regex = r"[a-zA-Z0-9._-]+\.[a-zA-Z]{2,}$"
email_regex = rf"{username_regex}@{domain_regex}"

# Example of valid and invalid email addresses
valid_email = "miguel.ramirez29@spsmail.cuny.edu"
invalid_email = "miguel.ramirez@.edu"

def validate_email(email):
    if re.match(email_regex, email):
        print(f"Valid email address: {email}")
    else:
        print(f"Invalid email address: {email}")

# Test valid email
validate_email(valid_email)

# Test invalid email
validate_email(invalid_email)

Valid email address: miguel.ramirez29@spsmail.cuny.edu
Invalid email address: miguel.ramirez@.edu


Explanation:

Username Regex: ^[a-zA-Z0-9._-]+ ensures that the username contains alphanumeric characters, dots, underscores, or hyphens.

Domain Regex: [a-zA-Z0-9._-]+\.[a-zA-Z]{2,}$ checks that the domain contains alphanumeric characters, followed by a dot (.), and then a valid top-level domain (at least 2 characters).

Combining: rf"{username_regex}@{domain_regex}" combines the username and domain regex into a valid email pattern.

______________________________________________________________________________________________________________________________________________________________________________________________________

Example 4: Lazy Quantifiers with Specific Pattern Matching

In [8]:
import re

pattern = r"a+?b+?c+?"
text = "abc"

match = re.search(pattern, text)
if match:
    print("Match found")
else:
    print("No match found")


Match found


Explanation:

Pattern: a+?b+?c+? uses lazy quantifiers, matching the minimal number of a, b, and c.

Text: "abc" matches the pattern.

___________________________________________________________________________________________________________________________________________________________________________

## Comprehensive Example: Validating Email and Matching Numbers without Leading Zeros

Finally, here’s a combined example that validates both an email and a number without leading zeros.

In [9]:
import re

# Email regex
username_regex = r"^[a-zA-Z0-9._-]+"
domain_regex = r"[a-zA-Z0-9._-]+\.[a-zA-Z]{2,}$"
email_regex = rf"{username_regex}@{domain_regex}"

# Number regex (no leading zeros)
number_regex = r"^(?!0)\d+$"

def validate_email(email):
    if re.match(email_regex, email):
        print(f"Valid email: {email}")
    else:
        print(f"Invalid email: {email}")

def validate_number(number):
    if re.match(number_regex, number):
        print(f"Valid number: {number}")
    else:
        print(f"Invalid number (no leading zeros allowed): {number}")

# Test emails
valid_email = "jane.doe@example.com"
invalid_email = "jane.doe@.com"
validate_email(valid_email)
validate_email(invalid_email)

# Test numbers
valid_number = "12345"
invalid_number = "01234"
validate_number(valid_number)
validate_number(invalid_number)


Valid email: jane.doe@example.com
Invalid email: jane.doe@.com
Valid number: 12345
Invalid number (no leading zeros allowed): 01234


### Conclusion:

First Pattern (r"\d+") matches any sequence of digits, including those with leading zeros.

Second Pattern (r"^(?!0)\d+$") is designed to match numbers that don’t start with a zero.

Third Pattern (rf"{username_regex}@{domain_regex}") helps validate email addresses.

Fourth Example demonstrates how lazy quantifiers work in regular expressions to match as little as possible.

**This approach provides flexibility for both number and email validation, while also showing how to fine-tune patterns for specific use cases.**