Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 25, 2025

📄 27% (0.27x) speedup for Urlizer.is_email_simple in django/utils/html.py

⏱️ Runtime : 7.18 milliseconds 5.64 milliseconds (best of 37 runs)

📝 Explanation and details

The optimization introduces validator instance caching to avoid repeatedly creating EmailValidator objects. In the original code, EmailValidator(allowlist=[]) creates a new validator instance on every call to is_email_simple(). The optimized version creates the validator only once and stores it as a class attribute _email_validator, reusing it for all subsequent calls.

Key changes:

  • Added a lazy initialization check using hasattr() to create the validator only once
  • Stored the validator as Urlizer._email_validator class attribute
  • Reused the cached validator instance for all email validations

Why this improves performance:
Object instantiation in Python has overhead - creating a new EmailValidator involves initializing its internal state, compiling regex patterns, and setting up validation logic. By caching the validator instance, this initialization cost is paid only once instead of on every function call.

The line profiler shows the optimization is most effective when is_email_simple() is called repeatedly - the 27% overall speedup comes from eliminating redundant validator instantiation. Test results demonstrate this scales well:

  • Basic single calls show modest 8-12% improvements
  • Large batch processing (1000+ emails) shows 22-49% speedups, with invalid emails benefiting most since validation fails faster with a pre-initialized validator
  • Mixed batches achieve the advertised 27% improvement

This optimization is particularly valuable for applications that validate many email addresses in sequence, such as bulk email processing or form validation systems.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 3045 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import re

# imports
import pytest
from django.core.exceptions import ValidationError
from django.core.validators import EmailValidator
from django.utils.html import Urlizer


def is_email_simple(value):
    """Return True if value looks like an email address."""
    try:
        EmailValidator(allowlist=[])(value)
    except ValidationError:
        return False
    return True

# unit tests

# --- Basic Test Cases ---






















































#------------------------------------------------
import re

# imports
import pytest
from django.core.exceptions import ValidationError
from django.core.validators import EmailValidator
from django.utils.html import Urlizer

# unit tests

# 1. Basic Test Cases
@pytest.mark.parametrize(
    "email,expected",
    [
        # Standard valid email
        ("user@example.com", True),
        # Valid email with dot in local part
        ("first.last@example.com", True),
        # Valid email with plus in local part
        ("user+tag@example.com", True),
        # Valid email with numeric domain
        ("user@123.com", True),
        # Valid email with subdomain
        ("user@mail.example.com", True),
        # Valid email with hyphen in domain
        ("user@my-domain.com", True),
        # Valid email with underscore in local part
        ("user_name@example.com", True),
        # Valid email with uppercase letters
        ("User@Example.COM", True),
        # Invalid: missing '@'
        ("userexample.com", False),
        # Invalid: missing domain
        ("user@", False),
        # Invalid: missing local part
        ("@example.com", False),
        # Invalid: missing TLD
        ("user@example", False),
        # Invalid: double '@'
        ("user@@example.com", False),
        # Invalid: space in email
        ("user name@example.com", False),
        # Invalid: leading dot in local part
        (".user@example.com", False),
        # Invalid: trailing dot in local part
        ("user.@example.com", False),
        # Invalid: double dot in local part
        ("user..name@example.com", False),
        # Invalid: double dot in domain
        ("user@example..com", False),
    ]
)
def test_is_email_simple_basic(email, expected):
    """Test basic valid and invalid email addresses."""
    codeflash_output = Urlizer.is_email_simple(email) # 152μs -> 135μs (12.3% faster)

# 2. Edge Test Cases

@pytest.mark.parametrize(
    "email,expected",
    [
        # Empty string
        ("", False),
        # Only whitespace
        ("   ", False),
        # Email with special characters in local part
        ('"user!#$%&\'*+-/=?^_`{|}~"@example.com', False),  # quoted local part not supported by EmailValidator
        # Email with unicode in local part
        ("usér@example.com", False),
        # Email with unicode in domain
        ("user@exámple.com", False),
        # Email with max allowed local part (64 chars)
        ("a"*64 + "@example.com", True),
        # Email with local part > 64 chars
        ("a"*65 + "@example.com", False),
        # Email with max allowed domain (255 chars)
        ("user@" + ("a"*63 + ".")*3 + "com", True),
        # Email with domain > 255 chars
        ("user@" + ("a"*63 + ".")*4 + "com", False),
        # Email with TLD of 1 char (invalid)
        ("user@example.c", False),
        # Email with TLD of 63 chars (valid)
        ("user@example." + "a"*63, True),
        # Email with consecutive dots in domain
        ("user@ex..ample.com", False),
        # Email with dash at start of domain label
        ("user@-example.com", False),
        # Email with dash at end of domain label
        ("user@example-.com", False),
        # Email with underscore in domain (invalid)
        ("user@exam_ple.com", False),
        # Email with IP address as domain (invalid for EmailValidator)
        ("user@[192.168.1.1]", False),
        # Email with comment (not supported)
        ("user@example.com (User Name)", False),
        # Email with display name (not supported)
        ("John Doe <user@example.com>", False),
        # Email with tab character
        ("user\t@example.com", False),
        # Email with newline character
        ("user\n@example.com", False),
    ]
)
def test_is_email_simple_edge(email, expected):
    """Test edge cases for email validation."""
    codeflash_output = Urlizer.is_email_simple(email) # 188μs -> 167μs (12.4% faster)

# 3. Large Scale Test Cases

def test_is_email_simple_large_batch_valid():
    """Test a large batch of valid emails."""
    # Construct 1000 valid emails
    emails = [f"user{i}@example{i%10}.com" for i in range(1000)]
    for email in emails:
        codeflash_output = Urlizer.is_email_simple(email) # 3.16ms -> 2.58ms (22.5% faster)

def test_is_email_simple_large_batch_invalid():
    """Test a large batch of invalid emails."""
    # Construct 1000 invalid emails (missing '@')
    emails = [f"user{i}example.com" for i in range(1000)]
    for email in emails:
        codeflash_output = Urlizer.is_email_simple(email) # 1.23ms -> 824μs (49.2% faster)

def test_is_email_simple_performance_on_large_input():
    """Test performance on long local and domain parts (under 1000 chars total)."""
    # Local part: 64 chars, domain: 255 chars (max allowed)
    local = "a" * 64
    domain = ".".join(["b"*63, "c"*63, "d"*63, "e"*61])  # total 63+1+63+1+63+1+61=253+3=256 (including dots)
    email = f"{local}@{domain}"
    # Should be valid (domain will be slightly over 255, so let's trim)
    if len(domain) > 255:
        domain = domain[:255]
        email = f"{local}@{domain}"
    codeflash_output = Urlizer.is_email_simple(email) # 16.0μs -> 15.0μs (7.22% faster)

def test_is_email_simple_mixed_large_batch():
    """Test a large mixed batch of valid and invalid emails."""
    emails = []
    expected = []
    for i in range(500):
        emails.append(f"user{i}@example.com")  # valid
        expected.append(True)
        emails.append(f"user{i}example.com")   # invalid
        expected.append(False)
    for email, exp in zip(emails, expected):
        codeflash_output = Urlizer.is_email_simple(email) # 2.37ms -> 1.85ms (27.5% faster)

# 4. Additional: Regression and tricky cases

@pytest.mark.parametrize(
    "email,expected",
    [
        # Email with quoted string (not supported by EmailValidator)
        ('"very.unusual.@.unusual.com"@example.com', False),
        # Email with trailing space
        ("user@example.com ", False),
        # Email with leading space
        (" user@example.com", False),
        # Email with internal space
        ("user@ example.com", False),
        # Email with control character
        ("user@\x10example.com", False),
        # Email with emoji
        ("user😊@example.com", False),
    ]
)
def test_is_email_simple_regression(email, expected):
    """Test regression and tricky cases."""
    codeflash_output = Urlizer.is_email_simple(email) # 60.9μs -> 56.0μs (8.74% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-Urlizer.is_email_simple-mh6swosh and push.

Codeflash

The optimization introduces **validator instance caching** to avoid repeatedly creating `EmailValidator` objects. In the original code, `EmailValidator(allowlist=[])` creates a new validator instance on every call to `is_email_simple()`. The optimized version creates the validator only once and stores it as a class attribute `_email_validator`, reusing it for all subsequent calls.

**Key changes:**
- Added a lazy initialization check using `hasattr()` to create the validator only once
- Stored the validator as `Urlizer._email_validator` class attribute
- Reused the cached validator instance for all email validations

**Why this improves performance:**
Object instantiation in Python has overhead - creating a new `EmailValidator` involves initializing its internal state, compiling regex patterns, and setting up validation logic. By caching the validator instance, this initialization cost is paid only once instead of on every function call.

The line profiler shows the optimization is most effective when `is_email_simple()` is called repeatedly - the 27% overall speedup comes from eliminating redundant validator instantiation. Test results demonstrate this scales well:
- Basic single calls show modest 8-12% improvements
- Large batch processing (1000+ emails) shows 22-49% speedups, with invalid emails benefiting most since validation fails faster with a pre-initialized validator
- Mixed batches achieve the advertised 27% improvement

This optimization is particularly valuable for applications that validate many email addresses in sequence, such as bulk email processing or form validation systems.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 25, 2025 21:37
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant