Skip to content

Optimize strpos: (1) scalar needle, and (2) Unicode input #20753

@neilconway

Description

@neilconway

Is your feature request related to a problem or challenge?

We should optimize the strpos implementation in two ways:

  1. When the needle is scalar, we can build a single memmem::Finder and use it to search repeatedly. This is significantly faster than using memchr, it turns out, and the cost of constructing the finder is cheap because it is amortized over the batch.
  2. We previously optimized strpos to use memchr for searching when both haystack and needle are ASCII. That was needlessly conservative: UTF-8 is self-stabilizing, so it should be safe to use memchr to search for matches for any combination of ASCII and UTF-8 needle and haystack.

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions