Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching compiled re patterns #108

Open
pvarsh opened this issue Nov 2, 2017 · 4 comments
Open

Caching compiled re patterns #108

pvarsh opened this issue Nov 2, 2017 · 4 comments

Comments

@pvarsh
Copy link

pvarsh commented Nov 2, 2017

First of all, thank you for creating this Python port.

In my use, I'm checking a large number of short strings to see if they are likely to be a phone number. I found that I'm making a lot of calls to re.compile() (for example here). Caching the patterns leads to about a 6-10x speedup for my use case.

If this is something that would make sense given the contributing guidelines, I'd be happy to do a PR.

@daviddrysdale
Copy link
Owner

Hmm, I was under the impression that the Python re code cached the compiled expressions, so getting a speed-up from an external cache surprises me slightly. I guess you might be blowing the size of the (global) cache -- are you doing lookups across lots of different countries?

Contributions welcomed -- thanks. As an initial thought, one possibility might be to add ..._re fields to the phonemetadata.py classes for each field that's a regexp (e.g. national_number_pattern_re), which get populated on construction (as all the metadata objects are supposed to be immutable).

(However, we'd need to check how much effect precompiling 300 sets of metadata regexps would have on library startup...)

@pvarsh
Copy link
Author

pvarsh commented Nov 28, 2017

Thank you for your response. I might get a chance to look deeper into it in the next week or two.

@moggers87
Copy link

The cache in re is limited to 100 in CPython 2.7 and 512 in CPython 3.6, so on Python 2.7 at least it's quite easy to blow past re._MAXCACHE

@lior1990
Copy link

lior1990 commented Jul 4, 2018

Any chance to add an option/flag to compile everything in advance so the package runtime will be faster? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants