Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance suggestion #73

Closed
jvanasco opened this issue Jun 6, 2017 · 4 comments
Closed

performance suggestion #73

jvanasco opened this issue Jun 6, 2017 · 4 comments

Comments

@jvanasco
Copy link
Contributor

jvanasco commented Jun 6, 2017

I'm considering migrating to micawber from a custom oembed consumer, and wanted to suggest a performance improvement that I am willing to generate a PR for.

I'd like to extend the ProviderRegistry with a secondary internal register that nests providers under domain names.

this would allow users to optionally avoid a regex match against every provider and only test the domain.

some light tests on a quick mockup showed the lookups to run in 30% the time -- including the overhead of parsing the domain name from a url, but about 5% of the time if you have the domain already.

we would be using this on a high volume indexer, so this performance is a need.

@coleifer
Copy link
Owner

coleifer commented Jun 6, 2017

I would suggest subclassing the existing ProviderRegistry and overriding the register and unregister methods in your subclass. You can parse out the hostname using urlparse(url).netloc. The bootstrap_* methods all accept a ProviderRegistry instance so you can provide an instance of your subclass and get everything registered easily.

@jvanasco
Copy link
Contributor Author

jvanasco commented Jun 6, 2017

ah, subclassing is a good idea. i'll redo the PR I made.

@coleifer
Copy link
Owner

coleifer commented Jun 7, 2017

I was suggesting that you subclass it in your own application -- I don't plan to merge the optimization into the codebase in the interests of keeping things simple and extensible.

@coleifer coleifer closed this as completed Jun 7, 2017
@jvanasco
Copy link
Contributor Author

jvanasco commented Jun 7, 2017

That's fair.

Would you be interested at all in this reorg of migrating the providers into a dict? You can strip out the 'domain' key if you want.

master...jvanasco:feature-library

fyi, urlparse(url).netloc is the hostname + port (if present in a url). you can use urlparse(url).hostname to ensure only a hostname. some of the endpoints serve subdomains though, so a library like tldextract would be needed to parse out the actual "registered" domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants