Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement: Type cast properties on demand #3089

Merged
merged 2 commits into from Apr 1, 2019

Conversation

Projects
None yet
2 participants
@pprkut
Copy link
Collaborator

commented Dec 5, 2018

I analyzed where time is spent when querying the library and found that most of it is actually type casting properties after retrieving them from the database. However, this isn't really necessary. We hardly ever need access to all properties, so it makes much more sense to delay the actual casting until a property is accessed.

Taking my previous baseline of 3.66s for master, doing the delayed type casting reduces this to 1.96s, or down to 1.45s when #2798 is also applied (a bit slower now than my first number with it now no longer being a poc).
Our of those remaining ~1.5s, about 0.5s is spent on the actual database queries, and the rest is object creation.

I'm open to ideas for further performance improvements here, but we get firmly into area of micro-optimizations (creating 11000 objects in 1s, we'd need to make every object creation 1/22000s faster to save another 0.5s). At least I didn't find any obvious candidates.

@sampsyo

This comment has been minimized.

Copy link
Member

commented Dec 6, 2018

Nice!! This seems like a really good optimization target. Excellent work with that profiler. 😃

I have a small idea for how to make the code a little more centralized and perhaps more maintainable: define a little "dict-like" wrapper that is responsible for performing the lazy conversion. Here's a quick sketch I put together:

class LazyConvertDict(object):
    def __init__(self, data, model_cls):
        self.data = data
        self.model_cls = model_cls
        self._converted = {}

    def _convert(self, key, value):
        return self.model_cls._type(key).from_sql(value)

    def __getitem__(self, key):
        if key in self._converted:
            return self._converted[key]
        elif key in self.data:
            value = self._convert(key, self.data[key])
            self._converted[key] = value
            return value

The model class could then just define _values_fixed and _values_flex to be instances of LazyConvertDict and access them normally—eliminating the need for the conversion handling inside of the lookup functions.

This would probably need a __setitem__ method too, but does the rough idea make sense?

@pprkut pprkut force-pushed the pprkut:delayed_type_casting branch from ea09bba to 5eced8d Mar 24, 2019

@pprkut

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 24, 2019

Uploaded a raw version that uses your suggested LazyConvertDict. Testing here looks fine. Performance for some reason seems to be even better with that version, not entirely sure why.
Please have a look. I'll perform cleanup, doc additions, test fixes within the next couple days :-)

@sampsyo
Copy link
Member

left a comment

Yay! Looking great already.

Show resolved Hide resolved beets/dbcore/db.py
Show resolved Hide resolved beets/dbcore/db.py Outdated

@pprkut pprkut force-pushed the pprkut:delayed_type_casting branch from 5eced8d to c90a270 Mar 31, 2019

@pprkut pprkut force-pushed the pprkut:delayed_type_casting branch from c90a270 to f9f2bdd Mar 31, 2019

@pprkut

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 31, 2019

AFAICS everything's fixed up now :)

@sampsyo

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

OK, this is truly awesome!! We should be maintaining a trophy room somewhere that lists all the performance bottlenecks you have found and defeated. Thank you for all your help along these lines!

@sampsyo sampsyo merged commit f9f2bdd into beetbox:master Apr 1, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

sampsyo added a commit that referenced this pull request Apr 1, 2019

Merge pull request #3089 from pprkut/delayed_type_casting
Performance improvement: Type cast properties on demand

sampsyo added a commit that referenced this pull request Apr 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.