Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for QueryResult caching #1251

Merged
merged 1 commit into from Nov 5, 2016
Merged

Support for QueryResult caching #1251

merged 1 commit into from Nov 5, 2016

Conversation

mwjames
Copy link
Contributor

@mwjames mwjames commented Nov 6, 2015

Most of the job is already done by the QueryDependencyLinksStore to track and update query dependencies.

To eliminate unnecessary SQL/SPARQL connections QueryResultCache ought to cache a subject list returned from the QueryEngine.

We don't cache the string result of a printer == it means we don't interfere with how the printer manipulates the data displayed, QueryResultCache as the name suggests caches the return object (aka QueryResult) from the QueryEngine before it is forwarded to an individual printer.

If for some reason QueryDependencyLinksStore is not enabled then auto-invalidation of the QueryResultCache items can not occur and instead (same as of now, meaning manual intervention using the purge button) setting $GLOBALS['smwgQueryResultCacheRefreshOnPurge'] = true; can
be used so that during an purge action event the cache is invalidated for all queries stored with a corresponding article (aka subject).

image

Features and limitations

  • limit=0, format=debug, and Special:Ask queries are not cached
  • redis is the preferred external StoreEngine in order to avoid issues during serialization as well as in terms of the amount of data to be stored/requested
  • $GLOBALS['smwgQueryResultCacheType'] is set to CACHE_NONE which means the feature is disabled, choosing an appropriate type is left to a administrator
  • $GLOBALS['smwgQueryResultCacheLifetime'] = 60 * 60 * 24; // a day declares the lifetime of a cached item

refs #1035, #1117

@mwjames mwjames added the new feature A new, or altered behaviour of an existing functionality that fundamentally impacts behaviour label Nov 6, 2015
@mwjames mwjames added this to the SMW 2.4 milestone Nov 6, 2015
@mwjames
Copy link
Contributor Author

mwjames commented Nov 7, 2015

@kghbln Once this is merged lets try to experiment on the sandbox. Probably redis is not available as cache so you might use CACHE_ANYTHING (locally I only tested it with redis).

CACHE_ANYTHING will make MW DB requests to pull cache content from the database (normally we want to diversify from a faster backend such as redis).

I'm not sure how to quantify the impact of having the cache enabled as we should see (or expect) less requests to the DB backend.

@mwjames mwjames changed the title [WIP] Add EmbeddedQueryResultCache Support for QueryResult caching Nov 5, 2016
@mwjames
Copy link
Contributor Author

mwjames commented Nov 5, 2016

After a year this got finally some love!

@kghbln Some notes:

  • This feature is disabled by default, in order to enable it smwgQueryResultCacheType needs to be set
  • smwgQueryResultCacheType should contain a responsive cache provider
  • smwgQueryResultCacheLifetime contains the lifetime for embedded queries and by default is set to 1d
  • smwgQueryResultNonEmbeddedCacheLifetime (see notes in DefaultSettings.php)
  • smwgQueryResultCacheRefreshOnPurge if enabled (enabled by default) is to manually purge a query at the action=purge event
  • If smwgQueryResultCacheType and smwgEnabledQueryDependencyLinksStore are both enabled then as soon as dependencies are altered the cache for a dedicated query gets evicted.
  • If smwgQueryResultCacheType is disabled then the _QUERY... (Has query) identifier remains the same but in case this feature gets enabled the ID will change. I had to modify the composition of the ID to ensure to find all queries with the same signature allowing it share the cache result and lower a possible cache fragmentation.
  • We don't cache the result of the query printer output, we only cache the subject list as the outcome of the query computing.

Some preliminary comparison on cached/non-cached response times. I'm unable to say how much the QueryResultCache can actually "save" computing or how high the retention rate would be in terms of cache vs. non-cached queries in a production environment.

2016-11-05 09:13:51 TAURUS mw-28-00: SMW\ParserData::updateStore :: Main_Page#0# (as DeferredCallableUpdate)
2016-11-05 09:13:52 TAURUS mw-28-00: SMW\CachedQueryResultPrefetcher::addQueryResultToCache doUpdate
2016-11-05 09:13:52 TAURUS mw-28-00: QueryResult from backend in (sec): 0.53615 (_QUERY7236492f22b1fab0f7b4194e3f938ba5) Main_Page#0#
2016-11-05 09:13:52 TAURUS mw-28-00: SMW\CachedQueryResultPrefetcher::addQueryResultToCache doUpdate
2016-11-05 09:13:52 TAURUS mw-28-00: QueryResult from backend in (sec): 0.57243 (_QUERYa8aeec541da03e00463a9585bb2653ec) Main_Page#0#
2016-11-05 09:13:52 TAURUS mw-28-00: SMW\CachedQueryResultPrefetcher::addQueryResultToCache doUpdate
2016-11-05 09:13:52 TAURUS mw-28-00: QueryResult from backend in (sec): 0.59305 (_QUERY7d0922981325cb036ae758e72e4fb192) Main_Page#0#
2016-11-05 09:13:52 TAURUS mw-28-00: SMW\ParserData::updateStore :: Main_Page#0# doUpdate
...
2016-11-05 09:35:13 TAURUS mw-28-00: SMW\SQLStore\QueryDependency\EntityIdListRelevanceDetectionFilter::getFilteredIdList procTime (sec): 0.001171
2016-11-05 09:39:10 TAURUS mw-28-00: QueryResult from cache in (sec): 0.00806 (_QUERY7236492f22b1fab0f7b4194e3f938ba5)
2016-11-05 09:39:10 TAURUS mw-28-00: SMW\SQLStore\QueryDependency\QueryDependencyLinksStore::doUpdateDependenciesBy (as DeferredCallableUpdate)
2016-11-05 09:39:10 TAURUS mw-28-00: SMW\SQLStore\QueryDependency\QueryDependencyLinksStore::doUpdateDependenciesBy procTime (sec): 0.018476
2016-11-05 09:39:10 TAURUS mw-28-00: QueryResult from cache in (sec): 0.00078 (_QUERYa8aeec541da03e00463a9585bb2653ec)
2016-11-05 09:39:10 TAURUS mw-28-00: SMW\SQLStore\QueryDependency\QueryDependencyLinksStore::doUpdateDependenciesBy (as DeferredCallableUpdate)
2016-11-05 09:39:10 TAURUS mw-28-00: SMW\SQLStore\QueryDependency\QueryDependencyLinksStore::doUpdateDependenciesBy procTime (sec): 0.014111
2016-11-05 09:39:10 TAURUS mw-28-00: QueryResult from cache in (sec): 0.00058 (_QUERY7d0922981325cb036ae758e72e4fb192)

@mwjames
Copy link
Contributor Author

mwjames commented Nov 5, 2016

Let's not wait another year!

@mwjames mwjames merged commit 15182f3 into master Nov 5, 2016
@mwjames mwjames deleted the query-cache branch November 5, 2016 22:56
@mwjames
Copy link
Contributor Author

mwjames commented Nov 5, 2016

Happy testing.

@kghbln kghbln added the wikidocu missing Code changes (mostly features) what have not yet been documented label Nov 6, 2016
@mwjames
Copy link
Contributor Author

mwjames commented Nov 20, 2016

@kghbln Did we enable this on the sandbox?

@kghbln
Copy link
Member

kghbln commented Nov 20, 2016

Did we enable this on the sandbox?

No, if it is not shown on the setup page ... For why: at the time this got merged I switched to 2.4.x and after getting back to normal I had senior moments. However, this is now switched to CACHE_ANYTHING and probably I give CACHE_MEMCACHED a shot at a later stage, too before I finally enable REDIS on the box.

@mwjames mwjames mentioned this pull request Nov 26, 2016
2 tasks
@mwjames mwjames mentioned this pull request Dec 4, 2016
2 tasks
@mwjames
Copy link
Contributor Author

mwjames commented Dec 10, 2016

Just pulled some data from the sandbox and we can see that in 48% of the requests we hit the cache (or better avoid computation of a query condition). We'll add more data points on what causes the delete rates but this requires a reset and a new version (that's why it is called transient stats).

{
    "misses": 3309,
    "deletes": 3060,
    "hits": {
        "embedded": 2987,
        "nonEmbedded": 94
    },
    "medianRetrievalResponseTime": {
        "cached": 0.00017727918244152,
        "uncached": 0.0027385082133287
    },
    "noCache": {
        "byLimit": 1078,
        "byOption": 35
    },
    "ratio": {
        "hit": 0.4822,
        "miss": 0.5178
    },
    "meta": {
        "version": "0.2",
        "cacheLifetime": {
            "embedded": 86400,
            "nonEmbedded": 600
        },
        "collectionDate": {
            "start": "2016-11-27 11:18:19",
            "update": "2016-12-10 18:36:41"
        }
    }
}

@mwjames
Copy link
Contributor Author

mwjames commented Dec 10, 2016

New stats will include things like (those of Undefined are without any specific context):

    "deletes": {
        "onArticlePurge": 5,
        "onParserCachePurgeJob": 61,
        "onUndefined": 30
    },
    "hits": {
        "tempCache": 114,
        "embedded": 28,
        "nonEmbedded": {
            "SpecialAsk": 11,
            "Undefined": 1,
            "API": 1
        }
    },
  • onArticlePurge counts every time someone pushes the purge button
  • onParserCachePurgeJob is executed when a QueryDependency causes a ParserCachePurgeJob which also evicts cached queries
  • SpecialAsk non-embedded queries that hit the cache from Special:Ask
  • API non-embedded queries that hit the cache from an API request
  • tempCache previously those would have been misses because for results that have not yet been written to the back-end (we write the cache in deferred mode after everything is finished), now the temporary cache holds on to those results until they have been stored and allows other requests to use them right away.

@mwjames
Copy link
Contributor Author

mwjames commented Dec 17, 2016

Different statistics from the sandbox:

I changed (a23b92d) the default cacheLifeTime for embedded queries (those that can be purged automatically by the ParserCachePurgeJob) from 1d to 7d.

  • deletes.onPropertyTableIdReferenceDisposal means when something is deleted by a job that triggers a purge
  • deletes.onParserCachePurgeJob shows that the QueryDepedency is doing its job in invalidating pages related to an altered query dependency
  • deletes.onArticlePurge someone used the purge button
  • deletes.onArticleDelete someone deleted an article
  • deletes.onArticleMove someone moved an article resulting in a cache purge
  • deletes.onUndefined most likely a NewRevisionFromEditComplete (did forget to set the context, fixed with 11acd84)
  • hits.nonEmbedded.Undefined no idea (or better no context therefore no data)
{
    "misses": 1460,
    "deletes": {
        "onPropertyTableIdReferenceDisposal": 526,
        "onParserCachePurgeJob": 5565,
        "onUndefined": 30,
        "onArticlePurge": 7,
        "onArticleDelete": 2,
        "onArticleMove": 1
    },
    "hits": {
        "embedded": 909,
        "nonEmbedded": {
            "Undefined": 26,
            "SpecialAsk": 4,
            "API": 10
        },
        "tempCache": {
            "embedded": 299
        }
    },
    "medianRetrievalResponseTime": {
        "uncached": 0.0038845668627478,
        "cached": 0.00045126991159509
    },
    "noCache": {
        "byLimit": 179,
        "byOption": 2
    },
    "ratio": {
        "hit": 0.3837,
        "miss": 0.6163
    },
    "meta": {
        "version": "1",
        "cacheLifetime": {
            "embedded": 86400,
            "nonEmbedded": 600
        },
        "collectionDate": {
            "start": "2016-12-12 01:30:01",
            "update": "2016-12-17 14:42:04"
        }
    }
}

@kghbln kghbln removed the wikidocu missing Code changes (mostly features) what have not yet been documented label Mar 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature A new, or altered behaviour of an existing functionality that fundamentally impacts behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants