Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce `DependencyLinksValidator`, refactor update logic, refs 1117, 3537 #3644

Merged
merged 1 commit into from Jan 20, 2019

Conversation

Projects
None yet
2 participants
@mwjames
Copy link
Contributor

commented Jan 20, 2019

This PR is made in reference to: #1117, #2283, #3537

This PR addresses or contains:

  • Introduces DependencyLinksValidator to validate a temporal attribute (see below) together with a refactored and entire new update logic which makes the following classes obsolete:
    • ParserCachePurgeJob
    • SpecialDeferredRequestDispatcher
    • DependencyLinksUpdateJournal, and
    • EntityIdListRelevanceDetectionFilter
  • Setting smwgQueryDependencyAffiliatePropertyDetectionList is removed as well as it no longer serves any function (only used in EntityIdListRelevanceDetectionFilter)

This PR includes:

  • Tests (unit/integration)
  • CI build passed

Fixes #3537

Notes

The ParserCache is important and should not be disabled arbitrarily, yet we have to reject the cache if an embedded query has been detected to be potentially outdated in order for the MediaWiki Parser to initialize a re-parse hereby forcing an update on subsequent embedded queries including their result sets and dependencies.

The rejection of the cache happens via the RejectParserCacheValue hook (since 3.0) and the new approach will rely on the same hook but with a new DependencyLinksValidator class. The DependencyLinksValidator will be responsible to detect “… whether any embedded query for subject X contains archaic dependencies (those entities that are part of a query either as result subject, property, condition, or printrequest) that were updated recently ...”

To make a judgement about a temporal attribute (aka "recently") a new field is required therefore the field smw_touched is added to track the time of the “latest touched” for each entity. This field is updated during the general data processing (or when a query is updated) hereby avoids any post-processing or time lag due to job queuing. The detection will happen on a local entity for a limited set of matches.

Subject X
  -> contains Query Y (last touched 12:00)
     -> contains result subject Foo (last touched 12:05)
        -> since Foo is "younger" it could contain new/altered assignments
           -> presents the likelihood of being outdated therefore makes
              it a (or contain) archaic dependency to X/Y
28	SELECT v.smw_id,v.smw_touched FROM "smw_fpt_ask"
INNER JOIN smw_object_ids AS p ON ((s_id=p.smw_id))
INNER JOIN smw_object_ids AS v ON ((o_id=v.smw_id))
WHERE p.smw_hash = '5132b31eb7f94c3956848a22dbcd1229298212d2' AND (p.smw_iw!=':smw') AND (p.smw_iw!=':smw-delete')

1.5359ms	SMW\SQLStore\QueryDependency\DependencyLinksValidator::hasArchaicDependencies
29	SELECT COUNT(smw_id) as count FROM "smw_object_ids"
INNER JOIN smw_query_links AS p ON ((p.o_id=smw_id))
WHERE p.s_id IN ('18458','18457','18456','18455','18454','18453','18452','18451','18450','18449','18448','18447','18413','18412','18411','18410','18409','18408','18407','18406','18405','18404','18403','18402','18341') AND (smw_touched > '2019-01-08 19:31:44')
LIMIT 1 

1.0739ms	SMW\SQLStore\QueryDependency\DependencyLinksValidator::hasArchaicDependencies

Not yet clear, but having huge sme_query_links table seems to be a part of it.

The table can become very tall which isn’t an issue with the new approach since its limit the access to a set of known IDs.

A quick test on a wiki with:

smw_object_ids ~213,792 rows	InnoDB	 162.1 MiB
smw_query_links ~647,873 rows	InnoDB	56.1 MiB
smw_fpt_ask 42,872 rows	InnoDB 7.6 MiB

reveals:

EXPLAIN SELECT v.smw_id,v.smw_touched FROM `smw_fpt_ask` INNER JOIN smw_object_ids AS p ON ((s_id=p.smw_id)) INNER JOIN smw_object_ids AS v ON ((o_id=v.smw_id)) WHERE p.smw_hash = '1d8e1f10cb5047c3a6e3fd868276fff8f12ad96e' AND (p.smw_iw!=':smw') AND (p.smw_iw!=':smw-delete')

1	SIMPLE	p	ref	PRIMARY,smw_id,smw_hash,smw_iw,smw_iw_2,smw_id_2	smw_hash	43	const	1	Using index condition; Using where	
1	SIMPLE	smw_fpt_ask	ref	s_id,o_id,s_id_2,o_id_2	s_id_2	4	mw-31-00-ql.p.smw_id	1	Using where; Using index	
1	SIMPLE	v	eq_ref	PRIMARY,smw_id,smw_id_2	PRIMARY	4	mw-31-00-ql.smw_fpt_ask.o_id	1		
EXPLAIN SELECT COUNT(smw_id) as count FROM `smw_object_ids` INNER JOIN smw_query_links AS p ON ((p.o_id=smw_id)) WHERE p.s_id IN ('11772','12681') AND (smw_touched > '20190108203807') LIMIT 1

1	SIMPLE	p	range	s_id,o_id,s_id_2	s_id_2	4	    NULL	57	Using where; Using index	
1	SIMPLE	smw_object_ids	eq_ref	PRIMARY,smw_id,smw_id_2	PRIMARY	4	mw-31-00-ql.p.o_id	1	Using where	

@mwjames mwjames added the enhancement label Jan 20, 2019

@mwjames mwjames added this to the SMW 3.1.0 milestone Jan 20, 2019

@@ -66,7 +66,7 @@
#
# @since 3.0
##
'smwgUpgradeKey' => 'smw:2018-09-01',
'smwgUpgradeKey' => 'smw:2019-01-19',

This comment has been minimized.

Copy link
@mwjames

mwjames Jan 20, 2019

Author Contributor

@kghbln FYI Here is an example where we alter the upgrade key (as we are adding a new DB field) hereby forcing installations to run an appropriate update.

@mwjames mwjames merged commit 90130dd into master Jan 20, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@mwjames mwjames deleted the qlinks-validator branch Jan 20, 2019

mwjames added a commit that referenced this pull request Jan 20, 2019

Switch index position, refs #3644
EXPLAIN SELECT shows `Using where; Using index` for `smw_object_ids`

[skip ci]
@mwjames

This comment has been minimized.

Copy link
Contributor Author

commented Jan 20, 2019

@kghbln In short, there will be no longer any smw.parserCachePurge job to be found as I stopped using it and instead rely on the smw_touched to check for "archaic" dependencies which should work faster (means instantaneously) with less computing necessary to find out whether the parser cache needs to be rejected or not for a subject (== page) that is being viewed.

The tracking of queries, and its members works as introduced in #1117 but the invalidation has changed. The on-wiki docu has been superseded by this [0] since it went into some length explaining limits and how the job update was expected to work.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/SQLStore/QueryDependency/README.md

@kghbln

This comment has been minimized.

Copy link
Member

commented Jan 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.