MW 1.23 / User-defined properties disappearing from pages #347

Closed
thingles opened this Issue Jun 24, 2014 · 36 comments

Comments

Projects
None yet
5 participants
Contributor

thingles commented Jun 24, 2014

On WikiApiary I’m seeing a large number of pages losing properties for no reason. This bug was originally highlighted at thingles/WikiApiary#114 and has persisted now through a few iterations of master.

I've been seeing an increase in the number of data errors that bots are getting because properties that should be there are not. Looking at properties for WikiApiary for example shows:

screenshot

That should have an "Is active" and "Is audited" property. They are in the template and in the page. If I manually force a "refresh" of the page the properties will appear again.

screenshot2

I’m seeing this a lot. Various tasks executing from cron or routinely throwing exceptions because mandatory properties have gone missing.

Contributor

mwjames commented Jun 24, 2014

The potential PR #303 that could have caused such behaviour has been reverted and as we see now it did not introduce any inconsistencies.

The behaviour of of use-defined properties disappearing is similar to the uncovered redirect bug in MW 1.23.

The reason of use-defined properties not being parsed is that the InternalParseBeforeLinks hook (contains the part where text is parsed and [[ :: ]] is being replaced) is not executed and therefore annotation information of a page are not returned.

Since it is similar to the redirect issue there are two things currently that I would suggest to see if any change can be monitored.

The underlying cause for the redirect issue was the use of the ContentHandler therefore the first thing to test is to modify the SMW\ContentParser with:

    protected function hasContentHandler() {
        return defined( 'CONTENT_MODEL_WIKITEXT' );
    }

into

    protected function hasContentHandler() {
        return false;
    }

The above change is only to investigate if the use of the ContentHandler is causing this issue not an actual solution (it would only mean that besides the redirect bug another issue has been identified in connection with the ContentHandler and the InternalParseBeforeLinks hook).

If the change doesn't show any significant improvement it is suggested to switch back to MW 1.22+ as code between 1.9.2 and 1.9.3 did not alter any part of the parsing/annotation (besides #303 which has been reverted).

mwjames changed the title from Properties disappearing from pages to User-defined properties disappearing from pages Jun 24, 2014

mwjames added the bug label Jun 24, 2014

Contributor

thingles commented Jun 24, 2014

Does it make sense that this happens very infrequently and only for a subset of properties for a given page? If the annotations are being removed I don't get why most are still there.

I'll try applying that change by hand when I’m online later tonight.

Contributor

mwjames commented Jun 24, 2014

make sense that this happens very infrequently and only for a subset of properties

As I said on thingles/WikiApiary#114, I personally was unable to replicate the issue on either MW 1.23/MW 1.24 but the cause of the disappearance of user-defined properties can currently only be linked to the InternalParseBeforeLinks hook.

Above suggestions are made in anticipation of the symptoms and are not the result of an analysis of the actual cause (since it "happens very infrequently" with no indication of what is triggering the disappearance as normally no user interaction is involved).

Contributor

thingles commented Jul 2, 2014

@mwjames the change to hasContentHandler() doesn't appear to have any positive impact on this disappearing properties problem.

Contributor

mwjames commented Jul 2, 2014

hasContentHandler() doesn't appear to have any positive impact

Ahh...

I just looked at the history of InternalParseBeforeLinks which was last changed Feb 17, 2014 (taken into account the revert from Jun 04, 2014 for changes made on May 05, 2014).

The reason I'm fixated on InternalParseBeforeLinksis because it is the only place in SMW where wiki text is parsed and annotations are returned to the ParserOutput object which is later used to update the Store when all other wiki edit/parsing processes are finished.

If the InternalParseBeforeLinks returns with an empty ParserOutput object for the page in question at the time a page is saved the empty ParserOutput will not contain any user-defined properties and only be filled with special-properties (because they are added at the later point independently from the user text).

This leaves me currently only to suggest to return to the previous MW version (as the issue seem to be first appeared close to the date when the update was made).

PS: In order to analyze the problem, I need to know how to trigger the issue in order to make a suggestion on how to fix it.

Contributor

thingles commented Jul 2, 2014

Re PS: I totally get that. I wish I could figure out a way to trigger it to happen. Similar to the #191 I have no idea what is causing this to happen. :-\ The only thing I know on this one is a refresh manually will fix it, where #191 that will never fix it. I would happily grant shell access to my server to trusted folks if there was an interest in looking directly.

Also, FWIW, if anyone ever wanted to setup a duplicate of the WikiApiary site I dump a fresh XML backup weekly. I’m not sure that is helpful at all, but thought I would share.

Contributor

mwjames commented Jul 2, 2014

I would happily grant shell access to my server to trusted folks if there was an interest in looking directly.

This would only make sense if we know what triggers it but right now the cause is unknown (in terms of a nondeterministic behavior). For example, is it a bot write to a page, is a job run or is it something else that those pages share in common that would make them classifiable (as a group) in terms of their lost data (belongs to certain entity etc.).

Contributor

thingles commented Jul 2, 2014

For example, is it a bot write to a page, is a job run or is it something else that those pages share in common that would make them classifiable (as a group) in terms of their lost data (belongs to certain entity etc.).

Just highlighting that I don't believe a bot write to a page is related to this or #191. It has been mentioned many times as a potential reason, but @kghbln and I have found many example of pages with these issues that have no recent bot activity.

Owner

kghbln commented Jul 2, 2014

I would still like to throw in my comment at thingles/WikiApiary#114 (comment) To what I have seen it seems to be connected to edits to the respective template holding the properties. I think it is a weird behaviour but this the only direct cause and effect thing I observed so far.

Contributor

mwjames commented Jul 9, 2014

Bill Whitson has reported a similar behaviour for MediaWiki 1.22.0 + Semantic MediaWiki (Version 1.9-RC1) + Semantic Forms (Version 2.6).

Contributor

mwjames commented Jul 9, 2014

Continuing my thoughts from #397 (comment):

SESP properties generally seems not to be missing which means that the process responsible for the store update is still active (namely the update hook).

Before the SemanticData container reaches the update hook it is expected that all other information (user-defined properties etc.) about a page have been added. In case the SemanticData container is empty the StoreWriter will override all existing values for this particular subject (page) because of an empty container (SESP data remain because there are added at a later point after the container was emptied or set empty).

As I explained above, the only point where SemanticData can be "filled" with annotations from a page is during the execution of InternalParseBeforeLinks. If InternalParseBeforeLinks doesn't return any annotation the SemanticData container will be empty. If for some reason InternalParseBeforeLinks is never reached before the store update it is also set to an empty status.

In order for annotations to be stored successfully, the execution of InternalParseBeforeLinks hook is required.

Contributor

mwjames commented Jul 9, 2014

SMW, uses the ParserOutput as intermediary "vehicle" to pass its data around before it reaches the store updater. This is necessary so that during page processing (which involves a lot of MediaWiki doing something and sometimes calling some hooks in order for custom functions to add something as well) can continue without SMW having to write to the DB.

The annotations that were found during InternalParseBeforeLinks are temporally attached to the ParserOutput object which is passed further down to MW until it reaches its final processing point where the SMW Store restores the information found in ParserOutput and transforms it into its own internal SemanticData object.

If for some reason during the processing, the ParserOutput was overridden or destroyed then of course SMW will assume by the time it tries the restore the information that no data were stored and uses an empty container as initial state to update a subject (page).

Contributor

mwjames commented Jul 9, 2014

@thingles Could you add the following, I'd like to track some activities (+ is marking the added line).

LocalSettings.php

$wgDebugLogGroups = array( 'smw-report' => 'YOUR LOCALTION/../debug-smw-report-log.txt' );

InternalParseBeforeLinks.php

    $this->parser->getOutput()->setProperty(
        'smw-semanticdata-status',
        $parserData->getSemanticData()->getProperties() !== array()
    );

+   wfDebugLog( 'smw-report', __METHOD__ . ' ' . $this->parser->getTitle()->getDBKey() . ' Properties: ' . count( $parserData->getSemanticData()->getProperties() ) . "\n" );

    return true;

LinksUpdateConstructed.php

+   wfDebugLog( 'smw-report' ,__METHOD__ . ' ' . $this->linksUpdate->getTitle()->getDBKey() . ' Properties: ' . count( $parserData->getSemanticData()->getProperties() ) . "\n" );

    $parserData->updateStore();

    return true;

This should report all activities in connection with user-defined annotation storage. The expectation is that when a SMW\InternalParseBeforeLinks::performUpdate entry with a positive properties count is directly followed by a SMW\LinksUpdateConstructed::process entry with a positive count, store data are expected not to be empty.

If for some reason SMW\LinksUpdateConstructed::process is reported with a zero property count it means that an empty container was forwarded to the store updater.

Contributor

thingles commented Jul 9, 2014

@mwjames I added these debug messages as described. I let it run for a few hours. Here is the log file..

There are 63 times that LinksUpdateConstructed had 0.

I also see stuff like this that I’m not sure if it’s an error

2014-07-09 16:54:17 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate WikiEducator Properties: 2
2014-07-09 16:54:17 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate WikiEducator Properties: 2
2014-07-09 16:54:17 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate WikiEducator Properties: 2
2014-07-09 16:54:17 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate WikiEducator Properties: 2
2014-07-09 16:54:17 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate WikiEducator Properties: 2
2014-07-09 16:54:18 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate WikiEducator Properties: 21

This pattern also seems odd…

2014-07-09 16:58:01 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/Whois Properties: 0
2014-07-09 16:58:02 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/Whois Properties: 5
2014-07-09 16:58:02 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/Maxmind Properties: 1
2014-07-09 16:58:02 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/Maxmind Properties: 6
2014-07-09 16:58:02 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/General Properties: 0
2014-07-09 16:58:02 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/General Properties: 5
2014-07-09 16:58:03 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/Extensions Properties: 1
2014-07-09 16:58:03 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/Extensions Properties: 6
2014-07-09 16:58:03 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/Interwikimap Properties: 0
2014-07-09 16:58:03 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/Interwikimap Properties: 2
2014-07-09 16:58:03 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/Namespaces Properties: 0
2014-07-09 16:58:03 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/Namespaces Properties: 2
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki/Skins Properties: 1
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki/Skins Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:04 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 3
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 6
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 19
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 85
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 85
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 85
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki Properties: 87
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki Properties: 87
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki Properties: 87
2014-07-09 16:58:07 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki Properties: 0

This was a result in the edit of the whois record for that wiki.

It’s worth noting that the root page "Rain World Wiki" had no edits, however, it does transclude the subpages. That transclusion would cause the root page to be refreshed as well.

Contributor

mwjames commented Jul 9, 2014

2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Rain_World_Wiki Properties: 85
2014-07-09 16:58:06 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki Properties: 87

This is an expected pair because it passes the properties from a page process to the updater (note 87 - 85 = 2, 2 means additional pre-defined properties such as category etc.).

What?

2014-07-09 16:58:07 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Rain_World_Wiki Properties: 0

The 2014-07-09 16:58:07 entry is disturbing because it is a single LinksUpdateConstructed without a corresponding InternalParseBeforeLinks and it has 0 properties which means the data container is empty and it will update the Store and override data from the previous 2014-07-09 16:58:06 update.

Looking at Rain_World_Wiki confirms reported activities, namely that it lost its user-defined properties because of the ZERO update from 2014-07-09 16:58:07.

Who?

Since there is no human interaction (last edit was 19:02, 24 June 2014‎ Audit Bee) recorded for "Rain World Wiki", who is making those updates from 2014-07-09?

How?

Transclusion ..., I don't really have time to go though all the templates and build a similar model therefore I would appreciated if you could post an simple example/description of how those master-page transclusion template scenarios are build and work together in WikiApiary so that I can try to create a simple example on my own (I don't have SF installed since it doesn't support Composer therefore any logic in a template or page should refrain from using SF functionality arraymap etc.). Preferably in a way so that I can copy and past.

mwjames added the discussion label Jul 10, 2014

Contributor

mwjames commented Jul 10, 2014

@thingles can you add an additional line for reporting in LinksUpdateConstructed.php

    wfDebugLog( ... )
+   wfDebugLog( 'smw-report' ,__METHOD__ . ' smw-semanticdata-status: ' . ( isset( $this->linksUpdate->mProperties['smw-semanticdata-status'] ) && $this->linksUpdate->mProperties['smw-semanticdata-status'] ? 'yes' : 'no' ) . "\n" );

I'm looking for entries with smw-semanticdata-status reporting no and a preceding SMW\LinksUpdateConstructed::process set to 0.

Contributor

thingles commented Jul 10, 2014

Transclusion …

The way WikiApiary uses transclusion is that Rain World Wiki transcludes each of the subpages that exist. So, the extension data is written in the wikitext of Rain World Wiki/Extensions, but is never parsed there, it is an a <includeonly> block. Then Rain World Wiki transcludes {{:{{PAGENAME}}/Extensions}} to pull in the properties. Note all pages are setting values via template calls.

The subpages do have edits. I believe that MediaWiki then fires off a job to refresh all pages that transclude that subpage. That is what triggers the update, I believe.

You could easily replicate this by having page Foo with

{{#set: PropA=foo}}
{{:{{PAGENAME}}/Bar}}

and then Foo/Bar

<includeonly>{{#set: PropB=bar}}</includeonly>

Then edit Foo/Bar and see Foo should get refreshed.

Contributor

thingles commented Jul 10, 2014

@thingles can you add an additional line for reporting in LinksUpdateConstructed.php

@mwjames I've added this and it’s now collecting. I'll let it run overnight and then share the details.

Contributor

thingles commented Jul 10, 2014

@mwjames additional line added and full log available.

Example lines I found that showed the behavior we saw before:

2014-07-10 03:15:57 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Dead_State_Wiki Properties: 86
2014-07-10 03:15:57 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process smw-semanticdata-status: yes
2014-07-10 03:15:57 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process Dead_State_Wiki Properties: 0
2014-07-10 03:15:57 pub2 mw_wikiapiary: SMW\LinksUpdateConstructed::process smw-semanticdata-status: yes
Contributor

mwjames commented Jul 10, 2014

@thingles @kghbln Blame MW, or the jobqueue or the template but don't blame SMW for what is happening.

With the following scenario I was able to recreate a SMW\LinksUpdateConstructed::process ... Properties: 0 entry:

LocalSettings (on a MediaWiki 1.23.0)

$wgJobRunRate = 0.01;

Template:Lorem_ipsum

<noinclude></noinclude><includeonly>{{#set:
 |Has number={{{number|}}}|Has quantity={{{quantity|}}}|Has date={{{date|}}}|Has Url={{{url|}}}|Has annotation uri={{{annotation uri|}}}|Has email={{{email|}}}|Has temperature={{{temperature|}}}|Has number5={{{number|}}} }}
}}</includeonly>

Page: ExampleOfEmptyTemplateEdit

{{:ExampleOfEmptyTemplateEdit/TransclusionOfValueTemplate}}

Page: ExampleOfEmptyTemplateEdit/TransclusionOfValueTemplate contains the actual template with values.

{{Lorem ipsum
|number=123456789
|quantity=13.25 km²
|date=1 Jan 2014
|url=http://loremipsum.org/Foo884
|annotation uri=http://loremipsum.org/foaf.rdf
|email=Lorem@ipsum.org
|temperature=133 °C
|boolean=false
}}

Issue

If I change Template:Lorem_ipsum with an edit and edit ExampleOfEmptyTemplateEdit/TransclusionOfValueTemplate on a continued basis without running the runJobs no update to the ExampleOfEmptyTemplateEdit page will be initiated due to the $wgJobRunRate setting.

Running runJobs after the edits, refreshJobs will update related pages together with ExampleOfEmptyTemplateEdit but this pages reports in the log with SMW\LinksUpdateConstructed::performUpdate ExampleOfEmptyTemplateEdit Properties: 0 (and or course lost all the user-defined properties).

Contributor

mwjames commented Jul 10, 2014

@thingles After wrecking my head about this issue I looked at RefreshLinksJob which has the following comment:

// If page_touched changed after this root job (with a good slave lag skew factor),
// then it is likely that any views of the pages already resulted in re-parses which
// are now in cache. This can be reused to avoid expensive parsing in some cases.

Experiment

Let's see if my hunch produces some positive results by disabling the following lines where instead of relying on RefreshLinksJob to be "intelligent" we are always re-parsing the content.

RefreshLinksJob.php (MW-core)

/*
        if ( isset( $this->params['rootJobTimestamp'] ) ) {
            $skewedTimestamp = wfTimestamp( TS_UNIX, $this->params['rootJobTimestamp'] ) + 5;
            if ( $page->getLinksTimestamp() > wfTimestamp( TS_MW, $skewedTimestamp ) ) {
                // Something already updated the backlinks since this job was made
                return true;
            }
            if ( $page->getTouched() > wfTimestamp( TS_MW, $skewedTimestamp ) ) {
                $parserOutput = ParserCache::singleton()->getDirty( $page, $parserOptions );
                if ( $parserOutput && $parserOutput->getCacheTime() <= $skewedTimestamp ) {
                    $parserOutput = false; // too stale
                }
            }
        }
*/

At all times the actual ParserOutput is fetched and not getDirty is doing some cheesy guessing here which produces a non-deterministic behaviour because sometimes it decides that the cache is too stale and sometimes it is not which would then force either a re-parse or not.

Having a re-parse is vital in order for InternalParseBeforeLinks to be executed.

mwjames removed the bug label Jul 10, 2014

Contributor

thingles commented Jul 11, 2014

@mwjames sorry for the delay. I've enabled the logging again and have commented out the lines you requested in includes/jobqueue/jobs/RefreshLinksJob.php. I'll let this go for a while, see what happens and share the log file. Very excited that you were able to replicate this behavior. That alone is a milestone!

Contributor

thingles commented Jul 12, 2014

@mwjames Grabbed the log file after running for the day with the commented out block in RefreshLinksJob.php.

I can’t seem to find any entries like the ones we saw before, but please verify on your end. Full log file.

With that said though WikiApiary is still losing millions of properties just today.

properties

Of note, the longstanding Semantic Forms bug 51479 made an appearance (/cc @yaronkoren @s7eph4n) :

2014-07-11 14:46:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123 Properties: 0
2014-07-11 14:46:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123 Properties: 0
2014-07-11 14:46:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123 Properties: 0
2014-07-11 14:46:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123 Properties: 0
2014-07-11 14:46:05 pub2 mw_wikiapiary: SMW\InternalParseBeforeLinks::performUpdate Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123 Properties: 0

FYI: I will be away from a computer most of this weekend.

Contributor

mwjames commented Jul 12, 2014

I can’t seem to find any entries like the ones we saw before

The SMW\InternalParseBeforeLinks::performUpdate with 0 I don't really mind because this was just to have a verification, SMW\LinksUpdateConstructed::performUpdate are the once that cause the issue when 0 is returned meaning an empty container.

With that said though WikiApiary is still losing millions of properties just today.

It can't be, RefreshLinksJob in connection with Template updates is the only place that could produce 0 ParserOutput and feet this to the LinksUpdateConstructed hook. Maybe their is another place that was introduced in MW 1.23 that does something like ParserCache::singleton()->getDirty( $page, $parserOptions ) that I'm not aware of.

The skew calculation as seen in the RefreshLinksJob was introduced in MW 1.23.

Contributor

thingles commented Jul 12, 2014

/cc @hexmode as maybe he is familiar with some related changes in 1.23. Hasn't this been observed in 1.22 as well though? Or is that specifically just the race condition issue.

Contributor

mwjames commented Jul 12, 2014

@hexmode Mark RefreshLinksJob introduced some changes in MW 1.23 (specifically skew calculation on parsed content [0]) which does some semi-intelligent guess work on behalf of the ParserOutput which prevents re-parsing of content and ultimately deletes SMW data [1]. This deletion is caused by the non-present of expected SMW data in the ParserOutput object that would normally be set during InternalParseBeforeLinks execution (this is similar to the redirect issue where parsing was circumvented) .

I leave it to Mark to decide how to proceed but the easiest and probably most satisfying
way to solve this is by adding something like wgUseRefreshJobSkewFactorCalculation which will only use ParserOuput skew factor calculation when it is enabled. It leaves users like @thingles to disable this functionality and ensures that page content is always parsed and not left to the job to decide when or when not to parse.

In RefreshLinksJob.php

if ( isset( $this->params['rootJobTimestamp'] &&
$wgUseRefreshJobSkewFactorCalculation ) ) {

[0] Looking at the RefreshLinksJob.php 1.22.8 we can clearly see that skew factor calculation was introduced in MW 1.23 (hence the non-present in MW 1.22.8).

[1] Issue can be reproduced by following #347 (comment)

mwjames changed the title from User-defined properties disappearing from pages to MW 1.23 / User-defined properties disappearing from pages Jul 12, 2014

mwjames added the mediawiki label Jul 12, 2014

Owner

JeroenDeDauw commented Jul 12, 2014

Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123

Haha. I wonder if this enables having fun :)

Contributor

s7eph4n commented Jul 12, 2014

Some_very_long_page_name_that_will_hopefully_never_get_created_ABCDEF123

Haha. I wonder if this enables having fun :)

What can I say. Blame it on evil globals...

@mwjames mwjames added a commit that referenced this issue Jul 14, 2014

@mwjames mwjames Hotfix for empty ParserOutput caused by RefreshLinksJob (MW 1.23+)
For details on the issue, see [0] as well as the method to replicate
the problem.

[0] #347 (comment)
e88bd38

@mwjames mwjames added a commit that referenced this issue Jul 14, 2014

@mwjames mwjames Hotfix for empty ParserOutput caused by RefreshLinksJob (MW 1.23+)
For details on the issue, see [0] as well as the method to replicate
the problem.

[0] #347#issuecomment-48595047

Change-Id: I5cf085ecad30a9807a16642226897a6b7caf8245
dc0bab6
Contributor

mwjames commented Jul 17, 2014

@thingles @kghbln What is the status? I'd like for you to test out #405 which doesn't fix the issue in RefreshLinksJob (as this a MW core issue) but it will mitigate (not solve) the problem by running an extra parse for all null edits which of course comes at an extra cost since we have to parse the whole page again (normally the data would be attached to the ParserOutput).

If you do check out this patch, revert all changes made in regards to this issue and only apply #405 and run composer dump-autoload if you fetch #405 via git.

Feedback is welcome so it can find its way into the 2.0/2.0.1 release.

mwjames referenced this issue Jul 17, 2014

Closed

2.0 release #367

24 of 24 tasks complete
Owner

kghbln commented Jul 17, 2014

@thingles It will be cool if you could switch WikiAPIary to #405 as suggested. Currently I only have very few content on my test wiki and the effect will proably be easier to see at WikiAPIary.

thingles referenced this issue in WikiApiary/WikiApiary Jul 20, 2014

Open

Import shodan list from wikiteam #170

Contributor

mwjames commented Jul 20, 2014

#405 has been merged into master, which means that if run composer update you should get the hotfix as well.

EDIT: Please make sure that all changes from above to SMW/MW-core are reverted before running the update.

Contributor

thingles commented Jul 20, 2014

@mwjames Thanks. I'll work on getting the update. On first attempt composer is complaining to me.

Your requirements could not be resolved to an installable set of packages.

  Problem 1
    - Installation request for mediawiki/semantic-media-wiki 1.9.x-dev -> satisfiable by mediawiki/semantic-media-wiki[1.9.x-dev].
    - remove mediawiki/semantic-media-wiki 1.9.x-dev|keep mediawiki/semantic-media-wiki dev-master
    - mediawiki/semantic-media-wiki 1.9.x-dev requires mediawiki/validator 1.*,>=1.0.1 -> satisfiable by mediawiki/validator[1.0.1].
    - Can only install one of: mediawiki/validator[2.0.4, 1.0.1].
    - Can only install one of: mediawiki/validator[2.0.4, 1.0.1].
    - mediawiki/semantic-media-wiki dev-master requires mediawiki/validator ~2.0,>=2.0.4 -> satisfiable by mediawiki/validator[2.0.4, 2.0.x-dev].
    - don't install mediawiki/validator 2.0.x-dev|install mediawiki/validator dev-master
    - Conclusion: don't install mediawiki/validator dev-master

I need to dig into what is happening there.

Contributor

mwjames commented Jul 20, 2014

It seems the current master (as of 2.0) needs mediawiki/validator": "~2.0,>=2.0.4" but your composer.lock file refers to dev-master": "1.9.x-dev" instead of dev-master": "2.0.x-dev"

@JeroenDeDauw Any thoughts?

Owner

JeroenDeDauw commented Jul 20, 2014

Well, something is referring to 1.9.x... It might well be the case that master of some extensions are not currently compatible with SMW master.

Contributor

thingles commented Jul 23, 2014

WikiApiary is now running 2.0-rc3 (ref WikiApiary/WikiApiary#177). Regarding above I updated my composer.json to ~2.0@dev as @JeroenDeDauw suggested in the email to Semantic MediaWiki users and all went well after that.

Contributor

mwjames commented Jul 24, 2014

While #405 tries to mitigate the effect of RefreshLinksJob has on the ParserOutput there is one scenario that we can't test (we were lucky to had a case to actual force the issue) and it will be nearly impossible for SMW to solve it.

#405 does a re-parse in case the ParserOutput is empty (as described here), now RefreshLinksJob does some magic to pull data from the cache and it could be that those data are stale but contain (however unlikely) some semantic data and in those cases the check in #405 will be negative and therefore it would not trigger an extra parse (as we expect those data to be the newest) and the update continues with data from the original ParserOutput.

If we still see those property values to decay then the only fix that avoid the issue is in core by doing #347 (comment) (well I hope not so but either ParserOutput contains the newest or no data anything in between is just a gamble).

@mwjames mwjames added a commit that referenced this issue Jul 24, 2014

@mwjames mwjames Clarify the reason behind #347
[skip ci]
dc47466

mwjames added this to the SMW 2.0 milestone Aug 5, 2014

mwjames closed this Aug 5, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment