Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"rebuildData.php" emitting page content rather than page title #2001

Closed
kghbln opened this issue Nov 8, 2016 · 12 comments
Labels
bug

Comments

@kghbln
Copy link
Member

@kghbln kghbln commented Nov 8, 2016

Setup and configuration

  • MediaWiki 1.28.0-rc.0 (7625c75) 15:14, 3. Nov. 2016
  • PHP 5.6.27-0+deb8u1 (apache2handler)
  • MariaDB 10.0.28-MariaDB-1~jessie
  • Semantic MediaWiki 2.4.1 (20996e2) 00:54, 7. Nov. 2016

Issue

When running "rebuildData.php" on sandbox sometimes the "page content" rather than the page title is emitted:

Example one

Script output:

(3460/1640)     Finished processing ID 3461 (Utilisateur:Lalquier)
(3462/1640)     Finished processing ID 3463 (This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the#0#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_)
(3463/1640)     Finished processing ID 3464 (Lalquier#2#_QUERYf2efbfaa4ebf54f9a38b0e5c7b2c76dd)

ID lookup:

[
    3463,
    {
        "smw_title": "Lalquier",
        "smw_namespace": "2",
        "smw_iw": "",
        "smw_subobject": "_QUERYf2efbfaa4ebf54f9a38b0e5c7b2c76dd",
        "smw_sortkey": "Lalquier"
    }
]

Wiki page: see here

Example two

Script output:

(3842/1640)     Finished processing ID 3843 (SEA:Internal#0#_QUERYc62ef87cee27cd0afd3f6a2ad07efd95)
(3843/1640)     Finished processing ID 3844 (This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the#0#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._"Is_specified_asWorld_Heritage_Site")_to_build_structured_and_queryable_content_provided_)
(3844/1640)     Finished processing ID 3845 (SubobjectTemplateLinkNone#0#_QUERY18ce294ba9208058a3e2d35798d8c299)

ID lookup:

[
    3844,
    {
        "smw_title": "SubobjectTemplateLinkNone",
        "smw_namespace": "0",
        "smw_iw": "",
        "smw_subobject": "_QUERY18ce294ba9208058a3e2d35798d8c299",
        "smw_sortkey": "SubobjectTemplateLinkNone"
    }
]

Wiki page: see here

This can perhaps be seen in context of #1963 though links in values are not configured for the wiki.

@kghbln kghbln added the discussion label Nov 8, 2016
@mwjames

This comment has been minimized.

Copy link
Contributor

@mwjames mwjames commented Nov 12, 2016

This can perhaps be seen in context of #1963 though links in values are not configured for the wiki.

I cannot really place this content therefore does the following query return some results?

SELECT * 
FROM  `smw_object_ids` 
WHERE  `smw_title` LIKE  '%This_page_supports_semantic_in-text_annotations%'
ORDER BY  `smw_object_ids`.`smw_iw` DESC 
LIMIT 0 , 30
@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Nov 12, 2016

I cannot really place this content therefore does the following query return some results?

See this Gist: https://gist.github.com/kghbln/d342edf390b9f48355992b72dba8d862

@mwjames

This comment has been minimized.

Copy link
Contributor

@mwjames mwjames commented Nov 12, 2016

See this Gist: https://gist.github.com/kghbln/d342edf390b9f48355992b72dba8d862

This is good so we know there are no gost-pages. Those are left-overs from #1963 but the PropertyTableIdReferenceDisposer should catch them but I'm guessing that because they have a subobject attached, [0] comes into play.

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/SQLStore/EntityRebuildDispatcher.php#L269-L271

@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Nov 12, 2016

[0] comes into play.

I trust your assessment here. Note that that these left-overs also get created when rebuilding data from scrap. Due to the switch to 2.4.x back and forth I always deleted the semantic backend and newly created the whole lot of data including the left-overs.

@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Mar 4, 2017

@mwjames Is this an issue that could/should somehow be addressed with a pull or so?

@mwjames

This comment has been minimized.

Copy link
Contributor

@mwjames mwjames commented Mar 4, 2017

Is this an issue that could/should somehow be addressed with a pull or so?

I would have to find a way to replicate the issue locally, repeatedly and consistently to see what's causing it. During general testing, I didn't come across such issue therefore it be would be rather difficult for me to make time for a zero-point investigation.

@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Mar 4, 2017

Thanks for the info. Indeed the wikis showing this behaviour do not fall apart and the processes are not interrupted in any way. So there is something in the water but it is not top priority. Fair enough I believe. :)

@kghbln kghbln added bug and removed discussion labels Mar 4, 2017
@kghbln kghbln mentioned this issue Jan 6, 2018
2 of 2 tasks complete
@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Aug 7, 2018

This is still happening but not doing any harm for quite some time. So this one may be reopened in the future in case of worries.

@kghbln kghbln closed this Aug 7, 2018
@mwjames

This comment has been minimized.

Copy link
Contributor

@mwjames mwjames commented Aug 7, 2018

@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Aug 7, 2018

[0] contains an analysis of what is happening for this particular case.

Thanks for the pointer. Great! This issue cannot longer be observed!

@mwjames

This comment has been minimized.

Copy link
Contributor

@mwjames mwjames commented Aug 11, 2018

[0] contains an analysis of what is happening for this particular case.
Thanks for the pointer. Great! This issue cannot longer be observed!

{{#set:Description=This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.This page supports semantic in-text annotations (e.g. "Is specified asWorld Heritage Site") to build structured and queryable content provided by Semantic MediaWiki. For a comprehensive description on how to use annotations or the #ask parser function, please have a look at the getting started, in-text annotation, or inline queries help page.}}

@kghbln I was a bit unsatisfied with the analysis of the real cause of the issue, so let me reiterate.

The issue is twofold, the annotation to Description (which is a page type property) with a value containing # will be parsed as Title text with a fragment (and in case of SMW is identified as fake subobject). The parsing in itself doesn't do any harm and while # should be avoided in a page title (or value annotation), using it within SMW doesn't cause any issues even though it occupies the smw_subobject ID field.

The issue is that when SMW tries to find an ID for such entity it will use the entire string to match the smw_subobject field and in the above case exceeds the 255 char length restriction of MySQL/MariaDB.

SELECT /* SMWSql3SmwIds::getDatabaseIdAndSort */  smw_id,smw_sortkey,smw_sort 
FROM `smw_object_ids`
WHERE smw_title = 'This_page_supports_semantic_in-text_annotations_(e.g._\"Is_specified_asWorld_Heritage_Site\")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the' AND smw_namespace = '0' AND smw_iw = '' AND smw_subobject = 'ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._\"Is_specified_asWorld_Heritage_Site\")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the_#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.This_page_supports_semantic_in-text_annotations_(e.g._\"Is_specified_asWorld_Heritage_Site\")_to_build_structured_and_queryable_content_provided_by_Semantic_MediaWiki._For_a_comprehensive_description_on_how_to_use_annotations_or_the_#ask_parser_function,_please_have_a_look_at_the_getting_started,_in-text_annotation,_or_inline_queries_help_page.'  LIMIT 1

Unfortunately above query will always return false on MySQL/MariaDB hence causes the ID request to create a new ID whenever it tries to match the string value for that annotation So, the duplicates observed are caused by an issue of long content field matches in MySQL/MariaDB.

Creating the same use case on Postgres and executing the same query doesn't create any duplicates for the said entity because Postgres has no field length restriction on the involved fields.

MySQL/MariaDB should find the ID even with a truncated content value on a restricted field but it doesn't, so we have to account for this.

@kghbln

This comment has been minimized.

Copy link
Member Author

@kghbln kghbln commented Aug 11, 2018

Very interesting information. Thanks a lot for the elaboration which will help others too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.