New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DISTINCT with GROUP BY in SQLStore::getPropertySubjects #3142

Merged
merged 1 commit into from Apr 28, 2018

Conversation

Projects
None yet
1 participant
@mwjames
Copy link
Contributor

mwjames commented Apr 28, 2018

This PR is made in reference to: #2040, #305, #2519

This PR addresses or contains:

  • Replaces DISTINCT with GROUP BY
  • Using GROUP BY smw_sort, smw_id to retain distinctiveness

This PR includes:

  • Tests (unit/integration)
  • CI build passed

Issue

The following analysis is based on a smw_object_ids table with 177,925 rows and a smw_di_wikipage table with 1,478,566 rows.

A SELECT DISTINCT in combination with a ORDER BY will in most events cause a filesort as demonstrated in the SMWSQLStore3Readers::getPropertySubjects method.

SELECT DISTINCT smw_title, smw_namespace, smw_iw, smw_subobject, smw_sortkey, smw_sort
FROM `smw_object_ids`
INNER JOIN `smw_di_wikipage` AS t1 ON t1.s_id=smw_id
WHERE t1.p_id='310167' AND smw_iw!=':smw' AND smw_iw!=':smw-delete' AND smw_iw!=':smw-redi'
ORDER BY smw_sort
LIMIT 51

14350.2071ms | SMWSQLStore3Readers::getPropertySubjects

The explain output shows that is has to read 177050 rows into a temporary_table before in can order them.

{
 "query_block": {
   "select_id": 1,
   "filesort": {
     "temporary_table": {
       "function": "buffer",
       "table": {
         "table_name": "smw_object_ids",
         "access_type": "range",
         "possible_keys": ["PRIMARY", "smw_iw", "smw_iw_2", "smw_id"],
         "key": "smw_iw",
         "key_length": "34",
         "used_key_parts": ["smw_iw"],
         "rows": 177050,
         "filtered": 100,
         "index_condition": "((smw_object_ids.smw_iw <> ':smw') and (smw_object_ids.smw_iw <> ':smw-delete') and (smw_object_ids.smw_iw <> ':smw-redi'))"
       },
       "table": {
         "table_name": "t1",
         "access_type": "ref",
         "possible_keys": ["p_id", "s_id", "s_id_2", "s_id_3", "p_id_2"],
         "key": "s_id",
         "key_length": "8",
         "used_key_parts": ["s_id", "p_id"],
         "ref": ["mw-master.smw_object_ids.smw_id", "const"],
         "rows": 3,
         "filtered": 100,
         "using_index": true,
         "distinct": true
       }
     }
   }
 }
}

Solution

Adding a new smw_sort,smw_id index and replacing DISTINCT with GROUP BY achieves an improvement in terms of query execution time from 14350.2071ms to 533.3791ms . This also avoids a temporary table which would cause a filesort and hereby a lengthen in execution time.

SELECT smw_title, smw_namespace, smw_iw, smw_subobject, smw_sortkey, smw_sort
FROM `smw_object_ids`
INNER JOIN `smw_di_wikipage` AS t1 ON t1.s_id=smw_id
WHERE t1.p_id='310167' AND smw_iw!=':smw' AND smw_iw!=':smw-delete' AND smw_iw!=':smw-redi'
GROUP BY smw_sort, smw_id
ORDER BY smw_sort LIMIT 51

533.3791ms | SMWSQLStore3Readers::getPropertySubjects
{
 "query_block": {
   "select_id": 1,
   "table": {
     "table_name": "smw_object_ids",
     "access_type": "index",
     "possible_keys": ["PRIMARY", "smw_iw", "smw_iw_2", "smw_id"],
     "key": "smw_sort",
     "key_length": "262",
     "used_key_parts": ["smw_sort", "smw_id"],
     "rows": 17,
     "filtered": 100,
     "attached_condition": "((smw_object_ids.smw_iw <> ':smw') and (smw_object_ids.smw_iw <> ':smw-delete') and (smw_object_ids.smw_iw <> ':smw-redi'))"
   },
   "table": {
     "table_name": "t1",
     "access_type": "ref",
     "possible_keys": ["p_id", "s_id", "s_id_2", "s_id_3", "p_id_2"],
     "key": "s_id",
     "key_length": "8",
     "used_key_parts": ["s_id", "p_id"],
     "ref": ["mw-master.smw_object_ids.smw_id", "const"],
     "rows": 3,
     "filtered": 100,
     "using_index": true
   }
 }
}

@mwjames mwjames added the performance label Apr 28, 2018

@mwjames mwjames added this to the SMW 3.0.0 milestone Apr 28, 2018

@mwjames mwjames merged commit aac21e2 into master Apr 28, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@mwjames mwjames deleted the db-group-by branch Apr 28, 2018

@mwjames mwjames referenced this pull request Jul 28, 2018

Merged

Add DataItemHandler::getIndexHint, refs 3142 #3261

1 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment