Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext searching in datastore query #3812

Merged
merged 9 commits into from
Aug 5, 2022
Merged

Fulltext searching in datastore query #3812

merged 9 commits into from
Aug 5, 2022

Conversation

dafeder
Copy link
Member

@dafeder dafeder commented Jul 29, 2022

This is a somewhat experimental feature for now and will not be fully documented until we have used it a bit more in the wild. But this allows fulltext boolean match queries against a datastore resource in MySQL if the column(s) being queries are properly indexed.

This PR also adds two new keys to the schema object in the datastore response. One is indexes which lists all standard indexes on the database table, as per the Drupal Schema API spec. Another is fulltext indexes, which follows the same pattern. This is not part of the official Drupal Schema spec but I felt was needed to signal to the user what columns are available for fulltext searching, and matches the rest of the spec as closely as possible.

QA steps

This will need to be test locally since for now we need to create the fulltext indexes manually. Set up a local site with the demo content.

  1. Ensure that you have a datastore at /api/1/datastore/query/d460252e-d42c-474a-9ea9-5287b1d595f6/0
  2. Try running a query before indexing. You should get a 400 error "You have attempted a fulltext match against a column that is not indexed for fulltext searching."
POST https://fulltext-query.ddev.site/api/1/datastore/query/d460252e-d42c-474a-9ea9-5287b1d595f6/0

{
  "conditions": [
    {
      "property": "city",
      "operator": "match",
      "value": "York"
    }
  ]
}
  1. Use drush dkan:dataset-info d460252e-d42c-474a-9ea9-5287b1d595f6 to get the table name of the datastore. Let's say its datastore_efee6d8c0ee35d0b1a161743806d018d.
  2. Connect to your db CLI with drush sqlc.
  3. Create a fulltext index on the city field:
ALTER TABLE datastore_efee6d8c0ee35d0b1a161743806d018d ADD FULLTEXT INDEX `cityfulltext` (`city`);
  1. Retry the query from step 2.
  2. Confirm that you see a result and got a single hit for "New York."
  3. Try again with a wildcard:
POST https://fulltext-query.ddev.site/api/1/datastore/query/d460252e-d42c-474a-9ea9-5287b1d595f6/0

{
  "conditions": [
    {
      "property": "city",
      "operator": "match",
      "value": "Yor*"
    }
  ]
}
  1. Confirm that you still get a hit for "New York".
  2. Check the schema object in the response and confirm that the fulltext index is listed properly.
  3. Now let's try an index with two columns. Go back to the db CLI and let's create a second index that combines city and state:
ALTER TABLE datastore_efee6d8c0ee35d0b1a161743806d018d ADD FULLTEXT INDEX `citystate` (`city`, `state`);
  1. We can try a few queries that should demonstrate how this can be leveraged:
POST https://fulltext-query.ddev.site/api/1/datastore/query/d460252e-d42c-474a-9ea9-5287b1d595f6/0

{
  "conditions": [
    {
      "property": "city,state",
      "operator": "match",
      "value": "San"
    }
  ]
}
POST https://fulltext-query.ddev.site/api/1/datastore/query/d460252e-d42c-474a-9ea9-5287b1d595f6/0

{
  "conditions": [
    {
      "property": "city,state",
      "operator": "match",
      "value": "Texas"
    }
  ]
}
  1. Confirm that in both cases you get multiple hits from both city and state fields. Also confirm that the newer fulltext index is included correctly in the schema object.

@dafeder
Copy link
Member Author

dafeder commented Jul 29, 2022

CodeClimate is complaining about 21 methods in SelectFactory. I think a refactor there that would actually lead to cleaner code would be a big lift and make more sense as part of work to pull AbstractDatbaseTable and its derivatives out of the rest of DKAN. That's way out of scope for this issue so I'd argue that we should mark this as a known issue for now but give it a pass. I'll leave that up to whoever reviews this though.

@dafeder dafeder marked this pull request as ready for review July 29, 2022 21:08
@dafeder dafeder requested a review from clayliddell July 29, 2022 21:12
Copy link
Contributor

@clayliddell clayliddell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall structure looks good. I've left a couple suggestions related to test coverage and one comment related to the new index portion of the schema.

Co-authored-by: Clayton Liddell <clayton.liddell@civicactions.com>
@dafeder
Copy link
Member Author

dafeder commented Aug 3, 2022

Whoops somehow merging those suggestions broke everything @clayliddell - will investigate. FWIW I'm not a big fan of that feature in github. Also I noticed a potential issue in my code - I just have that one placeholder :words for the matching terms, which will cause an issue if someone adds multiple MATCH conditions to a single query. I'll add an iterator to the end of it so we get :words0 words1 etc.

@clayliddell clayliddell self-requested a review August 5, 2022 21:10
@clayliddell clayliddell merged commit f432511 into 2.x Aug 5, 2022
@clayliddell clayliddell deleted the fulltext-query branch August 5, 2022 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants