Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search_api_solr support for EDTF #962

Open
seth-shaw-unlv opened this issue Oct 29, 2018 · 17 comments
Open

search_api_solr support for EDTF #962

seth-shaw-unlv opened this issue Oct 29, 2018 · 17 comments
Labels
Subject: Drupal related specifically to Drupal, usually pointing somewhere on drupal.org
Milestone

Comments

@seth-shaw-unlv
Copy link
Contributor

Currently EDTF field values do not fit the SOLR syntax for DateField or DateRangeField. (See SOLR "Working with Dates".)

E.g. EDTF uses a "/" to separate the beginning and end of a date range whereas SOLR wraps ranges in square brackets and uses " TO " as a separator. This would mean converting 2000/2018 to [2000 TO 2018].

We can write a Drupal Search API index preprocessor to do the conversion. The simplest example processor to follow as a guide is probably IgnoreCase.

@ppound
Copy link
Member

ppound commented Dec 4, 2018

@dannylamb, @seth-shaw-unlv If no one else is working on this feel free to assign it to me and I can take a look at it.

@seth-shaw-unlv
Copy link
Contributor Author

@ppound, funny you should say that. I just started working on it this morning. The search_api processors are new to me, so we'll see how it goes. I'll ping you if I get stuck.

@seth-shaw-unlv
Copy link
Contributor Author

So, it looks like my attempt simply use widgets and formatters on a text field is coming back to bite me. The Drupal Search API wants to parse the values as dates before we even get to the Processor plugins but DateTimePlus isn't a fan of our string value and throws an error before we can do anything about it.

It looks like we will need to create an actual FieldType to make this work...

@seth-shaw-unlv
Copy link
Contributor Author

The index field's datatype setting is what sets the SOLR schema, so if we want SOLR to view a field as a date, we need to declare the datatype as such there. However, the Search API FieldsHelper will pull the field value and try to parse fields with the date data type using date_parse. To get around this behavior we need to extend the SOLR DateRangeDataType and override getValue so that we can transform it to an ISO friendly format first.

@seth-shaw-unlv
Copy link
Contributor Author

Nevermind, providing a new DataType doesn't work either, because search_api_solr has a hard-coded list of data types it supports. So if we want this to work we need to either extend or mimic the Datetime Range FieldType.

@ppound
Copy link
Member

ppound commented Dec 5, 2018

Ok I poked around at this a bit as well before I saw your message. It sounds like we went down the same paths. There is some code here ppound/controlled_access_terms@31b2f16 that will index the fields into solr as daterange fields (after enabling the processor and setting the fields to the correct type in the search-api config) searching within solr works but I haven't tried searching from within Drupal yet (which is probably where I'll get stuck too).

@seth-shaw-unlv
Copy link
Contributor Author

@ppound I tried your branch and it still won't index in SOLR (6.6.5) for me. Also, the Search API Data Types will only use the String fallback.
screen shot showing EDRF Date range as not supported with the fallback data type as String

I think we really will need to revamp EDTF to get it to work.

@ppound
Copy link
Member

ppound commented Dec 5, 2018

Yeah I agree on the EDTF revamp.

Using the dr prefix in the datatype annotations will give us daterange fields in solr but nothing else in drupal knows that they are dateranges.
screen shot 2018-12-05 at 12 57 34 pm

@seth-shaw-unlv
Copy link
Contributor Author

This doesn't seem to be documented anywhere, so I'm making a note here: using the Date Range data type requires SOLR 7.x. If you select the Date Range type with SOLR 5.x or 6.x it will silently fail to index the field; you have to use the Date data type and index end_value as a separate date field.

@seth-shaw-unlv
Copy link
Contributor Author

Made some progress.

I have a new EDTF FieldType that repurposes the existing widget and formatter. The search api seems to work as single values are successfully indexed in Solr 7.x as date ranges!

Multi-values don't work yet nor have I attempted the JSON-LD pieces. Also, don't enable the controlled_access_terms_default_configuration as I haven't updated those configs to use the new field yet. (Also, there is plenty of code cleanup that could be done.)

@seth-shaw-unlv
Copy link
Contributor Author

Bah, I'm walking away from this. 😒 I've gotten SOLR to take the date ranges but not as single dates. Also, it doesn't appear that the search API wants to query them anyway; the facets module barely supports datetime and doesn't support datetime_range at all. You probably could get it to work by writing several custom plugins, but it doesn't seem worth it just to get a nice slider facet.

It looks like string-based EDTF, as suggested during the recent call, is the best way to go. It indexes just fine:
screen shot of the SOLR admin query screen showing the results of a query, including edtf dates as strings
and you can produce decent facets with it:
screen shot of a search results page including a date facet block on the right

I think we may need to stick with that, for now.

@seth-shaw-unlv
Copy link
Contributor Author

Note: if you want to spin up what I have so far:

  1. pull down the claw-playbook
  2. update the drupal_composer_dependencies variable in inventory/vagrant/group_vars/webserver/drupal.yml to use 'islandora/islandora_demo:dev-issue-962'.
  3. search for 'controlled_access_terms_default_configuration' and replace with 'controlled_access_terms_defaults' (should make three replacements)
  4. vagrant up

That should spin you up a fresh instance with all the various EDTF fields now set to EDTF FieldType instead of string.

@whikloj whikloj added this to the 1.x milestone Apr 11, 2019
@kspurgin
Copy link
Contributor

My concern about indexing EDTF dates (and a number of other fields currently set up as strings by default) as strings in Solr is that Solr string data type does not permit partial match.

Thus, in your screenshot above, if you searched for 1945, you aren't going to get the item with 1945/1947 as a result.

Likewise, a search for 1946 will only return the 22 items with that exact value, and will not include the 4 with 1946-06 or 1946-06-14

That's great for when you click on the facet value, but not so great if you let users type in a search. They will always be getting artificially small search sets.

Have we done something under the hood to help search work as expected on string fields? (I don't remember details, but on another project I worked on, I think we ended up defining a "string-like" Solr field type that didn't get any of the language-processing (stemming, etc) treatment but got whatever basic edge/ngram processing was necessary to make exact-but-partial string match work)

I ask because I believe I'm looking at an out-of-the-box Islandora install that has some custom field types like Fulltext "edgestring" and Fulltext "ngramstring" in the data types area at the bottom of admin/config/search/search-api/index/default_solr_index/fields, but all they say for Description is "Custom full text field"

@seth-shaw-unlv
Copy link
Contributor Author

seth-shaw-unlv commented Dec 14, 2020

I admit to being a SOLR novice. I haven't played with any of the other Fulltext variants to see how they impact search results (yet, it is on the list).

I should also note, while I'm at it, that there has been a number of conversations on this topic, mostly on Slack, since I last made an update in late 2018. The current thinking is that a Search API processor is the best way forward, instead of trying to extend the DateTime fieldtype. The most progress has been made by @joecorall and @elizoller who have implemented year-based date facets (omitting months, days, etc.) by using a field processor to index the year of an ETDF date.

@joecorall
Copy link
Member

FWIW, here's the processor being used for the EDTF year facet on Open Access Kent State: https://gist.github.com/joecorall/fa914809af3304cdd98194d929d1bad9

@kspurgin
Copy link
Contributor

kspurgin commented Feb 5, 2021

meta-issue: #1748

elizoller added a commit to Islandora/controlled_access_terms that referenced this issue May 3, 2021
@seth-shaw-unlv
Copy link
Contributor Author

Instead of leaving my EDTF as a FieldType branch lying around cluttering things up I decided to simply make a patch file and post it here in case anyone wants to come back and reference it.

@kstapelfeldt kstapelfeldt added Subject: Drupal related specifically to Drupal, usually pointing somewhere on drupal.org and removed drupal labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Subject: Drupal related specifically to Drupal, usually pointing somewhere on drupal.org
Projects
Development

No branches or pull requests

6 participants