-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade to Solr 9 #572
Comments
refer to commits on Princeton-CDH/geniza#1359 |
With this upgrade, we'll need to make updates to the solr config file ( Note that this also seems to relate to some of the failed tests I'm running into (I'm using Solr 8.11.2). |
@laurejt I pulled your branch and uploaded the config files to a new configset in my existing docker solr 9.2 instance. I was able to create a new collection with that configset, and had no problem indexing works ( When I tried to index page content ( Not sure if we want to commit index field changes yet, so here's what I did:
|
Potential Issue: deprecation of Trie fields Is it okay to just convert these instances to their corresponding Point fields? It looks like the "equivalent" declarations do not set the |
I found another thing we need to handle, just added to our checklist: we have a couple of copy fields defined in the old version of the schema https://github.com/Princeton-CDH/ppa-django/blob/main/solr_conf/conf/managed-schema#L529-L530 This is so we can index things two different ways: in this case, one version for tokenized text-based search and one for facet/sort. |
Is there a reason that managed-schema#L135, doesn't set |
Thanks for flagging. I think the only date field we have is the last modified timestamp, which we aren't faceting on; we only use it for last-modified header checks. We use an integer field for the pubdate, which we do facet on. So I think this is ok, but we'll want to check the last-modified header behavior. (Could be formatting issues, maybe) |
Probably because I didn't know about this setting! It looks useful for non-contiguous text blocks; I can't think of anywhere this wouldn't be helpful. |
Oddly, additional I'm going to go ahead and ignore / drop these, since the type is deprecated as it is. |
One thing to note is that this holds true for all other basic numeric types (e.g., |
For |
Worth highlighting that the field type |
For |
Should dynamic fields of the form |
Sounds like an oversight and/or leftover; let's remove the redundant filter. |
I think that the facet and sort fields are ones we will want to look at once we get things basically working with solr 9 - I remember customizing those some (like adding unicode folding, I think?) but not the specifics. I do remember we had to use text instead of string because you can't apply filters like unicode folding to string fields. |
|
I like the idea of making them consistent with the other fields! The inheritance makes that more complicated, doesn't it? So then would |
To clarify, I don't meant to replace that part; just the lowercasing and removal of hyphens that occurs after. |
In the example / default schema |
Oh! Thank you for clarifying, I didn't read it properly with the code wrapping on GitHub. That sounds smart, although may or may not be worth the overhead! Do we have an existing field that would handle this, or would need to customize (I don't even remember now why we did it this way...) |
We might have to create one. I'm guessing |
Tested functionality described in testing notes as well as some advanced Solr syntax and fielded keyword searches. Everything looks good as far as I can tell. @rlskoeser kicking back to you. |
@mnaydan hyphenation filter is ready for you test now. Do you need help finding examples? Here's a phrase I just found that works on the test site now but not in production: "ability to pass from one register" |
This is cool! Thanks for the example. Here are some more examples of it working: "slight alterations of manuscript readings" working on test but not on production; "conflicting phenomena" working on test but not on production. One strange case I found didn't highlight the whole phrase because - I figured out - there's a white space either at the end of the line or at the beginning of the next line. Here is the search without the white space, and the search with the white space. Since this was a cool bonus feature anyway, I think I'm in agreement with your original impulse to keep it as is and close this issue. We can always revisit later and make a new issue if we want to add "hypen-whitespace" search functionality, after we have more information about the kinds of instances from our NLP work. @rlskoeser what do you think? |
@mnaydan thanks for the testing and the cool examples! I noticed, and see in your highlighting for the last example, that in a lot of cases the stemming is giving us matches for these hyphenated words, although not always. I'm inclined to close this and get a release out with the Solr upgrade, and then keep it in mind when we decide how to handle hyphenations for the NLP work. |
testing notes
dev notes
note: use
[skip ci]
on commits until we get the hanging unit test issue resolved, so we don't have tests hanging on github actionsThe text was updated successfully, but these errors were encountered: