Skip to content

Commit

Permalink
Merge pull request #7378 from QualitativeDataRepository/IQSS/820
Browse files Browse the repository at this point in the history
Handle non-ascii chars in search
  • Loading branch information
kcondon committed Mar 11, 2021
2 parents ae799ac + 768ed3c commit bae37ca
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
4 changes: 3 additions & 1 deletion conf/solr/8.8.1/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
catchall "text" field, and use that for searching.
-->

<schema name="default-config" version="1.6">
<schema name="default-config" version="1.7">
<!-- attribute "name" is the name of this schema and is only used for display purposes.
version="x.y" is Solr's version number for the schema syntax and
semantics. It should not normally be changed by applications.
Expand Down Expand Up @@ -566,6 +566,7 @@
<filter class="solr.KeywordRepeatFilterFactory" />
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
Expand Down Expand Up @@ -616,6 +617,7 @@
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.FlattenGraphFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
Expand Down
10 changes: 10 additions & 0 deletions doc/release-notes/820-non-ascii-chars-in-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
(review these notes if this gets into the same release as #7645 as the steps are included there - we expect to include this in the same release)

### Search with non-ascii characters

Many languages include characters that have close analogs in ascii, e.g. (á, à, â, ç, é, è, ê, ë, í, ó, ö, ú, ù, û, ü…). This release changes the default Solr configuration to allow search to match words based on these associations, e.g. a search for Mercè would match the word Merce in a Dataset, and vice versa. This should generally be helpful, but can result in false positives.,e.g. "canon" will be found searching for "cañon".

## Upgrade Instructions

1. You will need to replace or modify your `schema.xml` and restart solr. Re-indexing is required to get full-functionality from this change - the standard instructions for an incremental reindex could be added here.

0 comments on commit bae37ca

Please sign in to comment.