Skip to content

Commit

Permalink
Fix indexing words with non-ASCII letters
Browse files Browse the repository at this point in the history
`Xapian::QueryParser` converts words to lowercase using `Unicode::tolower` (see
github.com/xapian/xapian/blob/v1.4.23/xapian-core/queryparser/queryparser.lemony#L1244),
so doxyindexer should do the same so that a term originally containing
uppercase letters could be found.

Besides that, a stemming algorithm should probably be selected using
correct language, but this change allows finding words at least
in the form which they are written in.
  • Loading branch information
bolshakov-a committed Oct 20, 2023
1 parent 3bcabe5 commit 77dfe6d
Showing 1 changed file with 2 additions and 4 deletions.
6 changes: 2 additions & 4 deletions addon/doxysearch/doxyindexer.cpp
Expand Up @@ -18,7 +18,6 @@
#include <cstdlib>
#include <iostream>
#include <string>
#include <algorithm>
#include <sstream>
#include <fstream>
#include <iterator>
Expand Down Expand Up @@ -92,9 +91,8 @@ static void addWords(const std::string &s,Xapian::Document &doc,int wfd)
std::istream_iterator<std::string> begin(iss),end,it;
for (it=begin;it!=end;++it)
{
std::string word = *it;
std::string lword = word;
std::transform(lword.begin(), lword.end(), lword.begin(), ::tolower);
const std::string word = *it;
const std::string lword = Xapian::Unicode::tolower(word);
safeAddTerm(word,doc,wfd);
if (lword!=word)
{
Expand Down

0 comments on commit 77dfe6d

Please sign in to comment.