Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wiki search fails if restricted to a namespace whose ID contains '-' or '.' #1659

Closed
jscheiber opened this issue Aug 9, 2016 · 7 comments · Fixed by #2286
Closed

Wiki search fails if restricted to a namespace whose ID contains '-' or '.' #1659

jscheiber opened this issue Aug 9, 2016 · 7 comments · Fixed by #2286

Comments

@jscheiber
Copy link

Applies to: "Detritus" and "Elenor of Tsort"
Modules: Wiki core search and searchform plugin

Description

Namespaces containing the special the characters '-' or '.' like
dev:releases:release_2016-04-02
or
cal:2015:date.20150801

are valid according to Wiki documentation, but when restricting search to such namespace with special characters, Wiki search and related plugins (searchform) do not show quicksearch and full search results e.g.:

changelog @dev:releases:release_2016-04-02

fails to show the "changelog" page in this namespace, whereas a search starting in the parent namespace

changelog @dev:releases

shows all changelogs in all sub-spaces including "release_2016-04-02"

Why is this issue important?

Beside the general confusion of not finding things in certain areas, this bug currently prevents us from using DokuWiki as lightweight CRM, since many namespaces for e.g. actions, dates, projects, contacts, offers, etc need a unique naming convention

  • that does not conflict with the "usual" pages in knowledgbase namespaces
  • and frequently requires a date identifier; but the bureaucracy plugin provides dates and creates pages with the '-' character in YYYY-MM-DD notation

e.g.:
c.contactname ... for contacts
dt.YYYY-MM-DD_datename ... for dates, meetings
a.YYYY-MM-DD_actionname, prj.projectname, wp.workpackagename, ... etc.

Most of these namespaces get automatically a sidebar that allows search within this contact / project / ... (plugin searchform).

Since the underscore is frequently used for all sort of pages, its easy to discriminate between automated CRM-pages and normal user pages, if the prefix delimiter is a "." and users are requested not to use dots in their page names.

@Klap-in
Copy link
Collaborator

Klap-in commented Aug 9, 2016

The query parser is function ft_queryParser($Indexer, $query)
https://github.com/splitbrain/dokuwiki/blob/master/inc/fulltext.php#L552

@splitbrain
Copy link
Collaborator

Only exact namespace will work. Eg. @playground:dev:release does not work (doesn't exist) but @playground:dev:releases works (note the trailing s).

The search for the zh-tw namespace shows that the feature is generally working.

@jscheiber could there be something else interfering with your search results? Can you reproduce the problem at dokuwiki.org? Does the problem only occur when searching from the searchform plugin or from the "normal" search field in the dokuwiki template as well?

@Klap-in
Copy link
Collaborator

Klap-in commented Aug 9, 2016

Zh-tw example is wrong, because it is not searching for page name.

@jscheiber
Copy link
Author

jscheiber commented Aug 9, 2016

@splitbrain

1. Reproduction at dokuwiki.org

Only partial reproduction so far:
First, thanks for pointing me to the feature of using the dokuwiki.org playground
Namespace testing setup - using pages page1 and page2 in the main namespace and in the LOG sub-space:

  • playground:jscheiber:start
  • playground:jscheiber:case_2016-08-08:start (title: TC1 - Test Case 2016-08-08)
  • playground:jscheiber:case_2016-08-08:page1 (title: TC1 - Page 1)
  • playground:jscheiber:case_2016-08-08:page2 (title: TC1 - Page 2)
  • playground:jscheiber:case_2016-08-08:log:page1 (title: Log 1)
  • playground:jscheiber:case_2016-08-08:log:page2 (title: Log 2)

Full search works fine with:
TC* @playground:jscheiber:case_2016-08-08
LOG @playground:jscheiber:case_2016-08-08
(finds TC and LOG strings in page titles)

It fails for a page name search:
page1 @playground:jscheiber:case_2016-08-08

but page name search works if started in parent namespace:
page1 @playground:jscheiber

The results are a bit different from our setup, since ours also fails for page titles, but this could be due to configuration settings.

2. Searchform

The issue appears identically with searchform and the core Dokuwiki search field

3. Interference

Some of our configuration settings that affect search obviously differ from the dokuwiki.org settings.
E.g. quicksearch at dokuwiki.org seams not to take into account page titles, but it finds page names, ...
And we operate Dokuwiki with ~60 plugins.

@jscheiber
Copy link
Author

jscheiber commented Aug 9, 2016

Added 2 more examples at Dokuwiki.org playground:

  1. A namespace with a dot '.': same issue as with '-'
  2. A namespace with same sub-pages as above but only underscore delimiters in NS: all searches work fine

Namespace Example with a dot - Issue reproduced

  • playground:jscheiber:spr.search:start (title: Software Problem Report - Namespace Search)
  • playground:jscheiber:spr.search:page1 (title: SPR Description)
  • playground:jscheiber:spr.search:page2 (title: SPR Testing)
  • playground:jscheiber:spr.search:log:page1 (title: SPR Logfile 1)
  • playground:jscheiber:spr.search:log:page2 (title: SPR Logfile 2)

Searches:

  1. SPR @playground:jscheiber:spr.search (text search is OK)
  2. page1 @playground:jscheiber:spr.search (page name search fails)
  3. page1 @playground:jscheiber (page name search works from parent directory)

Namespace Example with only underscore delimiter - everthing OK

  • playground:jscheiber:spr_search_ul:start (title: Software Problem Report 2 - Namespace Search - no special chars)
  • playground:jscheiber:spr_search_ul:page1 (title: SPR Description - nochars)
  • playground:jscheiber:spr_search_ul:page2 (title: SPR Testing - nochars)
  • playground:jscheiber:spr_search_ul:log:page1 (title: SPR Logfile 1 - nochars)
  • playground:jscheiber:spr_search_ul:log:page2 (title: SPR Logfile 2 - nochars)

Searches:

  1. SPR @playground:jscheiber:spr_search_ul
  2. page1 @playground:jscheiber:spr_search_ul
  3. page1 @playground:jscheiber

All searches work, if namespace does not contain '.' or '-'.

@micgro42
Copy link
Collaborator

The root cause is that the responsible regex does not account for - or +, but only for \w and :
https://github.com/splitbrain/dokuwiki/blob/81693bed0f4feedff78c111364b23b78ad979c93/inc/fulltext.php#L236

This could easily be fixed by adjusting the regex.

However, I'm wondering why we are fiddeling with regexes, when we have a query parser? Or would using the query parser drain too much performance?

@micgro42 micgro42 self-assigned this Mar 23, 2018
@micgro42 micgro42 mentioned this issue Mar 23, 2018
21 tasks
micgro42 added a commit that referenced this issue Mar 26, 2018
The regex for the pagename lookup didn't account for `-` and `.` being
valid characters for namespaces, which lead to wrong results in the
quicksearch and pagename lookup. The full search, which already used the
queryParser, showed the correct results.

This fixes #1659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants