Skip to content
This repository

Ignore BBCode tags in search #17

Closed
crazedpsyc opened this Issue · 2 comments

2 participants

Michael Smith John Barrett
Michael Smith
Owner

Currently, searching the forums returns unparsed BBCode in the results, and the BBCode tags themselves can be matched in a search (i.e. "code" will match everything with [code] blocks).

John Barrett
Collaborator

This appears to sill be an issue. I was thinking of some approaches to this including filtering of results in Perl to remove bbcode from returned text, then check we still have a match - I think this would break paging (and possibly other stuff) badly though.

Another option might be to create a postgres function to filter bbcode from content. So our search currently returns 5 comments containing 'code':

ddgc=# select count(*) from comment where content ilike '%code%';
 count 
-------
     5
(1 row)

If we add a 'strip_bbcode' function to our schema:

ddgc=# create function strip_bbcode(TEXT)
       returns TEXT as $$
       select regexp_replace($1,'\[[^\]]*\]','','g')
       $$ language sql;

To demonstrate what this does:

ddgc=# select strip_bbcode('[code]printf()[/code]');
 strip_bbcode 
--------------
 printf()
(1 row)

We can then:

ddgc=# select count(*) from comment where strip_bbcode(content) ilike '%code%';
 count
-------
     2
(1 row)

So we only get results back where the text itself contains 'code'. Note, the strip_bbcode function is pretty crude as it stands, it currently strips all text within square braces.

There might be a case to be made for creating a search index table which aggregates data in this fashion at regular intervals, so potentially expensive regexes aren't being performed with every search.

Michael Smith
Owner

Fixed in dezi-search.

Michael Smith crazedpsyc closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.