New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove wildcard on the fulltext search fixes #4606 #4711

Closed
wants to merge 1 commit into
base: release-2.1
from

Conversation

Projects
None yet
5 participants
@albertlast
Collaborator

albertlast commented Apr 14, 2018

The fulltext operator of the rdbms didn't understand wildcards,
to provide a better result is the best to remove them.

fix issue: #4606
#4605 is also fixed
when fulltext is used.

Remove wildcard on the fulltext search
Signed-off-by: albertlast albertlast@hotmail.de

@Gwenwyfar Gwenwyfar added the Search label Apr 14, 2018

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 27, 2018

This doesn't fix the issues in my tests.
screen shot 2018-04-27 at 3 26 45 pm

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Apr 27, 2018

Which kind of index you use?
Fulltext, custom?

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 27, 2018

That was with a fulltext index on MySQL.

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Apr 27, 2018

i mean in the smf setting,
which kind of indey you use.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 27, 2018

Fulltext
screen shot 2018-04-27 at 3 51 37 pm

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Apr 27, 2018

and you search for %% ?

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 27, 2018

For **

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 27, 2018

I've just submitted #4735 to fix #4606. It doesn't do anything about #4605 though, which is what I think your PR here is primarily aimed at.

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Apr 28, 2018

Nope the primary aim of this pr is,
to supress wildcard in fulltext search,
because they shouldn't work.

Can you please paste your search qry,
which was used by your smf?

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 28, 2018

I already pasted it above. It is two asterisks.

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Apr 28, 2018

I asked for the sql query,
which is used.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 28, 2018

the primary aim of this pr is,
to supress wildcard in fulltext search,
because they shouldn't work.

Exactly. That's what #4605 is about.

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Apr 28, 2018

yeah but pr #4735 doesn't fix it,
it's only hide the highlighting.
so the search run with wildcard to the database.

btw before you test my pr,
you need to clear your search cache.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 28, 2018

yeah but pr #4735 doesn't fix it,
it's only hide the highlighting.
so the search run with wildcard to the database.

Of course. #4605 and #4606 are distinct issues. #4735 addressed the second, not the first.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 28, 2018

I asked for the sql query,
which is used.

I will test that later today and let you know.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 30, 2018

INSERT IGNORE INTO dev_tmp_log_search_messages
   (id_msg)
SELECT m.id_msg
FROM dev_messages AS m
WHERE m.body LIKE '%%'
   AND m.id_board != 3
LIMIT 6000

in .../Sources/SearchAPI-Fulltext.php line 269, which took 0.00066686 seconds at 0.16149187 into request.
@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 30, 2018

Oh, sorry. Disregard my last comment. That was done on a different branch. I'll retest on the correct code shortly.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 30, 2018

Okay, here are the relevant queries that were used when the user enters lorem and ** in the search form, using a full text index and the code in this PR:

Searching for lorem:

INSERT IGNORE INTO dev_tmp_log_search_messages
   (id_msg)
SELECT m.id_msg
FROM dev_messages AS m
WHERE m.id_board != 3
   AND MATCH (body) AGAINST ('+lorem' IN BOOLEAN MODE)
LIMIT 6000 
in .../Sources/SearchAPI-Fulltext.php line 270, which took 0.00043201 seconds at 0.03601694 into request.

Searching for **:

SELECT m.id_msg
FROM dev_messages AS m
WHERE m.body LIKE '%%'
   AND m.id_board != 3
   AND MATCH (body) AGAINST ('+\"\"' IN BOOLEAN MODE)
LIMIT 6000 
in .../Sources/SearchAPI-Fulltext.php line 270, which took 0.00067592 seconds at 0.03723812 into request.
@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Apr 30, 2018

I should also have included this difference in my last post:

Searching for lorem:

INSERT IGNORE INTO dev_tmp_log_search_topics
   (id_topic)
SELECT t.id_topic
FROM dev_topics AS t
   INNER JOIN dev_log_search_subjects AS subj1 ON (subj1.id_topic = t.id_topic)
WHERE subj1.word LIKE '%lorem%'
   AND t.id_board != 3
LIMIT 1200 
in .../Sources/Search.php line 1350, which took 0.0004251 seconds at 0.03452396 into request.

Searching for **:

INSERT IGNORE INTO dev_tmp_log_search_topics
   (id_topic)
SELECT t.id_topic
FROM dev_topics AS t
WHERE t.id_board != 3
LIMIT 1200 
in .../Sources/Search.php line 1350, which took 0.00024986 seconds at 0.03593707 into request.
@jdarwood007

This comment has been minimized.

Member

jdarwood007 commented Jun 22, 2018

@Sesquipedalian Any objections to merging this? Seems like all we are ding is adding a few characters to a search blacklist.

@Sesquipedalian

This comment has been minimized.

Member

Sesquipedalian commented Jun 26, 2018

I honestly don't know what's going on with this PR now. It doesn't appear to be doing what (I think) @albertlast intends for it to do, but I am not sure how he wants to change it. I posted my result reports above in order to answer his questions. The ball is in his court now.

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Jun 26, 2018

It's removed * more not.
And this in my eyes works fine.

@jdarwood007

This comment has been minimized.

Member

jdarwood007 commented Jun 27, 2018

If I understand this all properly, its removing * and % as words capable of making it to the search index?

@jdarwood007

This comment has been minimized.

Member

jdarwood007 commented Jul 13, 2018

@Sesquipedalian
https://stackoverflow.com/questions/19777359/mysql-query-match-against-using-wildcard
Anything under 4 characters appears to be ignored by match.

From what I can tell, this doesn't fix #4606 as mentioned.

I can actually get MATCH to do a weird thing if I use ****

From this fix here, I can't get a reproducible change to occur after I apply the PR.

We do later on in the code actually remove % though:
$query_params['complex_body_' . $count++] = empty($modSettings['search_match_words']) || $search_data['no_regexp'] ? '%' . strtr($regularWord, array('' => '\', '%' => '\%')) . '%' : '[[:<:]]' . addcslashes(preg_replace(array('/([[]$.+*?|{}()])/'), array('[$1]'), $regularWord), '\'') . '[[:>:]]';

I think its a valid thing we need to fix, just need to find the proper fix. Or I need to figure out how to reproduce this.

@albertlast albertlast closed this Aug 31, 2018

@frandominguez03

This comment has been minimized.

Member

frandominguez03 commented Aug 31, 2018

Why did you close this? @albertlast

@albertlast

This comment has been minimized.

Collaborator

albertlast commented Aug 31, 2018

I lost intrested about this issue,
since the pr got not merge for 4 months,
i guess the issue is not a issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment