Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

full text search slow compared to mysql #720

Closed
tomindo opened this issue Jun 5, 2019 · 24 comments

Comments

Projects
None yet
3 participants
@tomindo
Copy link

commented Jun 5, 2019

Hi guys,
I have deployed Redisearch with Redis 5.0.5, created index with 8.5G redis memory usage. Below is the index info. The issue is that the ft.search query comes back really slow around 0.113s
"FT.SEARCH Ehann\RediSearch\Index 'pillow|bungalow' INFIELDS 3 title description brand " while same query using mysql 5.7 full text search comes back really fast 0.005s . I'm assuming the redisearch must be faster than mysql. Is there anyway to make full text search on Redisearch faster?
index_name
Ehann\RediSearch\Index
index_options

fields
title
type
TEXT
WEIGHT
1.5
link
type
TEXT
WEIGHT
1
NOINDEX
redirect_url
type
TEXT
WEIGHT
1
NOINDEX
description
type
TEXT
WEIGHT
1
image_link
type
TEXT
WEIGHT
1
NOINDEX
brand
type
TEXT
WEIGHT
1
num_docs
4799389
max_doc_id
4799389
num_terms
2210592
num_records
91894656
inverted_sz_mb
620.21772384643555
total_inverted_index_blocks
0
offset_vectors_sz_mb
155.36215496063232
doc_table_size_mb
397.16822910308838
sortable_values_size_mb
0
key_table_size_mb
105.53219985961914
records_per_doc_avg
19.14715727356128
bytes_per_record_avg
7.0770753851018275
offsets_per_term_avg
1.7727801712430373
offset_bits_per_record_avg
8
gc_stats
current_hz
1
bytes_collected
0
effectiv_cycles_rate
0
cursor_stats
global_idle
0
global_total
0
index_capacity
128
index_total
0

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

How many documents actually match your query?

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

20 documents matched

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

yes, it is a bit faster 0.110s,still way slower than mysql :( . Actually it matches 10 documents. What else should I do for Redisearch config or Redis server itself ? Also anything wrong with the index and document. Any help would be appreciated.
Thanks

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

How are you measuring the performance? The best way would be to run it in a loop and see what you get. 100ms is way slower than I'd expect for anything.

Also, what version of RediSearch are you using?

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

It would be more helpful if you could post the output of the results. The first number of the resultset indicates how many documents actually match your results, whereas the other numbers simply indicate how many results were returned. It is possible that you have a lot of matches and RediSearch spends its time sorting them.

Let's first verify that we're comparing apples to apples here.

Also, the true size of your index (not the documents!) is only 620MB:

inverted_sz_mb
620.21772384643555
@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

There you go
"FT.SEARCH Ehann\RediSearch\Index 'pillow|bungalow' NOCONTENT"

  1. (integer) 270285
  2. "2918063"
  3. "4378322"
  4. "3532056"
  5. "3128363"
  6. "2816258"
  7. "2782965"
  8. "2541161"
  9. "2092349"
  10. "1407850"
  11. "1082958"
@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

So it indicates that you have almost 300k matches. I am guessing most of the time is spent just sorting those matching result entries. Does MySQL full text search have any kind of default sorting order?

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

the mysql query runs with limit 10 which is the same default value for Redisearch. I dont think mysql has default sorting order unless we specify ORDER BY . Also, is there any way we can disable the sorting in Redisearch?

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

look like it doesn't match anything and still 0.110s
FT.AGGREGATE Ehann\RediSearch\Index 'pillow|bungalow' NOCONTENT

  1. (integer) 1
    with "limit 0 10" , it is 0.006s but no result
    FT.AGGREGATE Ehann\RediSearch\Index 'pillow|bungalow' limit 0 10
  2. (integer) 1
  3. (empty list or set)
  4. (empty list or set)
  5. (empty list or set)
  6. (empty list or set)
  7. (empty list or set)
  8. (empty list or set)
  9. (empty list or set)
  10. (empty list or set)
  11. (empty list or set)
  12. (empty list or set)
@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 5, 2019

What version of RediSearch are you using?

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 5, 2019

it is master branch

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

Ok, I see.

You will get the results still if you use LIMIT 0 10. The "Empty result" is because you haven't selected any fields to return. You can use the LOAD or APPLY keyword to do that; e.g.

FT.AGGREGATE Ehann\RediSearch\Index 'pillow|bungalow' LOAD 1 @title
@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

the redis server crashed when I ran your query. It has crashed sometimes before when I added some switches

------ DUMPING CODE AROUND EIP ------
Symbol: cfree (base: 0x7fa5b50b8580)
Module: /lib64/libc.so.6 (base 0x7fa5b5033000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x7fa5b50b8580 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
17283:M 05 Jun 2019 11:28:31.217 # dump of function (hexdump of 156 bytes):
488b0569093400488b004885c00f85bf0000004885ff0f84b4000000488b47f8488d4ff0a8027528a804488d3daf113400740c4889c84825000000fc488b3831d24889cee9a7b9ffff0f1f80000000008b15fe0b340085d2752e483b05cf0b34007625483d00000002771d4883e0f8488d1400488905b60b34004889159f0b340090eb080f1f40004883e0f8488b57f04889cf488d3410488b054a08

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/antirez/redis/issues
@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

Is that the only stack trace available? Can you provide the full log?

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

Does it only crash when you use the LOAD?

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

yes. I used your query
FT.AGGREGATE Ehann\RediSearch\Index 'pillow|bungalow' LOAD 1 @title

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 6, 2019

That piece of log is from redis log when it crashed on latest Centos 7

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 6, 2019

Can you open a new issue with the crash included in it, with the entire log? That piece does not help me at all. I need a stack trace of the functions.

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 7, 2019

I dont have the stack trace

@tw-bert

This comment has been minimized.

Copy link

commented Jun 8, 2019

@tomindo The crash you found is interesting, do you need guidance on how to copy and paste the stacktrace?

In the redis log file, there are clear from and to markers.

If you have accidentally deleted the redis logfile, can you reproduce the issue?

If you can't reproduce, what do you want Mark to do?

@tomindo

This comment has been minimized.

Copy link
Author

commented Jun 11, 2019

sorry guys, I was able to grep the stack trace and put it on a separate ticket.

@mnunberg

This comment has been minimized.

Copy link
Collaborator

commented Jun 29, 2019

Closing as I think we've adequately resolved performance differences pr above.

@mnunberg mnunberg closed this Jun 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.