Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching multiple pools with "from *" is much slower #5070

Closed
SoftTools59654 opened this issue Mar 11, 2024 · 5 comments · Fixed by #5174
Closed

Searching multiple pools with "from *" is much slower #5070

SoftTools59654 opened this issue Mar 11, 2024 · 5 comments · Fixed by #5174
Labels
bug Something isn't working community

Comments

@SoftTools59654
Copy link

When searching a text in a data about 100 GB and 80 GB
When using the from* command to read both files, the speed of reading data from the m2 hard drive is about 40MB. But when the same text is read without the from command, the speed of reading the file reaches about 2 GB. While the CPU is fully engaged, the file loading speed from the hard drive is very low
There is a lot of time difference in using the from command and searching one by one in the file. I tested this case several times and this case was there.

@SoftTools59654 SoftTools59654 added the bug Something isn't working label Mar 11, 2024
@SoftTools59654 SoftTools59654 changed the title When using the form * command, the speed of reading the hard drive decreases When using the from * command, the speed of reading the hard drive decreases Mar 12, 2024
@philrz philrz transferred this issue from brimdata/zui Mar 14, 2024
@philrz
Copy link
Contributor

philrz commented Mar 14, 2024

Repro is with Zed commit 38763f8.

While I don't have the benefit of the user's specific data to test with, I've been able to reproduce a similar effect locally on my Macbook using the data from s3://brim-sampledata/wrccdc/zng/wrccdc.zng.

I first created separate pools to each hold a subset of the wrccdc data.

$ zed -version
Version: v1.14.0-16-g38763f82

$ zed create wrccdc1
pool created: wrccdc1 2det49I4UKZauguZUCoKy3PyKKl

$ zq '_path=="conn"' wrccdc.zng | zed load -use wrccdc1 -
(1/1) 20.82GB 0B/s
2deuG2mkjlka5VL3kOThO7wHfO2 committed

$ zed create wrccdc2
pool created: wrccdc2 2deuL3vhFQikxnty2CdIZsIFUkr

$ zq '_path!="conn"' wrccdc.zng | zed load -use wrccdc2 -
(1/1) 2.98GB 0B/s
2deuWwPS3FaaS5qHhZ5RGjFs6x5 committed

FWIW the wrccdc1 pool is 520132400 records and 20 GB in size and the wrccdc2 pool is 52937660 and 3.0 GB in size.

Doing a "needle in a haystack" search for a value that I know is present in both pools, we get fast search performance on each individually.

$ time zed query 'from wrccdc1 | CuYcyl29a2g8pYV1s' 
{_path:"conn",ts:2017-03-24T19:59:23.231562Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},proto:"tcp"(=zenum),service:"http",duration:182.422ms,orig_bytes:438(uint64),resp_bytes:588(uint64),conn_state:"SF",local_orig:null(bool),local_resp:null(bool),missed_bytes:0(uint64),history:"ShADadfF",orig_pkts:5(uint64),orig_ip_bytes:650(uint64),resp_pkts:5(uint64),resp_ip_bytes:800(uint64),tunnel_parents:null(|[string]|)}

real	0m13.831s
user	0m0.037s
sys	0m0.027s

$ time zed query 'from wrccdc2 | CuYcyl29a2g8pYV1s' 
{_path:"files",ts:2017-03-24T19:59:23.399396Z,fuid:"F4NAir1FuTH6Oax2J4",tx_hosts:|[184.72.208.63]|,rx_hosts:|[10.10.7.65]|,conn_uids:|["CuYcyl29a2g8pYV1s"]|,source:"HTTP",depth:0(uint64),analyzers:|["MD5","SHA1","SHA256"]|,mime_type:"text/json",filename:null(string),duration:0s,local_orig:null(bool),is_orig:false,seen_bytes:35(uint64),total_bytes:null(uint64),missing_bytes:0(uint64),overflow_bytes:0(uint64),timedout:false,parent_fuid:null(string),md5:"dd82bbeb20a441bf55e62d42255de7f4",sha1:"90b01d6f0c6891651e8a729892e92592dd581762",sha256:"da1791b1b8361af6383790c6e77ef6eb149b22ed7e2d59b61ac2deb7884edf4d",extracted:null(string),extracted_cutoff:null(bool),extracted_size:null(uint64)}
{_path:"http",ts:2017-03-24T19:59:23.306076Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},trans_depth:1(uint64),method:"GET",host:"lc68.dsr.livefyre.com",uri:"/livecountping/98236508/663740012478?__=bye2msvgpn91&routed=1&jid&siteId=366439&networkId=livefyre.com",referrer:"http://www.asigra.com/blog/what-multi-tenancy-how-secure-it",version:"1.1",user_agent:"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",request_body_len:0(uint64),response_body_len:35(uint64),status_code:200(uint64),status_msg:"OK",info_code:null(uint64),info_msg:null(string),tags:|[]|(|[zenum=string]|),username:null(string),password:null(string),proxied:null(|[string]|),orig_fuids:null([string]),orig_filenames:null([string]),orig_mime_types:null([string]),resp_fuids:["F4NAir1FuTH6Oax2J4"],resp_filenames:null([string]),resp_mime_types:["text/json"]}

real	0m2.133s
user	0m0.033s
sys	0m0.021s

However, doing the search across both pools at once with from * as described by the user does indeed result in a significant slow-down bringing the search time from a combined ~15 seconds to 11+ minutes.

$ time zed query 'from * | CuYcyl29a2g8pYV1s'
{_path:"conn",ts:2017-03-24T19:59:23.231562Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},proto:"tcp"(=zenum),service:"http",duration:182.422ms,orig_bytes:438(uint64),resp_bytes:588(uint64),conn_state:"SF",local_orig:null(bool),local_resp:null(bool),missed_bytes:0(uint64),history:"ShADadfF",orig_pkts:5(uint64),orig_ip_bytes:650(uint64),resp_pkts:5(uint64),resp_ip_bytes:800(uint64),tunnel_parents:null(|[string]|)}
{_path:"files",ts:2017-03-24T19:59:23.399396Z,fuid:"F4NAir1FuTH6Oax2J4",tx_hosts:|[184.72.208.63]|,rx_hosts:|[10.10.7.65]|,conn_uids:|["CuYcyl29a2g8pYV1s"]|,source:"HTTP",depth:0(uint64),analyzers:|["MD5","SHA1","SHA256"]|,mime_type:"text/json",filename:null(string),duration:0s,local_orig:null(bool),is_orig:false,seen_bytes:35(uint64),total_bytes:null(uint64),missing_bytes:0(uint64),overflow_bytes:0(uint64),timedout:false,parent_fuid:null(string),md5:"dd82bbeb20a441bf55e62d42255de7f4",sha1:"90b01d6f0c6891651e8a729892e92592dd581762",sha256:"da1791b1b8361af6383790c6e77ef6eb149b22ed7e2d59b61ac2deb7884edf4d",extracted:null(string),extracted_cutoff:null(bool),extracted_size:null(uint64)}
{_path:"http",ts:2017-03-24T19:59:23.306076Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},trans_depth:1(uint64),method:"GET",host:"lc68.dsr.livefyre.com",uri:"/livecountping/98236508/663740012478?__=bye2msvgpn91&routed=1&jid&siteId=366439&networkId=livefyre.com",referrer:"http://www.asigra.com/blog/what-multi-tenancy-how-secure-it",version:"1.1",user_agent:"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",request_body_len:0(uint64),response_body_len:35(uint64),status_code:200(uint64),status_msg:"OK",info_code:null(uint64),info_msg:null(string),tags:|[]|(|[zenum=string]|),username:null(string),password:null(string),proxied:null(|[string]|),orig_fuids:null([string]),orig_filenames:null([string]),orig_mime_types:null([string]),resp_fuids:["F4NAir1FuTH6Oax2J4"],resp_filenames:null([string]),resp_mime_types:["text/json"]}

real	11m10.353s
user	0m0.213s
sys	0m0.082s

And if I drop those pools and load the entire data set into one pool, it runs fast whether I target it directly or via from *.

$ zed create wrccdc
pool created: wrccdc 2dew93Xeb0JfVr9l8cZXtcbtVKE

$ zed load -use wrccdc wrccdc.zng 
(1/1) 23.80GB/23.80GB 0B/s 100.00%
2dexIZCPsegXBLF2yGT2ICu3vq0 committed

$ time zed query 'from wrccdc | CuYcyl29a2g8pYV1s'
{_path:"files",ts:2017-03-24T19:59:23.399396Z,fuid:"F4NAir1FuTH6Oax2J4",tx_hosts:|[184.72.208.63]|,rx_hosts:|[10.10.7.65]|,conn_uids:|["CuYcyl29a2g8pYV1s"]|,source:"HTTP",depth:0(uint64),analyzers:|["MD5","SHA1","SHA256"]|,mime_type:"text/json",filename:null(string),duration:0s,local_orig:null(bool),is_orig:false,seen_bytes:35(uint64),total_bytes:null(uint64),missing_bytes:0(uint64),overflow_bytes:0(uint64),timedout:false,parent_fuid:null(string),md5:"dd82bbeb20a441bf55e62d42255de7f4",sha1:"90b01d6f0c6891651e8a729892e92592dd581762",sha256:"da1791b1b8361af6383790c6e77ef6eb149b22ed7e2d59b61ac2deb7884edf4d",extracted:null(string),extracted_cutoff:null(bool),extracted_size:null(uint64)}
{_path:"http",ts:2017-03-24T19:59:23.306076Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},trans_depth:1(uint64),method:"GET",host:"lc68.dsr.livefyre.com",uri:"/livecountping/98236508/663740012478?__=bye2msvgpn91&routed=1&jid&siteId=366439&networkId=livefyre.com",referrer:"http://www.asigra.com/blog/what-multi-tenancy-how-secure-it",version:"1.1",user_agent:"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",request_body_len:0(uint64),response_body_len:35(uint64),status_code:200(uint64),status_msg:"OK",info_code:null(uint64),info_msg:null(string),tags:|[]|(|[zenum=string]|),username:null(string),password:null(string),proxied:null(|[string]|),orig_fuids:null([string]),orig_filenames:null([string]),orig_mime_types:null([string]),resp_fuids:["F4NAir1FuTH6Oax2J4"],resp_filenames:null([string]),resp_mime_types:["text/json"]}
{_path:"conn",ts:2017-03-24T19:59:23.231562Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},proto:"tcp"(=zenum),service:"http",duration:182.422ms,orig_bytes:438(uint64),resp_bytes:588(uint64),conn_state:"SF",local_orig:null(bool),local_resp:null(bool),missed_bytes:0(uint64),history:"ShADadfF",orig_pkts:5(uint64),orig_ip_bytes:650(uint64),resp_pkts:5(uint64),resp_ip_bytes:800(uint64),tunnel_parents:null(|[string]|)}

real	0m17.494s
user	0m0.040s
sys	0m0.078s

$ time zed query 'from * | CuYcyl29a2g8pYV1s'
{_path:"files",ts:2017-03-24T19:59:23.399396Z,fuid:"F4NAir1FuTH6Oax2J4",tx_hosts:|[184.72.208.63]|,rx_hosts:|[10.10.7.65]|,conn_uids:|["CuYcyl29a2g8pYV1s"]|,source:"HTTP",depth:0(uint64),analyzers:|["MD5","SHA1","SHA256"]|,mime_type:"text/json",filename:null(string),duration:0s,local_orig:null(bool),is_orig:false,seen_bytes:35(uint64),total_bytes:null(uint64),missing_bytes:0(uint64),overflow_bytes:0(uint64),timedout:false,parent_fuid:null(string),md5:"dd82bbeb20a441bf55e62d42255de7f4",sha1:"90b01d6f0c6891651e8a729892e92592dd581762",sha256:"da1791b1b8361af6383790c6e77ef6eb149b22ed7e2d59b61ac2deb7884edf4d",extracted:null(string),extracted_cutoff:null(bool),extracted_size:null(uint64)}
{_path:"http",ts:2017-03-24T19:59:23.306076Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},trans_depth:1(uint64),method:"GET",host:"lc68.dsr.livefyre.com",uri:"/livecountping/98236508/663740012478?__=bye2msvgpn91&routed=1&jid&siteId=366439&networkId=livefyre.com",referrer:"http://www.asigra.com/blog/what-multi-tenancy-how-secure-it",version:"1.1",user_agent:"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",request_body_len:0(uint64),response_body_len:35(uint64),status_code:200(uint64),status_msg:"OK",info_code:null(uint64),info_msg:null(string),tags:|[]|(|[zenum=string]|),username:null(string),password:null(string),proxied:null(|[string]|),orig_fuids:null([string]),orig_filenames:null([string]),orig_mime_types:null([string]),resp_fuids:["F4NAir1FuTH6Oax2J4"],resp_filenames:null([string]),resp_mime_types:["text/json"]}
{_path:"conn",ts:2017-03-24T19:59:23.231562Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},proto:"tcp"(=zenum),service:"http",duration:182.422ms,orig_bytes:438(uint64),resp_bytes:588(uint64),conn_state:"SF",local_orig:null(bool),local_resp:null(bool),missed_bytes:0(uint64),history:"ShADadfF",orig_pkts:5(uint64),orig_ip_bytes:650(uint64),resp_pkts:5(uint64),resp_ip_bytes:800(uint64),tunnel_parents:null(|[string]|)}

real	0m18.229s
user	0m0.042s
sys	0m0.047s

@philrz philrz changed the title When using the from * command, the speed of reading the hard drive decreases Searching multiple pools with "from *" is much slower Mar 14, 2024
@SoftTools59654
Copy link
Author

Thank you for the perfect text you wrote

Even this problem exists during import and export

Maximum hard drive reading speed of 120 MB from 2GB
In most cases, CPU consumption is less than 40% of the total power
The amount of RAM used will gradually increase until it occupies the entire RAM

which increases the time of import and export

@SoftTools59654
Copy link
Author

Hello
Was there a solution to this problem?
It is really a big problem

@philrz
Copy link
Contributor

philrz commented Jul 2, 2024

@SoftTools59654: No, there's not yet been a solution here. While we understand the performance hit is significant, it turns out to be a very difficult problem to solve. One of our developers has recently revisited it and is working on designing an approach. We'll update here when there's progress to report.

@mattnibs mattnibs removed their assignment Jul 3, 2024
mattnibs added a commit that referenced this issue Jul 9, 2024
@philrz
Copy link
Contributor

philrz commented Jul 11, 2024

Verified in Zed commit 6e61673.

Using the same Macbook, data set, and query that I used above for the 11+ minute repro, now we see the same takes just over 15 seconds. While it's not guaranteed that the software will always be able to leverage all available CPU, in this case I can see that all my cores were near 100% busy during the query.

$ zed -version
Version: v1.16.0-15-g6e61673c

$ time zed query 'from * | CuYcyl29a2g8pYV1s'
{_path:"files",ts:2017-03-24T19:59:23.399396Z,fuid:"F4NAir1FuTH6Oax2J4",tx_hosts:|[184.72.208.63]|,rx_hosts:|[10.10.7.65]|,conn_uids:|["CuYcyl29a2g8pYV1s"]|,source:"HTTP",depth:0(uint64),analyzers:|["MD5","SHA1","SHA256"]|,mime_type:"text/json",filename:null(string),duration:0s,local_orig:null(bool),is_orig:false,seen_bytes:35(uint64),total_bytes:null(uint64),missing_bytes:0(uint64),overflow_bytes:0(uint64),timedout:false,parent_fuid:null(string),md5:"dd82bbeb20a441bf55e62d42255de7f4",sha1:"90b01d6f0c6891651e8a729892e92592dd581762",sha256:"da1791b1b8361af6383790c6e77ef6eb149b22ed7e2d59b61ac2deb7884edf4d",extracted:null(string),extracted_cutoff:null(bool),extracted_size:null(uint64)}
{_path:"http",ts:2017-03-24T19:59:23.306076Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},trans_depth:1(uint64),method:"GET",host:"lc68.dsr.livefyre.com",uri:"/livecountping/98236508/663740012478?__=bye2msvgpn91&routed=1&jid&siteId=366439&networkId=livefyre.com",referrer:"http://www.asigra.com/blog/what-multi-tenancy-how-secure-it",version:"1.1",user_agent:"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0",request_body_len:0(uint64),response_body_len:35(uint64),status_code:200(uint64),status_msg:"OK",info_code:null(uint64),info_msg:null(string),tags:|[]|(|[zenum=string]|),username:null(string),password:null(string),proxied:null(|[string]|),orig_fuids:null([string]),orig_filenames:null([string]),orig_mime_types:null([string]),resp_fuids:["F4NAir1FuTH6Oax2J4"],resp_filenames:null([string]),resp_mime_types:["text/json"]}
{_path:"conn",ts:2017-03-24T19:59:23.231562Z,uid:"CuYcyl29a2g8pYV1s",id:{orig_h:10.10.7.65,orig_p:64135(port=uint16),resp_h:184.72.208.63,resp_p:80(port)},proto:"tcp"(=zenum),service:"http",duration:182.422ms,orig_bytes:438(uint64),resp_bytes:588(uint64),conn_state:"SF",local_orig:null(bool),local_resp:null(bool),missed_bytes:0(uint64),history:"ShADadfF",orig_pkts:5(uint64),orig_ip_bytes:650(uint64),resp_pkts:5(uint64),resp_ip_bytes:800(uint64),tunnel_parents:null(|[string]|)}

real	0m15.383s
user	0m0.031s
sys	0m0.021s

I'd not studied non-search operations like count() before, but this is similarly fast now.

$ zed -version
Version: v1.16.0-15-g6e61673c

$ time zed query 'from * | count()'
573070060(uint64)

real	0m19.663s
user	0m0.028s
sys	0m0.019s

I'd not baselined count() before because this issue was opened specifically regarding search. But taking a step back to commit 6b4afe5 that was right before these changes merged, I can see that this operation was not performing as badly as the search, but the change definitely still helped.

$ zed -version
Version: v1.16.0-12-g6b4afe50

$ time zed query 'from * | count()'
573070060(uint64)

real	1m7.218s
user	0m0.043s
sys	0m0.029s

This fix is available now for those willing to compile and run zed off the tip of main. It will also be in the next GA Zui and Zed releases, which I hope to assemble in the next few days. The fix will be mentioned in the Zed release notes.

Thanks @mattnibs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants