Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show filename when searching across multiple files with zq #5202

Open
philrz opened this issue Aug 5, 2024 · 0 comments
Open

Show filename when searching across multiple files with zq #5202

philrz opened this issue Aug 5, 2024 · 0 comments

Comments

@philrz
Copy link
Contributor

philrz commented Aug 5, 2024

tl;dr

In a directory containing multiple files ending in .log, a user executes a search like:

zq -i line '"test"' *.log

Alongside each search result the user would like a way to also display the name of the file each result came from.

Details

Repro is with Zed commit c39086b.

This issue was originally surfaced in a community Slack thread. In the user's own words:

Is there a way when searching across a glob pattern for multiple files in a directory such as *.log to have the file the search result came from also listed? For example zq -i line '"test"' *.log

similar to grep -rnwio . -e "test" which would list the file and the containing string. I had a thought that using from might have gotten me there but not the right usage.

@mattnibs acknowledged to the user that we don't currently have a way of doing this and for now recommended using tools at the the shell to bridge the gap. So for example, using the zed-sample-data, this shows the baseline problem:

$ zq -version
Version: v1.17.0-11-gc39086ba

$ zq -i line '"thinkwithgoogle"' *.log.gz
"1521912845.237311\t144c918fa2aca4461d3535a237d311cb5102c1919096e0fa9b73ab95af4876fc\t3\t08434F2704007BF2\tCN=*.appspot.com,O=Google Inc,L=Mountain View,ST=California,C=US\tCN=Google Internet Authority G3,O=Google Trust Services,C=US\t1520451204.000000\t1527706320.000000\trsaEncryption\tsha256WithRSAEncryption\trsa\t2048\t65537\t-\t*.appspot.com,*.thinkwithgoogle.com,*.withgoogle.com,*.withyoutube.com,appspot.com,thinkwithgoogle.com,withgoogle.com,withyoutube.com\t-\t-\t-\tF\t-\tT\tF"

And here's the recommended approach from @mattnibs working as intended:

$ find . -name "*.log.gz"  | xargs -I {} zq -i line '"thinkwithgoogle" | {file:"{}",value:this}' {}
{file:"./x509.log.gz",value:"1521912845.237311\t144c918fa2aca4461d3535a237d311cb5102c1919096e0fa9b73ab95af4876fc\t3\t08434F2704007BF2\tCN=*.appspot.com,O=Google Inc,L=Mountain View,ST=California,C=US\tCN=Google Internet Authority G3,O=Google Trust Services,C=US\t1520451204.000000\t1527706320.000000\trsaEncryption\tsha256WithRSAEncryption\trsa\t2048\t65537\t-\t*.appspot.com,*.thinkwithgoogle.com,*.withgoogle.com,*.withyoutube.com,appspot.com,thinkwithgoogle.com,withgoogle.com,withyoutube.com\t-\t-\t-\tF\t-\tT\tF"}

The user confirmed this solution should be workable for now.

In terms of how we might address this more directly in the future, @mattnibs offered the following thoughts:

Maybe a solution would be to add globbing to the file source operator much as we do for pool sources then maybe have some flag that lets you put the source name/details on each value produced from the source.

When talking about decorating each value in the source maybe you have a -each flag that accepts a function where the first argument is the this value and the second is an info record describing the source and the result of the function would be the new value. So you could do something like:

func describe(value, info): (
 { value, info }
)
file -each=describe *.log

Discussing this reminded us all of another issue brimdata/zui#2931 where a user asked about doing something similar with from * and wanting to see the name of the pool each result came from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant