New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make more machine readable #36
Comments
We sure can talk about this, but first and foremost I’d like to know your use-case? :) |
@stapelberg I'm trying to put together a simple CLI tool to perform searches like: Fetching the JSON result pages is straightforward, the tricky part is getting the queryid in the first place. @josch are you planning to do the same? |
@FedericoCeratto I explained my use case to michael in a private email which is why it's not listed here. Let me give a quick overview. For detecting dpkg trigger cycles I need a service which provides the contents of binary control archives in a machine readable way. DCS currently neither supplies the contents of binary control files nor does it provide the queryid in a machine readable way. Which is why I ended up setting up binarycontrol.debian.net to have something working now. I'd like to move the functionality of this service to codesearch.debian.net though. As far as the required functionality goes, the dpkg trigger cycle detection code needs:
The binarycontrol.debian.net service currently has a simple GET interface returning plain text lists. As long as I can do searches in a machine readable way (a codesearch_cli.py script seems to do exactly that) I'll be happy. |
@stapelberg I'm still unsure on how to fetch the queryid value; without that the only viable option is to do HTML scraping, which is not the best thing and might create additional load on the service. |
@FedericoCeratto Hm, thinking about it, I think a command-line client should also just use the websocket interface. That way it gets streaming progress updates and results as they are available (which it can discard in the first version, but possibly preview in a later version, if desired). What do you think? |
@stapelberg progress updates in a CLI tool seem a bit unnecessary: the wait time is usually quite short, also the user might want to pipe the output into another tool, like less, awk, grep, without receiving any additional lines other than the matching files. |
@FedericoCeratto It’d be common to provide status updates on stderr (like curl and wget do) so that stdout is the actual output. There is no documentation on the websocket interface yet (and don’t hold your breath), so you’ll need to look at static/instant.js for that (and/or the server side, and what your browser receives). |
@stapelberg I have a basic version working at https://github.com/FedericoCeratto/debian-code-search-cli - I'm simply consuming all the available data from the websocket stream and then start running JSON queries. Is there a way to tell if the search has been completed before starting consuming data from the websocket? Thank you! |
@FedericoCeratto You need to consume all data. All messages contain a “Type” member, except for the search results themselves. If the Type is “progress”, the query is completed if and only if FilesProcessed == FilesTotal. After that event, you can start fetching the JSON files. |
Thanks for your help - the script seems to be working now! |
@josch Sorry for dropping the ball for so long on this issue. With the tool that Federico implemented, is there still anything left to be done, or can we close this issue now? |
Yes. Thank you! |
Hi,
correct me if I'm wrong but as far as I can see, the only machine readable interface in codesearch is the json results listing the packages containing a match for the current query. But obtaining this json data requires parsing the HTML of the result page. It would be nice if it was possible to make a query which returned either the link to the json or the json directly.
The text was updated successfully, but these errors were encountered: