Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make more machine readable #36

Closed
josch opened this issue Dec 13, 2014 · 12 comments
Closed

make more machine readable #36

josch opened this issue Dec 13, 2014 · 12 comments

Comments

@josch
Copy link

josch commented Dec 13, 2014

Hi,

correct me if I'm wrong but as far as I can see, the only machine readable interface in codesearch is the json results listing the packages containing a match for the current query. But obtaining this json data requires parsing the HTML of the result page. It would be nice if it was possible to make a query which returned either the link to the json or the json directly.

@stapelberg
Copy link
Contributor

We sure can talk about this, but first and foremost I’d like to know your use-case? :)

@FedericoCeratto
Copy link

@stapelberg I'm trying to put together a simple CLI tool to perform searches like:
./codesearch_cli.py 'foo path:debian/watch'

Fetching the JSON result pages is straightforward, the tricky part is getting the queryid in the first place. @josch are you planning to do the same?

@josch
Copy link
Author

josch commented Jan 10, 2015

@FedericoCeratto I explained my use case to michael in a private email which is why it's not listed here. Let me give a quick overview.

For detecting dpkg trigger cycles I need a service which provides the contents of binary control archives in a machine readable way. DCS currently neither supplies the contents of binary control files nor does it provide the queryid in a machine readable way. Which is why I ended up setting up binarycontrol.debian.net to have something working now. I'd like to move the functionality of this service to codesearch.debian.net though.

As far as the required functionality goes, the dpkg trigger cycle detection code needs:

  • finding binary packages with a specific file (DEBIAN/triggers)
  • limiting the results by suite (testing, unstable)
  • providing the contents of certain files in control metadata of a specific binary package

The binarycontrol.debian.net service currently has a simple GET interface returning plain text lists. As long as I can do searches in a machine readable way (a codesearch_cli.py script seems to do exactly that) I'll be happy.

@FedericoCeratto
Copy link

@stapelberg I'm still unsure on how to fetch the queryid value; without that the only viable option is to do HTML scraping, which is not the best thing and might create additional load on the service.
Any suggestion?

@stapelberg
Copy link
Contributor

@FedericoCeratto Hm, thinking about it, I think a command-line client should also just use the websocket interface. That way it gets streaming progress updates and results as they are available (which it can discard in the first version, but possibly preview in a later version, if desired).

What do you think?

@FedericoCeratto
Copy link

@stapelberg progress updates in a CLI tool seem a bit unnecessary: the wait time is usually quite short, also the user might want to pipe the output into another tool, like less, awk, grep, without receiving any additional lines other than the matching files.
Anyhow, is there documentation on how to use the websocket interface?

@stapelberg
Copy link
Contributor

@FedericoCeratto It’d be common to provide status updates on stderr (like curl and wget do) so that stdout is the actual output.

There is no documentation on the websocket interface yet (and don’t hold your breath), so you’ll need to look at static/instant.js for that (and/or the server side, and what your browser receives).

@FedericoCeratto
Copy link

@stapelberg I have a basic version working at https://github.com/FedericoCeratto/debian-code-search-cli - I'm simply consuming all the available data from the websocket stream and then start running JSON queries. Is there a way to tell if the search has been completed before starting consuming data from the websocket? Thank you!

@stapelberg
Copy link
Contributor

@FedericoCeratto You need to consume all data. All messages contain a “Type” member, except for the search results themselves. If the Type is “progress”, the query is completed if and only if FilesProcessed == FilesTotal. After that event, you can start fetching the JSON files.

@FedericoCeratto
Copy link

Thanks for your help - the script seems to be working now!

@stapelberg
Copy link
Contributor

@josch Sorry for dropping the ball for so long on this issue. With the tool that Federico implemented, is there still anything left to be done, or can we close this issue now?

@josch
Copy link
Author

josch commented Sep 17, 2015

Yes. Thank you!

@josch josch closed this as completed Sep 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants