make more machine readable #36

josch · 2014-12-13T06:36:52Z

Hi,

correct me if I'm wrong but as far as I can see, the only machine readable interface in codesearch is the json results listing the packages containing a match for the current query. But obtaining this json data requires parsing the HTML of the result page. It would be nice if it was possible to make a query which returned either the link to the json or the json directly.

stapelberg · 2014-12-13T17:01:47Z

We sure can talk about this, but first and foremost I’d like to know your use-case? :)

FedericoCeratto · 2015-01-10T20:13:52Z

@stapelberg I'm trying to put together a simple CLI tool to perform searches like:
./codesearch_cli.py 'foo path:debian/watch'

Fetching the JSON result pages is straightforward, the tricky part is getting the queryid in the first place. @josch are you planning to do the same?

josch · 2015-01-10T20:24:10Z

@FedericoCeratto I explained my use case to michael in a private email which is why it's not listed here. Let me give a quick overview.

For detecting dpkg trigger cycles I need a service which provides the contents of binary control archives in a machine readable way. DCS currently neither supplies the contents of binary control files nor does it provide the queryid in a machine readable way. Which is why I ended up setting up binarycontrol.debian.net to have something working now. I'd like to move the functionality of this service to codesearch.debian.net though.

As far as the required functionality goes, the dpkg trigger cycle detection code needs:

finding binary packages with a specific file (DEBIAN/triggers)
limiting the results by suite (testing, unstable)
providing the contents of certain files in control metadata of a specific binary package

The binarycontrol.debian.net service currently has a simple GET interface returning plain text lists. As long as I can do searches in a machine readable way (a codesearch_cli.py script seems to do exactly that) I'll be happy.

FedericoCeratto · 2015-01-29T00:28:24Z

@stapelberg I'm still unsure on how to fetch the queryid value; without that the only viable option is to do HTML scraping, which is not the best thing and might create additional load on the service.
Any suggestion?

stapelberg · 2015-01-29T07:27:17Z

@FedericoCeratto Hm, thinking about it, I think a command-line client should also just use the websocket interface. That way it gets streaming progress updates and results as they are available (which it can discard in the first version, but possibly preview in a later version, if desired).

What do you think?

FedericoCeratto · 2015-01-29T12:37:26Z

@stapelberg progress updates in a CLI tool seem a bit unnecessary: the wait time is usually quite short, also the user might want to pipe the output into another tool, like less, awk, grep, without receiving any additional lines other than the matching files.
Anyhow, is there documentation on how to use the websocket interface?

stapelberg · 2015-01-29T18:29:56Z

@FedericoCeratto It’d be common to provide status updates on stderr (like curl and wget do) so that stdout is the actual output.

There is no documentation on the websocket interface yet (and don’t hold your breath), so you’ll need to look at static/instant.js for that (and/or the server side, and what your browser receives).

FedericoCeratto · 2015-01-29T22:00:07Z

@stapelberg I have a basic version working at https://github.com/FedericoCeratto/debian-code-search-cli - I'm simply consuming all the available data from the websocket stream and then start running JSON queries. Is there a way to tell if the search has been completed before starting consuming data from the websocket? Thank you!

stapelberg · 2015-01-31T10:18:06Z

@FedericoCeratto You need to consume all data. All messages contain a “Type” member, except for the search results themselves. If the Type is “progress”, the query is completed if and only if FilesProcessed == FilesTotal. After that event, you can start fetching the JSON files.

FedericoCeratto · 2015-03-16T23:33:57Z

Thanks for your help - the script seems to be working now!

stapelberg · 2015-09-17T18:36:58Z

@josch Sorry for dropping the ball for so long on this issue. With the tool that Federico implemented, is there still anything left to be done, or can we close this issue now?

josch · 2015-09-17T19:24:50Z

Yes. Thank you!

josch closed this as completed Sep 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make more machine readable #36

make more machine readable #36

josch commented Dec 13, 2014

stapelberg commented Dec 13, 2014

FedericoCeratto commented Jan 10, 2015

josch commented Jan 10, 2015

FedericoCeratto commented Jan 29, 2015

stapelberg commented Jan 29, 2015

FedericoCeratto commented Jan 29, 2015

stapelberg commented Jan 29, 2015

FedericoCeratto commented Jan 29, 2015

stapelberg commented Jan 31, 2015

FedericoCeratto commented Mar 16, 2015

stapelberg commented Sep 17, 2015

josch commented Sep 17, 2015

make more machine readable #36

make more machine readable #36

Comments

josch commented Dec 13, 2014

stapelberg commented Dec 13, 2014

FedericoCeratto commented Jan 10, 2015

josch commented Jan 10, 2015

FedericoCeratto commented Jan 29, 2015

stapelberg commented Jan 29, 2015

FedericoCeratto commented Jan 29, 2015

stapelberg commented Jan 29, 2015

FedericoCeratto commented Jan 29, 2015

stapelberg commented Jan 31, 2015

FedericoCeratto commented Mar 16, 2015

stapelberg commented Sep 17, 2015

josch commented Sep 17, 2015