New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make more machine readable #36

Closed
josch opened this Issue Dec 13, 2014 · 12 comments

Comments

Projects
None yet
3 participants
@josch

josch commented Dec 13, 2014

Hi,

correct me if I'm wrong but as far as I can see, the only machine readable interface in codesearch is the json results listing the packages containing a match for the current query. But obtaining this json data requires parsing the HTML of the result page. It would be nice if it was possible to make a query which returned either the link to the json or the json directly.

@stapelberg

This comment has been minimized.

Show comment
Hide comment
@stapelberg

stapelberg Dec 13, 2014

Contributor

We sure can talk about this, but first and foremost I’d like to know your use-case? :)

Contributor

stapelberg commented Dec 13, 2014

We sure can talk about this, but first and foremost I’d like to know your use-case? :)

@FedericoCeratto

This comment has been minimized.

Show comment
Hide comment
@FedericoCeratto

FedericoCeratto Jan 10, 2015

@stapelberg I'm trying to put together a simple CLI tool to perform searches like:
./codesearch_cli.py 'foo path:debian/watch'

Fetching the JSON result pages is straightforward, the tricky part is getting the queryid in the first place. @josch are you planning to do the same?

FedericoCeratto commented Jan 10, 2015

@stapelberg I'm trying to put together a simple CLI tool to perform searches like:
./codesearch_cli.py 'foo path:debian/watch'

Fetching the JSON result pages is straightforward, the tricky part is getting the queryid in the first place. @josch are you planning to do the same?

@josch

This comment has been minimized.

Show comment
Hide comment
@josch

josch Jan 10, 2015

@FedericoCeratto I explained my use case to michael in a private email which is why it's not listed here. Let me give a quick overview.

For detecting dpkg trigger cycles I need a service which provides the contents of binary control archives in a machine readable way. DCS currently neither supplies the contents of binary control files nor does it provide the queryid in a machine readable way. Which is why I ended up setting up binarycontrol.debian.net to have something working now. I'd like to move the functionality of this service to codesearch.debian.net though.

As far as the required functionality goes, the dpkg trigger cycle detection code needs:

  • finding binary packages with a specific file (DEBIAN/triggers)
  • limiting the results by suite (testing, unstable)
  • providing the contents of certain files in control metadata of a specific binary package

The binarycontrol.debian.net service currently has a simple GET interface returning plain text lists. As long as I can do searches in a machine readable way (a codesearch_cli.py script seems to do exactly that) I'll be happy.

josch commented Jan 10, 2015

@FedericoCeratto I explained my use case to michael in a private email which is why it's not listed here. Let me give a quick overview.

For detecting dpkg trigger cycles I need a service which provides the contents of binary control archives in a machine readable way. DCS currently neither supplies the contents of binary control files nor does it provide the queryid in a machine readable way. Which is why I ended up setting up binarycontrol.debian.net to have something working now. I'd like to move the functionality of this service to codesearch.debian.net though.

As far as the required functionality goes, the dpkg trigger cycle detection code needs:

  • finding binary packages with a specific file (DEBIAN/triggers)
  • limiting the results by suite (testing, unstable)
  • providing the contents of certain files in control metadata of a specific binary package

The binarycontrol.debian.net service currently has a simple GET interface returning plain text lists. As long as I can do searches in a machine readable way (a codesearch_cli.py script seems to do exactly that) I'll be happy.

@FedericoCeratto

This comment has been minimized.

Show comment
Hide comment
@FedericoCeratto

FedericoCeratto Jan 29, 2015

@stapelberg I'm still unsure on how to fetch the queryid value; without that the only viable option is to do HTML scraping, which is not the best thing and might create additional load on the service.
Any suggestion?

FedericoCeratto commented Jan 29, 2015

@stapelberg I'm still unsure on how to fetch the queryid value; without that the only viable option is to do HTML scraping, which is not the best thing and might create additional load on the service.
Any suggestion?

@stapelberg

This comment has been minimized.

Show comment
Hide comment
@stapelberg

stapelberg Jan 29, 2015

Contributor

@FedericoCeratto Hm, thinking about it, I think a command-line client should also just use the websocket interface. That way it gets streaming progress updates and results as they are available (which it can discard in the first version, but possibly preview in a later version, if desired).

What do you think?

Contributor

stapelberg commented Jan 29, 2015

@FedericoCeratto Hm, thinking about it, I think a command-line client should also just use the websocket interface. That way it gets streaming progress updates and results as they are available (which it can discard in the first version, but possibly preview in a later version, if desired).

What do you think?

@FedericoCeratto

This comment has been minimized.

Show comment
Hide comment
@FedericoCeratto

FedericoCeratto Jan 29, 2015

@stapelberg progress updates in a CLI tool seem a bit unnecessary: the wait time is usually quite short, also the user might want to pipe the output into another tool, like less, awk, grep, without receiving any additional lines other than the matching files.
Anyhow, is there documentation on how to use the websocket interface?

FedericoCeratto commented Jan 29, 2015

@stapelberg progress updates in a CLI tool seem a bit unnecessary: the wait time is usually quite short, also the user might want to pipe the output into another tool, like less, awk, grep, without receiving any additional lines other than the matching files.
Anyhow, is there documentation on how to use the websocket interface?

@stapelberg

This comment has been minimized.

Show comment
Hide comment
@stapelberg

stapelberg Jan 29, 2015

Contributor

@FedericoCeratto It’d be common to provide status updates on stderr (like curl and wget do) so that stdout is the actual output.

There is no documentation on the websocket interface yet (and don’t hold your breath), so you’ll need to look at static/instant.js for that (and/or the server side, and what your browser receives).

Contributor

stapelberg commented Jan 29, 2015

@FedericoCeratto It’d be common to provide status updates on stderr (like curl and wget do) so that stdout is the actual output.

There is no documentation on the websocket interface yet (and don’t hold your breath), so you’ll need to look at static/instant.js for that (and/or the server side, and what your browser receives).

@FedericoCeratto

This comment has been minimized.

Show comment
Hide comment
@FedericoCeratto

FedericoCeratto Jan 29, 2015

@stapelberg I have a basic version working at https://github.com/FedericoCeratto/debian-code-search-cli - I'm simply consuming all the available data from the websocket stream and then start running JSON queries. Is there a way to tell if the search has been completed before starting consuming data from the websocket? Thank you!

FedericoCeratto commented Jan 29, 2015

@stapelberg I have a basic version working at https://github.com/FedericoCeratto/debian-code-search-cli - I'm simply consuming all the available data from the websocket stream and then start running JSON queries. Is there a way to tell if the search has been completed before starting consuming data from the websocket? Thank you!

@stapelberg

This comment has been minimized.

Show comment
Hide comment
@stapelberg

stapelberg Jan 31, 2015

Contributor

@FedericoCeratto You need to consume all data. All messages contain a “Type” member, except for the search results themselves. If the Type is “progress”, the query is completed if and only if FilesProcessed == FilesTotal. After that event, you can start fetching the JSON files.

Contributor

stapelberg commented Jan 31, 2015

@FedericoCeratto You need to consume all data. All messages contain a “Type” member, except for the search results themselves. If the Type is “progress”, the query is completed if and only if FilesProcessed == FilesTotal. After that event, you can start fetching the JSON files.

@FedericoCeratto

This comment has been minimized.

Show comment
Hide comment
@FedericoCeratto

FedericoCeratto Mar 16, 2015

Thanks for your help - the script seems to be working now!

FedericoCeratto commented Mar 16, 2015

Thanks for your help - the script seems to be working now!

@stapelberg

This comment has been minimized.

Show comment
Hide comment
@stapelberg

stapelberg Sep 17, 2015

Contributor

@josch Sorry for dropping the ball for so long on this issue. With the tool that Federico implemented, is there still anything left to be done, or can we close this issue now?

Contributor

stapelberg commented Sep 17, 2015

@josch Sorry for dropping the ball for so long on this issue. With the tool that Federico implemented, is there still anything left to be done, or can we close this issue now?

@josch

This comment has been minimized.

Show comment
Hide comment
@josch

josch Sep 17, 2015

Yes. Thank you!

josch commented Sep 17, 2015

Yes. Thank you!

@josch josch closed this Sep 17, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment