Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"papertrail-archives" archive file downloader #58

Open
troy opened this issue Nov 5, 2015 · 0 comments
Open

"papertrail-archives" archive file downloader #58

troy opened this issue Nov 5, 2015 · 0 comments

Comments

@troy
Copy link
Contributor

troy commented Nov 5, 2015

Right now, there's no way to bulk-download many archive files at once. The current method is a shell script. This is to create a new papertrail-archives CLI program which downloads either the newest N archive files or all archives before/after/between date(s)/time(s).

Papertrail has an API endpoint at /api/v1/archives.json which returns this response. It uses the same authentication as the other API endpoint.

See bin/papertrail-add-group and lib/papertrail/cli_add_group.rb for examples how to structure a new standalone command and supporting library. This will be bin/papertrail-archives and something like lib/papertrail/cli_archives.rb.

Arguments

It requires 1 of these 3 arguments:

--newest N
--min-time MIN
--max-time MAX

Examples:

papertrail-archives --newest 15
papertrail-archives --min-time '2015-01-15 00:00:00' 
papertrail-archives --min-time '2015-01-15 00:00:00' --max-time '2015-01-19'

Date parsing

Use the same behavior as the min-time and max-time arguments to the main papertrail command. That is, call parse_time on the CLI input and then pass those return values as API query params.

The client doesn't need to do any time comparisons nor anything beyond what's already in the codebase.

API client

This will entail a new lib/archives.rb for doing the API query (very similar to lib/search_query.rb), and probably a new lib/archive.rb model (very similar to lib/event.rb).

Note: The server doesn't currently honor min_time and max_time query parameters. Those will be added before this is released.

Behavior

  • If newest is given, hit the API target with no params and download the first n elements/files in the response array
  • If min-time and/or max-time are given, pass min_time and/or max_time to server and download all resulting elements/files. The the server will handle the time constraints.
  • If newest and either of the other args are given, or no arguments are given, refuse to run

In the API response, the URL to download a given archive file is in the download href hypermedia URL - example.

Downloads

Files may be large. I propose adding something like this as a new HttpClient.download method, since it only uses Net::HTTP but writes the files gradually.

Writing files

  • Write to current directory using the filename API response attribute. The filename values are unique.
  • Output a line each time a file completes. We'll figure out what it says later, but it'll probably just be the datestamp.

Notes

  • All of the download URLs use at least 1 redirect
@troy troy changed the title Implement archive downloader "papertrail-archives" archive file downloader Nov 5, 2015
@zakwilson zakwilson mentioned this issue Nov 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant