Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yertl command-line script interpreter #148

Open
preaction opened this issue Dec 27, 2015 · 5 comments
Open

yertl command-line script interpreter #148

preaction opened this issue Dec 27, 2015 · 5 comments
Labels

Comments

@preaction
Copy link
Owner

We could have a yertl command line script that works similarly to logstash:

#!/usr/bin/env yertl
use ETL::Yertl 'Script'; # optional, only needed if run via perl, default if run by `yertl` command
input file => '/var/log/httpd.log'; # Tails the file and allows for rotation by default
input zeromq => 'tcp://127.0.0.1:5000'; # Multiple inputs can be specified
input stdin =>; # Find a way to fix needing to quote/=>
filter grok => '%{LOG.HTTP_COMMON}'; # Filters are run sequentially, so grok should likely come first
output sql => driver => 'SQLite', database => 'httpd.db'; # Defaults to insert
output file => '/path/to/file', format => 'yaml'; # Defaults to default Yertl format
output stdout =>; # Find a way to fix needing to quote/=>
@preaction
Copy link
Owner Author

The default input should be default, and work like Perl's ARGV (STDIN + arguments). The default output should be stdout.

filter should take a subref to allow for Perl-based filtering.

input/output should both accept filehandles. Should that be considered a file even if it's a pipe or a socket?

@preaction
Copy link
Owner Author

input/output should all have format options. The default for output should be default. The default for input is trickier... grok requires lines, everything else requires documents. Probably, for consistency, we should use default as the default input, but perhaps we should, for ease-of-use, default to lines if grok is the first filter...

@preaction
Copy link
Owner Author

In the future, it'd be nice if input/output could have attached filters. So, the filter command should return something that can be used as an argument to input/output.

@preaction
Copy link
Owner Author

It would be nice to do filtering in forked processes for performance maybe...

@preaction
Copy link
Owner Author

preaction commented Sep 20, 2018

This should still be done even as we create the new Perl API. The Script API should export three functions: input, output, and transform (which replaces the existing transform function). input registers a new input. output registers an output.transform registers a transform. Once the script is done, the pipelines are constructed:

  • All input streams are combined into a single object (create a Yertl::MultiInput class) that starts reading from each input to pipe documents through
    • If no input is defined, use the default input of @ARGV+STDIN
  • The input format is set to 'lines' if the first transform registered is a LineTransform class
  • Loop over the transforms and chain them together
  • All output streams are combined into a single object (create a Yertl::MultiOutput class) that writes to all the output streams when a document is received
    • If no output is defined, use the default output of STDOUT

Once the pipelines are constructed, the loop can be run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant