Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Counter output even when TypedPipe has no elements (tallyAll) #1875

Open
connord-stripe opened this issue Oct 18, 2018 · 1 comment
Open

Comments

@connord-stripe
Copy link

I noticed that using the new tally API for counters won't create a new diagnostic counter (if I understand the naming correctly) if the associated TypedPipe has 0 elements. I'd prefer if this made the counter and just put the result as 0.

The following repro works for me, assuming you have an input that contains data. I'm looking in our jobtracker (timberlake) to identify info about the counters. If you replace the false below with true, the counter name will indeed appear in our jobtracker.

def execution: Execution[unit] =
...
      val output = TypedPipe.from(WritableSequenceFile[BytesWritable, BytesWritable](input))
        .filter( _ => false )
        .tallyAll("Counts", "testtesttest")
        .writeExecution(WritableSequenceFile[BytesWritable, BytesWritable](nestedOutput))

      Execution
        .sequence(List(output)) // could likely be simplified
        .unit
    }

Also a small nit: The current docs (https://twitter.github.io/scalding/api/#com.twitter.scalding.typed.TypedPipe) don't seem to have any reference to the tally functionality, seems like it might need to be regenerated?

@johnynek
Copy link
Collaborator

here is where counters are incremented:

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/Operations.scala#L694

In the (currently non-existent) cleanup method, if we know all the possible counters that could be updated, we could increment them with 0. Maybe that would work.

The bigger issue is that currently, we allow totally dynamic counters, even though many users make them statically. I think we can pattern match to discover the static cases since we use a case class, I think, to make the TupleConverter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants