cascading counters in ESTap #87

ppat · 2013-09-14T19:44:20Z

Can we add counters (FlowProcesss.increment) to ESTap to indicate how many tuples were attempted, how many were successfully added to index, how many timed out, how many failed, etc?

costin · 2013-10-24T18:08:21Z

Can you expand on this? As far as I know, Cascading already does this through the HadoopTupleEntrySchemeCollector (they are stored under cascading.flow.SliceCounters) - which ESTap uses automatically (in case of Hadoop).
Are these not enough for you?

ppat · 2013-10-24T18:28:32Z

I was not aware of that, but we have both Hadoop ES Taps and Local ES Taps (for smaller/faster indices). Are there corresponding slice counters for Local Mode too? Also do they Slice Counters correspond to # of docs that were indexed? What about failures, etc?

I can't seem to find mention of Slice Counters in Cascading docs, I will look them up when I get a second in Cascading source code to see whether they support what I need, but I wanted to give you a quick feedback.

costin · 2013-10-24T18:39:41Z

Can't comment on the local support but I assume that is supported as well. Quickly browsing through the source indicates that SliceCounter and StepCounters are used by the FlowStream (through SinkStage and SourceStage).

I'd be happy to provide some support for it but generally speaking, monitoring happens best close to the source (i.e. within Hadoop and/or Cascading).
As for the way the counters are used - the tap records reads/writes to/from ES - there's no notion of failure since, if that occurs, the job simply fails (as oppose to moving on and disregarding some tuples).

dump typoinfo in favor of object inspector as the type does not change but its format can causing issue. the oi variant should be just as fast and also provide better interoperability as we're using the provider code instead of guessing its format fix elastic#87

costin · 2014-02-20T23:17:07Z

Hi.

I've added stats for Hadoop (see #141) and I'm currently looking into provided dedicated stats for Cascading as well. Should be shortly in master.

costin · 2014-03-18T12:57:18Z

Done. In Hadoop mode, reporting happens through the Hadoop infrastructure while in local mode, they are reported directly to Cascading.
In both cases, they are available through Flow#getStats()

fix #87

costin added the v1.3.0.M3 label Feb 20, 2014

costin closed this as completed in 26ff9a9 Mar 18, 2014

costin added a commit that referenced this issue Apr 8, 2014

Expose es-hadoop stats in Local Cascading mode

de47cc0

fix #87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cascading counters in ESTap #87

cascading counters in ESTap #87

ppat commented Sep 14, 2013

costin commented Oct 24, 2013

ppat commented Oct 24, 2013

costin commented Oct 24, 2013

costin commented Feb 20, 2014

costin commented Mar 18, 2014

cascading counters in ESTap #87

cascading counters in ESTap #87

Comments

ppat commented Sep 14, 2013

costin commented Oct 24, 2013

ppat commented Oct 24, 2013

costin commented Oct 24, 2013

costin commented Feb 20, 2014

costin commented Mar 18, 2014