Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cascading counters in ESTap #87

Closed
ppat opened this issue Sep 14, 2013 · 5 comments
Closed

cascading counters in ESTap #87

ppat opened this issue Sep 14, 2013 · 5 comments

Comments

@ppat
Copy link

ppat commented Sep 14, 2013

Can we add counters (FlowProcesss.increment) to ESTap to indicate how many tuples were attempted, how many were successfully added to index, how many timed out, how many failed, etc?

@costin
Copy link
Member

costin commented Oct 24, 2013

Can you expand on this? As far as I know, Cascading already does this through the HadoopTupleEntrySchemeCollector (they are stored under cascading.flow.SliceCounters) - which ESTap uses automatically (in case of Hadoop).
Are these not enough for you?

@ppat
Copy link
Author

ppat commented Oct 24, 2013

I was not aware of that, but we have both Hadoop ES Taps and Local ES Taps (for smaller/faster indices). Are there corresponding slice counters for Local Mode too? Also do they Slice Counters correspond to # of docs that were indexed? What about failures, etc?

I can't seem to find mention of Slice Counters in Cascading docs, I will look them up when I get a second in Cascading source code to see whether they support what I need, but I wanted to give you a quick feedback.

@costin
Copy link
Member

costin commented Oct 24, 2013

Can't comment on the local support but I assume that is supported as well. Quickly browsing through the source indicates that SliceCounter and StepCounters are used by the FlowStream (through SinkStage and SourceStage).

I'd be happy to provide some support for it but generally speaking, monitoring happens best close to the source (i.e. within Hadoop and/or Cascading).
As for the way the counters are used - the tap records reads/writes to/from ES - there's no notion of failure since, if that occurs, the job simply fails (as oppose to moving on and disregarding some tuples).

bwmeier pushed a commit to bwmeier/elasticsearch-hadoop that referenced this issue Dec 19, 2013
dump typoinfo in favor of object inspector as the type does not change but its
format can causing issue. the oi variant should be just as fast and also provide
better interoperability as we're using the provider code instead of guessing its format

fix elastic#87
@costin
Copy link
Member

costin commented Feb 20, 2014

Hi.

I've added stats for Hadoop (see #141) and I'm currently looking into provided dedicated stats for Cascading as well. Should be shortly in master.

@costin
Copy link
Member

costin commented Mar 18, 2014

Done. In Hadoop mode, reporting happens through the Hadoop infrastructure while in local mode, they are reported directly to Cascading.
In both cases, they are available through Flow#getStats()

costin added a commit that referenced this issue Apr 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants