Skip to content

Conversation

@tz70s
Copy link
Contributor

@tz70s tz70s commented Sep 3, 2018

Prior error message doesn't put error tag into failed trace.
This is nice to have and useful on trouble shooting within request context via modern UI in Jaegar or Zipkin.

With a minor change in primitive action flow which didn't enclose with failed trace.

I'm not sure folks are consistent with tracing system because of conflicting goals that we already have transid in log. IMHO, at least, it shouldn't make traces confused once if user decides to take it into system: whether there's an accurate error message existed or not, we should know which one get failed.

Semantics followed by:
https://github.com/opentracing/specification/blob/master/semantic_conventions.md

Snapshots in Jaegar:

firstpage
second

Signed-off-by: Tzu-Chiao Yeh su3g4284zo6y7@gmail.com

Description

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

Prior error message doesn't put error tag into failed trace.
This is nice to have and useful on trouble shooting within request context via modern UI in Jaegar or Zipkin.

With a minor change in primitive action flow which didn't enclose with failed trace.

Signed-off-by: Tzu-Chiao Yeh <su3g4284zo6y7@gmail.com>
@codecov-io
Copy link

Codecov Report

Merging #4000 into master will decrease coverage by 5.08%.
The diff coverage is 46.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4000      +/-   ##
==========================================
- Coverage    85.9%   80.82%   -5.09%     
==========================================
  Files         146      146              
  Lines        7095     7102       +7     
  Branches      443      439       -4     
==========================================
- Hits         6095     5740     -355     
- Misses       1000     1362     +362
Impacted Files Coverage Δ
...la/src/main/scala/whisk/common/TransactionId.scala 93.44% <100%> (ø) ⬆️
...ala/whisk/common/tracing/OpenTracingProvider.scala 21.15% <14.28%> (-1.77%) ⬇️
...isk/core/controller/actions/PrimitiveActions.scala 88.4% <71.42%> (-1.23%) ⬇️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
.../scala/src/main/scala/whisk/utils/Exceptions.scala 20% <0%> (-20%) ⬇️
...whisk/connector/kafka/KafkaProducerConnector.scala 75.67% <0%> (-16.22%) ⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0242a60...e3c3899. Read the comment docs.

@chetanmeh
Copy link
Member

Surfacing error state in trace would be useful. @sandeep-paliwal Thoughts?


postedFuture andThen {
case Success(_) => transid.finished(this, startLoadbalancer)
case Failure(e) => transid.failed(this, startLoadbalancer, e.getMessage, logLevel = ErrorLevel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error log level be eliminated here? An activation failure doesn't always imply a system failure

Copy link
Contributor Author

@tz70s tz70s Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding, we can use the default warn level. Or I can match exception types for different levels. However, I think logging here is quite redundant b.c. we've already logged this at doInvoke.

Should this get changed as well?
https://github.com/apache/incubator-openwhisk/blob/0242a60aeb976d070abda76a8ac2aa7e3306e393/core/controller/src/main/scala/whisk/core/controller/Actions.scala#L292

BTW, in opentracing, there's no standardized warn tag provided, is this fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using the standard warn level would be fine.

The snippet you provided is depicting the system error, so the error log level in that place is fine IMO.

As long as it doesn't disturb anyone else, I'm fine with leaving the err tag for the tracing. My main point was to avoid the log overflood with the messages produced by users, since it could significantly hamper the performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it's a system error, thanks!

}
} andThen {
case Success(_) => transid.finished(this, startActivation)
case Failure(e) => transid.failed(this, startActivation, e.getMessage, logLevel = ErrorLevel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question regarding the log level...

@sandeep-paliwal
Copy link
Contributor

@chetanmeh agree. Having error information in trace for failed Invocations will be very useful.


private def setErrorTags(span: Span, message: => String): Unit = {
span.setTag("error", true)
span.setTag("message", message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag is usually added to make it easy to search for a specific span. Since key in this case is always going to be "error" do you think its more appropriate case for span.log() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sandeep-paliwal, thanks for suggestion.
I think the big difference between tag and log is timestamp; as you mentioned, the tag is much easier to search for a specific span, or make it explicit on what's going on.

I've tried capturing errors guidance in open tracing data model standard. This is not that useful to know that what's really going on in Jaegar.

Some code snippet:

private def setErrorTags(span: Span, message: => String): Unit = {
  span.log(Map("event" -> "error", "message" -> message).asJava)
}

Snapshot:
1
The error is not explicit.

2

Am I missing something? If not, I think an alternative way is still use error as tag, and attach the log that we can eliminate the message tag field, if this is necessary. WDYT?

i.e.

private def setErrorTags(span: Span, message: => String): Unit = {
  span.setTag("error", true)
  span.log(Map("event" -> "error", "message" -> message).asJava)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see that just adding log to trace is not that helpful in UI. I guess using tag has a better option then. Is there any use case to search for all failed invocations with a specific error? In that case error message as tag will also be useful compared to log().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it might be useful if someone knows internal error messages. One drawback is polluting tracing system if it's mixed with some other systems besides OpenWhisk, but it can be filtered by module though. I think we can keep it as tag for advanced usage as you mentioned.

Signed-off-by: Tzu-Chiao Yeh <su3g4284zo6y7@gmail.com>
Copy link
Member

@chetanmeh chetanmeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chetanmeh chetanmeh merged commit 77884e5 into apache:master Sep 25, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
Put error message as tag into failed trace. This is nice to have and useful on trouble shooting within request context via modern UI in Jaegar or Zipkin.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants