Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-8534. Enable asynchronous service logging #4663

Merged
merged 6 commits into from
Jun 22, 2023

Conversation

tanvipenumudy
Copy link
Contributor

@tanvipenumudy tanvipenumudy commented May 5, 2023

What changes were proposed in this pull request?

Currently, asynchronous service logging is not supported in Ozone. However, we do support asynchronous audit logging.
Enabling asynchronous service logging can improve performance by allowing the logging thread to continue executing without waiting for the actual logging operation to complete, reducing the impact of logging on the overall system performance.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8534

How was this patch tested?

NA

Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions:

  • Service logs are not a lot because we don't expect to log per request unless there's an unexpected issue. Audit logs are where we want to start with imho.
  • I may not have done enough homework, but I'm wondering if the custom appender to wire rolling file appender to AsyncAppender is too common to be implemented by a lib user like us. Can't it be achieved just by log4j config?
    If it's necessary, should it be in some other places than tools.

@tanvipenumudy
Copy link
Contributor Author

tanvipenumudy commented May 9, 2023

Thank you @duongkame for the review.

  • Thank you for bringing up these points - I think it is a valid point that the service logs typically do not generate a significant volume of logs unless there are unexpected issues.
  • For audit logging in Ozone, I believe it is already set to asynchronous logging by default via the Log4j 2 properties: om-audit-log4j2.properties - similarly for s3g, scm and dn.
  • Log4j offers the AsyncAppender class, which allows us to perform logging asynchronously. AsyncRFAAppender offers an advantage over AsyncAppender when it comes to combining asynchronous logging with the rolling file functionality + it provides any additional configuration options if needed.

I think we will need to reevaluate if going by the standard AsyncAppender would suffice and be a simpler implementation when compared to the custom AsyncRFAAppender if we choose to make the service logs asynchronous.

@tanvipenumudy
Copy link
Contributor Author

@jojochuang, @smengcl could you please take a look at this as well, thanks.

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tanvipenumudy . This change by itself looks fine to me.

--

Hadoop has been using Async log4j appender for NN/DN metrics logger (as least in the tests) as well:

https://github.com/apache/hadoop/blob/ff8eac517a3b86aa8167a6bc1b981d8884181ec2/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/log4j.properties#L33-L58

AsyncRFAAppender class looks the same as the Hadoop one but with some comments removed:

https://github.com/apache/hadoop/blob/ff8eac517a3b86aa8167a6bc1b981d8884181ec2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/AsyncRFAAppender.java

@smengcl smengcl marked this pull request as ready for review May 18, 2023 18:41
@jojochuang
Copy link
Contributor

jojochuang commented May 18, 2023

Hadoop's use of log4j for audit logging is archaic. We recently made a change to make it easier to migrate to log4j2.
https://issues.apache.org/jira/browse/HADOOP-18631 (two commits)

Basically, use log4j.properties to control AsyncAppender.

So, keep log4j to log4j2 migration in mind. You don't want to repeat what we went through in Hadoop.

@adoroszlai
Copy link
Contributor

Thanks @tanvipenumudy for working on this. Findbugs is failing, please check.

M M IS: Inconsistent synchronization of org.apache.hadoop.ozone.utils.AsyncRFAAppender.conversionPattern; locked 50% of time  Unsynchronized access at AsyncRFAAppender.java:[line 106]

https://github.com/tanvipenumudy/ozone/actions/runs/4891447776/jobs/8731974658#step:6:2283

@adoroszlai adoroszlai marked this pull request as draft May 19, 2023 13:27
@tanvipenumudy tanvipenumudy marked this pull request as ready for review June 14, 2023 09:07
@tanvipenumudy
Copy link
Contributor Author

Could you please take a final look at the change, @adoroszlai, @duongkame, @jojochuang, thanks!

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tanvipenumudy for updating the patch.

@adoroszlai adoroszlai merged commit 17802a0 into apache:master Jun 22, 2023
@adoroszlai
Copy link
Contributor

Thanks @tanvipenumudy for the patch, @duongkame, @jojochuang, @smengcl for the reviews.

errose28 added a commit to errose28/ozone that referenced this pull request Jun 25, 2023
* master: (96 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  HDDS-8879. Cleanup SecurityConfig and related class initialization (apache#4921)
  ...
errose28 added a commit to errose28/ozone that referenced this pull request Jun 26, 2023
* tmp-dir-refactor: (99 commits)
  HDDS-8586 Recon. - API for Count of deletePending keys and amount of data mapped to such keys. (apache#4923)
  Fix SCM HA finalization compat test
  HDDS-8908. Intermittent failure in TestBlockDeletion#testBlockDeletion (apache#4958)
  HDDS-8910. Replace LockManager with striped lock in ContainerStateManager (apache#4962)
  HDDS-8917. Move protobuf conversion out of the lock in PipelineStateManagerImpl (apache#4965)
  HDDS-8825. Use apache/hadoop 3.3.5 docker image (apache#4963)
  HDDS-8906. Avoid stream when getting in-service healthy nodes (apache#4960)
  HDDS-8907. Store volume count when storage report is updated (apache#4957)
  HDDS-8905. PipelineManager metrics should not be synchronized (apache#4959)
  HDDS-8553. Improve scanner integration tests. (apache#4936)
  HDDS-8854. Avoid unnecessary DatanodeDetails creation for NodeStateManager lookup (apache#4925)
  HDDS-8315. [Snapshot] Added unit tests for SnapshotDiffManager (apache#4716)
  HDDS-7968. [Snapshot] Improve KeyDeletingService to reclaim eligible key blocks in snapshot's deletedTable (apache#4935)
  HDDS-8838. Update default datanode check empty containter on disk to false (apache#4937)
  HDDS-8763. Support RocksDB iterator with ByteBuffer. (apache#4942)
  HDDS-8543. FSO directory should reflect bucket/cluster default replication (apache#4947)
  HDDS-8898. Replication limit should not be less than reconstruction weight (apache#4954)
  HDDS-8739. Snapdiff should return complete absolute path in Diff Entry (apache#4823)
  HDDS-8908. Mark TestBlockDeletion#testBlockDeletion as flaky
  HDDS-8534. Support asynchronous service logging (apache#4663)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants