Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scoped context #2438

Merged
merged 21 commits into from Apr 15, 2024
Merged

Scoped context #2438

merged 21 commits into from Apr 15, 2024

Conversation

rgoers
Copy link
Member

@rgoers rgoers commented Apr 3, 2024

This adds support for a ScopedContext and a ResourceLogger as requested in apache/logging-log4j-kotlin#71 and similar to #2214

Support for running child threads and propagating the ThreadContext and ScopedContext was added.
ThreadContext, ScopedContext, and custom ContextDataProviders can all be accessed via ContextData.getValue(key) in log4j-api. This necessitated moving ContextDataProvider support into the API.

Also note that ScopedContext supports arbitrary data objects being added and retrieved. However, the overall ContextMap (a logical entity when accessed from the ContextData class and a real entity in the LogEvent) still only supports Strings since that is the lowest common denominator.

Example usage for wrapping a simple code block.

ScopedContext.where("id", UUID.randomUUID())
    .where("ipAddress", request.getRemoteAddr())
    .where("loginId", session.getAttribute("loginId"))
    .where("hostName", request.getServerName())
    .run(new Worker());

private class Worker implements Runnable {
    private static final Logger LOGGER = LogManager.getLogger(Worker.class);

    public void run() {
        LOGGER.debug("Performing work");
        String loginId = ScopedContext.get("loginId");
    }
}

Creating Threads:

BlockingQueue<Runnable> workQueue = new ArrayBlockingQueue<>(5);
ExecutorService executorService = new ThreadPoolExecutor(1, 2, 30, TimeUnit.SECONDS, workQueue);
Future<?> future = ScopedContext.where("id", UUID.randomUUID())
    .where("ipAddress", request.getRemoteAddr())
    .where("loginId", session.getAttribute("loginId"))
    .where("hostName", request.getServerName())
    .run(executorService, new Worker());
try {
    future.get();
} catch (ExecutionException ex) {
    logger.warn("Exception in worker thread: {}", ex.getMessage());
}

Note that this syntax supports virtual threads.

ResourceLogger can be used to include resource data in all log events coming from a specific Logger.

     private class User {

        private final String loginId;
        private final String role;
        private int loginAttempts;
        private final ResourceLogger logger;

        public User(final String loginId, final String role) {
            this.loginId = loginId;
            this.role = role;
            logger = ResourceLogger.newBuilder()
                .withClass(this.getClass())
                .withSupplier(new UserSupplier())
                .build();
        }

        public void login() throws Exception {
            ++loginAttempts;
            try {
                authenticator.authenticate(loginId);
                logger.info("Login succeeded");
            } catch (Exception ex) {
                logger.warn("Failed login");
                throw ex;
            }
        }


        private class UserSupplier implements Supplier<Map<String, String>> {

            public Map<String, String> get() {
                Map<String, String> map = new HashMap<>();
                map.put("LoginId", loginId);
                map.put("Role", role);
                map.put("Count", Integer.toString(loginAttempts));
                return map;
            }
        }
    }

Data from the UserSupplier will appear as context data in every log event that uses the Logger. ResourceLogger is able to do this by using a ScopedContext internally.

Note that instead of ContextDataProviders returning their respective maps, the context map being constructed is now passed to each provider to add its own keys and values. This inherently makes building the context map garbage free, which was creating a spaghetti nightmare previously.

With the move of ContextDataProvider to the API ContextDataInjectors are no longer needed. They are deprecated in this PR to be removed in 3.0.0. Note that with this PR they are never used unless a user specifies the system property to declare one.

@ppkarwasz
Copy link
Contributor

For completeness' sake, there is a related OpenTelemetry proposal: open-telemetry/oteps#207.

@ppkarwasz
Copy link
Contributor

@rgoers,

Can you look at #2442? I think that your ScopedContext#runWhere implementation could be replaced by calls to the new ThreadContextMap#restore and ThreadContextMap#putAll methods, as shown in this test:

assertThat(executorService.submit(() -> {
threadContext.put(KEY, newThreadValue);
final Object newThreadSaved = threadContext.restore(mainThreadSaved);
try {
assertThat(threadContext.get(KEY)).isEqualTo(mainThreadValue);
throw throwable;
} catch (final RuntimeException e) {
assertThat(e).isSameAs(throwable);
assertThat(threadContext.get(KEY)).isEqualTo(mainThreadValue);
} finally {
threadContext.restore(newThreadSaved);
}
assertThat(threadContext.get(KEY)).isEqualTo(newThreadValue);
}))
.succeedsWithin(Duration.ofSeconds(1));

where threadContext is an instance of ThreadContextMap.

What do you think?

@rgoers
Copy link
Member Author

rgoers commented Apr 4, 2024

@ppkarwasz
I looked at the Open Telemetry proposal. While the naming might give the impression it is similar to ScopedContext they are really unrelated. The Open Telemetry proposal is ONLY for telemetry information, not data used by the application.
I looked at 2442. To be honest I have no idea what use case it solves but I don't see how it would be useful for ScopedContext. ThreadContextMap is primarily Map<String, String> where ScopedContext is Map<String, Object>. Furthermore, there are so many ThreadContextMap implementations I could easily see saving one and then trying to "restore" it into a different Map implementation (somehow). The ThreadContextMap situation is such a mess I have given up trying to understand it.

Looking a second time at your example I now see what you are attempting to do but I don't think you even need to modify ThreadContextMap to do that. I could just do:

Map<String, String> oldContext = ThreadContext.getContext();
try {
    ThreadContext.clearMap();
    ThreadContext.putAll(newMap);
    op.run();
} finally {
    ThreadContext.clearMap();
    ThreadContext.putAll(oldContext);
}    

To provide more configurability for the `ScopedContext` service, this PR
moves its implementation details to `log4j-core` and replaces it with a
`ScopedContextProvider` interface.

In Log4j API only a NO-OP version of the provider is present, but each
implementation of the API can provide its own.
@ppkarwasz
Copy link
Contributor

Would you consider merging #2445 into this PR?

@ppkarwasz
Copy link
Contributor

Looking a second time at your example I now see what you are attempting to do but I don't think you even need to modify ThreadContextMap to do that. I could just do:

Map<String, String> oldContext = ThreadContext.getContext();
try {
    ThreadContext.clearMap();
    ThreadContext.putAll(newMap);
    op.run();
} finally {
    ThreadContext.clearMap();
    ThreadContext.putAll(oldContext);
}    

Sure, this is possible without any changes to ThreadContextMap, but it is not necessarily efficient: depending on the implementation it might need to convert the internal representation to Map and create copies. Transferring data between threads is a rare, but expensive operation, so we might consider optimizing it.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this version of ScopedContext much better.

Note: could we remove ResourceLogger from this PR, so that we can iron the details of ScopedContext first?

@@ -20,7 +20,7 @@
*/
@Export
/**
* Bumped to 2.22.0, since FormattedMessage behavior changed.
* Bumped to 2.24.0, since FormattedMessage behavior changede.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be reverted, since the PR does not contain any changes to the o.a.l.l.message package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was merging I found you had changed the version in 2.x with this message. I had changed the message and was getting a conflict. Obviously my attempt to reconcile the conflict wasn't successful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we should remove ResourceLogger from the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that ResourceLogger should allow multiple implementations that can be changed via configuration.

The version in this PR puts the map in the ScopedContext as @rocketraman requires, but as you commented yourself, it also makes sense to use a MapMessage. I think companies should be able to change the behavior of ResourceLogger without changing a single line of code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about this. If I wanted to support the prior version of ResourceLogger I would implement ResourceMessageLogger.or similar. I wouldn't allow ResourceLogger to be overridden. As it stands the ResourceLogger class itself is only 25 lines including comments and blank lines. Heck, the builder class is larger. I just don't see the value in making this more complicated than it needs to be.

Comment on lines 157 to 160
@SuppressWarnings("unchecked")
public static <T> T get(String key) {
return (T) provider.getValue(key);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method should be in the ScopedContext class: having ScopedContextProvider#getValue is enough for integrators to extract values from the context, but normal users should not use ScopedContext as a generic service to set values that they retrieve further down the stack. If the value is important, they should pass it as method parameter.

Note that the absence of getters is in line with what OpenTelemetry does for tracing: Span has no getters. As far as I understood that is one of the reasons they invented Baggage.

My rule of thumb is:

  • for technical debt, baggage and similar, users should use Baggage or the (soon to be deprecated) ThreadContext,
  • for pure logging purposes they should use ScopedContext instead.

BTW: the signature of this method is terrible: it is a ClassCastException waiting to happen and Error Prone will soon point it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs to be in the API. Otherwise it will drastically reduce the usage of the class. As far as OpenTelemetry goes, it is solving a different problem then we are so I am not willing to base decisions we make on what they have decided to do.

As far as the signature goes, you are correct. I will change it to return Object, which is what the users are storing. That makes it slightly less convenient but they can provide the sugar to make it prettier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs to be in the API. Otherwise it will drastically reduce the usage of the class. As far as OpenTelemetry goes, it is solving a different problem then we are so I am not willing to base decisions we make on what they have decided to do.

I don't base my decision on OTel, but I happen to agree with them. Can you give a logging-related use case, where a user needs to retrieve a key he previously stored in the context map?

I know our opinions differ, but binary compatibility makes the decision of including get or not asymmetrical: if we remove it, we can add it in 2.25.0; if we add it now, we can not remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two examples:

  1. https://github.com/apache/logging-log4j-audit/blob/master/log4j-audit/log4j-audit-api/src/main/java/org/apache/logging/log4j/audit/rest/RequestContextHeaderInterceptor.java - This supports the ThreadContext. I would need to modify it to support the ScopedContext.
  2. We frequently include the tenant id in the ThreadContextMap. We then might have an endpoint that is /dosomething/current. That means that instead of passing the tenant id in the url the endpoint should retrieve it from the ThreadContextMap. If ScopedContext is used instead then it would need to be able to get the value. Likewise, we may use a datasource or some other resource based on the tenantId. We do NOT pass the tenant id as a parameter through each and every call in the system so we need to be able to retrieve it.

Again, everyone who has requested that the MDC support objects wants that so they can retrieve them. There is no other logical reason they would want that since logging can only log them as strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We frequently include the tenant id in the ThreadContextMap. We then might have an endpoint that is /dosomething/current. That means that instead of passing the tenant id in the url the endpoint should retrieve it from the ThreadContextMap.

@rgoers, with all due respect, this is a very bad idea. Your application is (ab)using the logging framework for context propagation in between its components. What is your application gonna use ScopedContext for next? Passing the total cost while issuing invoices? What is Log4j gonna support next? Introducing SystemContext as an in memory key value store?

I understand the use case and I agree that ScopedContext of Log4j can help there, but it doesn't mean that it should. MDC has a getter and that was a mistake. If you need a [scoped] context propagation mechanism, I would suggest implementing it as a separate library engineered for that particular use case. I am in favor of only dealing with logging in Log4j and nothing else.

Copy link
Member Author

@rgoers rgoers Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vy with all due respect, get off your high horse. Applications have been using the MDC this way for years and will also use the ScopedContext to do the same, only in a safer manner. They (and I) are not going to stop doing it just because you say so. You are about 20 years too late to argue against the MDC. Arguing against a get method in ScopedContext for this reason is simply silly because all you are accomplishing is saying that users have to continue using something that requires much more care to use safely.

I should note that the reason for doing this instead of using a custom ThreadLocal is really, really simple. As a user I WANT to include the data in logs AND I want to be able to access it from my app. I do NOT want to have to store it in the ThreadContext AND in a private ThreadLocal.

One other thought. %X{tenantId} doesn't work without a get method. Neither does a Filter. Also note that %X will retrieve values in the ScopedContext - See https://github.com/apache/logging-log4j2/blob/ScopedContext/log4j-core/src/main/java/org/apache/logging/log4j/core/impl/ContextData.java#L100 and https://github.com/apache/logging-log4j2/blob/ScopedContext/log4j-core/src/main/java/org/apache/logging/log4j/core/lookup/ContextMapLookup.java#L45 and https://github.com/apache/logging-log4j2/blob/ScopedContext/log4j-core/src/main/java/org/apache/logging/log4j/core/impl/ScopedContextDataProvider.java#L40. It is all exrremely simple. Also note that ANY ContextDataProvider will now be included, even custom ones provided by users, but the get method has to be implemented to work. The interface provides a default method but that will usually be overridden as it is generally going to be less efficient.

FYI - See

Usages of get

Usages of getting the context map - (If you claim that this is somehow "different" then you will have to explain how since I can still get an individual value this way except it is slower). Also note that I will definitely modify log4j-audit to support ScopedContext.

Not that it matters, but the original 1.2 API supported objects - https://logging.apache.org/log4j/1.x/apidocs/org/apache/log4j/MDC.html.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the questions about the lack of Object support in ThreadContext I have seen were logging-related (e.g. this SO question).

In those cases users don't need to retrieve the object value they put in the thread context, but they want to see an appropriate JSON representation of the value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppkarwasz We could go round and round on this for days. The fact of the matter is the get method is required to be able to avoid having to copy the map regardless of whether it is an object or a map. Being able to support objects is just a bonus. However, I would NEVER recommend that the object be converted to JSON on every logging call. The JSON String should be cached - which is one of the reasons I wanted the Renderable interface as toString wouldn't obviously do that.

As for how MDC data gets used outside of logging - in this link you will find many examples - https://www.tabnine.com/code/java/methods/org.slf4j.MDC/get

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgoers,

Are we talking about ScopedContext#get or ScopedContextProvider#getValue? The latter is required by many logging-related components, the former isn't! The only reason the method is there is to allow users that abused MDC as general-purpose key-value store, then did the same thing with ThreadContext to continue this dubious practice with ScopedContext.

Copy link
Member Author

@rgoers rgoers Apr 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew you were going to say that. However, I simply don't consider it a dubious practice and, in fact, recommend it. https://logging.apache.org/log4j-audit/latest/requestContext.html. I simply see no point in requiring users who are passing data from client to server to have to store the object in a ThreadContext or ScopedContext and also store it in their own ThreadLocal. It simply makes no sense. Making them have to use the Provider to access it is just confusing to users.

@ppkarwasz ppkarwasz assigned ppkarwasz and rgoers and unassigned ppkarwasz Apr 5, 2024
@rgoers
Copy link
Member Author

rgoers commented Apr 6, 2024

At this point I am very happy with where this is and am prepared to merge. If you have technical reasons as to why you would veto the commit please let me know.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in my comments above, I disagree on many aspects (like having the default implementation in log4j-api), but I won't stand in your way if you want to merge it.

Here are some minor comments on the implementation details:

Comment on lines +42 to +49
ScopedContext.where("key1", "Log4j2").run(() -> assertThat(ScopedContext.get("key1"), equalTo("Log4j2")));
ScopedContext.where("key1", "value1").run(() -> {
assertThat(ScopedContext.get("key1"), equalTo("value1"));
ScopedContext.where("key2", "value2").run(() -> {
assertThat(ScopedContext.get("key1"), equalTo("value1"));
assertThat(ScopedContext.get("key2"), equalTo("value2"));
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests could use ScopedContextProviderSuite. The idea behind introducing that class in my PR was as a sort of mini TCK that all implementations must pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking into doing this. I found it easier to modify an actual test class than modify a base class and a pair of test classes for the initial testing.

@@ -20,7 +20,7 @@
*/
@Export
/**
* Bumped to 2.22.0, since FormattedMessage behavior changed.
* Bumped to 2.24.0, since FormattedMessage behavior changede.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that ResourceLogger should allow multiple implementations that can be changed via configuration.

The version in this PR puts the map in the ScopedContext as @rocketraman requires, but as you commented yourself, it also makes sense to use a MapMessage. I think companies should be able to change the behavior of ResourceLogger without changing a single line of code.

* @return The current context Instance.
*/
private Optional<Instance> getContext() {
return Optional.ofNullable(scopedContext.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might allocate an instance of Optional at each call. Maybe we could return a constant:

private final ScopedContext.Instance EMPTY_INSTANCE = new Instance(this, Collections.emptyMap());

if nothing is attached to the ThreadLocal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is included in the #2450 proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it will return a static empty Optional if the value is null. Returning an empty instance just confuses things.

@rgoers
Copy link
Member Author

rgoers commented Apr 11, 2024

I have done major refactoring to the PR based on some of your comments. So I am not sure how much of the comments above now apply.

I should like to point out that when I started with this ScopedContext was a single, relatively simple class. It is now several classes that, while they may improve flexibility that is likely to never be needed, also make the code pretty complicated.

@jvz
Copy link
Member

jvz commented Apr 12, 2024

Would it be possible to key a scoped context on something besides ThreadContext? One idea I had was to use a span id (like from OpenTelemetry) as the key for the logical context associated with the span. Then it becomes a matter of supplying the current span id in order to look up or modify the context data. While servlet-style processing would essentially pair span ids to thread ids in a one-to-one fashion, other execution models like reactive streams, coroutines, and virtual threads, could use different bookkeeping mechanisms for associating a span with some execution context.

@rgoers
Copy link
Member Author

rgoers commented Apr 13, 2024

@jvz I don't understand your comment Matt. While a ScopedContext performs a similar function as the ThreadContext (but in a much safer way) it shares no code with it at all. Did you really mean ThreadLocal instead of ThreadContext?

Note that due to the refactoring suggested by Piotr it is now possible to replace the "anchor" mechanism. That said, I am NOT in favor of requiring any third party library with Log4j API. But if you wanted to do that for the implementation provided in Log4j Core I'd be more open to that - especially if it only used that implementation if the dependency was available.

@rgoers rgoers merged commit 86ff06c into 2.x Apr 15, 2024
9 checks passed
@rgoers rgoers deleted the ScopedContext branch April 15, 2024 06:09
@jvz
Copy link
Member

jvz commented Apr 16, 2024

Yes, I meant ThreadLocal, oops. What I'm hoping for is an API in Log4j that can be easily implemented to support other runtimes without Log4j needing to be aware of them or depend on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants