New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default ApplicationContext to use AsyncLocal instead of TLS #729

Closed
j055 opened this Issue Jun 11, 2017 · 20 comments

Comments

Projects
None yet
2 participants
@j055

j055 commented Jun 11, 2017

'Using CSLA 4 Data Access' says the following about ClientContext;

Like the LocalContext value, ClientContext is maintained on the current thread or HttpContext

However this is not the case in a console application.

In my tests ApplicationContext.User, LocalContext and GlobalContext are only available on the current thread but setting ClientContext overwrites data on all other threads. This is definitely not what you want.

@j055 j055 changed the title from ClientContext is not maintained only on the current thread. to ApplicationContext.ClientContext is NOT thread safe. Jun 22, 2017

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 7, 2017

Member

This depends on your platform, something that wasn't true when the book was written.

The UWP platform, for example, doesn't have the concept of thread-local storage, so the values are in a static location, which affects all threads.

It all depends on which IContextManager is being used at runtime. There are managers for ASP.NET, WPF, UWP, and now (in 4.6.600) for Windows Forms because Microsoft changed how they manage the CurrentPrincipal value (unfortunately).

Member

rockfordlhotka commented Jul 7, 2017

This depends on your platform, something that wasn't true when the book was written.

The UWP platform, for example, doesn't have the concept of thread-local storage, so the values are in a static location, which affects all threads.

It all depends on which IContextManager is being used at runtime. There are managers for ASP.NET, WPF, UWP, and now (in 4.6.600) for Windows Forms because Microsoft changed how they manage the CurrentPrincipal value (unfortunately).

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Jul 8, 2017

I'm just talking about the default implementation, the one in the csla.dll.

Read my last paragraph. If globalcontext can work on it's own thread so can clientcontext. This is a bug.

j055 commented Jul 8, 2017

I'm just talking about the default implementation, the one in the csla.dll.

Read my last paragraph. If globalcontext can work on it's own thread so can clientcontext. This is a bug.

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 8, 2017

Member

Looking back through the history, I think the bug is in the book's prose, not in the code.

The behavior of ClientContext has always been at the AppDomain level rather than per-thread.

https://github.com/MarimerLLC/csla/blame/v4.5.701/Source/Csla/ApplicationContext.cs#L860

I'm not eager to change it now, because platforms like UWP and Xamarin don't support per-thread storage, so in reality the current console app behavior is closer to the other modern platforms as-is.

Member

rockfordlhotka commented Jul 8, 2017

Looking back through the history, I think the bug is in the book's prose, not in the code.

The behavior of ClientContext has always been at the AppDomain level rather than per-thread.

https://github.com/MarimerLLC/csla/blame/v4.5.701/Source/Csla/ApplicationContext.cs#L860

I'm not eager to change it now, because platforms like UWP and Xamarin don't support per-thread storage, so in reality the current console app behavior is closer to the other modern platforms as-is.

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Jul 8, 2017

For me it's more about why LocalContext and GlobalContext work single threaded in a console app and ClientContext does not than how aligned a console app is with the other modern platforms.

I think console apps are more likely to be used in a multi-user environment. Azure Functions or WebJobs are a modern example of where they can be used.

I realize the ClientContext implementation has been around for a long time but I worry about the implementation inconsistencies between the other similar context classes (it's caught me out big-time) and like the idea that CSLA can be used flexibly in a single user and multi-user environment.

j055 commented Jul 8, 2017

For me it's more about why LocalContext and GlobalContext work single threaded in a console app and ClientContext does not than how aligned a console app is with the other modern platforms.

I think console apps are more likely to be used in a multi-user environment. Azure Functions or WebJobs are a modern example of where they can be used.

I realize the ClientContext implementation has been around for a long time but I worry about the implementation inconsistencies between the other similar context classes (it's caught me out big-time) and like the idea that CSLA can be used flexibly in a single user and multi-user environment.

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 10, 2017

Member

I'm not interested in changing the console app scenario - I guess I'm not sure how a console app becomes multiuser unless you start self-hosting http endpoints or something?

Clearly if the same AppDomain is used across multiple Azure Function or WebJob services instances then this would be a problem.

Certainly one increasingly common way to create microservices is to build them as a console app that is hosted in Docker. But in that case each service instance gets its own Docker container (isolation) and its own process within the container (isolation). They are written in .NET Core (at least for Linux containers), not full .NET, so there is no AppDomain, and so in this case they use yet another IContextManager implementation that's unique to .NET Core.

I haven't fully explored Azure Function or WebJob hosting, so those could be problematic. But I very much doubt it. They both run full .NET and most likely rely on individual AppDomain scopes per running instance to provide isolation - so the current default implementation wouldn't cause a problem.

If they don't use individual AppDomain scopes per running instance then things are obviously more complex, because they'd be more like ASP.NET. This is why ASP.NET provides HttpContext - because that's the only thing that exists reliably on a per-instance basis. So the question then, would be what is the HttpContext equivalent in an Azure Function and/or WebJob?

Member

rockfordlhotka commented Jul 10, 2017

I'm not interested in changing the console app scenario - I guess I'm not sure how a console app becomes multiuser unless you start self-hosting http endpoints or something?

Clearly if the same AppDomain is used across multiple Azure Function or WebJob services instances then this would be a problem.

Certainly one increasingly common way to create microservices is to build them as a console app that is hosted in Docker. But in that case each service instance gets its own Docker container (isolation) and its own process within the container (isolation). They are written in .NET Core (at least for Linux containers), not full .NET, so there is no AppDomain, and so in this case they use yet another IContextManager implementation that's unique to .NET Core.

I haven't fully explored Azure Function or WebJob hosting, so those could be problematic. But I very much doubt it. They both run full .NET and most likely rely on individual AppDomain scopes per running instance to provide isolation - so the current default implementation wouldn't cause a problem.

If they don't use individual AppDomain scopes per running instance then things are obviously more complex, because they'd be more like ASP.NET. This is why ASP.NET provides HttpContext - because that's the only thing that exists reliably on a per-instance basis. So the question then, would be what is the HttpContext equivalent in an Azure Function and/or WebJob?

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Jul 10, 2017

WebJobs/functions are multithreaded running in a single AppDomain, so thread safety is key. Really we just need a dictionary object which exists only on the current thread. So LocalContext and GlobalContext serve the purpose at the moment but could that change in the future?

j055 commented Jul 10, 2017

WebJobs/functions are multithreaded running in a single AppDomain, so thread safety is key. Really we just need a dictionary object which exists only on the current thread. So LocalContext and GlobalContext serve the purpose at the moment but could that change in the future?

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 10, 2017

Member

We can easily have a different IContextManager implementation that stores ClientContext in TLS - you'd have to make that the current context manager in the initialization of the AppDomain hosting the function/job.

I'm rather surprised that they run everything in the same AppDomain - that means static values are shared across all requests - making the entire runtime environment for functions/jobs extremely complex and dangerous, because that would imply that all developers building code for this environment knows to use appropriate locking technologies, avoid deadlocks and race conditions, etc.

Are you sure this is accurate information?

Really, the more I think about this the worse it sounds. Things like ADO.NET aren't always inherently threadsafe either, so multiple concurrent functions/jobs running in the same non-isolated context like this, access data for different users (for example) - wow - scary stuff!

If this is really how it works then functions/jobs seem too dangerous to use in practice.

Member

rockfordlhotka commented Jul 10, 2017

We can easily have a different IContextManager implementation that stores ClientContext in TLS - you'd have to make that the current context manager in the initialization of the AppDomain hosting the function/job.

I'm rather surprised that they run everything in the same AppDomain - that means static values are shared across all requests - making the entire runtime environment for functions/jobs extremely complex and dangerous, because that would imply that all developers building code for this environment knows to use appropriate locking technologies, avoid deadlocks and race conditions, etc.

Are you sure this is accurate information?

Really, the more I think about this the worse it sounds. Things like ADO.NET aren't always inherently threadsafe either, so multiple concurrent functions/jobs running in the same non-isolated context like this, access data for different users (for example) - wow - scary stuff!

If this is really how it works then functions/jobs seem too dangerous to use in practice.

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 11, 2017

Member

I've asked around a bit about the runtime within which functions/jobs run, because it just seems unlikely that Microsoft would have thrown everyone into the deep end of the pool with these technologies.

My understanding is this.

Web jobs are started as separate child processes on the kudu/scm site w3wp.exe

For functions different languages are invoked a little differently, i.e. with PowerShell each invocation is done in a new runspace, with .NET cached delegates with in app current domain.
But each function app instance is its own process and i.e. http calls is an APiController so it’s using the standard HttpControllerContext, HttpRequestMessage, HttpResponseMessage, etc.

So jobs are highly isolated by a process boundary.

Functions (in .NET) are isolated via standard ASP.NET behaviors, which does imply that you'll need to reference the CSLA NuGet package for ASP.NET in your function so you use the correct IContextManager that maintains all per-request state on HttpContext.

Member

rockfordlhotka commented Jul 11, 2017

I've asked around a bit about the runtime within which functions/jobs run, because it just seems unlikely that Microsoft would have thrown everyone into the deep end of the pool with these technologies.

My understanding is this.

Web jobs are started as separate child processes on the kudu/scm site w3wp.exe

For functions different languages are invoked a little differently, i.e. with PowerShell each invocation is done in a new runspace, with .NET cached delegates with in app current domain.
But each function app instance is its own process and i.e. http calls is an APiController so it’s using the standard HttpControllerContext, HttpRequestMessage, HttpResponseMessage, etc.

So jobs are highly isolated by a process boundary.

Functions (in .NET) are isolated via standard ASP.NET behaviors, which does imply that you'll need to reference the CSLA NuGet package for ASP.NET in your function so you use the correct IContextManager that maintains all per-request state on HttpContext.

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Jul 11, 2017

Web jobs are started as separate child processes on the kudu/scm site w3wp.exe

Yes but it's the continuously running WebJob with batching of triggers (that might not be the right terminology) that is the problem. e.g. a method with a queue trigger will read up to 16 queue messages in parallel. This happens in the same process. This is default, 'out-of-the-box' behaviour.

j055 commented Jul 11, 2017

Web jobs are started as separate child processes on the kudu/scm site w3wp.exe

Yes but it's the continuously running WebJob with batching of triggers (that might not be the right terminology) that is the problem. e.g. a method with a queue trigger will read up to 16 queue messages in parallel. This happens in the same process. This is default, 'out-of-the-box' behaviour.

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 11, 2017

Member

That does pose some interesting challenges then maybe - if Microsoft didn't provide any isolation model for that environment, and the webjobs are multi-tenant (for example) then I suspect CSLA simply can't work - there's too much cached configuration information at the AppDomain level that'll cause issues.

In a single tenant scenario it may work though, because there wouldn't be contention for varying cached configuration values.

I wonder if they provide a .NET synchronization context on a per-job level within the environment? That's the underlying tech beneath HttpContext, and would provide a place to store per-job state.

Otherwise, some years ago, I'd considered building my own per-task state manager to run within Silverlight/WinRT/UWP to address a similar issue. There might even still be an open issue in the backlog to do that work, though I deprioritized it since it would have solved an issue that nobody ever encountered 😄

WebJobs might bring that issue back to higher priority unless Microsoft has provided a solution already (one we can leverage with a new context manager).

Member

rockfordlhotka commented Jul 11, 2017

That does pose some interesting challenges then maybe - if Microsoft didn't provide any isolation model for that environment, and the webjobs are multi-tenant (for example) then I suspect CSLA simply can't work - there's too much cached configuration information at the AppDomain level that'll cause issues.

In a single tenant scenario it may work though, because there wouldn't be contention for varying cached configuration values.

I wonder if they provide a .NET synchronization context on a per-job level within the environment? That's the underlying tech beneath HttpContext, and would provide a place to store per-job state.

Otherwise, some years ago, I'd considered building my own per-task state manager to run within Silverlight/WinRT/UWP to address a similar issue. There might even still be an open issue in the backlog to do that work, though I deprioritized it since it would have solved an issue that nobody ever encountered 😄

WebJobs might bring that issue back to higher priority unless Microsoft has provided a solution already (one we can leverage with a new context manager).

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Jul 11, 2017

We are running a multi-tenant application and it does work in the standard webjob implementation as above, but this is only because we started using the LocalContext instead of the ClientContext to store the individual tenant data for each request.

cached configuration information at the AppDomain level

Of course we don't store data in static variables but you've made me think this is a fragile environment because of cached CSLA configuration. But I don't think we mind if all our tenants use this configuration. What sort of stuff are you thinking of?

j055 commented Jul 11, 2017

We are running a multi-tenant application and it does work in the standard webjob implementation as above, but this is only because we started using the LocalContext instead of the ClientContext to store the individual tenant data for each request.

cached configuration information at the AppDomain level

Of course we don't store data in static variables but you've made me think this is a fragile environment because of cached CSLA configuration. But I don't think we mind if all our tenants use this configuration. What sort of stuff are you thinking of?

@rockfordlhotka rockfordlhotka changed the title from ApplicationContext.ClientContext is NOT thread safe. to ApplicationContext doesn't work for WebJob/Azure Function scenarios Jul 17, 2017

@rockfordlhotka rockfordlhotka self-assigned this Jul 17, 2017

@rockfordlhotka rockfordlhotka added this to the 4.7.100 milestone Jul 17, 2017

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 17, 2017

Member

I've been out for a few days (having fun at Rock USA), but had feelers out to get more info about the whole webjob/function runtime situation.

It does appear that Microsoft has "dumped us in the deep end of the pool", in that they've given us a multi-job runtime with no concept of isolation between concurrent jobs. Basically the same model Unix gave us back in the late 1980's. Not that this is inherently bad, but it kind of flies in the face of all the programming models we've had since 1996 when MTS and EJB and similar app server technologies provided a safer model for concurrent server code.

It might be the case that AsyncLocal<T> could help (as per Stephen Cleary). Some info on it, and a similar/older tech can be found here: https://stackoverflow.com/questions/31707362/how-do-the-semantics-of-asynclocal-differ-from-the-logical-call-context

Member

rockfordlhotka commented Jul 17, 2017

I've been out for a few days (having fun at Rock USA), but had feelers out to get more info about the whole webjob/function runtime situation.

It does appear that Microsoft has "dumped us in the deep end of the pool", in that they've given us a multi-job runtime with no concept of isolation between concurrent jobs. Basically the same model Unix gave us back in the late 1980's. Not that this is inherently bad, but it kind of flies in the face of all the programming models we've had since 1996 when MTS and EJB and similar app server technologies provided a safer model for concurrent server code.

It might be the case that AsyncLocal<T> could help (as per Stephen Cleary). Some info on it, and a similar/older tech can be found here: https://stackoverflow.com/questions/31707362/how-do-the-semantics-of-asynclocal-differ-from-the-logical-call-context

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Jul 21, 2017

Member

btw @j055, if you do research AsyncLocal or other synchronization context solutions please share your findings!

Member

rockfordlhotka commented Jul 21, 2017

btw @j055, if you do research AsyncLocal or other synchronization context solutions please share your findings!

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Jul 21, 2017

Yes of course. Just dealing with a big migration of our platform to Azure, then dealing with the consequences of a major outage in one of their UK data centres. Interesting times.

j055 commented Jul 21, 2017

Yes of course. Just dealing with a big migration of our platform to Azure, then dealing with the consequences of a major outage in one of their UK data centres. Interesting times.

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Sep 6, 2017

Member

I have arrived at a planned course of action. My plan is this:

Change the default context provider so it stops using thread local storage (TLS) and AppDomain local storage; instead it will now use AsyncLocal<T> (where available).

AsyncLocal<T> has similar, but slightly different, semantics. Most notably, each time a new thread is created (via the thread pool or directly), the "parent" thread's context is copied to the new thread.

This is actually desirable - and we've built other functionality to overcome the fact that this didn't happen before. Still, it is a breaking change - but a good one - and since we're going from 4.6 to 4.7, now is the time to make such a change.

This won't affect .NET 4 or 4.5, because those platforms don't support AsyncLocal.

Also, the User property in netstandard1.5, netstandard1.6, or UWP is affected, because it used to be in a static string, and now it is in an AsyncLocal value.

Member

rockfordlhotka commented Sep 6, 2017

I have arrived at a planned course of action. My plan is this:

Change the default context provider so it stops using thread local storage (TLS) and AppDomain local storage; instead it will now use AsyncLocal<T> (where available).

AsyncLocal<T> has similar, but slightly different, semantics. Most notably, each time a new thread is created (via the thread pool or directly), the "parent" thread's context is copied to the new thread.

This is actually desirable - and we've built other functionality to overcome the fact that this didn't happen before. Still, it is a breaking change - but a good one - and since we're going from 4.6 to 4.7, now is the time to make such a change.

This won't affect .NET 4 or 4.5, because those platforms don't support AsyncLocal.

Also, the User property in netstandard1.5, netstandard1.6, or UWP is affected, because it used to be in a static string, and now it is in an AsyncLocal value.

rockfordlhotka added a commit to rockfordlhotka/csla that referenced this issue Sep 6, 2017

rockfordlhotka added a commit to rockfordlhotka/csla that referenced this issue Sep 6, 2017

#729 Revert changes to GlobalContext - values need to flow up the cha…
…in, and AsyncLocal doesn't provide for that

Also clean up some tests (remove invalid ones, remove commented out ones)
@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Sep 6, 2017

Member

Turns out that GlobalContext can't be implemented via AsyncLocal because its values need to flow back up the call stack when changed.

At least by default, AsyncLocal copies the values down to each new thread or context, but any changes at a lower level don't affect the parent's state. I don't currently see a way to flow those changes back up the stack, so I'm reverting to the previous behavior for GlobalContext.

Member

rockfordlhotka commented Sep 6, 2017

Turns out that GlobalContext can't be implemented via AsyncLocal because its values need to flow back up the call stack when changed.

At least by default, AsyncLocal copies the values down to each new thread or context, but any changes at a lower level don't affect the parent's state. I don't currently see a way to flow those changes back up the stack, so I'm reverting to the previous behavior for GlobalContext.

rockfordlhotka added a commit to rockfordlhotka/csla that referenced this issue Sep 6, 2017

@rockfordlhotka rockfordlhotka changed the title from ApplicationContext doesn't work for WebJob/Azure Function scenarios to Change default ApplicationContext to use AsyncLocal instead of TLS Sep 7, 2017

@j055

This comment has been minimized.

Show comment
Hide comment
@j055

j055 Nov 22, 2017

I've being doing some testing with WebJobs and the default ApplicationContext in 4.7.100-Beta-17100905.

It appears that executing functions using the default batching implementation gives the desired isolation between concurrent executing methods.

I setup a queue with 100 messages before running the following webjob function. All the messages appeared against the correct InvocationId which is NOT the case when I reference the CSLA 4.6 (as expected).

public class Functions
    {
        private static readonly StreamWriter Log = new StreamWriter("webjob.log");
        
        public static async Task ProcessQueueMessage([QueueTrigger("queue")] string message, Microsoft.Azure.WebJobs.ExecutionContext context)
        {
            Csla.ApplicationContext.ClientContext.Add("message", message);
            await Log.WriteLineAsync($"{DateTime.Now:o} {message} {context.InvocationId}");
            Thread.Sleep(100); // Does this prove anything?
            message = (string)Csla.ApplicationContext.ClientContext["message"];
            await Log.WriteLineAsync($"{DateTime.Now:o} {message} {context.InvocationId}");
            Log.Flush();
        }
    }

I guess I should check with the Webjob people that AsyncLocal is supposed to give the correct isolation. Haven't seen anything official looking yet.

j055 commented Nov 22, 2017

I've being doing some testing with WebJobs and the default ApplicationContext in 4.7.100-Beta-17100905.

It appears that executing functions using the default batching implementation gives the desired isolation between concurrent executing methods.

I setup a queue with 100 messages before running the following webjob function. All the messages appeared against the correct InvocationId which is NOT the case when I reference the CSLA 4.6 (as expected).

public class Functions
    {
        private static readonly StreamWriter Log = new StreamWriter("webjob.log");
        
        public static async Task ProcessQueueMessage([QueueTrigger("queue")] string message, Microsoft.Azure.WebJobs.ExecutionContext context)
        {
            Csla.ApplicationContext.ClientContext.Add("message", message);
            await Log.WriteLineAsync($"{DateTime.Now:o} {message} {context.InvocationId}");
            Thread.Sleep(100); // Does this prove anything?
            message = (string)Csla.ApplicationContext.ClientContext["message"];
            await Log.WriteLineAsync($"{DateTime.Now:o} {message} {context.InvocationId}");
            Log.Flush();
        }
    }

I guess I should check with the Webjob people that AsyncLocal is supposed to give the correct isolation. Haven't seen anything official looking yet.

@rockfordlhotka

This comment has been minimized.

Show comment
Hide comment
@rockfordlhotka

rockfordlhotka Nov 22, 2017

Member

Thank you, I really appreciate you posting your findings here!

Member

rockfordlhotka commented Nov 22, 2017

Thank you, I really appreciate you posting your findings here!

@j055

This comment has been minimized.

Show comment
Hide comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment