diff --git a/docs/async/async-in-depth.md b/docs/async/async-in-depth.md new file mode 100644 index 0000000000000..3a52f4c9ab062 --- /dev/null +++ b/docs/async/async-in-depth.md @@ -0,0 +1,122 @@ +# Async In Depth + +By [Phillip Carter](https://github.com/cartermp) + +Writing I/O-Bound or CPU-Bound asynchronous code is simple using the `async` and `await` keywords. The two key types involved are `Task` and `Task`. This article explains the fairly complex machinery used under the covers. + +## Task and Task + +Tasks are constructs used to implement what is known as the [Promise Model of Concurrency](https://en.wikipedia.org/wiki/Futures_and_promises). In short, they offer you a "promise" that work will be completed at a later point, letting you coordinate that with a clean API. + +* `Task` represents a single operation which does not return a value. +* `Task` represents a single operation which returns a value of type `T`. + +It’s important to reason about Tasks as abstractions of work happening asynchronously, and *not* an abstraction over multithreading. In fact, unless explicitly started on a new thread via `Task.Run`, a Task will start on the current thread and delegate work to the Operating System. + +You can learn more about Tasks and different ways to interact with them in the [Task-based Asynchronous Pattern (TAP) Article](https://msdn.microsoft.com/en-us/library/hh873175(v=vs.110).aspx). + +Lastly, as explained in the TAP artcile, Tasks are awaitable. This means that using `await` will allow your application or service to perform useful work while the task is running by yielding control to its caller until the task is done. If you’re using `Task`, the `await` keyword will additionally “unwrap” the value returned when the Task is complete. The details of how this works are explained further below. + +## Deeper Dive into Tasks for an I/O-Bound Operation + +Here’s a 10,000 foot view of what happens with a typical async I/O call: + +```csharp +public async Task GetHtmlAsync() +{ + var client = new HttpClient(); + + // Execution is still synchronous here! + // The task handle "getHtmlTask" represents the active HTTP request. + var getHtmlTask = client.GetStringAsync("http://www.dotnetfoundation.org"); + + // Execution of GetHtml() is yielded to the caller here! + var html = await getHtmlTask; + + return html; +} +``` + +The call to `GetStringAsync()` makes its way through the .NET libraries and runtime (perhaps hitting other async calls) until it reaches a system interop call (such as `P/Invoke` into Windows). It's worth noting that if an `await` is ever encountered in the library layer, a `Task` object will be passed back to `GetHtmlAsync()`, `GetHtmlAsync()` will reach its `await`, and control over `GetHtmlAsync()` will also be yielded. Regardless of if this happens, The interop layer of the runtime will make the proper System API call (such as `write()` to a socket on Linux), thus leaving user space and entering kernel space. This is where the real "magic" of async I/O happens. + +After the System API call, the request is now in kernel space, making its way to the networking subsystem of the OS (such as `/net` in the Linux Kernel). Here the OS will handle the networking request *asynchronously*. Details may be different depending on the OS used (the device driver call may be scheduled as a signal is sent back to the runtime, or a device driver call may be made and *then* a signal sent back), but eventually the runtime will be informed that the networking request is in progress. At this time, the work for the device driver will either be scheduled, in-progress, or already finished (the request is already out "over the wire") - but because this is all happening asynchronously, the device driver is able to immediately handle something else! + +For example, in Windows an OS thread makes a call to the network device driver and asks it to perform the networking operation via an Interrupt Request Packet (IRP) which represents the operation. The device driver recieves the IRP, makes the call to the network, marks the IRP as "pending", and returns back to the OS. Because the OS thread now knows that the IRP is "pending", it doesn't have any more work to do for this job and "returns" back to the runtime so that it can be used to perform other work. + +Once the info from the OS makes it back to the .NET runtime, the runtime will then create a `Task` or `Task` which will be returned to `GetHtmlAsync()` and assigned to the `getHtmlTask` variable. Note that at this point, although the I/O request is happening asynchronously, the system which called `GetHtmlAsync()` is still running synchronously! When the `await` keyword is encountered, only then is execution yielded to the caller of `GetHtmlAsync()`, and the execution context that it was called in will be free to do other work. + +**TODO:** Diagram of the above two paragraphs + +When the request is fulfilled and data comes back through the device driver, it notifies the CPU of new data received via an interrupt. How this interrupt gets handled will vary depending on the OS, but eventually the data will be passed through the OS until it reaches a system interop call (for example, in Linux an interrupt handler will schedule the bottom half of the IRQ to pass the data up through the OS asynchronously). Note that this *also* happens asynchronously! + +Once the data is passed into the runtime, it is then queued up as the result for the `Task` which corresponds to `getHtmlTask`. The caller of `GetHtmlAsync()` will eventually return execution to `GetHtmlAsync()`, and the result of the request is "unwrapped" into a `string`, which is then assigned to the `html` variable. + +**TODO:** Diagram of the above two paragraphs + +Throughout this entire process, a key takeaway is that **no thread is 100% dedicated to running the task**. Tasks have no thread affinity. Although work is executed in some contexts (after all, the OS does have to make its way through passing data to a device driver and responding to an interrupt), there is no thread dedicated to sitting there and *waiting* for data from the request to come back. This allows the system to handle a much larger volume of work rather than waiting for some I/O call to finish. + +Although the above may seem like a lot of work to be done, when measured in terms of wall clock time, it’s miniscule compared to the time it takes to do the actual I/O work. Although not at all precise, a potential timeline for such a call would look like this: + +0-1————————————————————————————————————————————————–2-3 + +* Time spent from points `0` to `1` is everything up until an async method yields control to its caller. +* Time spent from points `1` to `2` is the time spent on I/O. +* Finally, time spent from points `2` to `3` is passing control back (and potentially a value) to the async method, at which point it is executing again. + +### What does this mean for a server scenario? + +This model works well with a typical server scenario workload. Because async I/O Tasks aren't an abstraction over threading, it means that the server threadpool can service a much higher volume of web requests than if each thread were dedicated to running a particular request. Consider two servers: one that uses async code, and one that does not. For the purpose of this example, each server only has 5 threads available to service requests. + +Say each server receives 6 concurrent requests, which each ask for a resource that requires I/O of some sort. The server *without* async code has to queue up the 6th request until one of the 5 threads have finished the I/O-bound work and written a response: + +**TODO:** non-async diagram of server + +That's not an ideal scenario. The server *with* async still queues up the 6th request, but because it uses `async` and `await` each of its threads are freed up when the I/O-bound work starts, rather than when it finishes: + +**TODO:** async diagram of same server + +As you can see, the 5 threads doing I/O-bound work are freed after they start that work, allowing one of them to service the 6th request much sooner. When an I/O-bound job is complete, its result is placed in a queue and the next available thread picks it up and the response. + +Although this is a contrived example, it works in a very similar fashion in the real world. In fact, you can expect a server to be able to handle an order of magnitude more requests using `async` and `await` than if it were dedicating a thread for each request it receives. + +### What does this mean for client scenario? + +The biggest gain for using `async` and `await` for a client app is an increase in responsiveness. Although you can make an app responsive by spawning threads manually, the act of spawning a thread is an expensive operation relative to just using `async` and `await`. Especially for something like a mobile game, impacting the UI thread as little as possible where I/O is concerned is crucial. + +More importantly, because I/O-bound work spends virtually no time on the CPU, dedicating an entire CPU thread to perform barely any useful work would be a poor use of resources. + +**TODO:** Diagram showing yielding I/O stuff as UI thread can now do other work + +Additionally, dispatching work to the UI thread (such as updating a UI) is very simple with `async` methods, and does not require extra work (such as calling a thread-safe delegate). + +## Deeper Dive into Task and Task for a CPU-Bound Operation + +CPU-bound `async` code is a bit different than I/O-bound `async` code. Because the work is done on the CPU, there's no way to get around dedicating a thread to the computation. The use of `async` and `await` here doesn't buy you anything other than a clean way to interact with a background thread and keep the caller of the async method responsive. + +Here's a 10,000 foot view of a CPU-bound async call: + +```csharp +public async Task CalculateResult(InputData data) +{ + // This queues up the work on the threadpool. + var expensiveResultTask = Task.Run(() => DoExpensiveCalculation(data)); + + // Note that at this point, you can do some other work concurrently, + // as CalculateResult() is still executing! + + // Execution of CalculateResult is yielded here! + var result = await expensiveResultTask; + + return result; +} +``` + +`CalculateResult()` executes on the thread it was called on. When it calls `Task.Run`, it queues the expensive CPU-bound operation, `DoExpensiveCalculation()`, on the thread pool and receives a `Task` handle. `DoExpensiveCalculation()` is eventually run concurrently on the next available thread. It's possible to do concurrent work while `DoExpensiveCalculation()` is busy on another thread, because the thread which called `CalculateResult()` is still executing. + +Once `await` is encountered, the execution of `CalculateResult()` is yielded to its caller, allowing other work to be done with the current thread while `DoExpensiveCalculation()` is churning out a result. Once it has finished, the result is queued up to run on the main thread. Eventually, the main thread will return to executing `CalculateResult()`, at which point it will have the result of `DoExpensiveCalculation()`. + +### Why does async help here? + +`async` and `await` are the best practice for being responsive while performing CPU-bound work. This is a decision you'll have to evaluate. If there is value in adding responsiveness to an operationg that's CPU-bound, `async` and `await` are a great way to make that happen. + +It's important to note that if you don't gain anything from adding responsiveness to your CPU-bound work, `async` and `await` will actually be a performance hit over just calling the code directly on the same thread. This is because there is overhead in scheduling work on the threadpool and the runtime's coordination of Tasks to represent the work being done. \ No newline at end of file diff --git a/docs/languages/csharp/async.md b/docs/languages/csharp/async.md index 83d3c68c4d940..63e7361f69a9e 100644 --- a/docs/languages/csharp/async.md +++ b/docs/languages/csharp/async.md @@ -2,44 +2,135 @@ By [Phillip Carter](https://github.com/cartermp) -C# and Visual Basic share a language-level asynchronous programming model which allows for easily writing asynchronous code without having to juggle callbacks or conform to a library which supports asynchrony. It follows what is known as the [Task-based Asynchronous Pattern (TAP)](https://msdn.microsoft.com/en-us/library/hh873175%28v=vs.110%29.aspx). +If you have any I/O-bound needs (such as requesting data from a network or accessing a database), you'll want to utilize asynchronous programming. You could also have CPU-bound code, such as performing an expensive calculation, which is also a good scenario for writing async code. -The core of TAP are the `Task` and `Task` objects, which model asynchronous operations, supported by the `async` and `await` keywords (`Async` and `Await` in VB), which provide a natural developer experience for interacting with Tasks. The result is the ability to write asynchronous code which cleanly expresses intent, as opposed to callbacks which express intent far less cleanly. There are other ways to approach async code than `async` and `await` outlined in the TAP article linked above, but this document will focus on the language-level constructs from this point forward. +C# has a language-level asynchronous programming model which allows for easily writing asynchronous code without having to juggle callbacks or conform to a library which supports asynchrony. It follows what is known as the [Task-based Asynchronous Pattern (TAP)](https://msdn.microsoft.com/en-us/library/hh873175%28v=vs.110%29.aspx). -For example, you may need to download some data from a web service when a button is pressed, but don’t want to block the UI thread. It can be accomplished simply like this: +## Basic Overview of the Asynchronous Model -```cs +The core of async programming are the `Task` and `Task` objects, which model asynchronous operations. They are supported by the `async` and `await` keywords. The model is fairly simple in most cases: + +For I/O-bound code, you `await` an operation which returns a `Task` or `Task` inside of an `async` method. + +For CPU-bound code, you `await` an operation which is started on a background thread with the `Task.Run` method. + +The `await` keyword is where the magic happens, because it yields control to the caller of the method which perform the `await`. It is what ultimately allows a UI to be responsive, or a service to be elastic. + +There are other ways to approach async code than `async` and `await` outlined in the TAP article linked above, but this document will focus on the language-level constructs from this point forward. + +### I/O-Bound Example: Downloading data from a web service + +You may need to download some data from a web service when a button is pressed, but don’t want to block the UI thread. It can be accomplished simply like this: + +```csharp private readonly HttpClient _httpClient = new HttpClient(); ... -button.Clicked += async (o, e) => +downloadButton.Clicked += async (o, e) => { + // This line will yield control to the UI as the request + // from the web service is happening. + // + // The UI thread is now free to perform other work. var stringData = await _httpClient.DownloadStringAsync(URL); - DoStuff(stringData); + DoSomethingWithData(stringData); }; - ``` And that’s it! The code expresses the intent (downloading some data asynchronously) without getting bogged down in interacting with Task objects. -For those who are more theoretically-inclined, this is an implementation of the [Future/Promise concurrency model](https://en.wikipedia.org/wiki/Futures_and_promises). +### CPU-bound Example: Performing a Calculation for a Game + +Say you're writing a mobile game where pressing a button can inflict damage on many enemies on the screen. Performing the damage calcuation can be expensive, and doing it on the UI thread would cause the entire game to pause as the calculation is performed! + +The best way to handle this is to start a background thread which does the work using `Task.Run`, and `await` its result. This will allow the UI to feel smooth as the work is being done. + +```csharp +private DamageResult CalculateDamageDone() +{ + // Code omitted: + // + // Does an expensive calculation and returns + // the result of that calculation. +} + + +calculateButton.Clicked += async (o, e) => +{ + // This line will yield control to the UI CalculateDamageDone() + // performs its work. The UI thread is free to perform other work. + var damageResult = await Task.Run(() => CalculateDamageDone()); + DisplayDamage(damageResult); +}; +``` + +And that's it! This code cleanly expresses the intent of the button's click event, it doesn't require managing a background thread manually, and it does so in a non-blocking way. + +### What happens under the covers -A few important things to know before continuing: +There's a lot of moving pieces where asynchronous operations are concerned. If you're curious about what's going underneath the covers of `Task` and `Task`, checkout the [Async in-depth](async-in-depth.md) article for more information. -* Async code uses `Task` and `Task`, which are constructs used to model the work being done in an asynchronous context. [More on Task and Task](#more-on-task-and-task-t) -* When the `await` keyword is applied, it suspends the calling method and yields control back to its caller until the awaited task is complete. This is what allows a UI to be responsive and a service to be elastic. +On the C# side of things, the compiler transforms your code into a state machine which keeps track of things like yielding execution when an `await` is reached, resuming execution when a background job has finished, and so on. + +For the theoretically-inclined, this is an implementation of the [Promise Model](https://en.wikipedia.org/wiki/Futures_and_promises). + +## Key Pieces to Understand + +* Async code can be used for both I/O-bound and CPU-bound code, but differently for each scenario. +* Async code uses `Task` and `Task`, which are constructs used to model work being done in the background. +* When the `await` keyword is applied, it suspends the calling method and yields control back to its caller until the awaited task is complete. * `await` can only be used inside an async method. -* Unless an async method has an `await` inside its body, it will never yield! -* `async void` should **only** be used on Event Handlers (where it is required). -## Example (C#) -The following example shows how to write basic async code for both a client app and a web service. The code, in both cases, will count the number of times ”.NET” appears in the HTML of “dotnetfoundation.org”. +## Recognize CPU-Bound and I/O-Bound Work + +The first two examples of this guide showed how you can use `async` and `await` for I/O-bound and CPU-bound work. It's key that you can identify when a job you need to do is I/O-bound or CPU-bound, because it can greatly affect the performance of your code and could potentially lead to misusing certain constructs. + +Here are two questions you should ask before you write any code: + +1. Will my code be "waiting" for something, such as data from a database? + + If your answer is "yes", then your work is **I/O-bound**. + +2. Will my code be performing an expensive computation? + + If you answered "yes", then your work is **CPU-bound**. + +If the work you have is **I/O-bound**, use `async` and `await` *without* `Task.Run`. You *should not* use the Task Parallel Library. The reason for this is outlined in the [Async in Depth article](../../async/async-in-depth.md). + +If the work you have is **CPU-bound**, you have a further question to ask: + +Can the work be parallelized? If you can, then you should use the Task Parallel Library. If not, just use the current thread. Spawning a new thread won't help if you can't parallelize the work. + +## More Examples + +The following examples demonstrate various ways you can write async code in C#. They cover a few different scenarios you may come across. + +### Extracting Data from a Network + +This snippet downloads the HTML from www.dotnetfoundation.org and counts the number of times the string ".NET" occurs in the HTML. It uses ASP.NET MVC to define a web controller method which performs this task, returning the number. + +*Note: you shouldn't ever use Regexes if you plan on doing actual HTML parsing.* + +```csharp +private readonly HttpClient _httpClient = new HttpClient(); + +[HttpGet] +[Route("DotNetCount")] +public async Task GetDotNetCountAsync() +{ + // Suspends GetDotNetCountAsync() to allow the caller (the web server) + // to accept another request, rather than blocking on this one. + var html = await _httpClient.DownloadStringAsync("http://dotnetfoundation.org"); + + return Regex.Matches(html, ".NET").Count; +} +``` -Client app snippet (Universal Windows App): +Here's the same scenario written for a Universal Windows App, which performs the same task when a Button is pressed: -```cs +```csharp private readonly HttpClient _httpClient = new HttpClient(); private async void SeeTheDotNets_Click(object sender, RoutedEventArgs e) @@ -63,60 +154,66 @@ private async void SeeTheDotNets_Click(object sender, RoutedEventArgs e) NetworkProgressBar.IsEnabled = false; NetworkProgressBar.Visbility = Visibility.Collapsed; } - ``` -Web service snippet (ASP.NET MVC): +### Waiting for Multiple Tasks to Complete -```cs -private readonly HttpClient _httpClient = new HttpClient(); +You may find yourself in a situation where you need to retrieve multiple pieces of data concurrently. The `Task` API contains two methods, `Task.WhenAll` and `Task.WhenAny` which allow you to write asynchronous code which performs a non-blocking wait on mulitple background jobs (and wait either until all or finished or one has finished). -[HttpGet] -[Route("DotNetCount")] -public async Task GetDotNetCountAsync() -{ - // Suspends GetDotNetCountAsync() to allow the caller (the web server) - // to accept another request, rather than blocking on this one. - var html = await _httpClient.DownloadStringAsync("http://dotnetfoundation.org"); +This example shows how you might grab `User` data for a set of `userId`s. - return Regex.Matches(html, ".NET").Count; +```csharp + +public async Task GetUser(int userId) +{ + // Code omitted: + // + // Given a user Id {userId}, returns a User object corresponding + // to the entry in the database with {userId} as its Id. } +public static Task> GetUsers(IEnumerable userIds) +{ + var tasks = new List>(); + + foreach (int userId in userIds) + { + tasks.Add(GetUser(id)); + } + + return await Task.WhenAll(tasks); +} ``` -## More on Task and Task - -As mentioned before, Tasks are constructs used to represent operations working in the background. - -* `Task` represents a single operation which does not return a value. -* `Task` represents a single operation which returns a value of type `T`. - -Tasks are awaitable, meaning that using `await` will allow your application or service to perform useful work while the task is running by yielding control to its caller until the task is done. If you’re using `Task`, the `await` keyword will additionally “unwrap” the value returned when the Task is complete. - -It’s important to reason about Tasks as abstractions of work happening in the background, and _not_ an abstraction over multithreading. In fact, unless explicitly started on a new thread via `Task.Run`, a Task will start on the current thread and delegate work to the Operating System. - -Here’s a 10,000 foot view of what happens with a typical async call: - -The call (such as `GetStringAsync` from `HttpClient`) makes its way through the .NET libraries until it reaches a system interop call (such as `P/Invoke` on Windows). This eventually makes the proper System API call (such as `write` to a socket file descriptor on Linux). That System API call is then dealt with in the kernel, where the I/O request is sent to the proper subsystem. Although details about scheduling the work on the appropriate device driver are different for each OS, eventually an “incomplete task” signal will be sent from the device driver, bubbling its way back up to the .NET runtime. This will be converted into a `Task` or `Task` by the runtime and returned to the calling method. When `await` is encountered, execuction is yielded and the system can go do something else useful while the Task is running. +Here's another way to write this a bit more succinctly, using LINQ: -When the device driver has the data, it sends an interrupt which eventually allows the OS to bubble the result back up to the runtime, which will the queue up the result of the Task. Eventually execution will return to the method which called `GetStringAsync` at the `await`, and will “unwrap” the return value from the `Task` which was being awaited. The method now has the result! +```csharp -Although many details were glossed over (such as how “borrowing” compute time on a thread pool is coordinated), the important thing to recognize here is that **no thread is 100% dedicated to running the initiated task**. This allows threads in the thread pool of a system to handle a larger volume of work rather than having to wait for I/O to finish. - -Although the above may seem like a lot of work to be done, when measured in terms of wall clock time, it’s miniscule compared to the time it takes to do the actual I/O work. Although not at all precise, a potential timeline for such a call would look like this: - -0-1————————————————————————————————————————————————–2-3 +public async Task GetUser(int userId) +{ + // Code omitted: + // + // Given a user Id {userId}, returns a User object corresponding + // to the entry in the database with {userId} as its Id. +} -* Time spent from points `0` to `1` is everything up until an async method yields control to its caller. -* Time spent from points `1` to `2` is the time spent on I/O. -* Finally, time spent from points `2` to `3` is passing control back (and potentially a value) to the async method, at which point it is executing again. +public static Task> GetUsers(IEnumerable userIds) +{ + var tasks = userIds.Select(async id => await GetUser(id)); + return await Task.WhenAll(tasks); +} +``` -Tasks are also used outside of the async programming model. They are the foundation of the Task Parallel Library, which supports the parallelization of CPU-bound work via [Data Parallelism](https://msdn.microsoft.com/en-us/library/dd537608%28v=vs.110%29.aspx) and [Task Parallelism](https://msdn.microsoft.com/en-us/library/dd537609%28v=vs.110%29.aspx). +Although it's less code, take care when mixing LINQ with asynchronous code. Because LINQ uses deferred (lazy) execution, async calls won't happen immediately. ## Important Info and Advice Although async programming is relatively straightforward, there are some details to keep in mind which can prevent unexpected behavior. +* `async` **methods need to have an** `await` **keyword in their body or they will never yield!** + +This is important to keep in mind. If `await` is not used in the body of an `async` method, the C# compiler will generate a warning, but the code will compile and run as if it were a normal method. + * **You should add “Async” as the suffix of every async method name you write.** This is the convention used in .NET to more-easily differentiate synchronous and asynchronous methods. Note that certain methods which aren’t explicitly called by your code (such as event handlers or web controller methods) don’t necessarily apply. Because these are not explicitly called by your code, being explicit about their naming isn’t as important. @@ -157,3 +254,6 @@ Don’t depend on the state of global objects or the execution of certain method A recommended goal is to achieve complete or near-complete [Referential Transparency](https://en.wikipedia.org/wiki/Referential_transparency_%28computer_science%29) in your code. Doing so will result in an extremely predictable, testable, and maintainable codebase. +## Other Resources + +* Lucian Wischik's [Six Essential Tips for Async](https://channel9.msdn.com/Series/Three-Essential-Tips-for-Async) are a wonderful resource for async programming \ No newline at end of file