Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of persistent write-ahead log #57

Closed
8 tasks done
sakno opened this issue Mar 13, 2021 · 78 comments
Closed
8 tasks done

Improve performance of persistent write-ahead log #57

sakno opened this issue Mar 13, 2021 · 78 comments
Assignees
Labels
enhancement New feature or request

Comments

@sakno
Copy link
Collaborator

sakno commented Mar 13, 2021

See suggestions in #56 .
Tasks:

  • Add SkipAsync to IAsyncBinaryReader
  • Move identifier of the log record to its metadata.
  • Rewrite lock mechanism of PersistentState in a way that allows to do the compaction in parallel with writes or reads
  • Provides buffering API for log entries
  • Reuse buffering API at transport-level
  • Split file buffer size of regular log entries and snapshot entry
  • fsync for every write must be disabled by default (FileOptions.WriteThrough flag)
  • Introduce in-memory cached log entries
@sakno sakno added the enhancement New feature or request label Mar 13, 2021
@sakno sakno self-assigned this Mar 13, 2021
@sakno
Copy link
Collaborator Author

sakno commented Mar 16, 2021

@potrusil-osi, you was right about SkipAsync. The bug in StreamSegment.Position property setter. Now it's fixed. Also, the second minor performance improved is also pushed to develop branch. Command identifier used by Interpreter Framework is now a part of log entry metadata and doesn't require serialization. However, there is another side of this change: log entry representing snapshot must not be passed to the interpreter. Snapshot doesn't have its own identifier.

@potrusil-osi
Copy link

@sakno, it is unfortunate that I cannot pass snapshot to the interpreter anymore. I currently use it to store the in-memory state and the snapshot command is how it is often initialized. It is easy and elegant. Isn't there a way how to work around the limitation?

@sakno
Copy link
Collaborator Author

sakno commented Mar 17, 2021

Could you please provide a simple example of how you use custom command type for the snapshot? Do you serialize identifier manually inside of SnapshotBuilder?

@sakno
Copy link
Collaborator Author

sakno commented Mar 17, 2021

@potrusil-osi , support of snapshot entries returned back. CommandHandler attribute has special parameter indicating that the marked method is a handler for snapshot log entry:

[CommandHandler(IsSnapshotHandler = true)]
public ValueTask HandleSnapshot(SnapshotCommand command, CancellationToken token)
{
}

@potrusil-osi
Copy link

Thanks for such a fast response! I guess you don't want me to give you the example anymore... But no, I was no serializing the identifier inside SnapshotBuilder - it just used its own instance of CommandInterpreter to apply individual entries (including the previous snapshot) and then wrote the SnapshotCommand-based entry at the very end. It worked.

After getting the latest changes (and updating the CommandHandler for snapshot) I'm consistently getting this exception in a follower:

Unhandled exception. DotNext.Net.Cluster.Consensus.Raft.PersistentState+MissingLogEntryException: Log entry with relative index 1 doesn't exist in partition ...\Logs\partition-0\sp-0\0
   at DotNext.Net.Cluster.Consensus.Raft.PersistentState.Partition.ReadAsync(StreamSegment reader, Memory`1 buffer, Int32 index, Boolean refreshStream, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.Partition.cs:line 86
   at DotNext.Net.Cluster.Consensus.Raft.PersistentState.ReadAsync[TResult](LogEntryConsumer`2 reader, DataAccessSession session, Int64 startIndex, Int64 endIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 170
   at DotNext.Net.Cluster.Consensus.Raft.PersistentState.ReadAsync[TResult](LogEntryConsumer`2 reader, Int64 startIndex, Int64 endIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 234
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.<DotNext.Net.Cluster.Consensus.Raft.IRaftStateMachine.MoveToCandidateState>g__PreVoteAsync|76_0(Int64 currentTerm) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 750
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.DotNext.Net.Cluster.Consensus.Raft.IRaftStateMachine.MoveToCandidateState() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 717
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__139_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

The 0 partition file mentioned in the error is completely empty (unlike the same file in the leader).

@sakno
Copy link
Collaborator Author

sakno commented Mar 17, 2021

Physical layout of log entries on the disk in 3.1.0 is not compatible with previous versions. As a result, if new PersistentState uses files generated by previous version then you may have unpredictable results. I'll include this fact in release notes. Currently, you need to delete the entire directory with log entries and snapshot.

@potrusil-osi
Copy link

I delete the logs every time before my tests. I'll keep digging myself what could be causing it...

@sakno
Copy link
Collaborator Author

sakno commented Mar 17, 2021

Oh, binary form of data representation is efficient when transferring over the wire but very fragile and require a lot of testing... I fixed issue caused invalid calculation of Content-Length header when transferring log entries over the wire. Probably, corrupted stream of log entries can cause error you described above. Fix is pushed.

@potrusil-osi
Copy link

The last fix seems to fix the error. But I'm getting new errors now.

This one occurs periodically on the node that is the very first leader:

System.Net.Http.HttpRequestException: Response status code does not indicate success: 500 (Internal Server Error).
   at System.Net.Http.HttpResponseMessage.EnsureSuccessStatusCode()
   at DotNext.Net.Cluster.Consensus.Raft.Http.RaftClusterMember.SendAsync[TResult,TMessage](TMessage message, CancellationToken token) in /_/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftClusterMember.cs:line 132

And this occurs on startup if I do not clear the logs from the previous run:

System.ArgumentException: Log entry must have associated command identifier (Parameter 'entry')
   at ...SdsQueueAuditTrail.ApplyAsync(LogEntry entry) in .../SdsQueueAuditTrail.cs:line 32
   at DotNext.Net.Cluster.Consensus.Raft.PersistentState.ApplyAsync(Int64 startIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 812
   at DotNext.Net.Cluster.Consensus.Raft.PersistentState.ReplayAsync(CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/PersistentState.cs:line 857
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.StartAsync(CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 354
   at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.StartAsync(CancellationToken token) in /_/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftHttpCluster.cs:line 127
   at DotNext.Net.Cluster.Consensus.Raft.Http.Hosting.RaftHostedCluster.StartAsync(CancellationToken token) in /_/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/Hosting/RaftHostedCluster.cs:line 88
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)

@sakno
Copy link
Collaborator Author

sakno commented Mar 17, 2021

Fixed. Command id was not passed over the wire correctly.

@potrusil-osi
Copy link

Everything works well now. Thanks!

@potrusil-osi
Copy link

@sakno, after all the optimizations I'm getting much better numbers... The suggestion to have a buffer with a different size during compaction (described in #56) still holds. I have modified dotNext locally to give me the option and writing to the snapshot file is much faster thanks to that. I set the buffer to approximately the size of snapshot.

@sakno
Copy link
Collaborator Author

sakno commented Mar 18, 2021

Great news! Currently I'm working on buffered network I/O that allows to copy log entry content passed over the network before calling AppendAsync. This will reduce lock contention time between replication process and user code calling AppendAsync manually. Additionally, your suggestion has been added to the task list.

@sakno
Copy link
Collaborator Author

sakno commented Mar 18, 2021

Did you change the buffer size passed to FileStream constructor? There is may be another issue. I'm using FileOptions.WriteThrough to bypass any intermediate buffers. For me, it's strange that bufferSize has any sense. Could you please remove WriteThrough flag and check performance again? Without this flag, I/O is less durable however it might be more performant. If so, I'll add this flag to configuration of PersistentState.

@potrusil-osi
Copy link

I'm getting about 50% performance increase without that flag! I wonder if I need to test everything again to figure out the ideal buffer sizes.

@sakno
Copy link
Collaborator Author

sakno commented Mar 18, 2021

I guess without WriteThrough flag the size of the buffer has small impact on performance.

@sakno
Copy link
Collaborator Author

sakno commented Mar 18, 2021

Now WriteThrough flag is disabled by default but can be enabled using Options.WriteThrough property. Changes are pushed to develop branch.

@sakno
Copy link
Collaborator Author

sakno commented Mar 19, 2021

Buffer sizes are now separated in Options class.

@sakno
Copy link
Collaborator Author

sakno commented Mar 20, 2021

@potrusil-osi , I'm finished buffering API for log entries. It has two use cases:

  1. For buffering before calling AppendAsync from user code
  2. For buffering before calling AppendAsync by transport infrastructure

In case of Interpreter Framework, you can use it easily as follows:

var entry = interpreter.CreateLogEntry<MyCommand>(...);
using var buffered = await BufferedRaftLogEntry.CopyAsync(entry, options, token);
log.AppendAsync(buffered, token);

This make sense only for large log entries. Also, you need to experiment with settings from RaftLogEntryBufferingOptions class which instance must be passed to CopyAsync method. It has MemoryThreshold option. If the size of log entry is larger than this threshold then disk persistence will be used for buffering. This is relevant for transport-level buffering because read from the disk is faster than read over the network but probably make no sense for custom log entries acquired constructed by Interpreter. As a result, use buffered log entries for appropriate commands.

@sakno
Copy link
Collaborator Author

sakno commented Mar 24, 2021

Now I'm in a halfway to a new version of PersistentState with parallel compaction. It will have the following characteristics:

  • Log compaction and appending new entries can be done in parallel
  • Parallel reads are allowed
  • Reads are mutually exclusive with log compaction or writes (so you cannot read in parallel with log compaction)

At the moment I implemented one innovative thing: concurrent sorted linked list for partitions. This hand-written data structure is much better than SortedDictionary<K, V> from .NET that has O(log n) in comparison to O(1) average for search provided by a new structure. Moreover, new structure provides lock-free adding/removing operations.

The remaining half way is about writing special lock type on top from AsyncTrigger.

@sakno
Copy link
Collaborator Author

sakno commented Mar 27, 2021

@potrusil-osi , everything is ready for beta-testing. It was really challenging 😎 There are few options for background compaction available:

Option # 1:
If you're using UsePersistenceEngine extension method for registering custom WAL derived from PersistentState and PersistentState.Options.CompactionMode is set to Background then the library automatically registers background compaction worker.

By default, the worker implements incremental compaction. It means that the only one partition will be compacted at a time. It gives you the best performance with minimal interference with writes. However, this approach has drawbacks: if there are too many writes then disk space will grow faster than compaction.

To replace incremental compaction, you need to implement DotNext.IO.Log.ILogCompactionSupport interface in you custom WAL. It requires only one method. This method will be called by compaction worker on every commit. As a result, you are able to define how deep the compaction should be performed. PersistentState.ForceCompactionAsync(long count, CancellationToken) now gives this opportunity. The first parameter accepts compaction factor. Zero means that no compaction should be performed. 1 means that compaction should be minimal (as in incremental approach). In other words, this factor means how many partitions you want to squash. The number of available partitions for squashing can be requested via PersistentState.CompactionCount property.

Option # 2:
If you're not using UsePersistenceEngine extension method then you need to write Hosted Service and register it in DI container. The service is responsible for running compaction in the background. You can use built-in background compaction worker as an example.

I would appreciate if you will able to contribute a good benchmark. Unfortunately, BenchmarkDotNet is not applicable for load testing. Also, it would be great if you'll share the final measurements of your application with background compaction.

@sakno
Copy link
Collaborator Author

sakno commented Mar 27, 2021

As an option, EWMA can be used as a base for adaptive algorithm for background compaction. During application lifetime the algorithm can compute exponentially weighted average value of compaction factor and pass it to ForceCompactionAsync method. It's really trivial. However, its application is limited to situation when application has floating workload. If not, EWMA will give the maximum compaction factor which leads to high interference with writes. Moreover, the existing incremental approach should work fine with floating workload.

@potrusil-osi
Copy link

hey @sakno, thank you for so much work! I'll try to take advantage of the most recent changes as soon as possible.

I have two questions:

  1. What exactly did you mean when you asked me for "a good benchmark"? So far, I'm testing the code as part of a bigger application. It would be possible to extract just the state machine (the "simple queue" I described earlier) and simulate its usage by the application, but then question is if that specific state machine is a good benchmark for you since it is so simple and specific to our use case (it supports just adding items and clearing the queue up to a certain sequence number). Which leads to the next question...
  2. I already took advantage of the first version of ForceCompactionAsync which helped in many areas. But to take it to the next level I'd like to optimize the clear operation so that:
    • Compaction runs only after the clear is committed - that is already done.
    • When the compaction runs, it doesn't even try to read those partition files that contain only items have been cleared. This is obviously not possible now. I have optimized this process by skipping the deserialization of most of the content of such items, but the partition files are still processed, metadata read, etc. I'm not sure how exactly to support such a use case. Can you think of anything? Or am I asking for too much here?

@sakno
Copy link
Collaborator Author

sakno commented Mar 29, 2021

  1. I understand that it may be problematic to convert a portion of real app to the benchmark. If it's not possible or hard, you may share only measurements.
  2. SkipAsync is very cheap if IAsyncBinaryReader backed by the stream (which is true for PersistentState). I cannot implement some app-specific optimizations because the current WAL is general-purpose. However, you can reorganize your binary format and place BLOBs in the beginning of the stream so they can be skipped easily with one call. And, of course, you don't need to save skipped content to the new snapshot during compaction.

@sakno
Copy link
Collaborator Author

sakno commented Mar 29, 2021

Another compaction mode has been added in the latest commit. Now there are three compaction modes:

  • Sequential which is the only compaction mode available in 3.0.x or earlier. Offers the best optimization of disk space by the cost of the commit performance.
  • Background which allows to run compaction in parallel with writes
  • A new one, Foreground which is running during commit process in parallel with applying of the committed entries. It requires no ceremony as Background but much more efficient than Sequential.

Foreground compaction mode tries to squash the same number of previously committed entries in parallel as the number of entries requested for the commit. For instance, you have 3 previously committed entries and you want to commit another 3 entries. In this case, compaction process squashes 3 previously committed entries in parallel with applying new 3 committed entries. If overridden ApplyAsync method has approx the same performance as overridden SnapshotBuilder.ApplyAsync then overhead will be close to zero.

@sakno
Copy link
Collaborator Author

sakno commented Mar 29, 2021

Also, I have an idea about your second question. I can provide optional interface called IRandomAccessSnapshotSupport with a single method. You can implement this interface in the class derived from SnapshotBuilder. In this case I'll able to provide existing snapshot as read-only memory-mapped file and you will get random access to the snapshot contents at high speed instead of sequential reading and skipping. Of course, by the cost of your RAM (but without GC pressure). What do you think?

@potrusil-osi
Copy link

I think I didn't explain my use case very well. The biggest pain point is that when the compaction is executed on the background there are many partition files full of add operations (the snapshot file is quite small). When clear is committed and the compaction is triggered, almost all of the partitions (including the previous snapshot) can be dropped, except add entries in the last few partitions (which are squashed into a relatively small new snapshot).

To support that specific use case, the state implementation would probably need to be able to set custom metadata for the snapshot and all the partitions, access it during compaction, and influence the compaction algorithm to skip/delete certain partition files based on custom logic. But I'm not sure all this would make any sense for a general purpose WAL.

@potrusil-osi
Copy link

Hi @sakno, I like both optimizations! Hopefully I will have time to take advantage of them before the research project this is part of is finished.

@sakno
Copy link
Collaborator Author

sakno commented Apr 13, 2021

Here is benchmark of various PersistentState caching modes on my NVMe SSD:

Method Mean Error StdDev
ReadLogEntriesWithFullCacheAsync 4.230 us 0.0179 us 0.0139 us
ReadLogEntriesWithMetadataCacheAsync 22.512 us 0.4412 us 0.8605 us
ReadLogEntriesWithoutMetadataCacheAsync 42.370 us 0.8241 us 1.4214 us

You can try this benchmark on your machine. Run DotNext.Benchmarks project with dotnet run -c Bench and select PersistentStateBenchmark.

@potrusil-osi
Copy link

Mine results are not as good as yours...

Method Mean Error StdDev
ReadLogEntriesWithFullCacheAsync 8.268 us 0.0766 us 0.0716 us
ReadLogEntriesWithMetadataCacheAsync 213.049 us 8.0940 us 23.8652 us
ReadLogEntriesWithoutMetadataCacheAsync 227.656 us 4.4002 us 3.9007 us

@sakno
Copy link
Collaborator Author

sakno commented Apr 15, 2021

The actual result depends on the hardware. In case of ReadLogEntriesWithFullCacheAsync benchmark, the result depends on CPU because log entry fully cached in the memory and disk I/O is not needed. The last two benchmarks have dependency on disk I/O. In my case, it's NVMe SSD which uses PCIe as a bus. In your case it's SSD with SATA interface or even HDD. ReadLogEntriesWithMetadataCacheAsync result looks very suspicious because of high value of standard deviation. It seems like some another task was doing its job in the same time with the benchmark.

@sakno
Copy link
Collaborator Author

sakno commented Apr 20, 2021

@potrusil-osi , do you have estimation of time needed to complete your research work? I'm ready to publish release.

@potrusil-osi
Copy link

We have two more weeks. After the last two optimizations you provided the local results on my computer are pretty impressive. In our testing cluster the results are not as great, so we are digging deeper what the bottlenecks are.

@potrusil-osi
Copy link

@sakno, back to your analysis of the benchmark results: I also have a NVMe SSD drive (Samsung PM981 1TB). Any thoughts why my results are so much worse than yours? Are you running the benchmark in Windows?

@sakno
Copy link
Collaborator Author

sakno commented Apr 20, 2021

No, I ran these tests on Linux. But I have machine with Windows 10:

Method Mean Error StdDev
ReadLogEntriesWithFullCacheAsync 9.568 us 0.1045 us 0.0872 us
ReadLogEntriesWithMetadataCacheAsync 226.568 us 4.5002 us 3.9893 us
ReadLogEntriesWithoutMetadataCacheAsync 284.843 us 5.6811 us 5.3141 us

The results are close to your. But machine with Windows 10 has SATA SSD instead of NVMe SSD installed on PCIe. I think your laptop has two drives: system partition running on NVMe and user partition running on SATA. Benchmark creates WAL in temp directory. Usually, this directory is mounted to SSD SATA drive in case of two drives by default.

Also, in your case 23.8652 us is very abnormal StdDev. Try to run this benchmark again without any background tasks.

@potrusil-osi
Copy link

There is only one drive. But after disabling half the services the results are a bit better:

Method Mean Error StdDev
ReadLogEntriesWithFullCacheAsync 6.396 us 0.0646 us 0.0604 us
ReadLogEntriesWithMetadataCacheAsync 106.715 us 0.7805 us 0.6517 us
ReadLogEntriesWithoutMetadataCacheAsync 140.691 us 0.6169 us 0.5771 us

@sakno
Copy link
Collaborator Author

sakno commented Apr 20, 2021

In benchmark I didn't try to choose optimal configuration parameters. However, I changed default buffer size to 4096 bytes which is equal to filesystem cluster size and now the results much better on SATA SSD but remains the same for NVMe. Pull latest changes from develop branch and try again.

@potrusil-osi
Copy link

After getting the latest develop:

Method Mean Error StdDev
ReadLogEntriesWithFullCacheAsync 6.208 us 0.0295 us 0.0247 us
ReadLogEntriesWithMetadataCacheAsync 59.048 us 0.2609 us 0.2313 us
ReadLogEntriesWithoutMetadataCacheAsync 89.919 us 0.5084 us 0.4507 us

@sakno
Copy link
Collaborator Author

sakno commented Apr 20, 2021

Thanks for sharing results! Also, I managed to cut off a few microseconds by optimizing hot paths in code within UnsafeReadAsync method which is workhorse for all reading operations.

@afink-osi
Copy link

Hi @sakno, I am a colleague of @potrusil-osi who is out of the office this week. We have deployed our application to a cluster on dedicated hardware to continue testing. Unfortunately, we are not seeing the same degree of benefit from the optimizations that have been seen when testing locally, as our overall throughput numbers are only about 80% of what was achieved in the local testing. In particular, we are seeing two issues that we would like to get your thoughts on.

  1. Write Lock Acquisition

We are seeing that the write lock acquisition in AppendCached() is taking significantly more time (~36ms) as compared to the writing of the entry to disk in UnsafeAppend() which takes on average (~1ms). With the compaction improvements, we are seeing an average compaction time of ~9ms. Do you have any thoughts that might explain why this is taking so long to acquire the write lock in AppendCached()?

  1. Read Lock Acquisition

In the ReadBuffered() and ReadUnbuffered() methods, we are seeing that a read lock is being acquired, which on average is taking ~4ms. We have configured the PersitentState.Options with CopyOnReadOptions, which from the previous discussion, indicates that that this operation shouldn't require a read lock at all. Is it expected that there should still be a lock required for this operation?

@sakno
Copy link
Collaborator Author

sakno commented Apr 27, 2021

Hi @afink-osi ,
How many parallel writes do you have? Are reads/writes intermixed? What's the average size of log entry record?

@sakno
Copy link
Collaborator Author

sakno commented Apr 27, 2021

We are seeing that the write lock acquisition in AppendCached() is taking significantly more time (~36ms) as compared to the writing of the entry to disk in UnsafeAppend() which takes on average (~1ms).

It depends on numbers of parallel writes. For instance, if you have 36 parallel writes and each write operation takes ~1 ms then the last writer need to wait ~36 ms.

@sakno
Copy link
Collaborator Author

sakno commented Apr 27, 2021

Additionally, there is another project FASTER that offers general-purpose and high-performance WAL. It might be a good candidate for Raft but requires a lot of research.

@sakno
Copy link
Collaborator Author

sakno commented Apr 28, 2021

@afink-osi , @potrusil-osi , let's try again. What I've done: cached log entry will not be persisted until commit. As a result, AppendCachedAsync will be completed synchronously in most cases and lock contention will be low.

@afink-osi
Copy link

Hi @sakno, thanks for the quick response and for the clarification! Yesterday when running tests, we were seeing replication throughput at ~28% the throughput of our baseline (the application running without replication). Today, we were able to run some tests using the same storage volume as before but with your latest changes, and the results are very impressive! We are now running at about 65% of the baseline which is much more in line with the local tests that @potrusil-osi was running before and we are seeing minimal time waiting for locks. We also tested using memory instead of the SSD disk (may be acceptable for our use case) for log storage and pushed this further to ~72% of the baseline.

One more question: In the new method PersistCachedEntryAsync(), there is a call to WriteAsync() but we do not see any call to a Flush() method. Are these entries being flushed somewhere else?

@sakno
Copy link
Collaborator Author

sakno commented Apr 28, 2021

I missed call to FlushAsync, my bad. Now it's added. Flushing happens only if partition switched or the end of the log reached for better I/O performance especially if cached log entries are small.

@sakno
Copy link
Collaborator Author

sakno commented Apr 28, 2021

Also I added special TryGetBufferWriter method to IAsyncBinaryWriter interface that allows to enable fast synchronous serialization. When you appending log entries with addToCache parameter the writer will return non-null result. This method is supported only if IAsyncBinaryWriter backed by pipe or dynamic in-memory buffer. In other cases, it will return null.

@sakno
Copy link
Collaborator Author

sakno commented May 1, 2021

@potrusil-osi , @afink-osi , I've added support of diagnostic events to persistent WAL. You can configure them via PersistentState.Options type:

  • LockContentionCounter measures the rate of lock contention
  • LockDurationCounter measures the duration of the lock in acquired state, in ms
  • ReadCounter measures the rate of reads, in the number of log entries
  • WriteCounter measures the rate of writes, in the number of log entries
  • CompactionCounter measures the rate of log compaction, in the number of log entries
  • CommitCounter measures the rate of commits, in the number of log entries

You can use dotnet-counters tool from .NET SDK to track the counters or submit values to the monitoring tool.

@sakno
Copy link
Collaborator Author

sakno commented May 2, 2021

@potrusil-osi , @afink-osi , I would like to ask you guys to check develop branch again. It contains another optimization of writes based on memory-mapped files. Read performance remains the same. Also, replication process now optimized to reduce numbers of reads from WAL. Thanks in advance!

@afink-osi
Copy link

@sakno The new counters look very helpful, I didn't get a chance to try them yet but will hopefully look at them tomorrow. We were able to test the new changes to develop branch, and we are seeing an additional 11% increase in throughput in our application compared to before these latest optimizations!

@sakno
Copy link
Collaborator Author

sakno commented May 3, 2021

Nice to hear that, @afink-osi ! Let me know when you check these counters.

@sakno
Copy link
Collaborator Author

sakno commented May 9, 2021

@afink-osi , @potrusil-osi , did you have a chance to finalize your research project? If so, I would like to release new version of .NEXT.

@afink-osi
Copy link

@sakno We have completed our research project, unfortunately, we also did not have time to dig into the counters that you added before our time window closed. Thank you for your quick and thorough responses and improvements throughout this work! You may proceed with your release

@potrusil-osi
Copy link

Hi @sakno, I also wanted to thank you for all the work. It's been great experience to work with you on improving the WAL performance!

@sakno
Copy link
Collaborator Author

sakno commented May 11, 2021

.NEXT 3.1.0 has been released. Closing issue.

@sakno sakno closed this as completed May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants