This repository was archived by the owner on Mar 2, 2022. It is now read-only.
Tags: WukLab/LegoOS
Tags
Better ExCache (pcache) Recently we added two optimizations to pcache - Use free list instead of bitmap - Add piggyback for non-victim cache eviction Free list always have good performance while associativity is high. Although it may have extra lock contention, this is the best we could do by software. We used to have piggyback for victim cache flush. We recently made the per-set as our default eviction. And now we piggyback to it as well. Both these two are pure optimization. But.. they do make the code a little more complex, especially the pcache fill from remote path.
Able to run ResNet with ImageNet After fixing memory side memory leak, we are able to train ImageNet using TensorFlow ResNet. We tried with "--batch_size=1024", which has around 70GB resident memory. Who run this on CPU?? A LegoOS who currently does not support GPU monitor. But so as you know, writing a GPU monitor is DOABLE.
Kind of a stable net layer Oh well, we've fixed some bugs at FIT layer. Basically we end up posting recv wr to the QP twice, every single time. And, the worst thing is, there is so decent error checking for ib_post_recv(). My bad. Damn. I should have went through this. I've been tring to have decent error checking all the time. But as the repo becomes bigger, and more contributors, sometimes it is hard to control. After all, we are all human. Anyway, these days patch fixed this issue at both lego and linux-fit side. Along that, we've also added some decent rpc profiling code, which can profile different message size and emulate highly contended multi-thread rpc. There are still A HUGE ROOM for improvement at our net code. But let us hope this at least let us have a stable net layer. Lesson learned, lastweek.
Fixed victim double free bug in victim hit code path Added back the pi_lock to have irq disabled spinlock
Merge pull request #79: Infiniband, PCI, and DMA update This pull request includes three major parts PCI Ported PCI core subsystem from Linux. Not everything is ported, only the major data structures. DMA We reframed the DMA APIs. Underlying, we are using x86 pci-nommu DMA ops. We did not port the drivers/iommu/intel-iommu.c Hope this simple nommu can work everywhere. Infiniband Walk through ib_core, mlx4_core, and mlx4_ib. I think we have a solid IB stack. It has been tested with 1P and 1M: RAMFS microbenchmark 1P, 1M, and 1S: TF MNIST Reviewed by: Yutong Yiying
OSDI Eval Commit Point All OSDI experiments are carried out before this commit. This includes all the recent bug fix, network thread model, piggy-backed flush/miss, and cache-awared VA allocation.
Various Updates on this pre-release: zerofill, vNode, DirtyFlush, and… … few other fixes
Processor: User pgtable use per-PTE, per-PMD lock Before this, all pgtable opearations are protected by one spinlock in mm. This is bad for multi-threaded applications. We now use per PTE page, and per PMD page lock. The spinlock is embedded within `struct page`. The spinlock is 4 bytes. And as long as `struct page` is not larger than 64 bytes, we are fine. This optimization applies to Processor only, since it is the one who manipulate the user pgtables. Memory probably need some similar stuff. Later.
This tag marks a milestone where: - munmap/mremap behaviour changed, rmap_get_pte_locked can catch bugs without any doubt. - processor side loader bug fixed, execv() syscall can be used - besides, the basic envorionment is hooked with pcache: creating, cleanup are decent We should be able to run any programs at this point. Pcache should be an concern any longer.
pcache: sync refcount between evict and normal users This is really a nasty fix. But it can cover most race conditions. The problem can be described as: two threads are using the same pcm, while one of them is trying to evict it and another is using it. The evict one is pcache_evict_line. The others can be munmap, mremap, wp handler. So how we syn between these? We actually need help from two spinlocks (pte lock and pcache lock), and pcache refcount. pte lock actually ensure other parties (munmap, mremap, wp) can see a safe pcm. And the rule is: once drop the pte lock and acquire it again, it must check if the pte has been changed. If can be unmapped by eviction at the same time. Backgroud eviction again live usage is really hard to do. Unlike the Jave GC which counts refcount to an object, which will not reclaim live objects. But here, we have the danger of reclaiming live (used by other threads concurrently) pcm.
PreviousNext