Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMap compaction deletes non-empty fragments after some time #160

Closed
buraksezer opened this issue Apr 23, 2022 · 1 comment
Closed

DMap compaction deletes non-empty fragments after some time #160

buraksezer opened this issue Apr 23, 2022 · 1 comment
Assignees
Labels
bug Something isn't working dmap release:v0.5.x Issues related with 0.5.x release-blocker

Comments

@buraksezer
Copy link
Owner

It looks like the compaction goroutine removes non-empty fragments. Insert some keys and waits 10 minutes to reproduce the error. You can decrease the compaction interval to see the error.

➜  olric git:(master) olricd -c cmd/olricd/olricd-local.yaml
2022/04/23 13:51:23 [INFO] pid: 93542 has been started
2022/04/23 13:51:23 [INFO] Olric 0.5.0-beta.2 on darwin/arm64 go1.18.1 => olric.go:310
2022/04/23 13:51:23 [INFO] Join completed. Synced with 0 initial nodes => discovery.go:59
2022/04/23 13:51:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:51:23 [INFO] The cluster coordinator has been bootstrapped => discovery.go:42
2022/04/23 13:51:23 [INFO] Memberlist bindAddr: 127.0.0.1, bindPort: 3322 => routingtable.go:413
2022/04/23 13:51:23 [INFO] Cluster coordinator: 127.0.0.1:3320 => routingtable.go:414
2022/04/23 13:51:23 [INFO] Node name in the cluster: 127.0.0.1:3320 => olric.go:371
2022/04/23 13:51:23 [INFO] Olric bindAddr: 127.0.0.1, bindPort: 3320 => olric.go:377
2022/04/23 13:51:23 [INFO] Replication count is 1 => olric.go:379
2022/04/23 13:52:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:53:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:54:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:55:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:56:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:57:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:58:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 13:59:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 14:00:23 [INFO] Routing table has been pushed by 127.0.0.1:3320 => operations.go:86
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 0 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 1 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 2 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 3 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 4 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 5 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 6 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 7 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 8 => janitor.go:65
2022/04/23 14:01:23 [INFO] Empty DMap fragment (kind: Primary) has been deleted: dmap.bench on PartID: 9 => janitor.go:65
...

When you try to stop the server:

...
^C2022/04/23 14:09:44 [INFO] Signal catched: interrupt
2022/04/23 14:09:44 [INFO] Awaiting for background tasks
2022/04/23 14:09:44 [INFO] Press CTRL+C or send SIGTERM/SIGINT to quit immediately
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => eviction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
2022/04/23 14:09:44 [ERROR] Failed to acquire semaphore: context canceled => compaction.go:82
...
@buraksezer buraksezer added bug Something isn't working dmap release-blocker release:v0.5.x Issues related with 0.5.x labels Apr 23, 2022
@buraksezer buraksezer self-assigned this Apr 23, 2022
@buraksezer
Copy link
Owner Author

I checked every possible component and tried to find a real problem in DMaps and its compaction implementation. Everything seems okay.

ERROR: Failed to acquire semaphore: context canceled:

This message is quite normal because a semaphore limits the number of concurrent compaction workers, and there are some worker goroutines in the background who wait for their turn. I filtered the context cancelled message.

I used the following code snippet to create a load on a single Olric node. It creates random keys and inserts them into the cluster. I couldn't detect any missing key. It seems that memtier_benchmark somehow triggers DMap fragment creation, but it doesn't insert any keys. It looks all fine.

package main

import (
	"context"
	"crypto/rand"
	"encoding/hex"
	"fmt"
	"log"
	"sync"
	"time"

	"github.com/buraksezer/olric"
	"golang.org/x/sync/semaphore"
)

func main() {
	c, err := olric.NewClusterClient([]string{"localhost:3320"})
	if err != nil {
		log.Fatalf("olric.NewClusterClient returned an error: %v", err)
	}

	dm, err := c.NewDMap("bucket-of-arbitrary-items")
	if err != nil {
		log.Fatalf("olric.NewDMap returned an error: %v", err)
	}

	ctx, cancel := context.WithCancel(context.Background())

	var mtx sync.Mutex
	keys := make(map[string]struct{})

	var wg sync.WaitGroup
	sem := semaphore.NewWeighted(50)

	for i := 0; i < 1000000; i++ {

		err = sem.Acquire(ctx, 1)
		if err != nil {
			log.Println(err)
			continue
		}

		wg.Add(1)
		go func(idx int) {
			defer wg.Done()
			defer sem.Release(1)

			token := make([]byte, 16)
			_, err = rand.Read(token)
			if err != nil {
				log.Fatalf("Failed to read random bytes: %v", err)
			}

			key := hex.EncodeToString(token)
			err = dm.Put(ctx, key, fmt.Sprintf("Olric Rocks-%d", idx))
			if err != nil {
				log.Fatalf("Failed to call Put: %v", err)
			}

			mtx.Lock()
			keys[key] = struct{}{}
			mtx.Unlock()

			if idx%100000 == 0 {
				fmt.Println("PUT ==>", idx)
			}
		}(i)
	}

	wg.Wait()

	mtx.Lock()
	fmt.Println("Number of unique keys", len(keys))
	mtx.Unlock()

	for key, _ := range keys {
		gr, err := dm.Get(ctx, key)
		if err != nil {
			log.Fatalf("Failed to call Get: %v", err)
		}

		_, err = gr.String()
		if err != nil {
			log.Fatalf("Failed to read Get response: %v", err)
		}
	}

	ctx, cancel = context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	err = c.Close(ctx)
	if err != nil {
		log.Printf("Failed to close ClusterClient: %v", err)
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dmap release:v0.5.x Issues related with 0.5.x release-blocker
Projects
None yet
Development

No branches or pull requests

1 participant