Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The StorageManager cache might not function effectively under heavy IO pressure #1626

Closed
3 tasks done
rickyma opened this issue Apr 8, 2024 · 0 comments · Fixed by #1627
Closed
3 tasks done

Comments

@rickyma
Copy link
Contributor

rickyma commented Apr 8, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

I think the following code snipet in HybridStorageManager could be refactored after #383:

public boolean write(Storage storage, ShuffleWriteHandler handler, ShuffleDataFlushEvent event) {
  StorageManager underStorageManager = eventOfUnderStorageManagers.getIfPresent(event);
  if (underStorageManager == null) {
    return false;
  }
  return underStorageManager.write(storage, handler, event);
}

When we cannot obtain the cached storageManager from eventOfUnderStorageManagers, we should try other methods to acquire the storageManager and refresh the cache, instead of directly returning false.

Currently, the actual write logic is not truly executed.
There may be a scenario where retry failures occur rapidly, and such retries are meaningless:
35879725f6a9ba276ca6a37c5249f29

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
rickyma added a commit to rickyma/incubator-uniffle that referenced this issue Apr 8, 2024
zuston pushed a commit that referenced this issue Apr 10, 2024
…rs cache (#1627)

### What changes were proposed in this pull request?

Remove the meaningless eventOfUnderStorageManagers.

### Why are the changes needed?

Fix #1626 & #1620.
It's also a follow-up PR for #383.

This cache only makes sense when the event retries after a failure. However, after the event fails, it is not appropriate to continue taking the original storageManager from the cache(because events usually fail due to high IO pressure or disk damage or disk full). In this case, the cache seems to be meaningless, so there is a contradiction here, we should remove it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested in our env
jerqi pushed a commit that referenced this issue Apr 30, 2024
…rs cache (#1627)

### What changes were proposed in this pull request?

Remove the meaningless eventOfUnderStorageManagers.

### Why are the changes needed?

Fix #1626 & #1620.
It's also a follow-up PR for #383.

This cache only makes sense when the event retries after a failure. However, after the event fails, it is not appropriate to continue taking the original storageManager from the cache(because events usually fail due to high IO pressure or disk damage or disk full). In this case, the cache seems to be meaningless, so there is a contradiction here, we should remove it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested in our env
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant