Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid getting meta sync lock if no need #17172

Merged
merged 2 commits into from Mar 31, 2023

Conversation

secfree
Copy link
Contributor

@secfree secfree commented Mar 31, 2023

What changes are proposed in this pull request?

Avoid getting meta sync lock if no need.

Why are the changes needed?

After backporting #16241, "ls /" encountered the following exception

2023-03-30 14:00:51,657 ERROR FileSystemMasterClientServiceHandler - Exit (Error): ListStatus: request=path: "/"
options {
  loadMetadataType: ONCE
  commonOptions {
    syncIntervalMs: 86400000
    ttl: -1
    ttlAction: DELETE
  }
  recursive: false
  loadMetadataOnly: false
}

java.lang.RuntimeException: Call cancelled by trackers: GRPC_CLIENT_TRACKER
        at alluxio.master.file.RpcContext.throwIfCancelled(RpcContext.java:107)
        at alluxio.master.file.InodeSyncStream.sync(InodeSyncStream.java:322)
        at alluxio.master.file.DefaultFileSystemMaster.syncMetadata(DefaultFileSystemMaster.java:3816)
        at alluxio.master.file.DefaultFileSystemMaster.listStatus(DefaultFileSystemMaster.java:1042)
        ...

We found that "ls /" was waiting for the WRITE lock of "/" in MetadataSyncLockManager even it did not need to sync metadata. As other clients were "ls" other paths and they hold READ lock on "/", "ls /" got stuck.

Does this PR introduce any user facing changes?

NO

@secfree
Copy link
Contributor Author

secfree commented Mar 31, 2023

cc @elega

@ChunxuTang
Copy link
Contributor

Thanks for the contribution!

@yyongycy
Copy link
Contributor

yyongycy commented Mar 31, 2023

Did you see this issue in the master branch? If not, why is this only triggered in your branch?
On the other hand, "ls /" should be quite frequently. If this is blocked, others should also see that.

@yyongycy
Copy link
Contributor

@tcrain any thoughts?

@secfree
Copy link
Contributor Author

secfree commented Mar 31, 2023

Did you see this issue in the master branch? If not, why is this only triggered in your branch?

No. We didn't use the master branch in our production clusters.

@secfree
Copy link
Contributor Author

secfree commented Mar 31, 2023

On the other hand, "ls /" should be quite frequently. If this is blocked, others should also see that.

I have not tested that if the master branch also has the following problem:

We found that "ls /" was waiting for the WRITE lock of "/" in MetadataSyncLockManager even it did not need to sync metadata.

If yes, then the PR is still needed. If no, then it may just be an issue in our internal branch. I think this point can be confirmed from code.

@yyongycy
Copy link
Contributor

On the other hand, "ls /" should be quite frequently. If this is blocked, others should also see that.

I have not tested that if the master branch also has the following problem:

We found that "ls /" was waiting for the WRITE lock of "/" in MetadataSyncLockManager even it did not need to sync metadata.

If yes, then the PR is still needed. If no, then it may just be an issue in our internal branch. I think this point can be confirmed from code.

Could you please help double check to see if the master branch has this problem? If the master branch has no such problem, there must be some problem someplace else triggered this issue in your branch.

@yyongycy
Copy link
Contributor

yyongycy commented Mar 31, 2023

Since yimin confirmed, I am fine with this pr. Thanks for fixing it.

Copy link
Contributor

@yyongycy yyongycy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yyongycy
Copy link
Contributor

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit 875f3f2 into Alluxio:master Mar 31, 2023
17 checks passed
@Xenorith Xenorith added the area-master Alluxio master label Apr 25, 2023
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 18, 2023
### What changes are proposed in this pull request?

Avoid getting meta sync lock if no need.

### Why are the changes needed?

After backporting Alluxio#16241, "ls /" encountered the following exception

```
2023-03-30 14:00:51,657 ERROR FileSystemMasterClientServiceHandler -
Exit (Error): ListStatus: request=path: "/"
options {
loadMetadataType: ONCE
commonOptions {
syncIntervalMs: 86400000
ttl: -1
ttlAction: DELETE
}
recursive: false
loadMetadataOnly: false
}

java.lang.RuntimeException: Call cancelled by trackers:
GRPC_CLIENT_TRACKER
at alluxio.master.file.RpcContext.throwIfCancelled(RpcContext.java:107)
at alluxio.master.file.InodeSyncStream.sync(InodeSyncStream.java:322)
at
alluxio.master.file.DefaultFileSystemMaster.syncMetadata(DefaultFileSystemMaster.java:3816)
at
alluxio.master.file.DefaultFileSystemMaster.listStatus(DefaultFileSystemMaster.java:1042)
...
```

We found that "ls /" was waiting for the WRITE lock of "/" in
MetadataSyncLockManager even it did not need to sync metadata. As other
clients were "ls" other paths and they hold READ lock on "/", "ls /" got
stuck.

### Does this PR introduce any user facing changes?

NO

pr-link: Alluxio#17172
change-id: cid-405084e218b150b4ffcce08294b30b3dd867a309
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 18, 2023
### What changes are proposed in this pull request?

Avoid getting meta sync lock if no need.

### Why are the changes needed?

After backporting Alluxio#16241, "ls /" encountered the following exception

```
2023-03-30 14:00:51,657 ERROR FileSystemMasterClientServiceHandler -
Exit (Error): ListStatus: request=path: "/"
options {
loadMetadataType: ONCE
commonOptions {
syncIntervalMs: 86400000
ttl: -1
ttlAction: DELETE
}
recursive: false
loadMetadataOnly: false
}

java.lang.RuntimeException: Call cancelled by trackers:
GRPC_CLIENT_TRACKER
at alluxio.master.file.RpcContext.throwIfCancelled(RpcContext.java:107)
at alluxio.master.file.InodeSyncStream.sync(InodeSyncStream.java:322)
at
alluxio.master.file.DefaultFileSystemMaster.syncMetadata(DefaultFileSystemMaster.java:3816)
at
alluxio.master.file.DefaultFileSystemMaster.listStatus(DefaultFileSystemMaster.java:1042)
...
```

We found that "ls /" was waiting for the WRITE lock of "/" in
MetadataSyncLockManager even it did not need to sync metadata. As other
clients were "ls" other paths and they hold READ lock on "/", "ls /" got
stuck.

### Does this PR introduce any user facing changes?

NO

pr-link: Alluxio#17172
change-id: cid-405084e218b150b4ffcce08294b30b3dd867a309
jiacheliu3 pushed a commit to jiacheliu3/alluxio that referenced this pull request May 19, 2023
### What changes are proposed in this pull request?

Avoid getting meta sync lock if no need.

### Why are the changes needed?

After backporting Alluxio#16241, "ls /" encountered the following exception

```
2023-03-30 14:00:51,657 ERROR FileSystemMasterClientServiceHandler -
Exit (Error): ListStatus: request=path: "/"
options {
loadMetadataType: ONCE
commonOptions {
syncIntervalMs: 86400000
ttl: -1
ttlAction: DELETE
}
recursive: false
loadMetadataOnly: false
}

java.lang.RuntimeException: Call cancelled by trackers:
GRPC_CLIENT_TRACKER
at alluxio.master.file.RpcContext.throwIfCancelled(RpcContext.java:107)
at alluxio.master.file.InodeSyncStream.sync(InodeSyncStream.java:322)
at
alluxio.master.file.DefaultFileSystemMaster.syncMetadata(DefaultFileSystemMaster.java:3816)
at
alluxio.master.file.DefaultFileSystemMaster.listStatus(DefaultFileSystemMaster.java:1042)
...
```

We found that "ls /" was waiting for the WRITE lock of "/" in
MetadataSyncLockManager even it did not need to sync metadata. As other
clients were "ls" other paths and they hold READ lock on "/", "ls /" got
stuck.

### Does this PR introduce any user facing changes?

NO

pr-link: Alluxio#17172
change-id: cid-405084e218b150b4ffcce08294b30b3dd867a309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-master Alluxio master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants