Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Memory leak caused by UserGroupInformation.createProxyUser #772

Closed
3 tasks done
zuston opened this issue Mar 28, 2023 · 5 comments · Fixed by #773
Closed
3 tasks done

[Bug] Memory leak caused by UserGroupInformation.createProxyUser #772

zuston opened this issue Mar 28, 2023 · 5 comments · Fixed by #773

Comments

@zuston
Copy link
Member

zuston commented Mar 28, 2023

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

The hadoop filesystem instance will be created too many time in cache, which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and ugi. The scheme and authority are not changed everytime. But for ugi, if we invoke the createProxyUser, it will always create a new one, that means the every invoking Filesystem.get() , it will be cached due to different key.

We should cache the proxy user ugi to avoid caching every time.

Affects Version(s)

master

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
@zuston
Copy link
Member Author

zuston commented Mar 28, 2023

cc @advancedxy @jerqi @smallzhongfeng This is a critical bug, I will fix this ASSP.

@advancedxy
Copy link
Contributor

Do you any any heap dump to indicate this memory leak.

I just checked the related code, the proxy user created from the same ugi should equal to the same proxy user?

@zuston
Copy link
Member Author

zuston commented Mar 28, 2023

Do you any any heap dump to indicate this memory leak.

After dumping all heap, this bug was found.

I just checked the related code, the proxy user created from the same ugi should equal to the same proxy user?

I have not get your point.

@advancedxy
Copy link
Contributor

Do you any any heap dump to indicate this memory leak.

After dumping all heap, this bug was found.

Is it possible for you to upload the screenshot here?

I just checked the related code, the proxy user created from the same ugi should equal to the same proxy user?

I have not get your point.

I mean that two proxy ugi for the same proxy user should be equal to each other, that is what I have checked for.
But never mind, the hash code for proxy ugi is different for each instance.

@zuston
Copy link
Member Author

zuston commented Mar 28, 2023

g1
g2

zuston added a commit that referenced this issue Mar 29, 2023
### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Apr 5, 2023
…apache#773)

### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: apache#772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
jerqi pushed a commit that referenced this issue Apr 13, 2023
### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
jerqi added a commit that referenced this issue Apr 13, 2023
zuston added a commit that referenced this issue Apr 17, 2023
### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772 

The Hadoop filesystem instance will be created too many time in cache, 
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. 
The scheme and authority are not changed every time. But for UGI, if we invoke the 
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
1. Existing UTs
2. Added tests
jerqi added a commit that referenced this issue Apr 17, 2023
zuston added a commit to zuston/incubator-uniffle that referenced this issue Apr 17, 2023
…apache#773)

1. To avoid memory leak by caching of proxy user UGI.

Fix: apache#772

The Hadoop filesystem instance will be created too many time in cache,
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI.
The scheme and authority are not changed every time. But for UGI, if we invoke the
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

No.

1. Existing UTs
2. Added tests
zuston added a commit that referenced this issue Apr 17, 2023
…773) (#824)

### What changes were proposed in this pull request?

1. To avoid memory leak by caching of proxy user UGI.

### Why are the changes needed?

Fix: #772

The Hadoop filesystem instance will be created too many time in cache, which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. The scheme and authority are not changed every time. But for UGI, if we invoke the createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`, it will be cached due to different key.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

1. Existing UTs
2. Added tests
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Jun 20, 2023
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Jun 20, 2023
…apache#773)

1. To avoid memory leak by caching of proxy user UGI.

Fix: apache#772

The Hadoop filesystem instance will be created too many time in cache,
which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI.
The scheme and authority are not changed every time. But for UGI, if we invoke the
createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`,
it will be cached due to different key.

No.

1. Existing UTs
2. Added tests
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Jun 20, 2023
xianjingfeng pushed a commit to xianjingfeng/incubator-uniffle that referenced this issue Jun 20, 2023
… leak (apache#773) (apache#824)

1. To avoid memory leak by caching of proxy user UGI.

Fix: apache#772

The Hadoop filesystem instance will be created too many time in cache, which will cause the shuffle server memory leak.

As we know, the filesystem cache's key is built by the scheme、authority and UGI. The scheme and authority are not changed every time. But for UGI, if we invoke the createProxyUser, it will always create a new one, that means the every invoking `Filesystem.get()`, it will be cached due to different key.

No.

1. Existing UTs
2. Added tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants