Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement][tez] Optimize the method of obtain the vertex id. #986

Closed
3 tasks done
zhengchenyu opened this issue Jul 3, 2023 · 4 comments · Fixed by #990
Closed
3 tasks done

[Improvement][tez] Optimize the method of obtain the vertex id. #986

zhengchenyu opened this issue Jul 3, 2023 · 4 comments · Fixed by #990

Comments

@zhengchenyu
Copy link
Collaborator

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

For now, vertex id is extract from vertex name. This way only support the vertex name like "Map 0", "Reduce 1", generally generated from hive.
For tez examples, the vertex name is arbitrary, so we can't get the vertex id. So we need a new way to get vertex id.

How should we improve?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jul 3, 2023
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jul 3, 2023
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jul 3, 2023
@zhengchenyu zhengchenyu changed the title [Improvement] Optimize the method of obtain the vertex id. [Improvement][tez] Optimize the method of obtain the vertex id. Jul 3, 2023
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jul 4, 2023
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jul 4, 2023
zhengchenyu added a commit to zhengchenyu/incubator-uniffle that referenced this issue Jul 4, 2023
@lifeSo
Copy link
Collaborator

lifeSo commented Jul 5, 2023

It is great, if can obtain vertex id throuth:

          int sourceVertexId = dag.getVertex(edge.getSourceVertexName()).getVertexId().getId();
          int destinationVertexId = dag.getVertex(edge.getDestinationVertexName()).getVertexId().getId();

The only one it modify lots of file, but it is ok.

but when could not get vertexid, eg:

int sourceVertexId = this.conf.getInt(RSS_SHUFFLE_SOURCE_VERTEX_ID, -1);

could we throw exception, then we could know something wrong, instead get -1 and wrong result.

@zhengchenyu
Copy link
Collaborator Author

zhengchenyu commented Jul 5, 2023

It is great, if can obtain vertex id throuth:

          int sourceVertexId = dag.getVertex(edge.getSourceVertexName()).getVertexId().getId();
          int destinationVertexId = dag.getVertex(edge.getDestinationVertexName()).getVertexId().getId();

The only one it modify lots of file, but it is ok.

but when could not get vertexid, eg:

int sourceVertexId = this.conf.getInt(RSS_SHUFFLE_SOURCE_VERTEX_ID, -1);

could we throw exception, then we could know something wrong, instead get -1 and wrong result.

@lifeSo

assert is used to throw exception. You can see the below code.

    assert sourceVertexId != -1;
    assert destinationVertexId != -1;

And in fact, the vertex id must exists, so in general way, we will never throw exception.

@lifeSo
Copy link
Collaborator

lifeSo commented Jul 6, 2023

sourceVertexId

Greate! @zhengchenyu zhengchenyu
How about the assert check is in computeShuffleId() method.
Then users who call this methods do not assert check explicitly.

@zhengchenyu
Copy link
Collaborator Author

@lifeSo

That's a good idea, as only computeShuffleId use vertex id.
But just like the code show below. I check whether vertex id is -1, but the -1 is the default value of config. If we want to move assert to computeShuffleId, I think getInt also should move to computeShuffleId. How about this way?

    int sourceVertexId = this.conf.getInt(RSS_SHUFFLE_SOURCE_VERTEX_ID, -1);
    int destinationVertexId = this.conf.getInt(RSS_SHUFFLE_DESTINATION_VERTEX_ID, -1);
    assert sourceVertexId != -1;
    assert destinationVertexId != -1;

@jerqi jerqi closed this as completed in #990 Jul 6, 2023
jerqi pushed a commit that referenced this issue Jul 6, 2023
#990)

### What changes were proposed in this pull request?

Optimize the method of obtain the vertex id.

### Why are the changes needed?

For now, vertex id is extract from vertex name. This way only support the vertex name like "Map 0", "Reduce 1", generally generated from hive.
For tez examples, the vertex name is arbitrary, so we can't get the vertex id. So we need a new way to get vertex id.

Fix: #986

### How was this patch tested?

integration test, unit test, test in yarn cluster, test in tez local mode.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants