Skip to content

Basic Master to Workers protobuf#25

Merged
ConorGriffin37 merged 1 commit intodevelopfrom
feature/inital_worker_master_proto
Oct 4, 2017
Merged

Basic Master to Workers protobuf#25
ConorGriffin37 merged 1 commit intodevelopfrom
feature/inital_worker_master_proto

Conversation

@ConorGriffin37
Copy link
Copy Markdown

No description provided.

Comment thread proto/mrworker.proto Outdated

package mrworker;

// The MRWorkerRegistrationService is used by workers to register their IP and Port with the Master.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment that this service is implemented by the master and not the worker

Comment thread proto/mrworker.proto Outdated

message RegisterWorkerRequest {
// The workers ip and port where its MRWorkerService is exposed.
string worker_ip = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would honestly combine them.

Comment thread proto/mrworker.proto Outdated
}

message ReduceOperationResult {
enum Status {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status enum is repeated in 2 messages, I would just make it top level

Comment thread proto/mrworker.proto Outdated
OperationStatus operation_status = 2;
}

message MapOperationRequest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like in the most common use case multiple map operations will be sent to a worker at once rather than a single operation, should the request be something that is comprised of multiple "MapOperation" messages?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as new map and reduce operations will be sent to the worker as it completes them it will probably not be the most common use case. We would have to change the API in other ways to support this though. It might be too complicated to tackle this in the first sprint.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sending one at a time is good enough. Once a map reduce is done the scheduler should schedule another one rather than have the worker wonder which one it should do next.

Comment thread proto/mrworker.proto Outdated
string mapper_file_path = 2;
}

message ReduceOperationRequest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to the MapOperationRequest comment, I think this could be restructured so multiple ReduceOperations are sent in a single request.

Copy link
Copy Markdown
Member

@VoyTechnology VoyTechnology left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rearrange the order. MapRequest and MapResult, then ReduceRequest and ReduceResponse.

Comment thread proto/mrworker.proto
AVAILABLE = 0;
BUSY = 1;
};
enum OperationStatus {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved the OperationStatus out of MapOperationResult but not out of here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what the best way to do this was, as they didn't completely overlap. One includes an inprogress state and the other does not. I considered calling the other one CompletedOperationStatus but it seemed a bit long, could possibly do it to minimize confusion though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would they differ? Map or Reduce can also be in progress, so it looks the same for me.

Copy link
Copy Markdown
Author

@ConorGriffin37 ConorGriffin37 Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the API doesn't support getting the result of Map and Reduces that are in progress. The master would poll worker status until the reduce was done, in which case it would request the result.

Copy link
Copy Markdown
Member

@VoyTechnology VoyTechnology Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why it can't return IN_PROGRESS, or just not use the field. I think making the proto simpler and slightly easier to use makes more sense in this case.

Comment thread proto/mrworker.proto
};

message MapOperationResult {
message MapResult {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need such complexity of both key and file path? If each map reduce have a specific ID, we can save the intermediate data in a known (configurable) location. Then the key would become just a file in that path /tmp/mr475728/intermediate/$key, with the /tmp part being configurable in master or worker.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we we do need key and specific file path. A given worker could produce multiple intermediate results for the same key over a series of map-reduce operations so the file name won't just be able to be the key.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure tbh.

CC @GoldenBadger

Comment thread proto/mrworker.proto Outdated
repeated MapResult map_results = 2;
}

message ReduceOperationResult {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ReduceOperationResponse would be more consistent than Result

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way it would not exactly fit the convention because typically we use [RPC_NAME]Response

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I would just kick out Operation out of there. PerformMap and PerformReduce both for RPCs and messages

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And GetMapResult? GetMapResultResponse seems kind of weird though.

Comment thread proto/mrworker.proto Outdated
string intermediate_key = 1;
// The file paths of the intermediate files produced by the map operations.
repeated string input_file_paths = 2;
// The path to the binary that performs the reudce operation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/reudce/reduce

Comment thread proto/mrworker.proto
string worker_address = 1;
}

// TODO(conor): Consider ways to change this API to support workers handling multiple map and reduce operations at once
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we have a character limit? This should be added to CONTRIBUTING.md

Copy link
Copy Markdown
Contributor

@GoldenBadger GoldenBadger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are many trivial commits in this PR now, please squash before merging.

Comment thread proto/mrworker.proto Outdated
FAILED = 2;
};
WorkerStatus worker_status = 1;
// The status of the last assigned operation. This field will not be included if there is no assigned operation.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. If you don't specify something in proto3 it gets assigned the default value. Therefore IN_PROGRESS.
https://developers.google.com/protocol-buffers/docs/proto3#default

@ConorGriffin37 ConorGriffin37 force-pushed the feature/inital_worker_master_proto branch from cf6aa28 to 70b971d Compare October 4, 2017 23:25
@ConorGriffin37 ConorGriffin37 merged commit ce6a0f9 into develop Oct 4, 2017
@ConorGriffin37 ConorGriffin37 deleted the feature/inital_worker_master_proto branch October 4, 2017 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants