Basic Master to Workers protobuf#25
Conversation
|
|
||
| package mrworker; | ||
|
|
||
| // The MRWorkerRegistrationService is used by workers to register their IP and Port with the Master. |
There was a problem hiding this comment.
I would add a comment that this service is implemented by the master and not the worker
|
|
||
| message RegisterWorkerRequest { | ||
| // The workers ip and port where its MRWorkerService is exposed. | ||
| string worker_ip = 1; |
There was a problem hiding this comment.
I would honestly combine them.
| } | ||
|
|
||
| message ReduceOperationResult { | ||
| enum Status { |
There was a problem hiding this comment.
The status enum is repeated in 2 messages, I would just make it top level
| OperationStatus operation_status = 2; | ||
| } | ||
|
|
||
| message MapOperationRequest { |
There was a problem hiding this comment.
I feel like in the most common use case multiple map operations will be sent to a worker at once rather than a single operation, should the request be something that is comprised of multiple "MapOperation" messages?
There was a problem hiding this comment.
I think as new map and reduce operations will be sent to the worker as it completes them it will probably not be the most common use case. We would have to change the API in other ways to support this though. It might be too complicated to tackle this in the first sprint.
There was a problem hiding this comment.
I think sending one at a time is good enough. Once a map reduce is done the scheduler should schedule another one rather than have the worker wonder which one it should do next.
| string mapper_file_path = 2; | ||
| } | ||
|
|
||
| message ReduceOperationRequest { |
There was a problem hiding this comment.
Similar comment to the MapOperationRequest comment, I think this could be restructured so multiple ReduceOperations are sent in a single request.
VoyTechnology
left a comment
There was a problem hiding this comment.
I would rearrange the order. MapRequest and MapResult, then ReduceRequest and ReduceResponse.
| AVAILABLE = 0; | ||
| BUSY = 1; | ||
| }; | ||
| enum OperationStatus { |
There was a problem hiding this comment.
You moved the OperationStatus out of MapOperationResult but not out of here.
There was a problem hiding this comment.
I wasn't sure what the best way to do this was, as they didn't completely overlap. One includes an inprogress state and the other does not. I considered calling the other one CompletedOperationStatus but it seemed a bit long, could possibly do it to minimize confusion though.
There was a problem hiding this comment.
Why would they differ? Map or Reduce can also be in progress, so it looks the same for me.
There was a problem hiding this comment.
Because the API doesn't support getting the result of Map and Reduces that are in progress. The master would poll worker status until the reduce was done, in which case it would request the result.
There was a problem hiding this comment.
I don't see why it can't return IN_PROGRESS, or just not use the field. I think making the proto simpler and slightly easier to use makes more sense in this case.
| }; | ||
|
|
||
| message MapOperationResult { | ||
| message MapResult { |
There was a problem hiding this comment.
Do we actually need such complexity of both key and file path? If each map reduce have a specific ID, we can save the intermediate data in a known (configurable) location. Then the key would become just a file in that path /tmp/mr475728/intermediate/$key, with the /tmp part being configurable in master or worker.
There was a problem hiding this comment.
I think we we do need key and specific file path. A given worker could produce multiple intermediate results for the same key over a series of map-reduce operations so the file name won't just be able to be the key.
There was a problem hiding this comment.
I am not so sure tbh.
| repeated MapResult map_results = 2; | ||
| } | ||
|
|
||
| message ReduceOperationResult { |
There was a problem hiding this comment.
I think ReduceOperationResponse would be more consistent than Result
There was a problem hiding this comment.
Either way it would not exactly fit the convention because typically we use [RPC_NAME]Response
There was a problem hiding this comment.
Honestly, I would just kick out Operation out of there. PerformMap and PerformReduce both for RPCs and messages
There was a problem hiding this comment.
And GetMapResult? GetMapResultResponse seems kind of weird though.
| string intermediate_key = 1; | ||
| // The file paths of the intermediate files produced by the map operations. | ||
| repeated string input_file_paths = 2; | ||
| // The path to the binary that performs the reudce operation. |
| string worker_address = 1; | ||
| } | ||
|
|
||
| // TODO(conor): Consider ways to change this API to support workers handling multiple map and reduce operations at once |
There was a problem hiding this comment.
Would we have a character limit? This should be added to CONTRIBUTING.md
GoldenBadger
left a comment
There was a problem hiding this comment.
Since there are many trivial commits in this PR now, please squash before merging.
| FAILED = 2; | ||
| }; | ||
| WorkerStatus worker_status = 1; | ||
| // The status of the last assigned operation. This field will not be included if there is no assigned operation. |
There was a problem hiding this comment.
This is incorrect. If you don't specify something in proto3 it gets assigned the default value. Therefore IN_PROGRESS.
https://developers.google.com/protocol-buffers/docs/proto3#default
cf6aa28 to
70b971d
Compare
No description provided.