Skip to content
This repository has been archived by the owner on Aug 4, 2022. It is now read-only.

[REEF-1978] Adding Checkpoint handler for IMRU master #1429

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jwang98052
Copy link
Contributor

  • Adding IMRUCheckpointHandler to handle task state persistent
  • Added configuration module for IMRUCheckpointHandler
  • Update IMRUJobDefination to all client to set checkpoint configuration
  • Add UpdateTaskStateCodec implementation
  • Update IMRU examples to set checkpoint config and call the check point handler
  • Update test cases

JIRA: REEF-1978
This closes #

jwang98052 added 3 commits February 2, 2018 12:33
* Adding IMRUCheckpointHandler to handle task state persistent
* Added configuration module for IMRUCheckpointHandler
* Update IMRUJobDefination to all client to set checkpoint configuration
* Add UpdateTaskStateCodec implementation
* Update IMRU examples to set checkpoint config and call the check point handler
* Update test cases

JIRA: [REEF-1978](https://issues.apache.org/jira/browse/REEF-1978)
This closes  #
Copy link
Contributor

@markusweimer markusweimer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a surface level read:

  • Please document all public classes, interfaces and their public members.
  • Make all new public classes sealed.
  • Check constructor parameters in the constructor instead of in methods to facilitate early failure and concise code.
  • Reformat all log lines not to contain #### and such. Also, consider moving them to higher log levels.
  • Reduce the number of new public classes and interfaces where possible.

@@ -174,5 +176,24 @@ protected virtual IConfiguration BuildMapperFunctionConfig()
GenericType<BroadcastReceiverReduceSenderMapFunction>.Class)
.Build();
}

/// <summary>
/// Build checkpoint configuration. Subclass can override it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to use public abstract classes as APIs. Consider re-structuring this using composition instead of inheritance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to follow the existing pattern for getting configurations. It was to let client to share the same CreateJobDefinitionBuilder but have its own way to override the configuration. If we really want to change it, it needs to do in different PR as the change must be consistent cross other methods.

/// <summary>
/// Build checkpoint configuration. Subclass can override it.
/// </summary>
protected override IConfiguration BuildCheckpointConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. It is a sample client class, mainly contains driver configurations. I will change the class into internal.

/// </summary>
protected override IConfiguration BuildCheckpointConfig()
{
var filePath = Path.Combine(Path.GetTempPath(), Guid.NewGuid() + "state.txt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the temp files generated by REEF, not System.Path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


private void PersistState()
{
Logger.Log(Level.Info, "$$$$$$$$$$$ State to save: {0}", _taskState.Input[0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reformat all log lines and consider moving them to more fine grained log levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

{
var obj = (UpdateTaskState<int[], int[]>)_stateHandler.Restore(_stateCodec);

if (obj != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is obj used? Also, what if it is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj is used to update the current state in momer
_taskState.Update(obj);

If null, that means the checkpoint handler is not able to get any old state for whatever reason, then the current state in the memory keeps the same.

/// <returns></returns>
public ITaskState Restore(ICodec<ITaskState> codec)
{
if (!string.IsNullOrEmpty(_checkpointFilePath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be validated in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not for validation but for backward compatibility. If the client doesn't set it, we will do nothing but return null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it can be moved to constructor.

{
if (!string.IsNullOrEmpty(_checkpointFilePath))
{
var files = _fileSystem.GetChildren(_fileSystem.CreateUriForPath(_checkpointFilePath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URI should have been created in the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be. Persist can be called many times and restore is called only once for a single recovery. So not much to optimize.

var localLatestFlagfile = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString("N").Substring(0, 4));
var localLatestStatefile = Path.Combine(Path.GetTempPath() + Guid.NewGuid().ToString("N").Substring(0, 4));

_fileSystem.CopyToLocal(latestFlagFile, localLatestFlagfile);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy to local should not be necessary. Can't you just .Open() the remote file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can read directly from remote. I just want to match with the way the data is written to ensure the sate format.

/// <returns></returns>
public bool GetResult()
{
if (!string.IsNullOrEmpty(_checkpointFilePath) && _fileSystem.Exists(_resultFileUrl))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks of readonly attributes should move to the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not really for validation but for backward compatibility. We want to make sure the code is still working if the client doesn't config it.

}
catch (Exception e)
{
Exceptions.Throw(e, "Unable to deserialize checkpoint configuration", Logger);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the Exceptions use while you are at it :). Better yet, remove the useless catch here altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to follow the existing pattern. Yes, the try catch is not necessary.

@jwang98052
Copy link
Contributor Author

@markusweimer I have addressed your review comments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants