Genode real-time capable checkpoint/restore mechanism

General workflow in migration

Checkpoint component A
Serialize data
Transfer to new node
Deserialize data
Restore state of component A and restart it

Accessing target component's resources

Parent/child approach
Target component = child component
Parent provides custom services which are used by the child (i.e. parent intercepts services used by the child)
Child creation: PD, CPU, and RAM sessions
Child runtime: All other sessions like RM, LOG, Timer sessions
Custom services use the real services in the background
Parent stores information about the state of each session
Creation arguments
Update arguments
Method invokations
Parent restores the inner state of used sessions through these information

Checkpoint/Restore

Component's name: Rtcr (Real-time checkpointer/restorer)
Checkpoint in userland
Service method: checkpoint() => it uses Checkpointer::checkpoint
Pause target during checkpoint
Read information about the capability space and map
Store intercepted session information (not dataspace content) to parent's address space
Store dataspace content to parent's address space
Resume target after checkpoint
Restore in userland
Service method: restore() => it uses Restorer::restore
Recreate empty child without sessions
Recreate sessions and their RPC objects
Restore state of sessions and their RPC objects
Restore capability space and map with new capabilities
Incremental checkpointing as optimization
Approach
At checkpoint time store only the changes to the last checkpoint
Marking/tracing "dirty pages" by using page faults exceptions
Parent provides a custom RAM session to the child which allocates managed dataspaces (=region maps) instead of usual dataspaces
The managed dataspace is filled with usual dataspaces (called designated dataspaces)
Designated dataspaces occupy an exclusive area in the managed dataspace, thus, the whole space of the managed dataspace is filled
To mark an accessed dataspace, all dataspaces from the managed dataspace are detached
When a region in the managed dataspace is accessed a page fault is triggered
The page fault is resolved by a thread which attaches the corresponding designated dataspace to the faulting region
Now the target component can use (read, write, execute) the region in the managed dataspace without disruption
When a checkpoint is performed this designated dataspace is stored to parent's address space and detached from the managed dataspace
Now the managed dataspace is ready to mark/trace accessed regions again
Tweaks
The granularity of the marking mechanism can be modified by changing the size of designated dataspaces
Increasing the designated dataspace size
Decreasing the chance a page fault occurs which lowers the overhead while the target component is running (runtime overhead)
Increasing the duration of the checkpoint, because the dataspace is larger and needs more time for copying (checkpoint overhead)
Decreasing the designated dataspace size
Increasing the chance a page fault occurs which increases the runtime overhead
Decreasing the duration of the checkpoint, because the dataspace is smaller
A balance between runtime and checkpoint overhead has to be found out in regard to the locality of target's memory usage
Accessing adjacent memory regions profits from large designated dataspaces
Accessing spread memory regions profits from small designated dataspaces

Name		Name	Last commit message	Last commit date
Latest commit History 458 Commits
genode_patched_files/repos		genode_patched_files/repos
include		include
run		run
src		src
.gitignore		.gitignore
README.md		README.md
drawio_genode_capability_foc.png		drawio_genode_capability_foc.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genode_patched_files/repos

genode_patched_files/repos

include

include

run

run

src

src

.gitignore

.gitignore

README.md

README.md

drawio_genode_capability_foc.png

drawio_genode_capability_foc.png

Repository files navigation

Genode real-time capable checkpoint/restore mechanism

About

Releases 1

Packages

Languages

argos-research/genode-CheckpointRestore-SharedMemory

Folders and files

Latest commit

History

Repository files navigation

Genode real-time capable checkpoint/restore mechanism

About

Resources

Stars

Watchers

Forks

Languages