Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: AMD ROCm support with plugin #1519

Closed
wants to merge 29 commits into from

Conversation

rajbhar
Copy link
Contributor

@rajbhar rajbhar commented Jun 18, 2021

Hello CRIU Community and Maintainers,

Can you please review this "RFC" pull request as we are still adding more features and making some changes to our API IOCTL design. Further to our earlier announcement https://www.spinics.net/lists/amd-gfx/msg62112.html , we made some more enhancements in terms of functionalities and based on some feedback that we received on the mailing list with regards to placement of the plugin and how it should be built etc.

This PR includes another important enhancement that brings in multi-GPU support. The checkpoint / restore time is improved now compared to what we had earlier when we made the announcement, as we added multithreading support and we are further optimizing it using the DMA engines, to copy VRAM contents to image files. This will be added as soon as its ready, probably on top of the feedback we receive here. There are some changes made to core CRIU itself and one possible recent regression was noticed during our testing so we seek your guidance as we are not sure if we understand that well enough.

This work was tested on multi GPU AMD systems with real pytorch workloads such as BERT(Transformers) and BERT Large. We also tested some Tensorflow workloads and found some probable limitations of protobuf that its unable to handle buffer objects more than 4GB. We are still investigating this more but would like to inquire here if any such issue was observed or reported before?

Link to the kernel changes: https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/commits/fxkamd/criu-wip

We again thank you for your time and appreciate your feedback.

rajbhar and others added 26 commits June 17, 2021 15:53
This recently introduced commit breaks amdgpu_plugin and perhaps more
such consumers e.g. the CR operation fails on a BERT pytorch workloads
with systems having more than one GPU. I am not sure about how to handle
systemcfg file but the current solution looks like a workaround so
revert it.

This reverts commit ea8921b.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Unsupported VMAs that provide special plugins for backup can be treated
as regular VMAs and criu should save their metadata in the dump files.
There can be several special backup plugins that might run at different
times during checkpoint and restore.

To checkpoint tasks that have opened AMD KFD/RenderDXXX device files
directly directly or via AMD ROCt library, we can use special plugins
such as kfd_plugin.so or similar that implement various EXT and other
new hooks to support checkpointing and restore.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
kfd_ioctl.h contains the definitions for the APIs and required arguments
to call the ioctls so simply copy the header as is for amdgpu plugin.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
This applies to Checkpoint-Restore operations for a device file. When
restoring VMAs for the device files, criu runs sys_mmap in the pie
restore context but the offsets and file path within a device file may
change during restore operation so it needs to be adjusted properly. A
future patch will add a consumer for this plugin.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
To support Checkpoint Restore with AMDGPUs for ROCm workloads, introduce
a new plugin to assist CRIU with the help of AMD KFD kernel driver. This
initial commit just provides the basic framework to build up further
capabilities. Like CRIU, the amdgpu plugin also uses protobuf to serialize
and save the amdkfd data which is mostly VRAM contents with some metadata.
We generate a data file "kfd.<id>.img" during the dump stage. On restore
this file is read and extracted to re-create various types of buffer
objects that belonged to the previously checkpointed process. Upon
restore the mmap page offset within a device file might change so we use
the new hook to update and adjust the mmap offsets for newly created
target process. This is needed for sys_mmap call in pie restorer phase.

With the current implementation (amdgpu_plugin), we support:
 - Only compute workloads such (Non Gfx) are supported
 - Single GPU
 - GPU visible inside a container
 - AMD GPU Gfx 9 Family
 - Pytorch Benchmarks such as BERT Base

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Userptr buffer objects are system memory that criu handles so no need to
allocate space for such buffer objects. We should allocate space only
for GTT and VRAM memory since that is what we should save in our kfd
image file. The contents for userptr buffer objects will be drained by
criu via vmsplice since those are paged system memory and saved as
regular pages-*.img files.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
PROCPIDMEM interface has memory bandwidth bottleneck during checkpoint
so it should be used only when direct mmap/memcpy is not possible.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
During criu restore phase when a device is getting restored with the
help of a plugin, some device specific operations need to be delayed
until criu finalizes the VMA placements in address space of the target
process. But by the time criu finalizes this, its too late since pie
phase is over and control is back to criu master process.

This change allows an external trigger to each resuming task to check
whether it has a device specific operation pending such as issuing an
ioctl call. Since this is called from criu master process context,
supply the pid of the target process and give a chance to each plugin
registered to run device specific operation if the target pid is valid.

A consumer for this will be added in a seperate patch.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
For restarting MMU notifiers and to kick-start the evicted queues that
were created during a previuos criu restore plugin operation, we need an
external trigger that runs outside the target process context i.e. from
within master restorer context for each target (restored) pid. This
external trigger is required because we need to wait till criu pie
finalizes the VMAs relocation within the target process's address space.

When kfd recieves this ioctl, it starts the userptr worker and resumes
the MMU notifiers.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
This adds the capabilities to Checkpoint and Restore user mode queues
that are associated with a typical ROCm process such as memory queue
descriptors, control stacks and compute unit masks etc.

Also, serialize queue information returned during dump ioctl and store
them into image file.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Plugin should log extra debug data only with -DDEBUG.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Use fopen/fwrite/fread to be able to handle bigger files.

Try to open /dev/kfd earlier during restore, so we can fail earlier if
amdgpu driver is not loaded.

Memcopy data directly into bo_info_test structure during dump to reduce
number of memcpy.

Moved some variable declarations so that the variables are only valid
during the scope where they are used.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Add support for dumping and restoring events during CRIU checkpoint and
restore. Events are used to send notifications to applications when
certain tasks are completed or exceptions have occured.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
 - placeholder and initial documentation.
 - keep with CRIU main documentation

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
The doorbell offset depends on the gpu_id. When restoring on a different
node, the gpu_id may be different, so we need to re-adjust the doorbell
offsets.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Add a Dockerfile to setup CRIU with its dependencies in an official ROCm
pytorch container. Support for plugin will be added in a future patch.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
This sets up the pytorch environment for BERT Transformers and also sets
up CRIU along with all its dependencies including amdgpu plugin for
supporting CR with AMDGPUs.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Restore operation fails when we perform CR operation of multiple
independent kfd proceses because criu caches the ids for the device
files with same mnt_ids, inode pair. This might not be the optimal
solution but a unique id is required for each kfd process for a
successful restore. Decoded fdinfo files also confirm a duplicate id for
amdgpu devices and thus cache lookup during checkpoint prevents the
actual metadata for the device VMA write in the image files so on a
subsequent restore the desired id is not found and hence restore fails
even though the actual image file e.g. renderDXXX.img exits.

This can be tested with any simple kfd application that uses AMD ROCt
library and creates a VRAM buffer object. CR on such individual
processes works fine but when we invoke more than one process from a
shell script and perform CR on this shell script, the restore operation
fails without this change.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
 - Make ROCm amdgpu plugin a part of standard criu build, install and
uninstall process.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Parse local system topology in /sys/class/kfd/kfd/topology/nodes/ and
store properties for each gpu in the CRIU image files. The gpu
properties can then be used later during restore to make the process is
restored on gpu's with similar properties.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
The device topology on the restore node can be different from the
topology on the checkpointed node. The GPUs on the restore node may
have different gpu_ids, minor number. or some GPUs may have different
properties as checkpointed node. During restore, the CRIU plugin
determines the target GPUs to avoid restore failures caused by trying
to restore a process on a gpu that is different.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Add optional parameters to override default behavior during restore.
These parameters are passed in as environment variables before executing
CRIU.

List of parameters:

KFD_FW_VER_CHECK - disable firmware version check
KFD_SDMA_FW_VER_CHECK - disable SDMA firmware version check
KFD_CACHES_COUNT_CHECK - disable caches count check
KFD_NUM_GWS_CHECK - disable num_gws check
KFD_VRAM_SIZE_CHECK - disable VRAM size check
KFD_NUMA_CHECK - preserve NUMA regions

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Adding unit tests for GPU remapping code when checkpointing and
restoring on different nodes with different topologies.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Implement multi-threaded code to read and write contents of each GPU VRAM
BOs in parallel in order to speed up dumping process when using multiple
GPUs.

Moving opening and closing of file descriptors outside of loop so
that they are only performed once per GPU.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Libhsakmt(thunk) uses a shared memory file in /dev/shm/hsakmt_shared_mem
and its semaphore in /dev/shm/hsakmt_shared_mem. Adding a check during
checkpoint to see if these two files exist. If they exist then the
plugin will try to restore them during restore.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
@adrianreber
Copy link
Member

I think, from my point of view, it would make more sense to have multiple pull request. Start with a smaller pull request setting up the CRIU build-system infrastructure for plugins. Provide a minimal dummy plugin as an example.

Also, as you mentioned that the plugin requires additional dependencies, the plugin should not be built automatically just by typing make. The plugin should be built only when explicitly requested.

Existing CI runs should of course be successful.

@checkpoint-restore/maintainers and other thoughts? Everyone happy with the location of the plugins? I suggested plugins in an email thread on the CRIU mailing list two months ago. Just highlighting this here in case someone from @checkpoint-restore/maintainers did not see it.

@rst0git
Copy link
Member

rst0git commented Jun 24, 2021

It fails to compile on fedora 34:

$ make amdgpu_plugin
protoc-c --proto_path=. --c_out=. criu-amdgpu.proto
gcc -g -Wall -Werror -D _GNU_SOURCE -shared -nostartfiles -fPIC amdgpu_plugin.c amdgpu_plugin_topology.c criu-amdgpu.pb-c.c -o amdgpu_plugin.so -iquote../../../criu/include -iquote ../../criu/include -lpthread -lrt
amdgpu_plugin_topology.c: In function ‘parse_topo_node_mem_banks’:
amdgpu_plugin_topology.c:466:40: error: ‘%s’ directive writing up to 255 bytes into a region of size between 0 and 299 [-Werror=format-overflow=]
  466 |                 sprintf(bank_path, "%s/%s", path, dirent_node->d_name);
      |                                        ^~
amdgpu_plugin_topology.c:466:17: note: ‘sprintf’ output between 2 and 556 bytes into a destination of size 300
  466 |                 sprintf(bank_path, "%s/%s", path, dirent_node->d_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
amdgpu_plugin_topology.c:475:53: error: ‘/properties’ directive writing 11 bytes into a region of size between 1 and 300 [-Werror=format-overflow=]
  475 |                         sprintf(properties_path, "%s/properties", bank_path);
      |                                                     ^~~~~~~~~~~
amdgpu_plugin_topology.c:475:25: note: ‘sprintf’ output between 12 and 311 bytes into a destination of size 300
  475 |                         sprintf(properties_path, "%s/properties", bank_path);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
amdgpu_plugin_topology.c: In function ‘parse_topo_node_iolinks’:
amdgpu_plugin_topology.c:553:42: error: ‘%s’ directive writing up to 255 bytes into a region of size between 0 and 299 [-Werror=format-overflow=]
  553 |                 sprintf(iolink_path, "%s/%s", path, dirent_node->d_name);
      |                                          ^~
amdgpu_plugin_topology.c:553:17: note: ‘sprintf’ output between 2 and 556 bytes into a destination of size 300
  553 |                 sprintf(iolink_path, "%s/%s", path, dirent_node->d_name);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
amdgpu_plugin_topology.c:562:53: error: ‘/properties’ directive writing 11 bytes into a region of size between 1 and 300 [-Werror=format-overflow=]
  562 |                         sprintf(properties_path, "%s/properties", iolink_path);
      |                                                     ^~~~~~~~~~~
amdgpu_plugin_topology.c:562:25: note: ‘sprintf’ output between 12 and 311 bytes into a destination of size 300
  562 |                         sprintf(properties_path, "%s/properties", iolink_path);
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[1]: *** [Makefile:20: amdgpu_plugin.so] Error 1
make: *** [Makefile:334: amdgpu_plugin] Error 2

On Ubuntu20.04 and Fedora34 some warnings are treated as errors. This
fixes those format overflow and truncate warnings.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
To build amdgpu_plugin, we need the include headers for libdrm, add this
to alpine, openjdk, focal, ppc, mips etc. containers.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
@rajbhar
Copy link
Contributor Author

rajbhar commented Jun 25, 2021

I have force pushed the branch and added two commits for testing purpose. There are still 2-3 more fixes required in the plugin to fix lint and centos errors.

@rajbhar
Copy link
Contributor Author

rajbhar commented Jun 25, 2021

@rst0git could you please advise if this ( f81a453 ) approach to fix some container build dependencies is OK ?

@@ -452,7 +452,7 @@ static int parse_topo_node_mem_banks(struct tp_node *node, const char *dir_path)

while ((dirent_node = readdir(d_node)) != NULL) {
char line[300];
char bank_path[300];
char bank_path[1024];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please apply these changes in the "criu/plugin: Support AMD ROCm Checkpoint Restore with KFD" commit, which introduced amdgpu_plugin_topology.c?
This could be done, for example, with git rebase -i 311aee4ff^, then change pick to edit on the first line and save.

edit 311aee4ff criu/plugin: Support AMD ROCm Checkpoint Restore with KFD

After the changes have been applied you can use git commit -a --ammend and git rebase --continue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please advise if this ( f81a453 ) approach to fix some container build dependencies is OK ?

The build dependencies for amdgpu_plugin should be optional and CRIU should work even if libdrm-dev is not installed. As far as I know, the current CI doesn't have AMD GPU available to run tests for the plugin, but it would be good to have tests for it.

I am not sure If I understood you here. I made the change to the CI docker scripts but the tests were not triggered. I am not sure if maintainers need to manually trigger those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently CRIU would fail to compile if libdrm is not installed. Instead, when libdrm is not installed it should skip building amdgpu_plugin

I made the change to the CI docker scripts but the tests were not triggered.

The CI tests are triggered on git push, for example, you can see in the Alpine test logs that libdrm-dev has been installed:

Step 3/11 : RUN apk update && apk add 	$CC 	bash 	build-base 	coreutils 	git 	gnutls-dev 	libaio-dev 	libcap-dev 	libnet-dev 	libnl3-dev 	nftables 	nftables-dev 	pkgconfig 	protobuf-c-dev 	protobuf-dev 	py3-pip 	py3-protobuf 	python3 	sudo 	libdrm-dev
...
(59/113) Installing libdrm (2.4.106-r0)
(60/113) Installing libpciaccess-dev (0.16-r0)
(61/113) Installing libdrm-dev (2.4.106-r0)
...

@@ -13,6 +13,7 @@ endif
FOOTER := footer.txt
SRC1 += crit.txt
SRC1 += compel.txt
SRC1 += amdgpu_plugin.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name amdgpu_plugin (i.e., man amdgpu_plugin) is global.
What do you think about using something like criu-amdgpu-plugin instead?

#RUN mkdir -p /home/criu_build_dir
WORKDIR /root/criu_build_dir
#RUN cd /home/criu_build_dir
RUN git clone https://github.com/checkpoint-restore/criu.git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Dockerfile like this one (and the one in commit "criu/plugin: Dockerfile for AMD criu repo") would be more useful as a test in scripts/build/, but if it is part of the repository it could use the following instead of git clone.

COPY . /criu
WORKDIR /criu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -0,0 +1,275 @@
Supporting ROCm with CRIU
=========================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be added with commit "criu/plugin: Support AMD ROCm Checkpoint Restore with KFD".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@@ -147,7 +147,7 @@ HOSTCFLAGS += $(WARNINGS) $(DEFINES) -iquote include/
export AFLAGS CFLAGS USERCLFAGS HOSTCFLAGS

# Default target
all: flog criu lib crit
all: flog criu lib crit amdgpu_plugin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Adrian mentioned in his comment, the plugin should not be built automatically by typing make. This is because it requires additional dependencies that are optional for CRIU.

To enable the build we could use something like AMDGPU_PLUGIN=1 make instead?

Copy link
Contributor Author

@rajbhar rajbhar Jul 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, I've added nmk based dependencies check on libdrm in the Makefile

@rst0git
Copy link
Member

rst0git commented Jun 27, 2021

could you please advise if this ( f81a453 ) approach to fix some container build dependencies is OK ?

The build dependencies for amdgpu_plugin should be optional and CRIU should work even if libdrm-dev is not installed. As far as I know, the current CI doesn't have AMD GPU available to run tests for the plugin, but it would be good to have tests for it.

@Snorch
Copy link
Member

Snorch commented Jun 30, 2021

Can we please have plain (linear without merge commits) commit tree for this pull request?

Hard to review this kind of thing:

Output of ` git log --graph --oneline`
*   4f864a19f (HEAD -> rajbhar-amd-rocm-support) Merge branch 'criu-dev' into criu-dev
|\  
| * 98186981d zdtm: add network namespace locking test
| * a226a1b92 test: remove exec test
| * efe927fff criu: add unit testing for config file parser
| * c98b1d9e5 test: add tests for configuration file parsing
| * c627d0c83 config: make configuration file parser more robust
| * 062200697 criu: add cleanup_free attribute
| * 178c90240 bfd: loop through read()/write() when the action is incomplete
| * d46e20e9d ci: disable some tests on CentOS 7
| * 46bb2d7e5 ci: remove old workarounds
| * 3518b4544 ci: switch CentOS 7 test to Cirrus CI
| * 2a89750da ci: disable -x during print_env()
| * f10b887e6 zdtm: allow ignore taint via environment variable
| * 89787bfb6 scripts/vagrant: Use vagrant 2.2.16
| * 7bd807fab scripts/vagrant: Use Fedora 34
| * 5e81dcadf add PKG_CONFIG default in a few more places
| * 7cfe7b4ce crtools: improve error handling on signal setting
| * d89f7dabf build: respect $PKG_CONFIG settings
| * 185036b94 docker-test: use latest containerd release
* | f81a45309 script/builds: add build dependepncy for libdrm
* | a5df3add4 criu/plugin: fix build warnings
* | d83894294 criu/plugin: Restore libhsakmt shared memory files
* | ee928e112 criu/plugin: Read and write BO contents in parallel
* | c87fdf511 criu/plugin: Add unit tests for GPU remapping
* | 98eddc948 criu/plugin: Add parameters to override mapping
* | 16778cc30 criu/plugin: Remap GPUs on checkpoint restore
* | 84135f4ed criu/plugin: Implement system topology parsing
* | 274aabd9e criu/plugin: Add build options for amdgpu plugin
* | d83ddd5b5 criu/plugin: Add whitepaper document
* | 8268b61f8 criu/files: *RFC* Don't cache fd for amdgpu devices
* | 95c9258d3 criu/plugin: Dockerfile for AMD criu repo
* | 1199c20d8 criu/plugin: Pytorch container with criu
* | e4819aa49 criu/plugin: Re-adjust doorbell offset for queues
* | 9ff89731e criu/plugin: Add initial documentation for ROCm support.
* | 0c3230492 criu/plugin: Dump and restore events
* | 24a67616f criu/plugin: Support larger memory footprints
* | c08db4ace criu/plugin: dump debug logs selectively
* | 0a6771fa3 criu/plugin: Add support for dumping and restoring queues
* | ffeb86bf3 criu/plugin: Implement restore late hook for kfd
* | ca99c1504 criu/restore: Introduce restore late stage hook
* | f55fe4894 criu/plugin: optimization for large bar read
* | 4c1b8f30b criu/plugin: Optimize the proto image size
* | 311aee4ff criu/plugin: Support AMD ROCm Checkpoint Restore with KFD
* | 4028ddced criu/files-reg: Add offset and file path plugin
* | 4028ddced criu/files-reg: Add offset and file path plugin
* | 71463130c criu/plugin: Initialize AMD KFD header
* | fbfa5566a criu/parse: Treat some unsupported VMAs as regular
* | 51efa5cdd Revert "Allow systemcfg proc file to be dumped"
|/  
* 7686b939d zdtm/tun_ns: add per-test dependencies

Copy link
Member

@Snorch Snorch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a big work, nice! But I would agree with @adrianreber that spliting it into separate pr's would make review easier.

@@ -593,14 +593,49 @@ static int handle_vma(pid_t pid, struct vma_area *vma_area,
}
} else if (*vm_file_fd >= 0) {
struct stat *st_buf = vma_area->vmst;
struct stat st_kfd, st_dri_min;
char img_path[128];
int ret;

if (S_ISREG(st_buf->st_mode))
/* regular file mapping -- supported */;
else if (S_ISCHR(st_buf->st_mode) && (st_buf->st_rdev == DEVZERO))
/* devzero mapping -- also makes sense */;
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we please fix the codding style, while on it, as if/else/else-if branches can't have mixed braces/no-braces formating. In this case we should have brackets everywhere. https://www.kernel.org/doc/html/v4.10/process/coding-style.html#placing-braces-and-spaces

  2. Can we please put the code from the below branch to new bool function (for instance let's name it handle_plugin_vma), not to overgrow the size of handle_vma(). Probably something like this would be ok:

else if (handle_plugin_vma(...)) {
        /* mapping would be handled by external plugin */;
} else {
        pr_err("Can't handle non-regular mapping on %d's map %"PRIx64"\n", pid, vma_area->e->start);
        goto err;
}
  1. We should probably check plugin availability here, if there is no proper plugin we should not just skip this mapping - we should fail.

Or even: you can detect that mapping is plugin-dumpable via calling plugin itself via another new hook, that would be a nice symmetric design.

goto err;
ret = stat("/dev/kfd", &st_kfd);
if (ret == -1) {
pr_perror("fstat error for /dev/kfd\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pr_perror() calls everywhere please remove excess newline specifiers '\n', pr_perror() adds '\n' by itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@rajbhar
Copy link
Contributor Author

rajbhar commented Jul 8, 2021

@Snorch @rst0git Thanks for your review comments. I will rebase and re-upload the branch but I am not sure how to address builds errors for some of the automation tests e.g. Openj9 failure. I was hoping that the dependency on drm.h will be addressed when the container is built with the libdrm dependency and hence I added that in the Dockerfile. I understand that I'll still need to decouple building the plugin from main criu build but could you please explain what am i missing for the build failures other than clang and lint errors, specially for the cross compilation and other architectures?

@rst0git
Copy link
Member

rst0git commented Jul 10, 2021

@rajbhar I can replicate the Alpine build error locally as follows:

docker build  -t criu-alpine -f scripts/build/Dockerfile.alpine .
kfd_ioctl.h:26:10: fatal error: drm/drm.h: No such file or directory
   26 | #include <drm/drm.h>
      |          ^~~~~~~~~~~
compilation terminated.

It can be fixed with:

--- a/plugins/amdgpu/kfd_ioctl.h
+++ b/plugins/amdgpu/kfd_ioctl.h
@@ -23,7 +23,7 @@
 #ifndef KFD_IOCTL_H_INCLUDED
 #define KFD_IOCTL_H_INCLUDED
 
-#include <drm/drm.h>
+#include <libdrm/drm.h>
 #include <linux/ioctl.h>
 
 /*

@rajbhar
Copy link
Contributor Author

rajbhar commented Jul 12, 2021

@rajbhar I can replicate the Alpine build error locally as follows:

docker build  -t criu-alpine -f scripts/build/Dockerfile.alpine .
kfd_ioctl.h:26:10: fatal error: drm/drm.h: No such file or directory
   26 | #include <drm/drm.h>
      |          ^~~~~~~~~~~
compilation terminated.

It can be fixed with:

--- a/plugins/amdgpu/kfd_ioctl.h
+++ b/plugins/amdgpu/kfd_ioctl.h
@@ -23,7 +23,7 @@
 #ifndef KFD_IOCTL_H_INCLUDED
 #define KFD_IOCTL_H_INCLUDED
 
-#include <drm/drm.h>
+#include <libdrm/drm.h>
 #include <linux/ioctl.h>
 
 /*

Thanks. On Alpine, I'll also need to include limits.h header in addition to this. The two drm.h headers are slightly different but it probably won't matter for our use case. Ideally, the kernel src package should have brought in the drm/drm.h like the way kernel expects it and other distros too have it but alpine seems to differ to I have copied the libdrm/* using the Dockerfile for alpine. We can't change the include in this header as it can't be accepted upstream.

@rst0git
Copy link
Member

rst0git commented Jul 14, 2021

Ideally, the kernel src package should have brought in the drm/drm.h like the way kernel expects it and other distros too have it

The linux-lts-dev package includes drm.h, but usually the kernel package is not installed inside containers.

This was referenced Jul 15, 2021
@rajbhar
Copy link
Contributor Author

rajbhar commented Jul 16, 2021

@adrianreber , @Snorch @rst0git - I really appreciate your feedback on this PR. I have tried to address your suggestions in two new PRs #1556 and #1557 so closing this PR. Thanks again for your time and valuable feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants