Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve invalidation mechanism and harden security #60

Merged
merged 19 commits into from
Jul 30, 2019

Conversation

pakmarkthub
Copy link
Collaborator

@pakmarkthub pakmarkthub commented Jun 5, 2019

Problems:

This change:

  • fixes the bug that causes unmap_mapping_range to do nothing. We change the address passed to that API to vm_pgoff instead of vm_start.
  • prevents copy of vma when fork. Hence, the child processes cannot access the parent's mappings anymore. This behavior aligns with CUDA since CUDA contexts are not transferred to the child processes.
  • checks "struct pid *" to ensure that the caller is the same as the process that opens gdrdrv (calls to gdr_open). Hence, the use of opened gdrdrv fd is local to the creator process.
  • separates the mapping address space between each process. Two processes map to different GPU memory even with the same mmap offset.
  • makes the current gdr_mh_t.h sit next to the previous mapped range. It also guarantees that no two ranges will be overlapped due to the offsets.

Presubmit testings:

@pakmarkthub
Copy link
Collaborator Author

@Akshay-Venkatesh Can I ask you to review this PR as well? I would like to make sure that it does not break UCX.

@pakmarkthub
Copy link
Collaborator Author

@drossetti As discussed offline, unmap_mapping_range unmaps all mappings based on the specified file offset and range (see https://elixir.bootlin.com/linux/v4.15.18/source/mm/memory.c#L2811). If we have two mappings that have overlapped file-offset range, unmap_mapping_range will remove both of them.

An example undesirable situation:

  • Do cuMemAlloc(&d_A, ...) and cuMemAlloc(&d_B, ...).
  • Do gdr_pin_buffer for both d_A and d_B. The handle numbers returned from gdrdrv are very close. This situation rarely occurs but random_get_entropy() cannot guarantee that it will never occur.
  • Do gdr_map(..., &bar_ptr_A, ...) on d_A, and gdr_map(..., &bar_ptr_B, ...) on d_B.
  • Call cuMemFree(d_A).
  • This API triggers call back from nvidia_p2p_get_pages to destroy the bar_ptr_A mapping.
  • gdrdrv calls unmap_mapping_range with the offset based on handle_A.
  • unmap_mapping_range removes bar_ptr_B mapping as well because the offset based on handle_B is between handle_A and handle_A + size_A, which is a bug.

gdrdrv/gdrdrv.c Show resolved Hide resolved
gdrdrv/gdrdrv.c Outdated
mutex_lock(&info->lock);
list_add(&mr->node, &info->mr_list);

// The user treats handle as the offset value for mmap. To guarantee no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like:
// the user-space library passes the memory handle as the mmap offset,
// and offsets are used to determine the VMAs to delete during invalidation.
// Hence, we need [handle,handle+size-1] to correspond to a unique VMA.
// Note that size here must match the original mmap size

gdrdrv/gdrdrv.c Outdated
@@ -811,7 +856,7 @@ static int gdrdrv_mmap(struct file *filp, struct vm_area_struct *vma)
// this also works as the mapped flag for this mr
mr->cpu_mapping_type = GDR_MR_CACHING;
vma->vm_ops = &gdrdrv_vm_ops;
gdr_dbg("overwriting vma->vm_private_data=%px with mr=%px\n", vma->vm_private_data, mr);
gdr_dbg("overwriting vma->vm_private_data=0x%p with mr=0x%p\n", vma->vm_private_data, mr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that %px is not the same as 0x%p, check the kernel doc

@@ -0,0 +1,1138 @@
/*
* Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2019

read_fd = filedes_1[0];
write_fd = filedes_0[1];

srand(rand());
Copy link
Member

@drossetti drossetti Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is srand() needed?
Also, I'm not sure srand(rand()) makes sense.

test_invalidation.cpp Outdated Show resolved Hide resolved
gdrdrv/gdrdrv.c Outdated Show resolved Hide resolved
gdrdrv/gdrdrv.c Show resolved Hide resolved
@pakmarkthub
Copy link
Collaborator Author

@drossetti Please review

gdrdrv/gdrdrv.c Outdated
// The end of this range is wrapped around. The calculation uses PAGE_SHIFT
// because handle is equal vm_pgoff (aka file offset << PAGE_SHIFT). We
// indicate this case as -ENOMEM error.
if ((next_handle & ((gdr_hnd_t)(-1) >> PAGE_SHIFT)) < info->next_handle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like failing eagerly. It should fail when next handle is going to be used.

gdrdrv/gdrdrv.c Show resolved Hide resolved
gdrdrv/gdrdrv.c Show resolved Hide resolved
@pakmarkthub
Copy link
Collaborator Author

@drossetti Thank you. I have fixed the bug.

Makefile Outdated
@@ -29,7 +29,7 @@ endif

LIBOBJS := $(LIBSRCS:.c=.o)

SRCS := basic.cpp validate.cpp copybw.cpp
SRCS := basic.cpp validate.cpp copybw.cpp test_invalidation.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this new test to have all sanity tests? in that case, can we rename it as sanity.cpp ?
Also drop test_ prefix as it does not match the other test names.
We can add a proper prefix during the make install phase.

Makefile Outdated
@@ -86,6 +87,9 @@ validate: validate.o $(LIB)
copybw: copybw.o $(LIB)
$(LINK.cc) -o $@ $^ $(LIBS)

test_invalidation: test_invalidation.o $(LIB)
$(LINK.cc) -o $@ $^ -lcheck -lcudart -lcuda -lpthread -lrt -lsubunit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you really using subunit?
In case https://libcheck.github.io/check/doc/check_html/check_4.html#Subunit-Support shows you'd need to use:

srunner_run_all (sr, CK_SUBUNIT);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least on Ubuntu 18.04 with libcheck installed from "apt install check", I cannot compile without "-lsubunit". This seems to be a dependency from libcheck.a itself. The error is as below:

g++ -O2 -I /usr/local/cuda/include -I gdrdrv/ -I /usr/local/cuda/include -D GDRAPI_ARCH=X86 -L /usr/local/cuda/lib64 -L /usr/local/cuda/lib -L /usr/lib64/nvidia -L /usr/lib/nvidia -L /usr/local/cuda/lib64   -o sanity sanity.o libgdrapi.so.1.2 -lcheck -lcudart -lcuda -lpthread -lrt
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/libcheck.a(check_log.o): In function `subunit_lfun':
(.text+0x5f4): undefined reference to `subunit_test_start'
(.text+0x6bf): undefined reference to `subunit_test_fail'
(.text+0x6d4): undefined reference to `subunit_test_pass'
(.text+0x6ef): undefined reference to `subunit_test_error'
collect2: error: ld returned 1 exit status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use pkg-config to get the right libraries ?

gdrdrv/gdrdrv.c Show resolved Hide resolved
gdrdrv/gdrdrv.c Outdated
// because filp->f_mapping usually points to inode->i_mapping.
struct address_space mapping;

// The handle number of mmap's offset are equivalent. However, the mmap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence sounds weird. maybe you meant "and" instead of "of"

gdrdrv/gdrdrv.c Outdated
@@ -273,6 +303,19 @@ static int gdrdrv_open(struct inode *inode, struct file *filp)
INIT_LIST_HEAD(&info->mr_list);
mutex_init(&info->lock);

info->pid = task_pid(current);

// Create mapping per opened file. By preventing sharing of this file, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this comment one line above.
also drop "create mapping per opened file" and reword similar to:
"GPU driver does not support sharing GPU allocations at fork time. Hence here we track the process owning the driver fd and prevent other process to use it."
Here we store the current pid to prevent

gdrdrv/gdrdrv.c Show resolved Hide resolved
@drossetti
Copy link
Member

@pakmarkthub
is the comment in gdrapi.h copied below still true ?

// Note that altough BAR mappings of GPU memory are destroyed, user-space                                                                                                                                                                                                       
// mappings are not. So therefore user code is responsible of calling                                                                                                                                                                                                           
// gdr_unmap on all mappings before calling gdr_close.                                                                                                                                                                                                                          
int gdr_close(gdr_t g);

@pakmarkthub
Copy link
Collaborator Author

@drossetti Thanks Davide. This revision is ready for review.

@pakmarkthub
Copy link
Collaborator Author

@e-ago Could you also take a look? I want to make sure that this PR does not break what you are working on.

@pakmarkthub
Copy link
Collaborator Author

Please help reviewing this PR. It is one of the blockers for the automated test.

Makefile Outdated
@@ -86,6 +87,9 @@ validate: validate.o $(LIB)
copybw: copybw.o $(LIB)
$(LINK.cc) -o $@ $^ $(LIBS)

test_invalidation: test_invalidation.o $(LIB)
$(LINK.cc) -o $@ $^ -lcheck -lcudart -lcuda -lpthread -lrt -lsubunit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use pkg-config to get the right libraries ?

Makefile Outdated
@@ -96,7 +92,7 @@ drv_install:


clean:
rm -f *.o $(EXES) lib*.{a,so}* *~ core.* && \
rm -f *.o $(EXES) lib*.{a,so}* *~ core.* gdrdrv/.cache.mk libgdrapi.so* && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we defer to the gdrdrv/ Makefile for cleanup there?

gdrdrv/gdrdrv.c Outdated
// numerical pid here to avoid issues from pid reuse.
struct pid *pid;

// Address space uniqued to this opened file. We need to create a new one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'uniqued' should be 'unique'

gdrdrv/gdrdrv.c Show resolved Hide resolved
gdrdrv/gdrdrv.c Outdated

// Disallow mmapped VMA to propagate to child processes
vma->vm_flags |= VM_DONTCOPY;

#if LINUX_VERSION_CODE <= KERNEL_VERSION(2,6,9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to simply drop support of kernel versions earlier than 2.6.9 here, mainly because the GPU driver has a more stringent requirement.

gdrdrv/gdrdrv.c Outdated
mr->vma = NULL;
mr->mapping = NULL;
mr->cpu_mapping_type = GDR_MR_NONE;
// TODO: tear down stale partial mappings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might have never been true, i.e. in case of error, the vma is destroyed and all partial mappings are automatically removed.
Still we should verify that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that if gdrdrv_mmap returns an error, Linux kernel will clean up any partial mappings for us. See https://elixir.bootlin.com/linux/latest/source/mm/mmap.c#L1788 for reference.

sanity.cpp Outdated
return fd;
}

typedef struct gdr_memh_t {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than duplicating these type definitions here, I'd rather have a shared header file.

sanity.cpp Outdated

gdr_mh_t mh = {0};
if (mh == null_mh)
print_dbg("miao\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like stale code

sanity.cpp Outdated
gdr_mh_t mh = {0};
if (mh == null_mh)
print_dbg("miao\n");
CUdeviceptr d_ptr = d_A;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation problem

sanity.cpp Outdated Show resolved Hide resolved
@pakmarkthub
Copy link
Collaborator Author

@drossetti Thank you for the comments. I have fixed the issues. Please re-review the PR.

@@ -0,0 +1,12 @@
typedef struct gdr_memh_t {
Copy link
Member

@drossetti drossetti Jul 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 not strictly necessary but missing things:

  • license header
  • protection from multiple inclusions
  • extern "C"

@drossetti
Copy link
Member

drossetti commented Jul 16, 2019

The sanity test is a fairly large piece of code. Would it be possible to move that to a separate PR?
I would rename the current validation test in this PR.

Or at least fix the PR cover letter which still refers to "test_validation.cpp"

@pakmarkthub
Copy link
Collaborator Author

@drossetti As requested, I separated out the sanity test suite to PR #71. This PR now focuses on the invalidation issues only. Please re-review.

Copy link
Member

@drossetti drossetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
just a couple of nits, no need to rereview

gdrdrv/gdrdrv.c Outdated
mutex_unlock(&info->lock);

params.handle = mr->handle;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purely for style, I'd rather access mr->handle when it has a sane value only, i.e. when there are no errors so after list_add().

gdrdrv/gdrdrv.c Outdated
}
#else

// Disallow mmapped VMA to propagate to child processes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'child' -> 'children'

@drossetti drossetti merged commit 8a6985b into NVIDIA:invalidate Jul 30, 2019
@drossetti
Copy link
Member

merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants