Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement strncmp #778

Merged
merged 2 commits into from Nov 29, 2019
Merged

Implement strncmp #778

merged 2 commits into from Nov 29, 2019

Conversation

jgkamat
Copy link
Contributor

@jgkamat jgkamat commented Jun 17, 2019

I'll be honest and say I mostly have no idea what I'm doing here, so I'd appreciate it if I could get some extra review and guidance for this. This is my first time working with any serious compiler stuff, and my first real exposure to LLVM. I know you're probably busy for the 0.9.1 release, so feel free to put this on the backburner until you have more time.

For some reason:

BEGIN { if (strncmp("hhvm", str($1), 4)) { printf("hello!"); } }

results in

bpftrace: /usr/include/llvm/IR/Instructions.h:1117: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.

but, if(1) or if(strncmp("hhvm", str($1), -1) != 0) work fine. I think I'm missing something in the semantic analyzer but I'm not sure. It might need some changes to the if logic, but I'm not sure what/where those would be.

Fixes #459

src/ast/codegen_llvm.cpp Show resolved Hide resolved
tests/codegen/strncmp.cpp Show resolved Hide resolved
@@ -77,6 +77,7 @@
#include "codegen/pred_binop.cpp"
#include "codegen/string_equal_comparison.cpp"
#include "codegen/string_not_equal_comparison.cpp"
#include "codegen/strncmp.cpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore :)

@mmarchini
Copy link
Contributor

Overall code lgtm. Will do some more testing later.


For some reason:

BEGIN { if (strncmp("hhvm", str($1), 4)) { printf("hello!"); } }

results in

bpftrace: /usr/include/llvm/IR/Instructions.h:1117: void llvm::ICmpInst::AssertOK(): Assertion `getOperand(0)->getType() == getOperand(1)->getType() && "Both operands to ICmp instruction are not of the same type!"' failed.

but, if(1) or if(strncmp("hhvm", str($1), -1) != 0) work fine.

LLVM is pretty strict with Int types, it won't let you use most binary operators between two integers of different size. CreateStrncmp will save the result of the comparison as an Int8 (irbuilderbpf.cpp#L376, irbuilderbpf.cpp#L430), and if will try to compare it to a Int64 (which is always zero, codegen_llvm.cpp#L1127). So why == works? Turns out we uppercast our Binop results to Int64 to try to avoid this issue (codegen_llvm.cpp#L796). Changing the type used to store the result strcmp to Int64 fixes the issue:

diff --git a/src/ast/irbuilderbpf.cpp b/src/ast/irbuilderbpf.cpp
index c8aa040..9778aed 100644
--- a/src/ast/irbuilderbpf.cpp
+++ b/src/ast/irbuilderbpf.cpp
@@ -378,7 +378,7 @@ Value *IRBuilderBPF::CreateStrcmp(Value* val, std::string str, bool inverse) {
 Value *IRBuilderBPF::CreateStrncmp(Value* val, std::string str, uint64_t n, bool inverse) {
   Function *parent = GetInsertBlock()->getParent();
   BasicBlock *str_ne = BasicBlock::Create(module_.getContext(), "strcmp.false", parent);
-  AllocaInst *store = CreateAllocaBPF(getInt8Ty(), "strcmp.result");
+  AllocaInst *store = CreateAllocaBPF(getInt64Ty(), "strcmp.result");
 
   CreateStore(getInt1(inverse), store);
 
@@ -436,7 +436,7 @@ Value *IRBuilderBPF::CreateStrncmp(Value* val1, Value* val2, uint64_t n, bool in
   */
   Function *parent = GetInsertBlock()->getParent();
   BasicBlock *str_ne = BasicBlock::Create(module_.getContext(), "strcmp.false", parent);
-  AllocaInst *store = CreateAllocaBPF(getInt8Ty(), "strcmp.result");
+  AllocaInst *store = CreateAllocaBPF(getInt64Ty(), "strcmp.result");
   BasicBlock *done = BasicBlock::Create(module_.getContext(), "strcmp.done", parent);
 
   CreateStore(getInt1(inverse), store);

@jgkamat jgkamat force-pushed the jay/strncmp branch 2 times, most recently from 8dc5bc5 to bb9d422 Compare July 29, 2019 01:01
@jgkamat
Copy link
Contributor Author

jgkamat commented Jul 29, 2019

Thanks for the pointers, that was very helpful! While trying to fix this test:

bpftrace -e 'BEGIN { if (!strncmp(str($1), str($2), 7)) { printf("not equal");} exit();}' "python3-proc" "hhvm"

I discovered that ! dosen't work well inside if statements, for example, this fails:

bpftrace -e 'BEGIN { $x = 5; if (!$x) { printf("not equal");} exit();}'

I'll ignore that for now because that seems like a seperate issue though. I'm still not sure why the LLVM 5 debug build is failling (it seems to be another similar issue), but otherwise I think this should be relatively good.

@jgkamat
Copy link
Contributor Author

jgkamat commented Oct 30, 2019

Any chance I could get another round of review? The only other issue I am aware of is the LLVM 5 debug build failing. I'm not sure exactly why, as far as I know this error is in actually running the code, and not in the IR verification. Unfortunately getting LLVM5 is a little tough for me at the moment, but if the answer isn't obvious I can try to find a VM to debug further.

Thanks in advance! :)

[ RUN      ] codegen.strncmp_test
Assertion failed: BitWidth && "bitwidth too small" (/usr/include/llvm5/llvm/ADT/APInt.h: APInt: 273)
Aborted (core dumped)
The command "sudo docker run --privileged --rm -it -v $(pwd):$(pwd) -v /sys/kernel/debug:/sys/kernel/debug:rw -e STATIC_LINKING=$STATIC_LINKING -e TEST_ARGS=$TEST_ARGS bpftrace-builder-$BASE-llvm-$LLVM_VERSION $(pwd)/build-$TYPE-$BASE $TYPE -j`getconf _NPROCESSORS_ONLN`" exited with 134.

@jgkamat jgkamat changed the title [wip] Implement strncmp Implement strncmp Oct 30, 2019
%strcmp.cmp16 = icmp eq i8 %12, 0
br i1 %strcmp.cmp16, label %pred_true, label %pred_false

lookup_success: ; preds = %pred_true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this whole section dissapearing? That can't be ok

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vagrant@ubuntu-cosmic:~/bpftrace$ sudo bpftrace_master -e 't:syscalls:sys_enter_openat /comm == "cat"/ { @[comm]++ }' -c "cat xxx"
Attaching 1 probe...
/bin/cat: xxx: No such file or directory


@[cat]: 25

vagrant@ubuntu-cosmic:~/bpftrace$ sudo bpftrace -e 't:syscalls:sys_enter_openat /comm == "cat"/ { @[comm]++ }' -c "cat xxx"
Attaching 1 probe...
/bin/cat: xxx: No such file or directory

WIth your patch the whole function body seems to be optimized away. The optimized IR doesn't contain the map lookup and store code

@@ -427,14 +436,14 @@ Value *IRBuilderBPF::CreateStrcmp(Value* val1, Value* val2, bool inverse) {
*/
Function *parent = GetInsertBlock()->getParent();
BasicBlock *str_ne = BasicBlock::Create(module_.getContext(), "strcmp.false", parent);
AllocaInst *store = CreateAllocaBPF(getInt8Ty(), "strcmp.result");
AllocaInst *store = CreateAllocaBPF(getInt64Ty(), "strcmp.result");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added due to this comment, essentially doing a simple compare in an if statement was failing because of a type mismatch. I'm not sure why this is causing the map lookup code to be removed (reverting this does fix that), I'll try to figure out why that's happening...

tests/runtime/other Show resolved Hide resolved
@jgkamat jgkamat force-pushed the jay/strncmp branch 2 times, most recently from 0c102f1 to 468d260 Compare October 31, 2019 23:33
@fbs fbs mentioned this pull request Nov 7, 2019
4 tasks
@fbs fbs added this to the 0.9.4 milestone Nov 20, 2019
Copy link
Contributor

@fbs fbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :) thanks for working on this. Sorry for the lag.

Can you add some documentation too?

@@ -559,6 +559,17 @@ void SemanticAnalyser::visit(Call &call)
else if (call.func == "ustack") {
check_stack_call(call, Type::ustack);
}
else if (call.func == "strncmp") {
check_nargs(call, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To prevent going out of range:

    if(check_nargs(call, 3)) {
      check_arg(call, Type::string, 0);
      check_arg(call, Type::string, 1);
      check_arg(call, Type::integer, 2, true);

     Integer &size = static_cast<Integer&>(*call.vargs->at(2));
     if (size.n < 0)
       err_ << "Builtin strncmp requires a non-negative size" << std::endl; 
   }
    call.type = SizedType(Type::integer, 8);
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bpftrace -e 'i:s:1 { strncmp("a"); }'
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
Aborted

tests/codegen/strncmp.cpp Show resolved Hide resolved
else if (call.func == "strncmp") {
check_nargs(call, 3);
check_arg(call, Type::string, 0);
check_arg(call, Type::string, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few semantic analyser tests for this? Especially the "wrong" cases are good to test, e.g.:

TEST(semantic_analyser, strncmp)
{
  test("i:s:1 { strncmp(1) }", 1);
  test("i:s:1 { strncmp(1,1,1) }", 1);
  test("i:s:1 { strncmp("a",1,1) }", 1);
  test("i:s:1 { strncmp("a","a",-1) }", 1);
}

@jgkamat jgkamat force-pushed the jay/strncmp branch 3 times, most recently from b98ce06 to 1a8bdf7 Compare November 26, 2019 02:44
Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just selecting "request changes" so this doesn't get merged while this is discussed:

Shouldn't strncmp return 0 on a string match instead? That behaviour would be in line with the standard C version of this function.

src/ast/codegen_llvm.cpp Show resolved Hide resolved
@jgkamat
Copy link
Contributor Author

jgkamat commented Nov 27, 2019

Sure, that seems fine to me, it's a little less convenient but it is better to follow the standard strncmp return values. I don't think I can emulate the greater than/less than zero behavior without a little more work (which I'd rather leave for a separate diff), but zero/non-zero should be easy.

Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just one minor change in the semantic analyser needed

check_arg(call, Type::string, 1);
check_arg(call, Type::integer, 2, true);

Integer &size = static_cast<Integer&>(*call.vargs->at(2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it's not guaranteed that the third argument is an integer literal, so this could lead to some dodgy memory accesses. This section needs to only run if check_arg(call, Type::integer, 2, true) returns true

While this is slightly less convenient for quick scripting, it's
in-line with strncmp in the stdlib.
Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - thank you!

@ajor ajor merged commit 384640e into bpftrace:master Nov 29, 2019
@jgkamat jgkamat deleted the jay/strncmp branch November 30, 2019 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Substring/partial string comparison
4 participants