-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tryBinarySearch backtrack failure #175
Comments
Pharos commit cd2861a made some changes with problematic rules that caused errors like yours in the past. But I can't tell what commit you were on from the docker ID. If you run --help on any pharos command, it should print the commit all the way at the bottom. If you can tell me which commit you were on, it will be a large help. |
Sorry this is wrong, I'm on docker a2cb749f49c2 right now which is git 2becf22 I'm pretty sure I used the version before the latest batch commit so I'll rerun everything with debug enabled. |
Was the ooprolog step run with 2becf22? Or was that from before the latest batch commit too? I would recommend using |
It was from before, too. Are the following commands good?
Is there a debug flag for ooprolog? Also I'm currently partitioning and |
Yes, those look good.
No, just the log level.
I'll see if I have the same problem. |
I was able to reproduce your problem by running the latest version of ooanalyzer on your facts file. I should have enough information to start investigating. |
Here is the end of the log at log-level 5:
|
This seems incorrect, and must come from reasonVFTableSizeGTE_B, which has to do with inherited classes. Specifically, one of these:
|
It's hard to tell where the bad size 0x6c is coming from, so I'm rerunning with |
This is great, thank you! I created a new facts file, this time with no function timeout, an apidb entry for setupapi.dll and the latest version (2becf22): Logs
ooanalyzer claims that no delete() methods were found but the facts file contains insnCallsDelete lines that seem identical to the previous facts file (except the last column). I'll run |
Using the the new facts file from the previous post and ooprolog from 2becf22 produced this ooprog.log. The command output is:
The end of the log with log level 7 is:
|
And the previous run did not say that? @sei-ccohen sounds like the warning might be incorrect for the new tag based system... I'm going to create a separate issue for that. |
0x192389c looks like it is 0x5c to me, but there appears to be RTTI after the vftable. OOAnalyzer might be getting that confused? I don't think that is the root problem of this issue but it is concerning. 0x194d444 looks only to be 0x18 large. So that is probably the root of the problem. According to IDA, there is another vftable at 0194D460. But OOAnalyzer does not think so:
VFTable 0x194d460 is installed at 0x8c8984. I just looked at the facts file I generated, which just completed overnight, and:
And looking at your new facts file:
So for some reason, you are not finding the vftable install of 0x194d460. This may not be a prolog problem after all. |
My guess right now is that one of the options you are using is causing us to miss the vftable install. Maybe FTR, I was using only Here is the |
Just guessing here, but --partitioner=rose seems like the more likely culprit in this case than --no-semantics. Both options are expensive in resources & run-time but improve the results. We don't test the multi-threaded option as often and as well as we should but if you've got the resources for it, that should help speed up the fact generation after partitioning. Sadly the partitioner is still single-threaded. |
Rerunning with |
Correct, the previous run did not say that.
I'll run
Should I still run ooprolog on your facts file? |
Yes, the facts file I posted earlier is different and as far as I know should work. We still have to figure out what to do about the facts issue. Is the problem in the fact generator? Prolog? |
I got a seg fault with it. I accidentally used log level 5, the log is available here.
Should I rerun with log level 7 after a restart? |
A seg fault in ooprolog is probably a SWI prolog bug. The first thing is to figure out how reproducible it is. How long did it take to crash? I would try running again with log level 5 and see if it will crash consistently. I have also been running it on my machine and it did not crash. But I was also (accidentally) using log level 7. I have restarted with log level 5 to see if I can reproduce the crash. |
It took about nine hours. I'll rerun the same command again tonight. How important is the log file? Is it ok if I only provide the command output? |
Log file is not important. It might be interesting to know the last few lines to see if it crashed at the same spot, but at this point I mostly want to know if you can trigger it consistently. |
Also, I have been running my facts file for 27 hours so far with no crash. Though I should probably mention I'm not running it in our docker environment. I should probably try that next. |
Good to know, that fits inside a normal reply. Usually when any step of ooanalyzer runs I pretty much let it have exclusive use of my machine. Not this time though because it had been running non stop for the past couple days I got tired of waiting around and just played some resource intensive games. That might have something to do with it. We'll see in about nine hours because I just kicked off another run with your facts file and log level 5.
If you let that run finish please send me the results file :P |
It's been running for about twelve hours now and seems to be stuck. I remember it got stuck last time too but continued after a while. It's at the same spot as last time, I have a script that reports the log file size and both times it got stuck it was at 2.2GB. Here's the last 200 lines of the log:
There is a lot of htop shows:
No CPU usage for a while now. |
I don't think I've ever seen SWI have 0 cpu while it is running. That is not a good sign. When is the last time you checked your ram? Maybe try strace -p 3362 or gdb -p 3362 and try to see what it is doing? Mine is still running, though as your log showed, the output is very intermittent. This is not uncommon for very large programs, but it suggests that it will take a very long time to run. I will try to open in a profiler when I have a chance to see if there is anything we can speed up. |
Never. I can try Memtest86+ if that's what you mean.
strace:
gdb:
Mine has been running for 20 hours now and waiting for 15 of those. Seems like it's truly stuck. How long should I let it run for? |
Try running After that I think you can kill it. I think running memtest86 is probably a good idea... |
Okay, memtest86 is next. So if I understand correctly the crash and now the indefinite wait looks like it's caused by faulty memory and is not a bug in ooanalyzer, but the prolog facts issue looks like it is. Maybe it was caused by faulty memory during facts generation though, maybe the facts file I posted that you used to reproduce the facts issue with is corrupted. |
Looks like the main thread is waiting for garbage collection (which presumably got stuck for some reason). I think bad memory is a possibility. But it's also possible that there is some bug in SWI that triggers memory corruption. We have seen this a number of times, but never the failure mode you are seeing. I was able to reproduce the prolog facts issue. @sei-ccohen is still investigating, but it is a problem or shortcoming with the ROSE partitioner. That is not caused by a hardware issue. |
Ah okay, thanks for the explanation! I'll report back with the memtest results sometime this week. |
Just to add my own two cents here... Part this problem is pretty clearly that using --partitioner=rose produces a set of facts that confuses the later Prolog analysis step. It would be nice if that option always worked as intended, but the truth is that it's primarily useful when the default Pharos partitioner doesn't complete in a reasonable amount of time. The ROSE partitioner should never produce more accurate or reliable results, just faster results. Especially given how large the sample is, and how long the Prolog step takes, it seems like a good idea to just not use that option in this case. As for the memory corruption and SWI Prolog crashing issue, I'll phrase things slightly differently. We've not seen the crashes that you have, and the problem appears to be some kind of corruption in SWI Prolog itself, and not in "our" code. Since SWI Prolog is usually very reliable, that means something odd is happening on your end. sei-eschwartz has suggested bad memory. I thought maybe it was a "low-memory" bug caused when memory allocation failed, but I would have expected the process to have been killed a short while later for running out of memory, not just to get stuck in a loop. Whatever this bug is... It's weird. |
Thank you for the additional info! I always used the ROSE partitioner without semantics because that's what the guide uses but I'll switch to trying your partitioner with semantics first and letting it run for a day or two before switching to ROSE and no semantics. I don't think I'm running out of memory, I have 24GB RAM and a 128GB swap file (using swap slows down compression and the partitioner report by a lot, though). I tried memtest86 v9.0 free and let it run for one out of four passes without any errors. I'll do all four passes overnight though and will let you know. |
We had a discussion about this yesterday internally. The guidance was mildly contradictory, and we've fixed that for the next release. The current recommendation is to use --no-semantics but NOT --partitioner=rose. Both options can affect the results, so it's hard to just say that this or that set of options is better than the other. Using --partitoner=rose is known to miss a lot of functions (especially those that are only called virtually) and that's a pretty serious problem for OO analysis. Without that option you get the longer running but more complete Pharos partitioner. The --no-semantics option turns OFF the use of advanced semantics when disassembling, and that can miss stuff too, but is mostly affects analysis when the programs has obfuscated control flow, anti-reverse engineering, etc. Since those features are less common that virtual functions, you're probably ok with --no-semantics. The --no-semantics option is also particularly expensive because it emulates each instruction at least once during disassembly to identify branches that are never taken, compute advanced jump targets, etc. If you're willing to be patient, you will get the best results without either option. |
My run has been going nowhere. I'm going to do some profiling to see if there are any places we can speed things up. |
The main problem seems to be concludeNOTMergeClasses taking ~93% of the runtime. The problem is that there is simply an enormous number of reasonNOTMergeClass answers; there are so many that it takes a long time to simply iterate through them and see which are new. We are working on a long term change called monotonic tabling that would help with this. But that's not ready for use yet. I will need to think more about short term solutions. |
I'm experimenting with the following branch. Instead of trying to proactively assert all |
I left the original run that was not making any progress going, and it just terminated. Unfortunately, in an error, but I guess that is still progress. We managed to get to:
Here is the end of the log:
|
These are some great insights into what these options do, please add these explanations to the guide.
I'm fine with waiting for monotonic tabling, can you approximate when it will land? But I can also test with your changes if that helps, are they available in a premade docker container?
Aw, bummer. But from the other issue it sounds like the problem was identified, is it easy to fix it? |
There is no eta for monotonic tabling, and it's not sure to be successful. There's no need to test with my changes though. I am experimenting with a few different options. But the first priority is fixing #181
Yes and no. I think we have a pretty good understanding of the problem. But because it takes so long to reach the problem in your program, I've been trying to trigger the problem in a small testcase to make sure we handle it correctly. This in turn revealed some other problems in OOAnalyzer (namely that OO calls can be inlined into non-OO functions, and OOAnalyzer doesn't handle these). That's probably just an issue with my test executable, and I doubt it is a problem with your executable. But I want to make sure my fix works before waiting another 16 days or however long it took to crash. |
I'm not sure if this is what you mean but I've seen inlined constructors and destructors before, can't remember if they were inside a thiscall or not. I suspect that I've also seen regular functions being inlined, that was definitely in a thiscall though. |
I got this failure when analyzing a 35MB executable. It took forever so I didn't experiment at all. I'd be happy to do so though with your guidance. I also uploaded the executable and generated files here.
Before running this I forgot to pull the latest Docker image. I remembered before step 2, so fn2hash was run with 365a528011b5, everything else with da72c9ddd247.
Logs
fn2hash to create ooprog.ser
ooanalyzer to create ooprog-facts.pl
Info about ooprog-facts.pl
ooprolog with failure
Other
--no-semantics
flag for step 2, causing a warning. This warning uses the wrong variable, see here (should usesemantics_were_disabled
).--no-compression
flag to the partitioner.--no-report
flag to the partitioner.The text was updated successfully, but these errors were encountered: