Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

sslyze appearing to stall out on some domains #138

Closed
konklone opened this issue Jul 1, 2017 · 8 comments
Closed

sslyze appearing to stall out on some domains #138

konklone opened this issue Jul 1, 2017 · 8 comments

Comments

@konklone
Copy link
Contributor

konklone commented Jul 1, 2017

I don't have enough information to report a bug yet. When I checked on our server, the sslyze scans had stalled out with 9 in-flight, with a bunch of defunct sslyze processes. These were the domains:

nces.ed.gov
autodiscover.ors.od.nih.gov
stg-reg2.hcia.cms.gov
vpn1.cjis.gov
portcullis.nlrb.gov
safesupportivelearning.ed.gov
my.uscis.gov
www.educationusa.state.gov
pittsburgh.feb.gov

But when I ran a scan using sslyze on all 9 of those in a row, using --serial, none of them stalled out. So I'm not totally sure how to reproduce this.

@konklone
Copy link
Contributor Author

konklone commented Jul 1, 2017

cc @h-m-f-t if you've seen anything like this in your own work

@h-m-f-t
Copy link
Collaborator

h-m-f-t commented Jul 4, 2017

portcullis.nlrb.gov
my.uscis.gov

These two look familiar; I'm pretty certain they threw occasional errors in the early days of pshtt (which probably would have relied on now-defunct capabilities of sslyze). I don't think anything was done client-side to remediate, other than a later re-scan.

@jsf9k
Copy link
Collaborator

jsf9k commented Oct 30, 2017

@konklone and @h-m-f-t, I submitted #148 because I was seeing something like this with pshtt scans. When running lots of pshtt scans via domain-scan I would every once in a while see one or more scans hang indefinitely. I was able to attach to the hung processes using gdb and the debug version of python. The hanging seems to be related to the public suffix data being being pulled for every pshtt scan being run. I guess if you pull too often it starts hanging.

I don't know if something similar could be happening inside of sslyze runs. The problem I was seeing would happen with different domains each time, so it was definitely not particular to any given domain.

@konklone
Copy link
Contributor Author

konklone commented Nov 1, 2017

I think I have a decent fix for this in #151. While it doesn't totally identify the root cause, it refactors the tool to use the sslyze library directly, in Python, which cuts down our level of process spawning and indirection. Early signs show that it appears to be making a significant impact on reliability and mitigating zombie processes.

@konklone
Copy link
Contributor Author

konklone commented Nov 2, 2017

Unfortunately, even using #151, there are still stuck processes. They manifest differently -- not as defunct processes, but just stuck and not doing anything. It eventually halts execution of the scan.

I attached to one of them, which had been stuck for over a day, with gdb and ran backtrace and got the below. I'm new to gdb and not experienced at reviewing C backtraces, and don't know what conclusions to draw. It looks like it's waiting on a mutex of some sort.

#0  0x00007f7aaaca0827 in futex_abstimed_wait_cancelable (private=0,
    abstime=0x0, expected=0, futex_word=0x7f7a8c000c10)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f7a8c000c10, abstime=0x0)
    at sem_waitcommon.c:111
#2  0x00007f7aaaca08d4 in __new_sem_wait_slow (sem=0x7f7a8c000c10, abstime=0x0)
    at sem_waitcommon.c:181
#3  0x00007f7aaaca097a in __new_sem_wait (sem=sem@entry=0x7f7a8c000c10)
    at sem_wait.c:29
#4  0x0000000000438acf in PyThread_acquire_lock_timed (
    lock=lock@entry=0x7f7a8c000c10, microseconds=microseconds@entry=-1000000,
    intr_flag=intr_flag@entry=1) at Python/thread_pthread.h:352
#5  0x0000000000579515 in acquire_timed (timeout=-1000000000,
    lock=0x7f7a8c000c10) at ./Modules/_threadmodule.c:68
#6  lock_PyThread_acquire_lock (self=0x7f7a6ce45e90, args=<optimized out>,
    kwds=<optimized out>) at ./Modules/_threadmodule.c:151
#7  0x00000000004ad56c in _PyCFunction_FastCallDict (kwargs=0x0,
    nargs=<optimized out>, args=0x7f7aa44e4af8, func_obj=0x7f7aa3f2c8b8)
    at Objects/methodobject.c:231
#8  _PyCFunction_FastCallKeywords (func=func@entry=0x7f7aa3f2c8b8,
    stack=stack@entry=0x7f7aa44e4af8, nargs=<optimized out>,
    kwnames=kwnames@entry=0x0) at Objects/methodobject.c:295
#9  0x000000000053d80e in call_function (
    pp_stack=pp_stack@entry=0x7ffc58d8e0b0, oparg=oparg@entry=2,
    kwnames=kwnames@entry=0x0) at Python/ceval.c:4788
#10 0x0000000000542057 in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3275
#11 0x000000000053d475 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7aa44e4958)
    at Python/ceval.c:718
#12 _PyEval_EvalCodeWithName (_co=0x7f7aa9d7d4b0,
    globals=globals@entry=0x7f7aa9d5c8b8, locals=locals@entry=0x0,
    args=<optimized out>, argcount=1, kwnames=kwnames@entry=0x0,
    kwargs=0x7f7aa3fbdac0, kwcount=0, kwstep=1, defs=0x7f7aa9d6d320,
    defcount=2, kwdefs=0x0, closure=0x0, name=0x7f7aa9d796a8,
    qualname=0x7f7aa9d802b0) at Python/ceval.c:4119
#13 0x000000000053d727 in fast_function (kwnames=0x0, nargs=<optimized out>,
    stack=<optimized out>, func=0x7f7aa97129d8) at Python/ceval.c:4929
#14 call_function (pp_stack=pp_stack@entry=0x7ffc58d8e310,
    oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4809
#15 0x0000000000542057 in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3275
#16 0x000000000053d475 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7aa3fbd930)
    at Python/ceval.c:718
#17 _PyEval_EvalCodeWithName (_co=0x7f7aa9d7d420,
    globals=globals@entry=0x7f7aa9d5c8b8, locals=locals@entry=0x0,
    args=<optimized out>, argcount=1, kwnames=kwnames@entry=0x0,
    kwargs=0x7f7aa3fb03e0, kwcount=0, kwstep=1, defs=0x7f7aa9d7f7d0,
    defcount=1, kwdefs=0x0, closure=0x0, name=0x7f7aaaef3d18,
    qualname=0x7f7aa9d77930) at Python/ceval.c:4119
#18 0x000000000053d727 in fast_function (kwnames=0x0, nargs=<optimized out>,
    stack=<optimized out>, func=0x7f7aa9712950) at Python/ceval.c:4929
#19 call_function (pp_stack=pp_stack@entry=0x7ffc58d8e570,
    oparg=oparg@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:4809
#20 0x0000000000542057 in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3275
#21 0x000000000053d475 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7aa3fb0240)
    at Python/ceval.c:718
#22 _PyEval_EvalCodeWithName (_co=0x7f7aa4fef1e0,
    globals=globals@entry=0x7f7aa504e480, locals=locals@entry=0x0,
    args=<optimized out>, argcount=1, kwnames=kwnames@entry=0x7f7aa504b178,
    kwargs=0x7f7aa44e4588, kwcount=1, kwstep=1, defs=0x7f7aa4b7e220,
    defcount=1, kwdefs=0x0, closure=0x0, name=0x7f7aa89f54f0,
    qualname=0x7f7aa4b85850) at Python/ceval.c:4119
#23 0x000000000053d727 in fast_function (kwnames=0x7f7aa504b178,
    nargs=<optimized out>, stack=<optimized out>, func=0x7f7aa4b90840)
    at Python/ceval.c:4929
#24 call_function (pp_stack=pp_stack@entry=0x7ffc58d8e7d8,
    oparg=oparg@entry=1, kwnames=kwnames@entry=0x7f7aa504b160)
    at Python/ceval.c:4809
#25 0x00000000005432a4 in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3291
#26 0x000000000053cad1 in PyEval_EvalFrameEx (throwflag=0, f=0x7f7aa44e43e8)
    at Python/ceval.c:718
#27 _PyFunction_FastCall (co=co@entry=0x7f7aa50494b0, args=<optimized out>,
    args@entry=0x7ffc58d8e980, nargs=nargs@entry=4,
    globals=globals@entry=0x7f7aa5039af8) at Python/ceval.c:4870
#28 0x0000000000546349 in _PyFunction_FastCallDict (
    func=func@entry=0x7f7aa5056a60, args=args@entry=0x7ffc58d8e980,
    nargs=nargs@entry=4, kwargs=kwargs@entry=0x0) at Python/ceval.c:4972
#29 0x0000000000452e50 in _PyObject_FastCallDict (func=0x7f7aa5056a60,
    args=0x7ffc58d8e980, nargs=4, kwargs=0x0) at Objects/abstract.c:2295
#30 0x0000000000452f4a in _PyObject_Call_Prepend (func=0x7f7aa5056a60,
    obj=<optimized out>, args=0x7f7a6af251b0, kwargs=0x0)
    at Objects/abstract.c:2358
#31 0x0000000000452d12 in _PyObject_FastCallDict (func=0x7f7aa97042c8,
    args=0x7ffc58d8ea80, nargs=3, kwargs=0x0) at Objects/abstract.c:2316
#32 0x0000000000454242 in PyObject_CallFunctionObjArgs (
    callable=callable@entry=0x7f7aa97042c8) at Objects/abstract.c:2811
#33 0x00000000005409df in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3233
#34 0x000000000053d475 in PyEval_EvalFrameEx (throwflag=0, f=0x1211908)
    at Python/ceval.c:718
#35 _PyEval_EvalCodeWithName (_co=0x7f7aaa0ece40,
    globals=globals@entry=0x7f7aaa170168, locals=locals@entry=0x0,
    args=<optimized out>, argcount=2, kwnames=kwnames@entry=0x0,
    kwargs=0x1178ad8, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0,
    closure=0x0, name=0x7f7aaa08dab0, qualname=0x7f7aaa08dab0)
    at Python/ceval.c:4119
#36 0x000000000053d727 in fast_function (kwnames=0x0, nargs=<optimized out>,
    stack=<optimized out>, func=0x7f7aa513e488) at Python/ceval.c:4929
#37 call_function (pp_stack=pp_stack@entry=0x7ffc58d8ee60,
    oparg=oparg@entry=2, kwnames=kwnames@entry=0x0) at Python/ceval.c:4809
#38 0x0000000000542057 in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3275
#39 0x000000000053d475 in PyEval_EvalFrameEx (throwflag=0, f=0x11788f8)
    at Python/ceval.c:718
#40 _PyEval_EvalCodeWithName (_co=0x7f7aaa128b70,
    globals=globals@entry=0x7f7aaa170168, locals=locals@entry=0x0,
    args=<optimized out>, argcount=1, kwnames=kwnames@entry=0x0,
    kwargs=0xd97500, kwcount=0, kwstep=1, defs=0x7f7aa9fd24f8, defcount=1,
    kwdefs=0x0, closure=0x0, name=0x7f7aaa08ab20, qualname=0x7f7aaa08ab20)
    at Python/ceval.c:4119
#41 0x000000000053d727 in fast_function (kwnames=0x0, nargs=<optimized out>,
    stack=<optimized out>, func=0x7f7aaa159e18) at Python/ceval.c:4929
#42 call_function (pp_stack=pp_stack@entry=0x7ffc58d8f0c0,
    oparg=oparg@entry=1, kwnames=kwnames@entry=0x0) at Python/ceval.c:4809
#43 0x0000000000542057 in _PyEval_EvalFrameDefault (f=<optimized out>,
    throwflag=<optimized out>) at Python/ceval.c:3275
#44 0x000000000053d475 in PyEval_EvalFrameEx (throwflag=0, f=0xd97378)
    at Python/ceval.c:718
#45 _PyEval_EvalCodeWithName (_co=_co@entry=0x7f7aaa0ecf60,
    globals=globals@entry=0x7f7aaa188168, locals=locals@entry=0x7f7aaa0ecf60,
    args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0,
    kwargs=0x8, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0,
    closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4119
#46 0x000000000053e2a3 in PyEval_EvalCodeEx (closure=0x0, kwdefs=0x0,
    defcount=0, defs=0x0, kwcount=0, kws=0x0, argcount=0, args=0x0,
    locals=locals@entry=0x7f7aaa0ecf60, globals=globals@entry=0x7f7aaa188168,
    _co=_co@entry=0x7f7aaa0ecf60) at Python/ceval.c:4140
#47 PyEval_EvalCode (co=co@entry=0x7f7aaa0ecf60,
    globals=globals@entry=0x7f7aaa170168, locals=locals@entry=0x7f7aaa170168)
    at Python/ceval.c:695
#48 0x000000000042737f in run_mod (arena=0x7f7aaa188168, flags=0x7ffc58d8f3a0,
    locals=0x7f7aaa170168, globals=0x7f7aaa170168, filename=0x7f7aaa07df48,
    mod=0xde81b8) at Python/pythonrun.c:980
#49 PyRun_FileExFlags (fp=0xd93660, filename_str=<optimized out>,
    start=<optimized out>, globals=0x7f7aaa170168, locals=0x7f7aaa170168,
    closeit=1, flags=0x7ffc58d8f3a0) at Python/pythonrun.c:933
#50 0x00000000004275ac in PyRun_SimpleFileExFlags (fp=0xd93660,
    filename=<optimized out>, closeit=1, flags=0x7ffc58d8f3a0)
    at Python/pythonrun.c:396
#51 0x000000000043b919 in run_file (p_cf=0x7ffc58d8f3a0,
    filename=0xd522a0 L"./scan", fp=0xd93660) at Modules/main.c:320
#52 Py_Main (argc=argc@entry=5, argv=argv@entry=0xd51010) at Modules/main.c:780
#53 0x000000000041db82 in main (argc=5, argv=<optimized out>)
    at ./Programs/python.c:69

@jsf9k
Copy link
Collaborator

jsf9k commented Nov 2, 2017

@konklone, I learned recently that gdb can walk python stack traces. I don't know what OS you're using, but on most linuxes you install the debug version of python, and then run gdb /path/to/debug/python PID_of_process_to_be_debugged. At that point you should be able to run the py-* functions like py-bt and py-list at the gdb prompt.

You can read more here. I found that I didn't need to do the add-auto-load-safe-path bit at the top of the page. It seemed to work automagically out of the box once I installed the debug version of python.

I hope that helps some. It looks like you're stuck waiting on a mutex, like you said. Using the py-* commands in gdb should let you see what lines of python code lead to the hang.

@konklone
Copy link
Contributor Author

konklone commented Nov 5, 2017

After trying this with synchronous and concurrent sslyze scanners, and adjusting worker counts, it looks like one of the problems (now and historically) has simply been running out of memory on the machine, and the kernel killing processes.

When I use a synchronous scanner, which runs sslyze in-process, the main domain-scan process/workers are what get killed, which surfaces the error more directly and results in a full cessation of the overall scan. But there are logs in /var/log/syslog that show this has been happening for a while, so the kernel was probably killing some of the spawned subprocesses when I was shelling out to sslyze (and then when I was using the concurrent scanner, which triggers sslyze to do its own multiprocess work) and that was generating zombie processes.

I'm not sure what the next step is here yet, though I may just run sslyze in --serial mode (1 worker), which should hopefully keep memory under control (I have a EC2 instance with 8GB of RAM...) for now, at the cost of speed.

@konklone
Copy link
Contributor Author

konklone commented Nov 5, 2017

I'm closing this with the merge of #151, and will re-open if I continue to see this behavior.

@konklone konklone closed this as completed Nov 5, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants