Skip to content

Commit

Permalink
Detect aname loops (draios#753)
Browse files Browse the repository at this point in the history
* Whitespace diffs.

Checking in separate from other changes.

* Combine parent state traversal w/ loop detection

Replace the ad-hoc parent thread state traversal that was in several
filterchecks as well as in the mesos/coreos code with a central way to
traverse parent thread state and detect potential loops at the same
time. A new method traverse_parent_state traverses the parent state from
the current thead and takes a function that is called for each thread
while traversing.

This prevents infinite loops like observed in
falcosecurity/falco#208.

This doesn't address the underlying cause of what caused the thread
state to get corrupted in the first place. That's tracked by a separate
issue draios#752.

In the 4 filterchecks that used to traverse parent state (proc.sname,
proc.loginshellid, proc.aname, proc.apid), replace the direct traversal
with a call to traverse_parent_state + an appropriate visitor
function.

Update mesos's get_env_mesos_task_id, which used to do a combination of
recursion and get_parent_task_id to traverse parent state, with a visitor
and traverse_parent_state. It stops as soon as any of the environment
variables for a thread are found.

This version doesn't explicitly skip pid 1, but I don't think that was
strictly necessary as init wouldn't have those environment variables
anyway.

Also replace a similar process in coreos to find rkt pods.

* Add regression tests for parent state loops

Add a new trace file parent_state_loop.scap to the traces zip that has a
series of processes with malformed parent state containing a loop.

Add 3 new sysdig command lines that test filterchecks/outputs that are
known to traverse parent thread state.

Although they should *not* cause an infinite loop, add a timeout to the
sysdig command line just to make sure it is terminated somewhat quickly.
  • Loading branch information
mstemm authored and Damian Myerscough committed Mar 3, 2017
1 parent 5c525fc commit 7b3be99
Show file tree
Hide file tree
Showing 6 changed files with 154 additions and 58 deletions.
2 changes: 1 addition & 1 deletion test/sysdig_batch_parser.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ do
echo "Corresponding reference file $ref does not exist--skipping"
else
echo "Processing $f"
TZ=UTC eval $SYSDIG -N -r $f $ARGS > $DIRNAME/$(basename $f).output
TZ=UTC eval timeout 60 $SYSDIG -N -r $f $ARGS > $DIRNAME/$(basename $f).output
fi
done

Expand Down
5 changes: 5 additions & 0 deletions test/sysdig_trace_regression.sh
Original file line number Diff line number Diff line change
Expand Up @@ -114,5 +114,10 @@ $BASEDIR/sysdig_batch_parser.sh $SYSDIG $CHISELS "-p '*%evt.num %evt.outputtime
# Cwd
$BASEDIR/sysdig_batch_parser.sh $SYSDIG $CHISELS "-pc -p\"*%evt.num %evt.outputtime %evt.cpu %container.name (%container.id) %proc.name (%thread.tid:%thread.vtid) %evt.dir %evt.type %evt.info %proc.cwd\"" $TRACEDIR $RESULTDIR/cwd $BASELINEDIR/cwd || ret=1

# Testing filters/outputs that can traverse parent thread state
$BASEDIR/sysdig_batch_parser.sh $SYSDIG $CHISELS "-p\"*%evt.num %evt.outputtime %evt.cpu %proc.name (%thread.tid) %evt.dir %evt.type %evt.info LS=%proc.loginshellid\"" $TRACEDIR $RESULTDIR/loginshell-parent-loop $BASELINEDIR/loginshell-parent-loop || ret=1
$BASEDIR/sysdig_batch_parser.sh $SYSDIG $CHISELS "proc.apid=10 or proc.apid=26890" $TRACEDIR $RESULTDIR/apid-parent-loop $BASELINEDIR/apid-parent-loop || ret=1
$BASEDIR/sysdig_batch_parser.sh $SYSDIG $CHISELS "proc.aname=foo or proc.aname=sh" $TRACEDIR $RESULTDIR/aname-parent-loop $BASELINEDIR/aname-parent-loop || ret=1

rm -rf "${TMPBASE}"
exit $ret
48 changes: 31 additions & 17 deletions userspace/libsinsp/container.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -157,22 +157,29 @@ sinsp_container_info* sinsp_container_manager::get_container(const string& conta
string sinsp_container_manager::get_env_mesos_task_id(sinsp_threadinfo* tinfo)
{
string mtid;
if(tinfo)

sinsp_threadinfo::visitor_func_t visitor = [&mtid] (sinsp_threadinfo *ptinfo)
{
// Mesos task ID detection is not a straightforward task;
// this list may have to be extended.
mtid = tinfo->get_env("MESOS_TASK_ID"); // Marathon
if(!mtid.empty()) { return mtid; }
mtid = tinfo->get_env("mesos_task_id"); // Chronos
if(!mtid.empty()) { return mtid; }
mtid = tinfo->get_env("MESOS_EXECUTOR_ID"); // others
if(!mtid.empty()) { return mtid; }
sinsp_threadinfo* ptinfo = tinfo->get_parent_thread();
if(ptinfo && ptinfo->m_tid > 1)
{
mtid = get_env_mesos_task_id(ptinfo);
}
mtid = ptinfo->get_env("MESOS_TASK_ID"); // Marathon
if(!mtid.empty()) { return false; }
mtid = ptinfo->get_env("mesos_task_id"); // Chronos
if(!mtid.empty()) { return false; }
mtid = ptinfo->get_env("MESOS_EXECUTOR_ID"); // others
if(!mtid.empty()) { return false; }

return true;
};

// Try the current thread first. visitor returns true if mtid
// was not filled in. In this case we should traverse the
// parents.
if(tinfo && visitor(tinfo))
{
tinfo->traverse_parent_state(visitor);
}

return mtid;
}

Expand Down Expand Up @@ -361,10 +368,10 @@ bool sinsp_container_manager::resolve_container(sinsp_threadinfo* tinfo, bool qu
{
rkt_appname = tinfo->m_root.substr(prefix + COREOS_PREFIX.size(), suffix - prefix - COREOS_PREFIX.size());
// It is a rkt pod with stage1-coreos
sinsp_threadinfo* tinfo_it = tinfo;
while(!valid_id && tinfo_it != nullptr)

sinsp_threadinfo::visitor_func_t visitor = [&rkt_podid, &container_info, &rkt_appname, &valid_id] (sinsp_threadinfo *ptinfo)
{
for(const auto& env_var : tinfo_it->m_env)
for(const auto& env_var : ptinfo->m_env)
{
auto container_uuid_pos = env_var.find(COREOS_PODID_VAR);
if(container_uuid_pos == 0)
Expand All @@ -374,10 +381,17 @@ bool sinsp_container_manager::resolve_container(sinsp_threadinfo* tinfo, bool qu
container_info.m_id = rkt_podid + ":" + rkt_appname;
container_info.m_name = rkt_appname;
valid_id = true;
break;
return false;
}
}
tinfo_it = tinfo_it->get_parent_thread();
return true;
};

// Try the current thread first. visitor returns true if no coreos pid
// info was found. In this case we traverse the parents.
if (visitor(tinfo))
{
tinfo->traverse_parent_state(visitor);
}
}
}
Expand Down
100 changes: 62 additions & 38 deletions userspace/libsinsp/filterchecks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1635,18 +1635,26 @@ uint8_t* sinsp_filter_check_thread::extract(sinsp_evt *evt, OUT uint32_t* len, b
// Find the highest ancestor process that has the same session id and
// declare it to be the session leader.
sinsp_threadinfo* mt = tinfo->get_main_thread();
sinsp_threadinfo* pt = NULL;

if(mt == NULL)
{
return NULL;
}

for(pt = mt->get_parent_thread();
pt != NULL && pt->m_sid == mt->m_sid;
mt = pt, pt = pt->get_parent_thread());
int64_t sid = mt->m_sid;
sinsp_threadinfo::visitor_func_t visitor = [sid, &mt] (sinsp_threadinfo *pt)
{
if(pt->m_sid != sid)
{
return false;
}
mt = pt;
return true;
};

mt->traverse_parent_state(visitor);

// At this point pt either doesn't exist or has a different session id.
// mt has been updated to the highest process that has the same session id.
// mt's comm is considered the session leader.
m_tstr = mt->get_comm();
*len = m_tstr.size();
Expand Down Expand Up @@ -1916,15 +1924,23 @@ uint8_t* sinsp_filter_check_thread::extract(sinsp_evt *evt, OUT uint32_t* len, b
}
}

for(; mt != NULL; mt = mt->get_parent_thread())
sinsp_threadinfo::visitor_func_t check_thread_for_shell = [&res] (sinsp_threadinfo *pt)
{
size_t len = mt->m_comm.size();
size_t len = pt->m_comm.size();

if(len >= 2 && mt->m_comm[len - 2] == 's' && mt->m_comm[len - 1] == 'h')
if(len >= 2 && pt->m_comm[len - 2] == 's' && pt->m_comm[len - 1] == 'h')
{
res = &mt->m_pid;
res = &pt->m_pid;
}
}

return true;
};

// First call the visitor on the main thread.
check_thread_for_shell(mt);

// Then check all its parents to see if they are shells
mt->traverse_parent_state(check_thread_for_shell);

return (uint8_t*)res;
}
Expand Down Expand Up @@ -2140,9 +2156,6 @@ uint8_t* sinsp_filter_check_thread::extract(sinsp_evt *evt, OUT uint32_t* len, b

bool sinsp_filter_check_thread::compare_full_apid(sinsp_evt *evt)
{
bool res;
uint32_t j;

sinsp_threadinfo* tinfo = evt->get_thread_info();

if(tinfo == NULL)
Expand All @@ -2169,29 +2182,33 @@ bool sinsp_filter_check_thread::compare_full_apid(sinsp_evt *evt)
//
// No id specified, search in all of the ancestors
//
for(j = 0; mt != NULL; mt = mt->get_parent_thread(), j++)
bool found = false;
sinsp_threadinfo::visitor_func_t visitor = [this, &found] (sinsp_threadinfo *pt)
{
if(j > 0)
bool res;

res = flt_compare(m_cmpop,
PT_PID,
&pt->m_pid);

if(res == true)
{
res = flt_compare(m_cmpop,
PT_PID,
&mt->m_pid);
found = true;

if(res == true)
{
return true;
}
// Can stop traversing parent state
return false;
}
}

return false;
return true;
};

mt->traverse_parent_state(visitor);

return found;
}

bool sinsp_filter_check_thread::compare_full_aname(sinsp_evt *evt)
{
bool res;
uint32_t j;

sinsp_threadinfo* tinfo = evt->get_thread_info();

if(tinfo == NULL)
Expand All @@ -2218,22 +2235,29 @@ bool sinsp_filter_check_thread::compare_full_aname(sinsp_evt *evt)
//
// No id specified, search in all of the ancestors
//
for(j = 0; mt != NULL; mt = mt->get_parent_thread(), j++)
bool found = false;
sinsp_threadinfo::visitor_func_t visitor = [this, &found] (sinsp_threadinfo *pt)
{
if(j > 0)
bool res;

res = flt_compare(m_cmpop,
PT_CHARBUF,
(void*)pt->m_comm.c_str());

if(res == true)
{
res = flt_compare(m_cmpop,
PT_CHARBUF,
(void*)mt->m_comm.c_str());
found = true;

if(res == true)
{
return true;
}
// Can stop traversing parent state
return false;
}
}

return false;
return true;
};

mt->traverse_parent_state(visitor);

return found;
}

bool sinsp_filter_check_thread::compare(sinsp_evt *evt)
Expand Down
48 changes: 46 additions & 2 deletions userspace/libsinsp/threadinfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ void sinsp_threadinfo::init()
m_program_hash = 0;
m_program_hash_falco = 0;
m_lastevent_data = NULL;
m_parent_loop_detected = false;
}

sinsp_threadinfo::~sinsp_threadinfo()
Expand Down Expand Up @@ -368,7 +369,7 @@ void sinsp_threadinfo::init(scap_threadinfo* pi)
m_vtid = pi->vtid;
m_vpid = pi->vpid;
m_clone_ts = pi->clone_ts;

set_cgroups(pi->cgroups, pi->cgroups_len);
m_root = pi->root;
ASSERT(m_inspector);
Expand Down Expand Up @@ -640,7 +641,7 @@ string sinsp_threadinfo::get_cwd()
{
// Ideally we should use get_cwd_root()
// but scap does not read CLONE_FS from /proc
// Also glibc and muslc use always
// Also glibc and muslc use always
// CLONE_THREAD|CLONE_FS so let's use
// get_main_thread() for now
sinsp_threadinfo* tinfo = get_main_thread();
Expand Down Expand Up @@ -767,6 +768,49 @@ uint64_t sinsp_threadinfo::get_fd_limit()
return get_main_thread()->m_fdlimit;
}

void sinsp_threadinfo::traverse_parent_state(visitor_func_t &visitor)
{
// Use two pointers starting at this, traversing the parent
// state, at different rates. If they ever equal each other
// before slow is NULL there's a loop.

sinsp_threadinfo *slow=this->get_parent_thread(), *fast=slow;

// Move fast to its parent
fast = (fast ? fast->get_parent_thread() : fast);

while(slow)
{
if(!visitor(slow))
{
break;
}

// Advance slow one step and advance fast two steps
slow = slow->get_parent_thread();

// advance fast 2 steps, checking to see if we meet
// slow after each step.
for (uint32_t i = 0; i < 2; i++) {
fast = (fast ? fast->get_parent_thread() : fast);

// If not at the end but fast == slow, there's a loop
// in the thread state.
if(slow && (slow == fast))
{
// Note we only log a loop once for a given main thread, to avoid flooding logs.
if(!m_parent_loop_detected)
{
g_logger.log(string("Loop in parent thread state detected for pid ") +
std::to_string(m_pid), sinsp_logger::SEV_WARNING);
m_parent_loop_detected = true;
}
return;
}
}
}
}

sinsp_threadinfo* sinsp_threadinfo::lookup_thread()
{
return m_inspector->get_thread(m_pid, true, true);
Expand Down
9 changes: 9 additions & 0 deletions userspace/libsinsp/threadinfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,14 @@ class SINSP_PUBLIC sinsp_threadinfo
*/
uint64_t get_fd_limit();

//
// Walk up the parent process heirarchy, calling the provided
// function for each node. If the function returns false, the
// traversal stops.
//
typedef std::function<bool (sinsp_threadinfo *)> visitor_func_t;
void traverse_parent_state(visitor_func_t &visitor);

//
// Core state
//
Expand Down Expand Up @@ -325,6 +333,7 @@ VISIBILITY_PRIVATE
uint16_t m_lastevent_type;
uint16_t m_lastevent_cpuid;
sinsp_evt::category m_lastevent_category;
bool m_parent_loop_detected;

friend class sinsp;
friend class sinsp_parser;
Expand Down

0 comments on commit 7b3be99

Please sign in to comment.