Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async container info #1326

Merged
merged 15 commits into from Apr 19, 2019

Conversation

Projects
None yet
4 participants
@mstemm
Copy link
Contributor

commented Mar 7, 2019

Changes to stop looking up docker container metadata inline with event processing and perform the lookups in the background instead. A new event PPME_CONTAINER_JSON_E (*) signals when the metadata lookup is complete and full information about the container exists.

While the metadata lookup is underway, a "stub" container is created with a name="incomplete", image="incomplete". This ensures that there is a valid container object for processes running in a container. Obviously, the negative impact is that while the lookup is underway, for any initial activity in the container you can't know anything about the container that it's running in, other than the container id.

This fixes #1321.

(*) The event previously existed, but had the EF_INTERNAL flag so it was hidden and not ever returned by the inspector. It was used to save container information to trace files.

@mstemm mstemm force-pushed the async-container-info branch 2 times, most recently from 843d0ea to 859bafa Mar 7, 2019

@mstemm mstemm force-pushed the async-container-info branch from 0ab8777 to aa5c232 Mar 7, 2019

@mstemm mstemm requested review from gnosek and mattpag Mar 7, 2019

@mstemm

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

I think this is in pretty good shape, although I still want to do some perf tests. Could you take a look?

@mstemm

This comment has been minimized.

Copy link
Contributor Author

commented Mar 8, 2019

Ok could you take another look? One pending source of failures on osx is tbb. I could just change it to protect all that stuff with HAS_CAPTURE but I'd like to avoid it if possible.

@adalton
Copy link
Contributor

left a comment

The only remaining comment is non-blocking.

@mstemm mstemm force-pushed the async-container-info branch from 09ca9ec to 60e4ce5 Mar 11, 2019

@mstemm

This comment has been minimized.

Copy link
Contributor Author

commented Mar 11, 2019

Any opinions on getting rid of m_enabled = false on the first failed container metadata lookup? I think it's more robust this way--what I found in busy environments is that it was pretty easy to get a single lookup failure which would effectively disable all future lookups.

@mstemm mstemm force-pushed the async-container-info branch 4 times, most recently from b14a0ec to 8fe1adb Mar 12, 2019

@mstemm mstemm force-pushed the async-container-info branch from d332d31 to 0eb84d4 Mar 15, 2019

Show resolved Hide resolved userspace/libsinsp/parsers.cpp Outdated
@@ -195,7 +156,6 @@ docker::docker_response libsinsp::container_engine::docker::get_docker(sinsp_con
{
case 0: /* connection failed, apparently */
g_logger.format(sinsp_logger::SEV_NOTICE, "Docker connection failed, disabling Docker container engine");
m_enabled = false;

This comment has been minimized.

Copy link
@gnosek

gnosek Mar 25, 2019

Contributor

where does this happen now? or do we keep trying to hit docker forever now?

This comment has been minimized.

Copy link
@mstemm

mstemm Mar 25, 2019

Author Contributor

Yeah that was my last general comment on the PR. We always try to fetch forever now. The general consensus was that this was better than giving up forever after the first failure.

@mstemm mstemm force-pushed the async-container-info branch from 6a5a176 to 6e888f6 Mar 26, 2019

/* PPME_CONTAINER_JSON_E */{"container", EC_INTERNAL, EF_SKIPPARSERESET | EF_MODIFIES_STATE, 1, {{"json", PT_CHARBUF, PF_NA} } },
/* PPME_CONTAINER_JSON_X */{"container", EC_INTERNAL, EF_UNUSED, 0},
/* PPME_CONTAINER_JSON_E */{"container", EC_PROCESS, EF_MODIFIES_STATE, 1, {{"json", PT_CHARBUF, PF_NA} } },
/* PPME_CONTAINER_JSON_X */{"container", EC_PROCESS, EF_UNUSED, 0},

This comment has been minimized.

Copy link
@gnosek

gnosek Mar 28, 2019

Contributor

Why the flags change? (not saying it's unneeded, trying to understand)

This comment has been minimized.

Copy link
@mstemm

mstemm Mar 28, 2019

Author Contributor

EC_INTERNAL means they aren't returned from sinsp::next(), but we rely on the container event now to indicate that a container has been created. EF_SKIPPARSERRESET means it would be skipped by falco and also the thread state isn't properly created in sinsp_parser::reset(), but we want the thread info for the event.

return Json::FastWriter().write(obj);
}

bool sinsp_container_manager::container_to_sinsp_event(const string& json, sinsp_evt* evt)
bool sinsp_container_manager::container_to_sinsp_event(const string& json, sinsp_evt* evt, int64_t tid)

This comment has been minimized.

Copy link
@gnosek

gnosek Mar 28, 2019

Contributor

aren't tids unsigned?

This comment has been minimized.

container_info.m_type = s_cri_runtime_type;
// It might also be a docker container, so set the type
// to UNKNOWN for now.
container_info.m_type = CT_UNKNOWN;

This comment has been minimized.

Copy link
@gnosek

gnosek Mar 28, 2019

Contributor

What happens when both docker and cri fail to return metadata for a container? Are we stuck with CT_UNKNOWN?

This comment has been minimized.

Copy link
@mstemm

mstemm Mar 28, 2019

Author Contributor

Yes. I suppose we could have a combined type or just pick one of them, but I think it'd be better to have this explicit type for when we don't know the metadata.

This comment has been minimized.

Copy link
@mstemm

mstemm Mar 28, 2019

Author Contributor

@gnosek and I chatted offline and we're going to get rid of CT_UNKNOWN and rely on m_metadata_complete to internally denote that container info is incomplete. Since docker runs first, it will set the type to CT_DOCKER. If cri looks up info successfully it will set the type properly. If both fail, the type will remain at CT_DOCKER, which seems as good as you can given the cgroup layouts are the same.

// use this to ensure that the tid of the CONTAINER_JSON event
// we eventually emit is the top running thread in the
// container.
typedef tbb::concurrent_hash_map<std::string, int64_t> top_tid_table;

This comment has been minimized.

Copy link
@gnosek

gnosek Mar 28, 2019

Contributor

Why do we need to track the top threadinfo (or rather, why do we need a special thread chosen from the container at all)?

What happens if we fail to find an event from the toppest-of-top tids during the async lookup (and presumably use a child tid instead)?

This comment has been minimized.

Copy link
@mstemm

mstemm Mar 28, 2019

Author Contributor

We track the threadinfo so the container event has some thread info associated with it. And we need to update this on the fly as it's very common that the initial process that created the container exits once the actual entrypoint is started, so without any extra steps you'd have a container event with a zombie threadinfo.

It's not the end of the world if it's not exactly the top thread but it is useful to be some thread in the container.

This comment has been minimized.

Copy link
@gnosek

gnosek Apr 17, 2019

Contributor

I'm still not convinced we need a tid in the container event. Where do we use it?

@mattpag
Copy link
Contributor

left a comment

Since we aren't doing any filtering of containerinfos with incomplete metadata in the container.* filterchecks code, use cases like sysdig 'container.id != host' -p "%container.image" are now returning incomplete until the fetching is completed.
Can we go ahead and do it? Or have you already thought about this and something worse happens if we do?

This also means that Falco rules with "negative predicates" - !=, not in ... - based on most container fields will misbehave unless we address it in some way, right?

@@ -134,7 +134,8 @@ class sinsp_container_info
m_cpu_period(100000),
m_has_healthcheck(false),
m_healthcheck_exe(""),
m_is_pod_sandbox(false)
m_is_pod_sandbox(false),
m_metadata_complete(true)

This comment has been minimized.

Copy link
@mattpag

mattpag Apr 1, 2019

Contributor

Why isn't the default value false? Can't we set it to true immediately if the metadata fetch is synchronous and when we have all the data for the async case? (this is also related to the "top-level" PR review comment)

This comment has been minimized.

Copy link
@mstemm

mstemm Apr 1, 2019

Author Contributor

I kept the default at true so container engines that don't have any notion of background metadata lookups can just create a container_info and know the metadata will be flagged as complete. It's only docker (and later cri) that will set the flag to false and then true when the metadata is all fetched.

@mstemm

This comment has been minimized.

Copy link
Contributor Author

commented Apr 1, 2019

About the container info with incomplete, I'll check but I thought the way I had it set up is that the container event e.g. an event that can be returned via sinsp::next() is only emitted when the info is complete. The container manager does create a container info object with image, etc == incomplete, but I intended that there's no sinsp event emitted.

Yeah double checked with the following: sudo ./userspace/sysdig/sysdig evt.type=container and running docker run busybox:latest. Only got 1 container event:

56033 16:37:10.038194346 0 <NA> (3091) > container json={"container":{"Mounts":[],"cpu_period":100000,"cpu_quota":0,"cpu_shares":1024,"env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"id":"59447b0e4b9e","image":"busybox:latest","imagedigest":"sha256:061ca9704a714ee3e8b80523ec720c64f6209ad3f97c0ff7cb9ec7d19f15149f","imageid":"d8233ab899d419c58cf3634c0df54ff5d8acc28f8173f09c21df4a07229e1205","imagerepo":"busybox","imagetag":"latest","ip":"172.17.0.2","is_pod_sandbox":false,"labels":null,"memory_limit":0,"name":"determined_mclean","port_mappings":[],"privileged":false,"swap_limit":0,"type":0}}
@mattpag

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2019

Yes the container event is emitted only when the metadata fetching is completed, but that's not what I'm talking about.

Since the container_info is available from the container_manager as soon as we start the async fetch, container filterchecks will be able to fetch it and use it but will return wrong results.
Just try the example command I pasted above and you'll see that incomplete is returned for all the syscalls processed while the async fetch is still in progress (since the filtercheck handling code will pick up the "placeholder" container_info), then when it completes you'll start seeing the correct image name.

And let me reiterate, this means that Falco rules with "negative predicates" - !=, not in, ... - based on most container fields will misbehave unless we address it in some way, because you'll have FPs (if we return incomplete for the first few hundreds of syscalls a rule like container.image not in (foo,bar) will incorrectly trigger for all of them)

@mstemm mstemm force-pushed the async-container-info branch from 34d7859 to d3cea6d Apr 2, 2019

@mstemm

This comment has been minimized.

Copy link
Contributor Author

commented Apr 2, 2019

Ok, here's a falco rules change that should make the container_started macro only trigger when there is complete metadata: falcosecurity/falco#570.

@mstemm mstemm force-pushed the async-container-info branch from ccaa47c to bdcd391 Apr 2, 2019

const Json::Value& mesos_task_id = container["mesos_task_id"];
if(!mesos_task_id.isNull() && mesos_task_id.isConvertibleTo(Json::stringValue))
{
container_info.m_mesos_task_id = mesos_task_id.asString();
}

#ifdef HAS_ANALYZER

This comment has been minimized.

Copy link
@gnosek

gnosek Apr 17, 2019

Contributor

nit: can we drop the #ifdef? I'd hope we eventually remove the flag completely

This comment has been minimized.

Copy link
@mstemm

mstemm Apr 17, 2019

Author Contributor

Sure, I'll remove it, but I'll remove it around all references of m_metadata_deadline, not just this one, right?

mstemm added some commits Mar 7, 2019

Read from a queue of pending events in sinsp::next
Add a non-blocking queue to the inspector and in sinsp::next() try to
read events from that queue, similar to how we look at m_metaevt.

If any events are found, they are returned from sinsp::next() instead of
reading from the libscap.

In order to support truly portable sinsp_event objects, we
need (optional) storage for the matching libscap event in m_pevt. This
is in m_pevt_storage, NULL by default but if non-NULL will be
freed. This will be used to pass container events to the inspector via
the queue.
Add tid for container events, pass to inspector
Make sure container_json events have a real thread id by passing it as
an argument to container_to_sinsp_event.

Modify notify_new_container to use the non-blocking queue to pass events
to the inspector instead of m_meta_evt.
Add all info to parsing/dumping of json events
Previously, CONTAINER_JSON events had some, but not all, of the info
that was in a sinsp_container_info object. Fix this so all important
info is both dumped to json in container_to_json and parsed in
parse_container_json_evt.
Refactor docker info fetches to be async
Refactor docker metadata fetches to be asynchronous, using the framework
in sysdig::async_key_value_source.

docker_async_source (a global static so it can be long-lived) is now
responsible for looking up docker metadata. Some methods that used to be
in docker:: like parse_docker(), get_docker(), etc move to
docker_async_source. It dequeues requests, calling parse_docker() to
look up the information as needed, and calls store_value() to pass the
metadata back to the caller.

docker::resolve now uses parse_docker_async to schedule the lookup with
docker_async_source. Before scheduling the lookup it creates a stub
container with type UNKNOWN (UNKNOWN because you can't tell the
difference between cri and docker containers only from the cgroup), with
the id set, and with a name/image set to "incomplete". This ensures that
threads have some associated container info object with it. In the
callback once the full metadata is available, it calls
notify_new_container, which creates the CONTAINER_JSON event and pushes
it to the inspector.

There's a fair amount of bookeeping to make sure that the container
metadata has a valid tid. The very first tid that creates the container
often exits after forking of the real container entrypoint, so you need
to keep track of the "top tid" in the container for every call to
docker::resolve() and replace it if you find it's exited.

Previously, on the first error fetching container metadata, a flag
m_enabled would be set to false, and all subsequent attempts to fetch
container metadata would be skipped. Now that lookups are done in the
background, I think it makes sense to always try a lookup for every
container, even after failures. So remove m_enabled entirely from the
docker engine.

Also, as a part of this change, reorganize the way the async lookups are
done to better support the choices of different osen (linux, windows,
and HAS_CAPTURE). Instead of compiling out the curl handles when
HAS_CAPTURE is false, always compile the code for them but don't even
attempt any lookups in docker::resolve. Note that
docker_async_source::get_docker no longer changes behavior based on
HAS_CAPTURE.
Fix cri::resolve to work w UNKNOWN container infos
cri::resolve might be called after docker::resolve, so there could be a
container_info object with type == UNKNOWN.

Update it to handle this i.e. do the lookup anyway.
Use tbb, libcurl on osx, disable libidn2
We need it now that tbb is a core part of the inspector and that the
curl part of the docker container engine isn't #ifdefd with HAS_CAPTURE.

Both are already downloaded/built as dependencies of sysdig so it's just
a matter of CMakeLists.txt config.

Also, disable libidn2 explicitly when building libcurl. On OSX builds
this gets picked up from an autodetected library, while it does not get
picked up on linux. We already disable libidn1, so also disable libidn2.
Serialize/unserialize is_pod_sandbox
Instead of inferring is_pod_sandbox from the container name only for
docker, save/load it to json directly from the container info. This
ensures it works for other container engines than docker.
Get rid of CT_UNKNOWN
Instead of having a CT_UNKNOWN type, iniitally set the type to CT_DOCKER
when starting the async lookup. Cri runs next and if the grpc lookup
completes successfully this will be replaced with CT_CRIO. If both grpc
and docker metadata lookups fail the type will remain at CT_DOCKER.
Just include sinsp class
It's a bit simpler than including sinsp.h, which has many many dependent
header files.
Add note on why the container evt needs a tid
It's required for the container.* filterchecks to work.
Only start async lookup thread if docker found
Call detect_docker, which identifies a docker container, before starting
the async lookup thread. If no docker containers are ever used, the
async thread won't be started.
Set all container metadata fields to incomplete
Instead of just setting the image to incomplete, set all container
metadata fields to incomplete.
Don't protect m_metadata_deadline w HAS_ANALYZER
It's filled in when using the analyzer, but we can define it in both
cases and just not use it when there is no analyzer.

@mstemm mstemm merged commit 199d5a6 into dev Apr 19, 2019

2 checks passed

Travis CI - Branch Build Passed
Details
Travis CI - Pull Request Build Passed
Details

@mstemm mstemm deleted the async-container-info branch Apr 19, 2019

@davideschiera davideschiera restored the async-container-info branch May 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.