Async container info #1326

mstemm · 2019-03-07T01:33:26Z

Changes to stop looking up docker container metadata inline with event processing and perform the lookups in the background instead. A new event PPME_CONTAINER_JSON_E (*) signals when the metadata lookup is complete and full information about the container exists.

While the metadata lookup is underway, a "stub" container is created with a name="incomplete", image="incomplete". This ensures that there is a valid container object for processes running in a container. Obviously, the negative impact is that while the lookup is underway, for any initial activity in the container you can't know anything about the container that it's running in, other than the container id.

This fixes #1321.

(*) The event previously existed, but had the EF_INTERNAL flag so it was hidden and not ever returned by the inspector. It was used to save container information to trace files.

mstemm · 2019-03-07T04:36:09Z

I think this is in pretty good shape, although I still want to do some perf tests. Could you take a look?

userspace/libsinsp/container_engine/docker.h

userspace/libsinsp/container_engine/docker_linux.cpp

mstemm · 2019-03-08T23:16:08Z

Ok could you take another look? One pending source of failures on osx is tbb. I could just change it to protect all that stuff with HAS_CAPTURE but I'd like to avoid it if possible.

adalton

The only remaining comment is non-blocking.

userspace/libsinsp/container_engine/docker_common.cpp

mstemm · 2019-03-11T21:00:53Z

Any opinions on getting rid of m_enabled = false on the first failed container metadata lookup? I think it's more robust this way--what I found in busy environments is that it was pretty easy to get a single lookup failure which would effectively disable all future lookups.

userspace/libsinsp/parsers.cpp

userspace/libsinsp/container_engine/docker.h

gnosek · 2019-03-25T16:42:01Z

userspace/libsinsp/container_engine/docker_linux.cpp

@@ -195,7 +156,6 @@ docker::docker_response libsinsp::container_engine::docker::get_docker(sinsp_con
 	{
 		case 0: /* connection failed, apparently */
 			g_logger.format(sinsp_logger::SEV_NOTICE, "Docker connection failed, disabling Docker container engine");
-			m_enabled = false;


where does this happen now? or do we keep trying to hit docker forever now?

Yeah that was my last general comment on the PR. We always try to fetch forever now. The general consensus was that this was better than giving up forever after the first failure.

gnosek · 2019-03-28T16:03:59Z

driver/event_table.c

-	/* PPME_CONTAINER_JSON_E */{"container", EC_INTERNAL, EF_SKIPPARSERESET | EF_MODIFIES_STATE, 1, {{"json", PT_CHARBUF, PF_NA} } },
-	/* PPME_CONTAINER_JSON_X */{"container", EC_INTERNAL, EF_UNUSED, 0},
+	/* PPME_CONTAINER_JSON_E */{"container", EC_PROCESS, EF_MODIFIES_STATE, 1, {{"json", PT_CHARBUF, PF_NA} } },
+	/* PPME_CONTAINER_JSON_X */{"container", EC_PROCESS, EF_UNUSED, 0},


Why the flags change? (not saying it's unneeded, trying to understand)

EC_INTERNAL means they aren't returned from sinsp::next(), but we rely on the container event now to indicate that a container has been created. EF_SKIPPARSERRESET means it would be skipped by falco and also the thread state isn't properly created in sinsp_parser::reset(), but we want the thread info for the event.

gnosek · 2019-03-28T16:06:51Z

userspace/libsinsp/container.cpp

 	return Json::FastWriter().write(obj);
 }

-bool sinsp_container_manager::container_to_sinsp_event(const string& json, sinsp_evt* evt)
+bool sinsp_container_manager::container_to_sinsp_event(const string& json, sinsp_evt* evt, int64_t tid)


aren't tids unsigned?

Not in threadinfo.h: https://github.com/draios/sysdig/blob/dev/userspace/libsinsp/threadinfo.h#L238

gnosek · 2019-03-28T16:08:57Z

userspace/libsinsp/container_engine/cri.cpp

-		container_info.m_type = s_cri_runtime_type;
+		// It might also be a docker container, so set the type
+		// to UNKNOWN for now.
+		container_info.m_type = CT_UNKNOWN;


What happens when both docker and cri fail to return metadata for a container? Are we stuck with CT_UNKNOWN?

Yes. I suppose we could have a combined type or just pick one of them, but I think it'd be better to have this explicit type for when we don't know the metadata.

@gnosek and I chatted offline and we're going to get rid of CT_UNKNOWN and rely on m_metadata_complete to internally denote that container info is incomplete. Since docker runs first, it will set the type to CT_DOCKER. If cri looks up info successfully it will set the type properly. If both fail, the type will remain at CT_DOCKER, which seems as good as you can given the cgroup layouts are the same.

gnosek · 2019-03-28T16:19:44Z

userspace/libsinsp/container_engine/docker.h

+	// use this to ensure that the tid of the CONTAINER_JSON event
+	// we eventually emit is the top running thread in the
+	// container.
+	typedef tbb::concurrent_hash_map<std::string, int64_t> top_tid_table;


Why do we need to track the top threadinfo (or rather, why do we need a special thread chosen from the container at all)?

What happens if we fail to find an event from the toppest-of-top tids during the async lookup (and presumably use a child tid instead)?

We track the threadinfo so the container event has some thread info associated with it. And we need to update this on the fly as it's very common that the initial process that created the container exits once the actual entrypoint is started, so without any extra steps you'd have a container event with a zombie threadinfo.

It's not the end of the world if it's not exactly the top thread but it is useful to be some thread in the container.

I'm still not convinced we need a tid in the container event. Where do we use it?

userspace/libsinsp/container_engine/docker_common.cpp

mattpag

Since we aren't doing any filtering of containerinfos with incomplete metadata in the container.* filterchecks code, use cases like sysdig 'container.id != host' -p "%container.image" are now returning incomplete until the fetching is completed.
Can we go ahead and do it? Or have you already thought about this and something worse happens if we do?

This also means that Falco rules with "negative predicates" - !=, not in ... - based on most container fields will misbehave unless we address it in some way, right?

mattpag · 2019-04-01T16:29:37Z

userspace/libsinsp/container_info.h

@@ -134,7 +134,8 @@ class sinsp_container_info
 		m_cpu_period(100000),
 		m_has_healthcheck(false),
 		m_healthcheck_exe(""),
-		m_is_pod_sandbox(false)
+		m_is_pod_sandbox(false),
+		m_metadata_complete(true)


Why isn't the default value false? Can't we set it to true immediately if the metadata fetch is synchronous and when we have all the data for the async case? (this is also related to the "top-level" PR review comment)

I kept the default at true so container engines that don't have any notion of background metadata lookups can just create a container_info and know the metadata will be flagged as complete. It's only docker (and later cri) that will set the flag to false and then true when the metadata is all fetched.

mstemm · 2019-04-01T22:12:50Z

About the container info with incomplete, I'll check but I thought the way I had it set up is that the container event e.g. an event that can be returned via sinsp::next() is only emitted when the info is complete. The container manager does create a container info object with image, etc == incomplete, but I intended that there's no sinsp event emitted.

Yeah double checked with the following: sudo ./userspace/sysdig/sysdig evt.type=container and running docker run busybox:latest. Only got 1 container event:

56033 16:37:10.038194346 0 <NA> (3091) > container json={"container":{"Mounts":[],"cpu_period":100000,"cpu_quota":0,"cpu_shares":1024,"env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"id":"59447b0e4b9e","image":"busybox:latest","imagedigest":"sha256:061ca9704a714ee3e8b80523ec720c64f6209ad3f97c0ff7cb9ec7d19f15149f","imageid":"d8233ab899d419c58cf3634c0df54ff5d8acc28f8173f09c21df4a07229e1205","imagerepo":"busybox","imagetag":"latest","ip":"172.17.0.2","is_pod_sandbox":false,"labels":null,"memory_limit":0,"name":"determined_mclean","port_mappings":[],"privileged":false,"swap_limit":0,"type":0}}

mattpag · 2019-04-02T08:49:29Z

Yes the container event is emitted only when the metadata fetching is completed, but that's not what I'm talking about.

Since the container_info is available from the container_manager as soon as we start the async fetch, container filterchecks will be able to fetch it and use it but will return wrong results.
Just try the example command I pasted above and you'll see that incomplete is returned for all the syscalls processed while the async fetch is still in progress (since the filtercheck handling code will pick up the "placeholder" container_info), then when it completes you'll start seeing the correct image name.

And let me reiterate, this means that Falco rules with "negative predicates" - !=, not in, ... - based on most container fields will misbehave unless we address it in some way, because you'll have FPs (if we return incomplete for the first few hundreds of syscalls a rule like container.image not in (foo,bar) will incorrectly trigger for all of them)

mstemm · 2019-04-02T16:27:09Z

Ok, here's a falco rules change that should make the container_started macro only trigger when there is complete metadata: falcosecurity/falco#570.

gnosek · 2019-04-17T14:19:39Z

userspace/libsinsp/parsers.cpp

 		const Json::Value& mesos_task_id = container["mesos_task_id"];
 		if(!mesos_task_id.isNull() && mesos_task_id.isConvertibleTo(Json::stringValue))
 		{
 			container_info.m_mesos_task_id = mesos_task_id.asString();
 		}
+
+#ifdef HAS_ANALYZER


nit: can we drop the #ifdef? I'd hope we eventually remove the flag completely

Sure, I'll remove it, but I'll remove it around all references of m_metadata_deadline, not just this one, right?

Add a non-blocking queue to the inspector and in sinsp::next() try to read events from that queue, similar to how we look at m_metaevt. If any events are found, they are returned from sinsp::next() instead of reading from the libscap. In order to support truly portable sinsp_event objects, we need (optional) storage for the matching libscap event in m_pevt. This is in m_pevt_storage, NULL by default but if non-NULL will be freed. This will be used to pass container events to the inspector via the queue.

Make sure container_json events have a real thread id by passing it as an argument to container_to_sinsp_event. Modify notify_new_container to use the non-blocking queue to pass events to the inspector instead of m_meta_evt.

Previously, CONTAINER_JSON events had some, but not all, of the info that was in a sinsp_container_info object. Fix this so all important info is both dumped to json in container_to_json and parsed in parse_container_json_evt.

Refactor docker metadata fetches to be asynchronous, using the framework in sysdig::async_key_value_source. docker_async_source (a global static so it can be long-lived) is now responsible for looking up docker metadata. Some methods that used to be in docker:: like parse_docker(), get_docker(), etc move to docker_async_source. It dequeues requests, calling parse_docker() to look up the information as needed, and calls store_value() to pass the metadata back to the caller. docker::resolve now uses parse_docker_async to schedule the lookup with docker_async_source. Before scheduling the lookup it creates a stub container with type UNKNOWN (UNKNOWN because you can't tell the difference between cri and docker containers only from the cgroup), with the id set, and with a name/image set to "incomplete". This ensures that threads have some associated container info object with it. In the callback once the full metadata is available, it calls notify_new_container, which creates the CONTAINER_JSON event and pushes it to the inspector. There's a fair amount of bookeeping to make sure that the container metadata has a valid tid. The very first tid that creates the container often exits after forking of the real container entrypoint, so you need to keep track of the "top tid" in the container for every call to docker::resolve() and replace it if you find it's exited. Previously, on the first error fetching container metadata, a flag m_enabled would be set to false, and all subsequent attempts to fetch container metadata would be skipped. Now that lookups are done in the background, I think it makes sense to always try a lookup for every container, even after failures. So remove m_enabled entirely from the docker engine. Also, as a part of this change, reorganize the way the async lookups are done to better support the choices of different osen (linux, windows, and HAS_CAPTURE). Instead of compiling out the curl handles when HAS_CAPTURE is false, always compile the code for them but don't even attempt any lookups in docker::resolve. Note that docker_async_source::get_docker no longer changes behavior based on HAS_CAPTURE.

cri::resolve might be called after docker::resolve, so there could be a container_info object with type == UNKNOWN. Update it to handle this i.e. do the lookup anyway.

We need it now that tbb is a core part of the inspector and that the curl part of the docker container engine isn't #ifdefd with HAS_CAPTURE. Both are already downloaded/built as dependencies of sysdig so it's just a matter of CMakeLists.txt config. Also, disable libidn2 explicitly when building libcurl. On OSX builds this gets picked up from an autodetected library, while it does not get picked up on linux. We already disable libidn1, so also disable libidn2.

Instead of inferring is_pod_sandbox from the container name only for docker, save/load it to json directly from the container info. This ensures it works for other container engines than docker.

Instead of having a CT_UNKNOWN type, iniitally set the type to CT_DOCKER when starting the async lookup. Cri runs next and if the grpc lookup completes successfully this will be replaced with CT_CRIO. If both grpc and docker metadata lookups fail the type will remain at CT_DOCKER.

It's a bit simpler than including sinsp.h, which has many many dependent header files.

It's required for the container.* filterchecks to work.

Call detect_docker, which identifies a docker container, before starting the async lookup thread. If no docker containers are ever used, the async thread won't be started.

Instead of just setting the image to incomplete, set all container metadata fields to incomplete.

It's filled in when using the analyzer, but we can define it in both cases and just not use it when there is no analyzer.

mstemm force-pushed the async-container-info branch 2 times, most recently from 843d0ea to 859bafa Compare March 7, 2019 03:23

mstemm mentioned this pull request Mar 7, 2019

Support container event to denote container starts falcosecurity/falco#550

Merged

mstemm force-pushed the async-container-info branch from 0ab8777 to aa5c232 Compare March 7, 2019 04:35

mstemm requested review from gnosek and mattpag March 7, 2019 04:35

adalton reviewed Mar 8, 2019

View reviewed changes

adalton approved these changes Mar 10, 2019

View reviewed changes

mattpag reviewed Mar 11, 2019

View reviewed changes

userspace/libsinsp/container_engine/docker_common.cpp Outdated Show resolved Hide resolved

mstemm force-pushed the async-container-info branch from 09ca9ec to 60e4ce5 Compare March 11, 2019 19:48

mstemm force-pushed the async-container-info branch 4 times, most recently from b14a0ec to 8fe1adb Compare March 12, 2019 17:41

anoop-sysd mentioned this pull request Mar 13, 2019

Add prototype for async_metric_source [SMAGENT-1324] #1283

Closed

mstemm force-pushed the async-container-info branch from d332d31 to 0eb84d4 Compare March 15, 2019 17:11

mstemm mentioned this pull request Mar 15, 2019

Start tracking k8s liveness probes #1320

Merged

gnosek reviewed Mar 25, 2019

View reviewed changes

userspace/libsinsp/parsers.cpp Outdated Show resolved Hide resolved

gnosek reviewed Mar 25, 2019

View reviewed changes

userspace/libsinsp/container_engine/docker.h Outdated Show resolved Hide resolved

gnosek reviewed Mar 25, 2019

View reviewed changes

mstemm force-pushed the async-container-info branch from 6a5a176 to 6e888f6 Compare March 26, 2019 20:08

gnosek reviewed Mar 28, 2019

View reviewed changes

userspace/libsinsp/container_engine/docker_common.cpp Show resolved Hide resolved

gnosek reviewed Mar 28, 2019

View reviewed changes

userspace/libsinsp/container_engine/docker_common.cpp Outdated Show resolved Hide resolved

gnosek reviewed Mar 28, 2019

View reviewed changes

userspace/libsinsp/container_engine/docker_common.cpp Show resolved Hide resolved

gnosek reviewed Mar 28, 2019

View reviewed changes

userspace/libsinsp/container_engine/docker_common.cpp Show resolved Hide resolved

mattpag reviewed Apr 1, 2019

View reviewed changes

mstemm force-pushed the async-container-info branch from 34d7859 to d3cea6d Compare April 2, 2019 15:28

mstemm force-pushed the async-container-info branch from ccaa47c to bdcd391 Compare April 2, 2019 20:00

gnosek reviewed Apr 17, 2019

View reviewed changes

mstemm added 14 commits April 17, 2019 16:37

Add tid for container events, pass to inspector

ff599a3

Make sure container_json events have a real thread id by passing it as an argument to container_to_sinsp_event. Modify notify_new_container to use the non-blocking queue to pass events to the inspector instead of m_meta_evt.

Add all info to parsing/dumping of json events

8546f3e

Previously, CONTAINER_JSON events had some, but not all, of the info that was in a sinsp_container_info object. Fix this so all important info is both dumped to json in container_to_json and parsed in parse_container_json_evt.

Fix cri::resolve to work w UNKNOWN container infos

037b757

cri::resolve might be called after docker::resolve, so there could be a container_info object with type == UNKNOWN. Update it to handle this i.e. do the lookup anyway.

Serialize/unserialize is_pod_sandbox

98fd835

Instead of inferring is_pod_sandbox from the container name only for docker, save/load it to json directly from the container info. This ensures it works for other container engines than docker.

Just include sinsp class

4594484

It's a bit simpler than including sinsp.h, which has many many dependent header files.

Add note on why the container evt needs a tid

9ab808d

It's required for the container.* filterchecks to work.

Only start async lookup thread if docker found

71b08a6

Call detect_docker, which identifies a docker container, before starting the async lookup thread. If no docker containers are ever used, the async thread won't be started.

Use a static for the string "incomplete"

4547563

Fix typo CRI -> docker

84a6558

Set all container metadata fields to incomplete

8d65ac8

Instead of just setting the image to incomplete, set all container metadata fields to incomplete.

mstemm force-pushed the async-container-info branch from bdcd391 to 8d65ac8 Compare April 17, 2019 23:39

mstemm mentioned this pull request Apr 18, 2019

Rely as little as possible on tinfo for container metadata filterchecks #1368

Closed

Don't protect m_metadata_deadline w HAS_ANALYZER

9c7f3df

It's filled in when using the analyzer, but we can define it in both cases and just not use it when there is no analyzer.

mstemm merged commit 199d5a6 into dev Apr 19, 2019

mstemm deleted the async-container-info branch April 19, 2019 01:25

davideschiera restored the async-container-info branch May 23, 2019 03:30

fntlnz mentioned this pull request Dec 5, 2019

getting Null as container name falcosecurity/falco#925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async container info #1326

Async container info #1326

mstemm commented Mar 7, 2019

mstemm commented Mar 7, 2019

mstemm commented Mar 8, 2019

adalton left a comment

mstemm commented Mar 11, 2019

gnosek Mar 25, 2019

mstemm Mar 25, 2019

gnosek Mar 28, 2019

mstemm Mar 28, 2019

gnosek Mar 28, 2019

mstemm Mar 28, 2019

gnosek Mar 28, 2019

mstemm Mar 28, 2019

mstemm Mar 28, 2019 •

edited

Loading

gnosek Mar 28, 2019

mstemm Mar 28, 2019

gnosek Apr 17, 2019

mattpag left a comment

mattpag Apr 1, 2019 •

edited

Loading

mstemm Apr 1, 2019

mstemm commented Apr 1, 2019 •

edited

Loading

mattpag commented Apr 2, 2019

mstemm commented Apr 2, 2019

gnosek Apr 17, 2019

mstemm Apr 17, 2019

Async container info #1326

Async container info #1326

Conversation

mstemm commented Mar 7, 2019

mstemm commented Mar 7, 2019

mstemm commented Mar 8, 2019

adalton left a comment

Choose a reason for hiding this comment

mstemm commented Mar 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstemm Mar 28, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattpag left a comment

Choose a reason for hiding this comment

mattpag Apr 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstemm commented Apr 1, 2019 • edited Loading

mattpag commented Apr 2, 2019

mstemm commented Apr 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstemm Mar 28, 2019 •

edited

Loading

mattpag Apr 1, 2019 •

edited

Loading

mstemm commented Apr 1, 2019 •

edited

Loading