Skip to content

Conversation

@mstemm
Copy link
Contributor

@mstemm mstemm commented Apr 30, 2019

Instead of having container engine objects be short-lived, only created
at the instant resolve() is called, have them be long-lived and a part
of the sinsp_container_manager object. This involves:

  • create a stub base class libsinsp::container_engine::resolver that
    defines virtual void resolve() and virtual cleanup() methods.
  • making all the classes in container_engine/* derive from the base
    class.
  • in sinsp_container_manager, create a list of
    container_engine::resolver objects.
  • when resolving containers, iterate over the list instead of using the
    templated functions resolve_container_impl().

This fixes SMAGENT-1569, because there is no global state related to the
container engines.

@mstemm mstemm requested a review from gnosek April 30, 2019 01:24
Copy link
Contributor

@adalton adalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All nits.

Copy link
Contributor

@gnosek gnosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the code changes look okay at a glance, I still fail to understand how that helps with locks held over a fork.

@mstemm
Copy link
Contributor Author

mstemm commented Apr 30, 2019

The reason this helps with forks is that some tests did a exit() in a forked child, which close all FILEs, call the destructor of all globals, etc. Since it's a forked child, the destructors were being called twice (in parent + child), and that caused a hang due to one of the semaphores. The corresponding agent PR had a test that explicitly triggers this--aside from this change, I could make the test fail/pass by switching between exit() and _exit() (which skips all that teardown stuff).

@mstemm mstemm force-pushed the SMAGENT-1569-stateful-container-resolvers branch from 91f5561 to e9d73bf Compare April 30, 2019 16:33
Copy link
Contributor

@gnosek gnosek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized exit() only calls destructors of statics/globals, not of local variables, so moving from one to the other will indeed help.

Instead of having container engine objects be short-lived, only created
at the instant resolve() is called, have them be long-lived and a part
of the sinsp_container_manager object. This involves:

 - create a stub base class libsinsp::container_engine::resolver that
   defines virtual void resolve() and virtual cleanup() methods.
 - making all the classes in container_engine/* derive from the base
   class.
 - in sinsp_container_manager, create a list of
   container_engine::resolver objects.
 - when resolving containers, iterate over the list instead of using the
   templated functions resolve_container_impl().

This fixes SMAGENT-1569, because there is no global state related to the
container engines.
@mstemm mstemm force-pushed the SMAGENT-1569-stateful-container-resolvers branch from e9d73bf to 5d8eeb2 Compare April 30, 2019 20:55
mstemm added 2 commits April 30, 2019 16:39
Instead of creating the container engines in the constructor, create
them at the first call to resolve_container(). This gives time to set
things like the cri socket, timeout, etc to alternate values.
Add additional debug logging and slightly modify existing logging to add
a consistent "cri: xxx" or "cri (<container id>)" prefix, like we do for
docker.
@mstemm
Copy link
Contributor Author

mstemm commented Apr 30, 2019

@adalton and @gnosek, I made some additional changes to defer creating the container engines until the first call to resolve_container. This allows time to specify an alternate cri socket path, timeout, etc. I'd like to get your feedback. I didn't go the full step of moving the static values like the socket path, timeout, etc out of libsinsp/cri.cpp, but if you feel like it's better to move them into the cri engine, let me know.

@mstemm mstemm merged commit 72a94bd into dev May 1, 2019
@mstemm mstemm deleted the SMAGENT-1569-stateful-container-resolvers branch May 1, 2019 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants