Skip to content
This repository has been archived by the owner on Nov 30, 2021. It is now read-only.

ref(controller): query fleet state #2993

Merged
merged 4 commits into from
Feb 26, 2015

Conversation

bacongobbler
Copy link
Member

Before, we relied on django-fsm to give us a general idea of what state
the container object in the model is at. This did not give users an
accurate idea of the state of their containers. Introducing a new
.state() method to the scheduler as well as changing Container.state to
call it's own scheduler's .state() method allows users to directly
understand what state their containers are in.

closes #2766
closes #2987

This was thoroughly tested by hand since the controller does not wrap unit tests around fleet.py, but this should cover it :)

Points: PTAL at the systemdActiveStateMap in fleet.py and let me know if the mappings between the dbus API's ActiveState and what used to be DjangoFSM's JobState is correct.

@bacongobbler
Copy link
Member Author

Something that @gabrtv brought up was that we should have a benchmark test for the old deis ps and the new deis ps. This one will definitely take longer since it's a direct http call... Might be better to cache the request somewhere, somehow. Open to ideas!

@bacongobbler bacongobbler added this to the v1.4 milestone Feb 2, 2015
@bacongobbler bacongobbler self-assigned this Feb 2, 2015
@bacongobbler
Copy link
Member Author

I had to add a new close_db_connections decorator to make up for the post_transition decorator that was replaced. This is necessary in multi-threaded function calls which modify a model's state. Source code courtesy of https://code.djangoproject.com/ticket/22420#comment:17

@aledbf
Copy link
Contributor

aledbf commented Feb 3, 2015

Might be better to cache the request somewhere, somehow. Open to ideas!

@bacongobbler why the cache? I know that it's going to be "slower" than a sql query but I need the real state not something from 60 seconds ago

@gabrtv
Copy link
Member

gabrtv commented Feb 3, 2015

@aledbf when i discussed with @bacongobbler I was referring to an in-memory cache per request that avoided N Fleet UnitState queries for N containers (as currently implemented). The result would be "real state", just a more efficient query.

@bacongobbler
Copy link
Member Author

We could optimize by querying the global unit state instead of querying for each individual job state, then filtering for the specific job. That would allow us to retrieve the state of all containers in one request rather than N requests, as @gabrtv mentions :)

@bacongobbler bacongobbler force-pushed the 2766-query-fleet-state branch 2 times, most recently from 9cbafa0 to bec2991 Compare February 3, 2015 22:41
@bacongobbler
Copy link
Member Author

After looking through fleet's API with @gabrtv, it seems like the global unit state request is paginated. That may be more trouble than it's worth; it could potentially hurt our common use case (between 1-5 containers). I'll have to do some benchmarks, but as long as it's not terribly slow when dealing with 20+ containers then we should be okay.

@bacongobbler
Copy link
Member Author

while fixing this up, I'm finding that deis destroy takes forever, but that's because we destroy each container in series rather than parallel. deis ps is actually quite fast!

Shall I fix that up, similar to how we scale containers?

@bacongobbler
Copy link
Member Author

As an example:

><> deis destroy

 !    WARNING: Potentially Destructive Action
 !    This command will destroy the application: foo
 !    To proceed, type "foo" or re-run this command with --confirm=foo

> foo
Destroying foo...
done in 99s
Git remote deis removed

@bacongobbler bacongobbler force-pushed the 2766-query-fleet-state branch 2 times, most recently from 72321d6 to 212f1b0 Compare February 6, 2015 20:37
@bacongobbler
Copy link
Member Author

@gabrtv I was testing deis ps with 20 containers scaled up. The response time was < 1 second. I've also made App.destroy() delete its containers in parallel so we should be good now for manual testing :)

EDIT: never mind... looks like I'll have to come up with a better mapping between systemd's active/sub states and Deis' container states.

Matthew Fisher added 3 commits February 25, 2015 14:27
Before, we relied on django-fsm to give us a general idea of what state
the container object in the model is at. This did not give users an
accurate idea of the state of their containers. Introducing a new
.state() method to the scheduler as well as changing Container.state to
call it's own scheduler's .state() method allows users to directly
understand what state their containers are in.
When we call app.destroy(), each container is destroyed in series. Since
we now rely on Fleet's state to respond, we are getting a more accurate
idea of what state the container is actually in. Because of this,
c.destroy() was taking a very long time waiting for a response from
fleet. iThis wasn't a problem before because we were waiting for
django-fsm, which basically said "yup, I told fleet to destroy the
container. It's dead now", which wasn't entirely true.

Switching to destroying containers in parallel makes this operation much
faster.
when fleet loads a job, sometimes it'll automatically start and stop
the container, which in our case will return as 'failed', even though
the container is perfectly fine.
# determine if the job no longer exists (will raise a RuntimeError on 404)
unit = self._get_unit(name)
state = self._wait_for_container_state(name)
activeState = state['systemdActiveState']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, ActiveState.

@mboersma
Copy link
Member

This should fix container state disagreement that we have seen from time to time. Code LGTM. Nice work @bacongobbler.

@carmstrong
Copy link
Contributor

Code LGTM. We can also remove the last note from http://docs.deis.io/en/latest/managing_deis/backing_up_data/ but in the sake of getting this PR merged, that can be done in a follow-up PR.

bacongobbler pushed a commit that referenced this pull request Feb 26, 2015
@bacongobbler bacongobbler merged commit 447f1dc into deis:master Feb 26, 2015
@bacongobbler bacongobbler deleted the 2766-query-fleet-state branch February 26, 2015 23:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants