ref(controller): query fleet state #2993

bacongobbler · 2015-01-29T20:35:14Z

Before, we relied on django-fsm to give us a general idea of what state
the container object in the model is at. This did not give users an
accurate idea of the state of their containers. Introducing a new
.state() method to the scheduler as well as changing Container.state to
call it's own scheduler's .state() method allows users to directly
understand what state their containers are in.

closes #2766
closes #2987

This was thoroughly tested by hand since the controller does not wrap unit tests around fleet.py, but this should cover it :)

Points: PTAL at the systemdActiveStateMap in fleet.py and let me know if the mappings between the dbus API's ActiveState and what used to be DjangoFSM's JobState is correct.

bacongobbler · 2015-01-30T18:45:53Z

Something that @gabrtv brought up was that we should have a benchmark test for the old deis ps and the new deis ps. This one will definitely take longer since it's a direct http call... Might be better to cache the request somewhere, somehow. Open to ideas!

bacongobbler · 2015-02-03T19:31:27Z

I had to add a new close_db_connections decorator to make up for the post_transition decorator that was replaced. This is necessary in multi-threaded function calls which modify a model's state. Source code courtesy of https://code.djangoproject.com/ticket/22420#comment:17

aledbf · 2015-02-03T19:44:41Z

Might be better to cache the request somewhere, somehow. Open to ideas!

@bacongobbler why the cache? I know that it's going to be "slower" than a sql query but I need the real state not something from 60 seconds ago

gabrtv · 2015-02-03T19:51:18Z

@aledbf when i discussed with @bacongobbler I was referring to an in-memory cache per request that avoided N Fleet UnitState queries for N containers (as currently implemented). The result would be "real state", just a more efficient query.

bacongobbler · 2015-02-03T19:53:48Z

We could optimize by querying the global unit state instead of querying for each individual job state, then filtering for the specific job. That would allow us to retrieve the state of all containers in one request rather than N requests, as @gabrtv mentions :)

bacongobbler · 2015-02-03T23:27:17Z

After looking through fleet's API with @gabrtv, it seems like the global unit state request is paginated. That may be more trouble than it's worth; it could potentially hurt our common use case (between 1-5 containers). I'll have to do some benchmarks, but as long as it's not terribly slow when dealing with 20+ containers then we should be okay.

bacongobbler · 2015-02-06T19:04:07Z

while fixing this up, I'm finding that deis destroy takes forever, but that's because we destroy each container in series rather than parallel. deis ps is actually quite fast!

Shall I fix that up, similar to how we scale containers?

bacongobbler · 2015-02-06T19:04:26Z

As an example:

><> deis destroy

 !    WARNING: Potentially Destructive Action
 !    This command will destroy the application: foo
 !    To proceed, type "foo" or re-run this command with --confirm=foo

> foo
Destroying foo...
done in 99s
Git remote deis removed

bacongobbler · 2015-02-06T20:45:44Z

@gabrtv I was testing deis ps with 20 containers scaled up. The response time was < 1 second. I've also made App.destroy() delete its containers in parallel ~~so we should be good now for manual testing :)~~

EDIT: never mind... looks like I'll have to come up with a better mapping between systemd's active/sub states and Deis' container states.

Before, we relied on django-fsm to give us a general idea of what state the container object in the model is at. This did not give users an accurate idea of the state of their containers. Introducing a new .state() method to the scheduler as well as changing Container.state to call it's own scheduler's .state() method allows users to directly understand what state their containers are in.

When we call app.destroy(), each container is destroyed in series. Since we now rely on Fleet's state to respond, we are getting a more accurate idea of what state the container is actually in. Because of this, c.destroy() was taking a very long time waiting for a response from fleet. iThis wasn't a problem before because we were waiting for django-fsm, which basically said "yup, I told fleet to destroy the container. It's dead now", which wasn't entirely true. Switching to destroying containers in parallel makes this operation much faster.

when fleet loads a job, sometimes it'll automatically start and stop the container, which in our case will return as 'failed', even though the container is perfectly fine.

mboersma · 2015-02-26T21:41:01Z

controller/scheduler/fleet.py

+            # determine if the job no longer exists (will raise a RuntimeError on 404)
+            unit = self._get_unit(name)
+            state = self._wait_for_container_state(name)
+            activeState = state['systemdActiveState']


Heh, ActiveState.

mboersma · 2015-02-26T21:51:44Z

This should fix container state disagreement that we have seen from time to time. Code LGTM. Nice work @bacongobbler.

carmstrong · 2015-02-26T23:28:51Z

Code LGTM. We can also remove the last note from http://docs.deis.io/en/latest/managing_deis/backing_up_data/ but in the sake of getting this PR merged, that can be done in a follow-up PR.

ref(controller): query fleet state

gabrtv added the requires-manual-testing label Jan 29, 2015

bacongobbler added this to the v1.4 milestone Feb 2, 2015

bacongobbler self-assigned this Feb 2, 2015

bacongobbler force-pushed the 2766-query-fleet-state branch from 04b36fd to bf73ca6 Compare February 3, 2015 19:27

bacongobbler force-pushed the 2766-query-fleet-state branch 2 times, most recently from 9cbafa0 to bec2991 Compare February 3, 2015 22:41

bacongobbler force-pushed the 2766-query-fleet-state branch 2 times, most recently from 72321d6 to 212f1b0 Compare February 6, 2015 20:37

This was referenced Feb 6, 2015

Deis hangs in launching state though launced successfully #2987

Closed

Remove scheduler timeouts / Taking too long to pull a run container fails with 'failed to create container' #3054

Closed

bacongobbler force-pushed the 2766-query-fleet-state branch 5 times, most recently from 299f2df to 9683c34 Compare February 12, 2015 21:47

bacongobbler force-pushed the 2766-query-fleet-state branch from 9683c34 to 370551f Compare February 25, 2015 06:23

Matthew Fisher added 3 commits February 25, 2015 14:27

fix(controller): remove print calls

9ee7525

bacongobbler force-pushed the 2766-query-fleet-state branch from 370551f to 9ee7525 Compare February 26, 2015 06:34

fix(controller): fixup fleet reporting failed state

3c462b7

when fleet loads a job, sometimes it'll automatically start and stop the container, which in our case will return as 'failed', even though the container is perfectly fine.

bacongobbler force-pushed the 2766-query-fleet-state branch from 23f151f to 3c462b7 Compare February 26, 2015 17:38

mboersma reviewed Feb 26, 2015
View reviewed changes

bacongobbler pushed a commit that referenced this pull request Feb 26, 2015

Merge pull request #2993 from bacongobbler/2766-query-fleet-state

447f1dc

ref(controller): query fleet state

bacongobbler merged commit 447f1dc into deis:master Feb 26, 2015

bacongobbler deleted the 2766-query-fleet-state branch February 26, 2015 23:33

carmstrong mentioned this pull request Mar 2, 2015

docs(managing_deis): remove scale note on backup/restore #3170

Merged

mboersma mentioned this pull request Apr 5, 2016

chore(build.sh): remove unused git install #5012

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(controller): query fleet state #2993

ref(controller): query fleet state #2993

bacongobbler commented Jan 29, 2015

bacongobbler commented Jan 30, 2015

bacongobbler commented Feb 3, 2015

aledbf commented Feb 3, 2015

gabrtv commented Feb 3, 2015

bacongobbler commented Feb 3, 2015

bacongobbler commented Feb 3, 2015

bacongobbler commented Feb 6, 2015

bacongobbler commented Feb 6, 2015

bacongobbler commented Feb 6, 2015

mboersma Feb 26, 2015

mboersma commented Feb 26, 2015

carmstrong commented Feb 26, 2015

ref(controller): query fleet state #2993

ref(controller): query fleet state #2993

Conversation

bacongobbler commented Jan 29, 2015

bacongobbler commented Jan 30, 2015

bacongobbler commented Feb 3, 2015

aledbf commented Feb 3, 2015

gabrtv commented Feb 3, 2015

bacongobbler commented Feb 3, 2015

bacongobbler commented Feb 3, 2015

bacongobbler commented Feb 6, 2015

bacongobbler commented Feb 6, 2015

bacongobbler commented Feb 6, 2015

mboersma Feb 26, 2015

Choose a reason for hiding this comment

mboersma commented Feb 26, 2015

carmstrong commented Feb 26, 2015