-
Notifications
You must be signed in to change notification settings - Fork 798
ref(controller): query fleet state #2993
ref(controller): query fleet state #2993
Conversation
Something that @gabrtv brought up was that we should have a benchmark test for the old |
04b36fd
to
bf73ca6
Compare
I had to add a new |
@bacongobbler why the cache? I know that it's going to be "slower" than a sql query but I need the real state not something from 60 seconds ago |
@aledbf when i discussed with @bacongobbler I was referring to an in-memory cache per request that avoided N Fleet UnitState queries for N containers (as currently implemented). The result would be "real state", just a more efficient query. |
We could optimize by querying the global unit state instead of querying for each individual job state, then filtering for the specific job. That would allow us to retrieve the state of all containers in one request rather than N requests, as @gabrtv mentions :) |
9cbafa0
to
bec2991
Compare
After looking through fleet's API with @gabrtv, it seems like the global unit state request is paginated. That may be more trouble than it's worth; it could potentially hurt our common use case (between 1-5 containers). I'll have to do some benchmarks, but as long as it's not terribly slow when dealing with 20+ containers then we should be okay. |
while fixing this up, I'm finding that Shall I fix that up, similar to how we scale containers? |
As an example:
|
72321d6
to
212f1b0
Compare
@gabrtv I was testing EDIT: never mind... looks like I'll have to come up with a better mapping between systemd's active/sub states and Deis' container states. |
299f2df
to
9683c34
Compare
9683c34
to
370551f
Compare
Before, we relied on django-fsm to give us a general idea of what state the container object in the model is at. This did not give users an accurate idea of the state of their containers. Introducing a new .state() method to the scheduler as well as changing Container.state to call it's own scheduler's .state() method allows users to directly understand what state their containers are in.
When we call app.destroy(), each container is destroyed in series. Since we now rely on Fleet's state to respond, we are getting a more accurate idea of what state the container is actually in. Because of this, c.destroy() was taking a very long time waiting for a response from fleet. iThis wasn't a problem before because we were waiting for django-fsm, which basically said "yup, I told fleet to destroy the container. It's dead now", which wasn't entirely true. Switching to destroying containers in parallel makes this operation much faster.
370551f
to
9ee7525
Compare
when fleet loads a job, sometimes it'll automatically start and stop the container, which in our case will return as 'failed', even though the container is perfectly fine.
23f151f
to
3c462b7
Compare
# determine if the job no longer exists (will raise a RuntimeError on 404) | ||
unit = self._get_unit(name) | ||
state = self._wait_for_container_state(name) | ||
activeState = state['systemdActiveState'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, ActiveState.
This should fix container state disagreement that we have seen from time to time. Code LGTM. Nice work @bacongobbler. |
Code LGTM. We can also remove the last note from http://docs.deis.io/en/latest/managing_deis/backing_up_data/ but in the sake of getting this PR merged, that can be done in a follow-up PR. |
ref(controller): query fleet state
Before, we relied on django-fsm to give us a general idea of what state
the container object in the model is at. This did not give users an
accurate idea of the state of their containers. Introducing a new
.state() method to the scheduler as well as changing Container.state to
call it's own scheduler's .state() method allows users to directly
understand what state their containers are in.
closes #2766
closes #2987
This was thoroughly tested by hand since the controller does not wrap unit tests around fleet.py, but this should cover it :)
Points: PTAL at the systemdActiveStateMap in fleet.py and let me know if the mappings between the dbus API's ActiveState and what used to be DjangoFSM's JobState is correct.