Skip to content

Commit

Permalink
Merge pull request #102 from eclipse-ankaios/fix_new_workloads_always…
Browse files Browse the repository at this point in the history
…_state_removed

Fix new workload always having the state "removed"
  • Loading branch information
krucod3 authored Nov 23, 2023
2 parents afbb44c + 580e753 commit 4beb916
Show file tree
Hide file tree
Showing 4 changed files with 326 additions and 12 deletions.
132 changes: 122 additions & 10 deletions agent/doc/swdesign/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ The PodmanKubeRuntime connector implements the runtime connector trait for 'podm

### GenericPollingStateChecker

The GenericPollingStateChecker is a general purpose StateChecker (and implements the state checker trait) that can be used by a runtime connector to make polling requests for workload state as predefined intervals.
The `GenericPollingStateChecker` is a general purpose `StateChecker` (and implements the state checker trait) that can be used by a runtime connector to make polling requests for workload state as predefined intervals.

### External Libraries

Expand Down Expand Up @@ -947,7 +947,7 @@ When the podman runtime connector is called to create workload, the podman runti
* pull the workload image specified in the runtime configuration if the image is not already available locally
* create the container
* start the container in the detached mode
* start a GenericPollingStateChecker to check the workload state
* start a `GenericPollingStateChecker` to check the workload state

Tags:
- PodmanRuntimeConnector
Expand Down Expand Up @@ -1326,7 +1326,7 @@ This section describes how workload states are sampled inside the Ankaios agent
It is required that each runtime connector delivers a state checker when a workload is created. Additionally, the runtime connector provides an extra method for starting a checker for workloads that are resumed by the WorkloadFacade.

How the state checker is implemented is up to the specific runtime connector, given that the state checker trait is implemented. The state checker trait requires a state getter object to be provided. The object must implement the runtime state getter trait and is specific to the runtime connector. The provided state getter object is called inside the state checker.
The extra complexity introduced by having two traits is needed in order to provide common state checker implementations that can be reused among runtime connectors. One of these checkers is the GenericPollingStateChecker.
The extra complexity introduced by having two traits is needed in order to provide common state checker implementations that can be reused among runtime connectors. One of these checkers is the `GenericPollingStateChecker`.

#### General state checker interface
`swdd~agent-general-state-checker-interface~1`
Expand Down Expand Up @@ -1383,10 +1383,10 @@ Needs:

Status: approved

A GenericPollingStateChecker implementation is provided that polls the workload state every second via the provided runtime state getter.
A `GenericPollingStateChecker` implementation is provided that polls the workload state every second via the provided runtime state getter.

Rationale:
The GenericPollingStateChecker helps avoiding code duplication.
The `GenericPollingStateChecker` helps avoiding code duplication.

Tags:
- GenericPollingStateChecker
Expand All @@ -1400,7 +1400,7 @@ Needs:

Status: approved

When the Workload State of a Workload changes on a workload, the GenericPollingStateChecker shall send an UpdateWorkloadState message to the Ankaios Server, containing the new Workload State.
When the Workload State of a Workload changes on a workload, the `GenericPollingStateChecker` shall send an UpdateWorkloadState message to the Ankaios Server, containing the new Workload State.

Tags:
- GenericPollingStateChecker
Expand All @@ -1409,6 +1409,59 @@ Needs:
- impl
- utest

#### PodmanCli container state cache

##### PodmanCli container state cache contains all containers
`swdd~podmancli-container-state-cache-all-containers~1`

Status: approved

The PodmanCli container state cache shall store the state of all Podman containers.

Rationale:
Calling podman for each workload to get its current state uses unnecessary system resources.
Using this cache only one Podman call is needed to get the states of all Podman workloads (podman runtime and podman-kube runtime).

Tags:
- PodmanCli

Needs:
- impl
- utest

##### PodmanCli uses container state cache
`swdd~podmancli-uses-container-state-cache~1`

Status: approved

When the PodmanCli is called to get container states,
the PodmanCli shall use the PodmanCli container state cache for returning the requested states.

Tags:
- PodmanCli

Needs:
- impl
- utest

##### PodmanCli container state cache refresh
`swdd~podmancli-container-state-cache-refresh~1`

Status: approved

When the PodmanCli is called to get container states
and the cache is empty or the content is older than a second,
the PodmanCli shall request Podman for the current container states
and refresh the PodmanCli container state cache with the result
before returning the requested states.

Tags:
- PodmanCli

Needs:
- impl
- utest

#### Podman runtime connector specific state getter

###### Podman runtime implements the runtime state getter trait
Expand Down Expand Up @@ -1454,13 +1507,51 @@ Needs:
- impl
- utest


##### PodmanStateGetter uses PodmanCli
`swdd~podman-state-getter-uses-podmancli~1`

Status: approved

When the `PodmanStateGetter` is called to get the current state of a workload over the state getter interface
the `PodmanStateGetter` shall use the `PodmanCli`.

Tags:
- PodmanRuntimeConnector

Needs:
- impl
- utest

##### PodmanStateGetter reset Podman container state cache
`swdd~podman-state-getter-reset-cache~1`

Status: approved

When the `PodmanStateGetter` is created for a new workload,
the `PodmanStateGetter` shall reset the Podman container state cache.

Rationale:
After a new workload is created,
the Podman container state cache will not contain containers of this workload,
the `PodmanStateGetter` will return `removed` and
the `GenericPollingStateChecker` will stop updating the state of this workload.

Tags:
- PodmanRuntimeConnector

Needs:
- impl
- utest

##### PodmanStateGetter returns removed state
`swdd~podman-state-getter-returns-removed-state~1`

Status: approved

When the `PodmanStateGetter` is called to get the current state over the state getter interface and
the `PodmanStateGetter` gets empty list of current states, the `PodmanStateGetter` shall return the state removed.
When the `PodmanStateGetter` is called to get the current state of a workload over the state getter interface
and the `PodmanStateGetter` gets no state for this workload,
the `PodmanStateGetter` shall return the state removed.

Rationale:
This happens when the container has been removed and the Agent meanwhile triggers status check of the workload.
Expand All @@ -1478,7 +1569,7 @@ Needs:
Status: approved

When the `PodmanStateGetter` is called to get the current state over the state getter interface and
the `PodmanStateGetter` is unable to read the container state (error or too many current states read), the unknown state shall be returned.
the `PodmanStateGetter` is unable to read the container state, the unknown state shall be returned.

Comment:
In other words the unknown state shall be the default state.
Expand Down Expand Up @@ -1515,7 +1606,7 @@ Needs:
Status: approved

When the `PodmanKubeStateGetter` is called to get the current state of a workload,
the `PodmanKubeStateGetter` requests Podman for the state of all containers of the pods listed in the workload ID.
the `PodmanKubeStateGetter` requests PodmanCli for the state of all containers of the pods listed in the workload ID.

Tags:
- PodmanKubeRuntimeConnector
Expand All @@ -1524,6 +1615,27 @@ Needs:
- impl
- utest

##### PodmanKubeStateGetter reset Podman container state cache
`swdd~podman-kube-state-getter-reset-cache~1`

Status: approved

When the `PodmanKubeStateGetter` is created for a new workload,
the `PodmanKubeStateGetter` shall reset the Podman container state cache.

Rationale:
After a new workload is created,
the Podman container state cache will not contain containers of this workload,
the `PodmanKubeStateGetter` will return `removed` and
the `GenericPollingStateChecker` will stop updating the state of this workload.

Tags:
- PodmanRuntimeConnector

Needs:
- impl
- utest

##### PodmanKubeStateGetter returns `removed` if no container exists
`swdd~podman-kube-state-getter-removed-if-no-container~1`

Expand Down
77 changes: 75 additions & 2 deletions agent/src/runtime_connectors/podman/podman_runtime.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ impl RuntimeStateGetter<PodmanWorkloadId> for PodmanStateGetter {
log::trace!("Getting the state for the workload '{}'", workload_id.id);

// [impl->swdd~podman-state-getter-returns-unknown-state~1]
// [impl->swdd~podman-state-getter-uses-podmancli~1]
// [impl->swdd~podman-state-getter-returns-removed-state~1]
let exec_state = match PodmanCli::list_states_by_id(workload_id.id.as_str()).await {
Ok(state) => {
if let Some(state) = state {
Expand Down Expand Up @@ -158,6 +160,9 @@ impl RuntimeConnector<PodmanWorkloadId, GenericPollingStateChecker> for PodmanRu
workload_spec: WorkloadSpec,
update_state_tx: StateChangeSender,
) -> Result<GenericPollingStateChecker, RuntimeError> {
// [impl->swdd~podman-state-getter-reset-cache~1]
PodmanCli::reset_ps_cache().await;

log::debug!(
"Starting the checker for the workload '{}' with id '{}'",
workload_spec.name,
Expand Down Expand Up @@ -199,6 +204,8 @@ mod tests {
state_change_interface::StateChangeCommand,
test_utils::generate_test_workload_spec_with_param,
};
use futures_util::task::Spawn;
use mockall::Sequence;

use super::PodmanCli;
use super::PodmanRuntime;
Expand Down Expand Up @@ -289,8 +296,11 @@ mod tests {
async fn utest_create_workload_success() {
let _guard = MOCKALL_CONTEXT_SYNC.get_lock_async().await;

let context = PodmanCli::podman_run_context();
context.expect().return_const(Ok("test_id".into()));
let run_context = PodmanCli::podman_run_context();
run_context.expect().return_const(Ok("test_id".into()));

let resest_cache_context = PodmanCli::reset_ps_cache_context();
resest_cache_context.expect().return_const(());

let workload_spec = generate_test_workload_spec_with_param(
AGENT_NAME.to_string(),
Expand All @@ -311,6 +321,68 @@ mod tests {
assert_eq!(workload_id.id, "test_id".to_string());
}

// [utest->swdd~podman-state-getter-reset-cache~1]
#[tokio::test]
async fn utest_state_getter_resets_cache() {
let _guard = MOCKALL_CONTEXT_SYNC.get_lock_async().await;

let run_context = PodmanCli::podman_run_context();
run_context.expect().return_const(Ok("test_id".into()));

let mut seq = Sequence::new();

let resest_cache_context = PodmanCli::reset_ps_cache_context();
resest_cache_context
.expect()
.once()
.return_const(())
.in_sequence(&mut seq);

let list_states_context = PodmanCli::list_states_by_id_context();
list_states_context
.expect()
.once()
.return_const(Ok(Some(ExecutionState::ExecRunning)))
.in_sequence(&mut seq);

let workload_spec = generate_test_workload_spec_with_param(
AGENT_NAME.to_string(),
WORKLOAD_1_NAME.to_string(),
PODMAN_RUNTIME_NAME.to_string(),
);
let (to_server, mut from_agent) =
tokio::sync::mpsc::channel::<StateChangeCommand>(BUFFER_SIZE);

let podman_runtime = PodmanRuntime {};
let res = podman_runtime
.create_workload(workload_spec, Some(PathBuf::from("run_folder")), to_server)
.await;

let (_workload_id, _checker) = res.unwrap();

from_agent.recv().await;
}

// [utest->swdd~podman-state-getter-uses-podmancli~1]
#[tokio::test]
async fn utest_state_getter_uses_podman_cli() {
let _guard = MOCKALL_CONTEXT_SYNC.get_lock_async().await;

let list_states_context = PodmanCli::list_states_by_id_context();
list_states_context
.expect()
.return_const(Ok(Some(ExecutionState::ExecRunning)));

let state_getter = PodmanStateGetter {};
let execution_state = state_getter
.get_state(&PodmanWorkloadId {
id: "test_workload_id".into(),
})
.await;

assert_eq!(execution_state, ExecutionState::ExecRunning);
}

#[tokio::test]
async fn utest_create_workload_run_failed() {
let _guard = MOCKALL_CONTEXT_SYNC.get_lock_async().await;
Expand Down Expand Up @@ -419,6 +491,7 @@ mod tests {
assert_eq!(res, Err(RuntimeError::List("simulated error".to_owned())))
}

// [utest->podman-state-getter-uses-podmancli~1]
#[tokio::test]
async fn utest_get_state_returns_state() {
let _guard = MOCKALL_CONTEXT_SYNC.get_lock_async().await;
Expand Down
Loading

0 comments on commit 4beb916

Please sign in to comment.