Skip to content

Commit

Permalink
docs updates -- traces, hooks, execute commands
Browse files Browse the repository at this point in the history
  • Loading branch information
drmorr0 committed Jun 26, 2024
1 parent fe18047 commit 9811d98
Show file tree
Hide file tree
Showing 15 changed files with 187 additions and 292 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ rstest = { version = "0.18.2", optional = true }

anyhow = { version = "1.0.75", features = ["backtrace"] }
async-recursion = "1.0.5"
async-trait = "0.1.80"
bytes = "1.5.0"
chrono = "0.4.26"
clap = { version = "4.3.21", features = ["cargo", "derive", "string"] }
Expand All @@ -62,7 +63,6 @@ tracing = "0.1.37"
tracing-log = "0.1.3"
tracing-subscriber = { version = "0.3.17", features = ["env-filter"] }
url = "2.4.1"
async-trait = "0.1.80"

[dependencies.kube]
version = "0.85.0"
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ This package provides the following components:

## Documentation

Full [documentation for SimKube](https://appliedcomputing.io/simkube/index.html) is available on Applied
Full [documentation for SimKube](https://appliedcomputing.io/simkube/) is available on Applied
Computing's website. Here are some quick links to select topics:

- [Installation](https://appliedcomputing.io/simkube/docs/intro/installation.html)
- [Autoscaling](http://appliedcomputing.io/simkube/docs/adv/autoscaling.html)
- [Metrics Collection](http://appliedcomputing.io/simkube/docs/adv/metrics.html)
- [Component Reference](http://appliedcomputing.io/simkube/docs/components/sk-ctrl.html)
- [Developing SimKube](http://appliedcomputing.io/simkube/docs/dev/contributing.html)
- [Installation](https://appliedcomputing.io/simkube/docs/intro/installation/)
- [Autoscaling](http://appliedcomputing.io/simkube/docs/adv/autoscaling/)
- [Metrics Collection](http://appliedcomputing.io/simkube/docs/adv/metrics/)
- [Component Reference](http://appliedcomputing.io/simkube/docs/components/skctl/)
- [Developing SimKube](http://appliedcomputing.io/simkube/docs/dev/contributing/)

## Contributing

Expand Down
10 changes: 5 additions & 5 deletions docs/adv/autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,8 @@ are some initial instructions in there for installing the karpenter+KWOK binary
it will automatically use KWOK to scale up nodes in the cluster just like Cluster Autoscaler. As with Cluster
Autoscaler, KWOK applies the `kwok-provider=true:NoSchedule` taint to the nodes it creates.

> [!NOTE]
> Unlike Cluster Autoscaler, karpenter does not take in a list of Kubernetes Node specs to determine what instances it
> launches. Instead, it uses a hard-coded list of "generic" instance types which roughly map to standard instance
> offerings by the major cloud providers. There is an [open PR](https://github.com/kubernetes-sigs/karpenter/pull/1048)
> to enable configuring node types via an injected file.
Unlike Cluster Autoscaler, karpenter does not take in a list of Kubernetes Node specs to determine what instances it
launches. Instead, it uses a hard-coded list of "generic" instance types which roughly map to standard instance
offerings by the major cloud providers. If you want to run Karpenter with a different set of configured instances, you
need to modify the [embedded `instance_types.json`](https://github.com/kubernetes-sigs/karpenter/blob/main/kwok/cloudprovider/instance_types.json)
file and rebuild Karpenter.
60 changes: 60 additions & 0 deletions docs/adv/hooks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
<!--
project: SimKube
template: docs.html
-->

# Simulation hooks

SimKube supports running arbitrary setup or cleanup scripts at a number of different points during the simulation
process. The general method for configuring hooks is the same at each extension point: simply inject the following
command into the Simulation custom resource:

```yaml
- cmd: echo # required
args: ["foo"] # required
ignoreFailure: true # optional, will not abort the simulation on failure
sendSim: true # optional, will send the Simulation resource to the hook as JSON over stdin
```
## Extension points
There are four places where hooks can be injected:
### preStartHooks
Pre-start hooks run once before any other simulation setup; you can use these hooks to create additional namespaces, set
up monitoring, etc.
### postStopHooks
Similarly, post-stop hooks run once after _all_ simulation iterations have completed and after all other cleanup tasks
are complete. They can be used to clean up any resources or do additional reporting on the simulation results
(extracting logs from relevant pods, for example).
### preRunHooks
Pre-run hooks run before _every_ iteration of the simulation, and can be used to re-create resources that should be
"fresh" at the beginning of each iteration. They are the first thing the SimKube driver runs, before executing any
other setup.
### postRunHooks
Lastly, post-run hooks run at the end of _every_ simulation iteration, and can be used to clean up resources that might
pollute future simulation iterations. They are the last thing the SimKube driver runs.
## Injecting hooks
If you are using `skctl` to run your simulation, you can provide a set of hooks via a YAML file similar to the
following, using the `--hooks` CLI argument:

```bash exec="on" result="yaml"
cat simkube/examples/hooks/example.yml
```

Otherwise, you can specify the hooks as part of the Simulation custom resource object.

## Running hooks

All executables needed to run hooks must be present and on the path in the `sk-ctrl` pod (for pre-start and post-stop
hooks) or in the `sk-driver` pod (for pre-run and post-run hooks). The standard Docker images built for SimKube include
`kubectl`, `curl`, and `jq` for this purpose.
46 changes: 46 additions & 0 deletions docs/adv/traces.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
<!--
project: SimKube
template: docs.html
-->

# Traces

The SimKube tracer collects timeseries data about the events happening in a live Kubernetes cluster and exports that
data to a trace file for future replay and analysis. These trace files can then be stored in a cloud provider or
downloaded locally. We describe configuration options for each of these use cases.

## Cloud storage

We support exporting traces to Amazon S3, Google Cloud Storage, and Microsoft Azure Storage through the
[object\_store](https://docs.rs/object_store/latest/object_store/) crate. The `sk-tracer` and `sk-driver` pods need to
be configured with the correct permissions to write and read data to your chosen cloud storage. One option is to inject
environment variables into the pod that object\_store understands.

- Amazon S3: use the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables
- Google Cloud Storage: use the `GOOGLE_SERVICE_ACCOUNT` environment variable and inject your service account JSON file
into the `sk-tracer` pod
- Microsoft Azure: use the `AZURE_STORAGE_ACCOUNT_NAME` and `AZURE_STORAGE_ACCOUNT_KEY` environment variables

The object\_store crate will try other authentication/authorization methods if these environment variables are not set
(for example, it will try to get credentials from the instance metadata endpoint for AWS), so these are not the only
ways to grant permissions to the tracer and the driver. Configuring these permissions is beyond the scope of this
documentation, and we encourage you to consult the IAM documentation for your chosen cloud provider(s).

## Local storage

If you do not have access to (or do not want to use) cloud storage, you can also save a trace file to local storage
using, for example, `skctl export -o file:///path/to/trace`. However, using this trace file in the simulator is a bit
more complicated; it will need to be injected into the node(s) where your Simulation driver pods will run, and then
volume-mounted into the driver pod. If you are running locally via `kind`, you can add the following block to your
`kind` config to mount the trace file directory on your laptop into the kind nodes:

```yaml
- role: worker
extraMounts:
- hostPath: /tmp/kind-node-data
containerPath: /data
```
From there, when you run a simulation, you need to specify the trace data using `skctl run --trace-path
file:///data/trace`. This location is the location _inside the Kind node docker container_, not inside the driver pod.
SimKube will automatically construct the appropriate volume mounts so that the driver pod can reference the trace.
60 changes: 21 additions & 39 deletions docs/components/sk-ctrl.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,56 +12,34 @@ actually perform the Simulation.

## Usage

```
Usage: sk-ctrl [OPTIONS]
Options:
--use-cert-manager
--cert-manager-issuer <CERT_MANAGER_ISSUER> [default: ]
-v, --verbosity <VERBOSITY> [default: info]
-h, --help Print help
```bash exec="on" result="plain"
sk-ctrl --help
```

## Details

The Simulation Controller does the following on receipt of a new Simulation:

0. Verifies that all the expected pre-existing objects are present in the cluster
1. Creates a SimulationRoot object to hang all of the simulated objects off of
2. Creates the namespace for the simulation driver to run in
3. Creates custom resources for the [Prometheus operator](https://prometheus-operator.dev) to configure metrics
0. Runs all preStart hooks
1. Verifies that all the expected pre-existing objects are present in the cluster
2. Creates a SimulationRoot "meta" object to hang objects off of that should persist for the whole simulation
3. Creates the namespace for the simulation driver to run in
4. Creates custom resources for the [Prometheus operator](https://prometheus-operator.dev) to configure metrics
collection
4. Creates a MutatingWebhookConfiguration for the simulation driver
5. Creates a Service for the simulation driver
6. Sets up certificates for the simulation driver mutating webhook (currently requires the use of
5. Creates a MutatingWebhookConfiguration for the simulation driver
6. Creates a Service for the simulation driver
7. Sets up certificates for the simulation driver mutating webhook (currently requires the use of
[cert-manager](https://cert-manager.io)).
7. Creates the simulation driver Job
8. Creates the simulation driver Job
9. Waits for the driver to complete
10. Cleans up all "meta" resources
11. Runs all postStop hooks

## Simulation Custom Resource

Here is an example Simulation object:

```yaml
apiVersion: simkube.io/v1
kind: Simulation
metadata:
name: testing
spec:
driverNamespace: simkube
metricsConfig:
namespace: monitoring
serviceAccount: prometheus-k8s
remoteWriteConfigs:
- url: http://prom2parquet-svc.monitoring:1234/receive
trace: file:///data/trace
```
The `SimulationSpec` contains three fields, the location of the trace file which we want to use for the simulation,
configuration for metrics collection, and the namespace to launch the driver into. Currently the only trace location
supported is `file:///`, i.e., the trace file already has to be present on the driver node at the specified location.
In the future we will support downloading from an S3 bucket or other persistent storage.

The Simulation CR is cluster-namespaced, because it must create SimulationRoots.
Simulations are controlled by a Simulation custom resource object, which specifies, among other things, how to configure
the Simulation driver, metrics collection, and any hooks. The Simulation CR is cluster-namespaced, because it must
create SimulationRoots.

## SimulationRoot Custom Resource

Expand All @@ -73,6 +51,10 @@ owned by the SimulationRoot, so that users can still see the results and logs fr

## Configuring Metrics Collection

> [!NOTE] In the future we may move metrics collection out of SimKube proper and instead run it as a standard "hook".
> If you do not want to use Prometheus for metrics collection, or wish to configure it differently, you can disable
> metrics collection using `skctl --disable-metrics` and configure your own metrics solution with a preStart hook.
SimKube depends on the [Prometheus operator](https://prometheus-operator.dev) being installed in your simulation
cluster, as it creates custom resources understood by this operator. The `metricsConfig` section of the Simulation spec
controls how this is set up. The `namespace` and the `serviceAccount` fields are the namespace and service account that
Expand Down
40 changes: 18 additions & 22 deletions docs/components/sk-driver.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,32 +12,28 @@ selectors, and tolerations to ensure that the simulated pods end up on virtual n

## Usage

```
Usage: sk-driver [OPTIONS] --sim-name <SIM_NAME> --sim-root <SIM_ROOT> --virtual-ns-prefix <VIRTUAL_NS_PREFIX> \
--cert-path <CERT_PATH> --key-path <KEY_PATH> --trace-mount-path <TRACE_MOUNT_PATH>
Options:
--sim-name <SIM_NAME>
--sim-root <SIM_ROOT>
--virtual-ns-prefix <VIRTUAL_NS_PREFIX>
--admission-webhook-port <ADMISSION_WEBHOOK_PORT> [default: 8888]
--cert-path <CERT_PATH>
--key-path <KEY_PATH>
--trace-mount-path <TRACE_MOUNT_PATH>
-v, --verbosity <VERBOSITY> [default: info]
-h, --help Print help
```bash exec="on" result="plain"
sk-driver --help
```

## Details

The driver is launched by the [Simulation Controller](./sk-ctrl.md) when a new simulation is started. On startup, it
reads the cluster trace from the specified `--trace-path` and then replays all the events in the trace. The driver
shuts down when the trace is finished.

The driver also exposes a `/mutate` endpoint on the specified `--admission-webhook-port`, which is called by the
Kubernetes control plane whenever a new pod is created. The mutation endpoint checks to see if the Pod is owned by any
of the simulated resources, and if so, adds the following mutations to the object to ensure that it is scheduled on the
virtual cluster:
The driver is launched by the [Simulation Controller](./sk-ctrl.md) when a new simulation is started. The driver
performs the following steps:

0. Runs all preRun hooks
1. Creates the mutating webhook listener endpoint
2. Creates a SimulationRoot object to hang all simulation objects off of
3. Reads the specified trace from the specified path
4. Replays the trace events
5. Cleans up the SimulationRoot
6. Shuts down the mutating webhook listener
7. Runs all postRun hooks

The driver exposes a `/mutate` endpoint on the specified `--admission-webhook-port`, which is called by the Kubernetes
control plane whenever a new pod is created. The mutation endpoint checks to see if the Pod is owned by any of the
simulated resources, and if so, adds the following mutations to the object to ensure that it is scheduled on the virtual
cluster:

```yaml
labels:
Expand Down
10 changes: 2 additions & 8 deletions docs/components/sk-tracer.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,8 @@ all or a portion of the trace to persistent storage so that it can be replayed l

## Usage

```
Usage: sk-tracer [OPTIONS] --config-file <CONFIG_FILE> --server-port <SERVER_PORT>
Options:
-c, --config-file <CONFIG_FILE>
--server-port <SERVER_PORT>
-v, --verbosity <VERBOSITY> [default: info]
-h, --help Print help
```bash exec="on" result="plain"
sk-tracer --help
```

## Config File Format
Expand Down
Loading

0 comments on commit 9811d98

Please sign in to comment.