From 56d4b5abae1885a1cf70e0f8c5445ca35995ae95 Mon Sep 17 00:00:00 2001 From: Si Beaumont Date: Tue, 2 Dec 2014 13:32:48 +0000 Subject: [PATCH] Fix up code markdown. Renders better in browser and editor Inline code should be backticked _once_ i.e. `function_name`. Code blocks should use three backticks and a newline and a language where appropriate... e.g. ```ocaml let f () = () ``` Carry on here... Signed-off-by: Si Beaumont --- README.md | 2 + features/HA/HA.md | 55 ++++++++++--------- features/XSM/XSM.md | 4 +- features/backtraces.md | 34 ++++++------ getting-started/architecture.md | 4 +- squeezed/architecture/architecture.md | 6 +- xapi/architecture.md | 16 +++--- xapi/design/XenAPI-evolution.md | 2 +- xapi/futures/database/distributed-database.md | 20 +++---- xenopsd/design/Tasks.md | 21 ++++--- xenopsd/walkthroughs/VM.start.md | 40 ++++++++------ 11 files changed, 109 insertions(+), 95 deletions(-) diff --git a/README.md b/README.md index 8a4565fa2..aa904e121 100644 --- a/README.md +++ b/README.md @@ -48,6 +48,7 @@ $ gem install jekyll Then you can host the site locally with the following command from the root directory of this repository: + ``` $ jekyll serve -w --baseurl '/xapi-project' ``` @@ -57,6 +58,7 @@ You will then be able to view the page at `localhost:4000/xapi-project`. ## A note on images If you are contributing images, consider compressing them to keep this repo as slim as possible: + ``` convert -resize 900 -background white -colors 256 [input.png] [output.png] ``` diff --git a/features/HA/HA.md b/features/HA/HA.md index 011eacc01..d161d5ca6 100644 --- a/features/HA/HA.md +++ b/features/HA/HA.md @@ -100,7 +100,7 @@ continue to fail then xhad will consider the host to have failed and self-fence. xhad is configured via a simple config file written on each host in -```/etc/xensource/xhad.conf```. The file must be identical on each host +`/etc/xensource/xhad.conf`. The file must be identical on each host in the cluster. To make changes to the file, HA must be disabled and then re-enabled afterwards. Note it may not be possible to re-enable HA depending on the configuration change (e.g. if a host has been added but that host has @@ -114,6 +114,7 @@ The xhad.conf file is written in XML and contains which local network interface and block device to use for heartbeating. The following is an example xhad.conf file: + ``` @@ -186,29 +187,29 @@ The fields have the following meaning: xapi's notion of a management network. - HeartbeatTimeout: if a heartbeat packet is not received for this many seconds, then xhad considers the heartbeat to have failed. This is - the user-supplied "HA timeout" value, represented below as ```T```. - ```T``` must be bigger than 10; we would normally use 60s. + the user-supplied "HA timeout" value, represented below as `T`. + `T` must be bigger than 10; we would normally use 60s. - StateFileTimeout: if a storage update is not seen for a host for this many seconds, then xhad considers the storage heartbeat to have failed. - We would normally use the same value as the HeartbeatTimeout ```T```. + We would normally use the same value as the HeartbeatTimeout `T`. - HeartbeatInterval: interval between heartbeat packets sent. We would - normally use a value ```2 <= t <= 6```, derived from the user-supplied - HA timeout via ```t = (T + 10) / 10``` + normally use a value `2 <= t <= 6`, derived from the user-supplied + HA timeout via `t = (T + 10) / 10` - StateFileInterval: interval betwen storage updates (also known as "statefile updates"). This would normally be set to the same value as HeartbeatInterval. - HeartbeatWatchdogTimeout: If the host does not send a heartbeat for this amount of time then the host self-fences via the Xen watchdog. We normally - set this to ```T```. + set this to `T`. - StateFileWatchdogTimeout: If the host does not update the statefile for this amount of time then the host self-fences via the Xen watchdog. We - normally set this to ```T+15```. + normally set this to `T+15`. - BootJoinTimeout: When the host is booting and joining the liveset (i.e. the cluster), consider the join a failure if it takes longer than this - amount of time. We would normally set this to ```T+60```. + amount of time. We would normally set this to `T+60`. - EnableJoinTimeout: When the host is enabling HA for the first time, consider the enable a failure if it takes longer than this amount of time. - We would normally set this to ```T+60```. + We would normally set this to `T+60`. - XapiHealthCheckInterval: Interval between "health checks" where we run a script to check whether Xapi is responding or not. - XapiHealthCheckTimeout: Number of seconds to wait before assuming that @@ -223,20 +224,20 @@ The fields have the following meaning: In addition to the config file, Xhad exposes a simple control API which is exposed as scripts: -- ```ha_set_pool_state (Init | Invalid)```: sets the global pool state to "Init" (before starting +- `ha_set_pool_state (Init | Invalid)`: sets the global pool state to "Init" (before starting HA) or "Invalid" (causing all other daemons who can see the statefile to shutdown) "Invalid" -- ```ha_start_daemon```: if the pool state is "Init" then the daemon will +- `ha_start_daemon`: if the pool state is "Init" then the daemon will attempt to contact other daemons and enable HA. If the pool state is "Active" then the host will attempt to join the existing liveset. -- ```ha_query_liveset```: returns the current state of the cluster. -- ```ha_propose_master```: returns whether the current node has been +- `ha_query_liveset`: returns the current state of the cluster. +- `ha_propose_master`: returns whether the current node has been elected pool master. -- ```ha_stop_daemon```: shuts down the xhad on the local host. Note this +- `ha_stop_daemon`: shuts down the xhad on the local host. Note this will not disarm the Xen watchdog by itself. -- ```ha_disarm_fencing```: disables fencing on the local host. -- ```ha_set_excluded```: when a host is being shutdown cleanly, record the +- `ha_disarm_fencing`: disables fencing on the local host. +- `ha_set_excluded`: when a host is being shutdown cleanly, record the fact that the VMs have all been shutdown so that this host can be ignored in future cluster membership calculations. @@ -288,7 +289,7 @@ Starting up a host ![Starting up a host](HA.start.svg) -First Xapi starts up the xhad via the ```ha_start_daemon``` script. The +First Xapi starts up the xhad via the `ha_start_daemon` script. The daemons read their config files and start exchanging heartbeats over the network and storage. All hosts must be online and all heartbeats must be working for HA to be enabled -- it is not sensible to enable HA when @@ -297,11 +298,11 @@ join the liveset then it clears the "excluded" flag which would have been set if the host had been shutdown cleanly before -- this is only needed when a host is shutdown cleanly and then restarted. -Xapi periodically queries the state of xhad via the ```ha_query_liveset``` -command. The state will be ```Starting``` until the liveset is fully -formed at which point the state will be ```Online```. +Xapi periodically queries the state of xhad via the `ha_query_liveset` +command. The state will be `Starting` until the liveset is fully +formed at which point the state will be `Online`. -When the ```ha_start_daemon``` script returns then Xapi will decide +When the `ha_start_daemon` script returns then Xapi will decide whether to stand for master election or not. Initially when HA is being enabled and there is a master already, this node will be expected to stand unopposed. Later when HA notices that the master host has been @@ -322,9 +323,9 @@ running on it. An excluded host will never allow itself to form part of a "split brain". Once a host has given up its master role and shutdown any VMs, it is safe -to disable fencing with ```ha_disarm_fencing``` and stop xhad with -```ha_stop_daemon```. Once the daemon has been stopped the "excluded" -bit can be set in the statefile via ```ha_set_excluded``` and the +to disable fencing with `ha_disarm_fencing` and stop xhad with +`ha_stop_daemon`. Once the daemon has been stopped the "excluded" +bit can be set in the statefile via `ha_set_excluded` and the host safely rebooted. Disabling HA cleanly @@ -334,7 +335,7 @@ Disabling HA cleanly HA can be shutdown cleanly when the statefile is working i.e. when hosts are alive because of survival rule 1. First the master Xapi tells the local -Xhad to mark the pool state as "invalid" using ```ha_set_pool_state```. +Xhad to mark the pool state as "invalid" using `ha_set_pool_state`. Every xhad instance will notice this state change the next time it performs a storage heartbeat. The Xhad instances will shutdown and Xapi will notice that HA has been disabled the next time it attempts to query the liveset. @@ -353,7 +354,7 @@ on every host. Since all hosts are online thanks to survival rule 2, the Xapi master is able to tell all Xapi instances to disable their recovery logic. Once the Xapis have been disabled -- and there is no possibility of split brain -- each host is asked to disable the watchdog -with ```ha_disarm_fencing``` and then to stop Xhad with ```ha_stop_daemon```. +with `ha_disarm_fencing` and then to stop Xhad with `ha_stop_daemon`. Add a host to the pool ---------------------- diff --git a/features/XSM/XSM.md b/features/XSM/XSM.md index 15498fe59..0915d838b 100644 --- a/features/XSM/XSM.md +++ b/features/XSM/XSM.md @@ -16,8 +16,8 @@ The following diagram shows how XSM works at a high level: The slowest part of a storage migration is migrating the storage, since virtual disks can be very large. Xapi starts by taking a snapshot and copying that to the destination as a background task. Before the datapath connecting the VM -to the disk is re-established, xapi tells ```tapdisk``` to start mirroring all -writes to a remote ```tapdisk``` over NBD. From this point on all VM disk writes +to the disk is re-established, xapi tells `tapdisk` to start mirroring all +writes to a remote `tapdisk` over NBD. From this point on all VM disk writes are written to both the old and the new disk. When the background snapshot copy is complete, xapi can migrate the VM memory across. Once the VM memory image has been received, the destination VM is diff --git a/features/backtraces.md b/features/backtraces.md index ed8ab3754..2f5062af3 100644 --- a/features/backtraces.md +++ b/features/backtraces.md @@ -41,18 +41,18 @@ let finally f cleanup = raise e (* <-- backtrace starts here now *) ``` -This function performs some action (i.e. ```f ()```) and guarantees to -perform some cleanup action (```cleanup ()```) whether or not an exception +This function performs some action (i.e. `f ()`) and guarantees to +perform some cleanup action (`cleanup ()`) whether or not an exception is thrown. This is a common pattern to ensure resources are freed (e.g. -closing a socket or file descriptor). Unfortunately the ```raise e``` in +closing a socket or file descriptor). Unfortunately the `raise e` in the exception handler loses the backtrace context: when the exception -gets to the toplevel, ```Printexc.get_backtrace ()``` will point at the -```finally``` rather than the real cause of the error. +gets to the toplevel, `Printexc.get_backtrace ()` will point at the +`finally` rather than the real cause of the error. We will use a variant of the solution proposed by [Jacques-Henri Jourdan](http://gallium.inria.fr/blog/a-library-to-record-ocaml-backtraces/) where we will record backtraces when we catch exceptions, before the -buffer is reinitialised. Our ```finally``` function will now look like this: +buffer is reinitialised. Our `finally` function will now look like this: ``` let finally f cleanup = @@ -66,7 +66,7 @@ let finally f cleanup = raise e ``` -The function ```Backtrace.is_important e``` associates the exception ```e``` +The function `Backtrace.is_important e` associates the exception `e` with the current backtrace before it gets deleted. Xapi always has high-level exception handlers or other wrappers around all the @@ -74,8 +74,8 @@ threads it spawns. In particular Xapi tries really hard to associate threads with active tasks, so it can prefix all log lines with a task id. This helps admins see the related log lines even when there is lots of concurrent activity. Xapi also tries very hard to label other threads with names for the same reason -(e.g. ```db_gc```). Every thread should end up being wrapped in ```with_thread_named``` -which allows us to catch exceptions and log stacktraces from ```Backtrace.get``` +(e.g. `db_gc`). Every thread should end up being wrapped in `with_thread_named` +which allows us to catch exceptions and log stacktraces from `Backtrace.get` on the way out. OCaml design guidelines @@ -85,21 +85,21 @@ Making nice backtraces requires us to think when we write our exception raising and handling code. In particular: - If a function handles an exception and re-raise it, you must call - ```Backtrace.is_important e``` with the exception to capture the backtrace first. -- If a function raises a different exception (e.g. ```Not_found``` becoming a XenAPI - ```INTERNAL_ERROR```) then you must use ```Backtrace.reraise ``` to + `Backtrace.is_important e` with the exception to capture the backtrace first. +- If a function raises a different exception (e.g. `Not_found` becoming a XenAPI + `INTERNAL_ERROR`) then you must use `Backtrace.reraise ` to ensure the backtrace is preserved. - All exceptions should be printable -- if the generic printer doesn't do a good enough job then register a custom printer. - If you are the last person who will see an exception (because you aren't going - to rethow it) then you *may* log the backtrace via ```Debug.log_backtrace e``` + to rethow it) then you *may* log the backtrace via `Debug.log_backtrace e` *if and only if* you reasonably expect the resulting backtrace to be helpful and not spammy. - If you aren't the last person who will see an exception (because you are going to rethrow it or another exception), then *do not* log the backtrace; the next handler will do that. - All threads should have a final exception handler at the outermost level - for example ```Debug.with_thread_named``` will do this for you. + for example `Debug.with_thread_named` will do this for you. Backtraces in python @@ -269,7 +269,7 @@ The SMAPIv1 API Errors in SMAPIv1 are returned as XMLRPC "Faults" containing a code and a status line. Xapi transforms these into XenAPI exceptions usually of the -form ```SR_BACKEND_FAILURE_```. We can extend the SM backends to use the +form `SR_BACKEND_FAILURE_`. We can extend the SM backends to use the XenAPI exception type directly: i.e. to marshal exceptions as dictionaries: ```python @@ -281,12 +281,12 @@ XenAPI exception type directly: i.e. to marshal exceptions as dictionaries: We can then define a new backtrace-carrying error: -- code = ```SR_BACKEND_FAILURE_WITH_BACKTRACE``` +- code = `SR_BACKEND_FAILURE_WITH_BACKTRACE` - param1 = json-encoded backtrace - param2 = code - param3 = reason -which is internally transformed into ```SR_BACKEND_FAILURE_``` and +which is internally transformed into `SR_BACKEND_FAILURE_` and the backtrace is appended to the current Task backtrace. From the client's point of view the final exception should look the same, but Xapi will have a chance to see and log the whole backtrace. diff --git a/getting-started/architecture.md b/getting-started/architecture.md index 0fe2910a6..6ed527037 100644 --- a/getting-started/architecture.md +++ b/getting-started/architecture.md @@ -19,8 +19,8 @@ running xapi, all sharing some storage: At any time, at most one host is known as the *pool master* and is responsible for co-ordination and locking resources within the pool. When a pool is first created a master host is chosen. The master role can be transferred -- on user request in an orderly fashion (```xe pool-designate-new-master```) -- on user request in an emergency (```xe pool-emergency-transition-to-master```) +- on user request in an orderly fashion (`xe pool-designate-new-master`) +- on user request in an emergency (`xe pool-emergency-transition-to-master`) - automatically if HA is enabled on the cluster. All hosts expose an HTTP and XML/RPC interface running on port 80 and with TLS/SSL diff --git a/squeezed/architecture/architecture.md b/squeezed/architecture/architecture.md index f387f3e1d..f58c88140 100644 --- a/squeezed/architecture/architecture.md +++ b/squeezed/architecture/architecture.md @@ -13,7 +13,7 @@ The following diagram shows the internals of Squeezed: At the center of squeezed is an abstract model of a Xen host. The model includes - the amount of already-used host memory (used by fixed overheads such as Xen and the crash kernel) -- per-domain memory policy specifically ```dynamic-min``` and ```dynamic-max``` which +- per-domain memory policy specifically `dynamic-min` and `dynamic-max` which together describe a range, within which the domain's actual used memory should remain - per-domain calibration data which allows us to compute the necessary balloon target value to achive a particular memory usage value. @@ -32,7 +32,7 @@ their targets. Note that ballooning is fundamentally a co-operative process, so must handle cases where the domains refuse to obey commands. The "output" of squeezed is a list of "actions" which include: -- set domain x's ```memory/target``` to a new value -- set the ```maxmem``` of a domain to a new value (as a hard limit beyond which the domain +- set domain x's `memory/target` to a new value +- set the `maxmem` of a domain to a new value (as a hard limit beyond which the domain cannot allocate) diff --git a/xapi/architecture.md b/xapi/architecture.md index 45cad5b8e..ead021081 100644 --- a/xapi/architecture.md +++ b/xapi/architecture.md @@ -38,11 +38,11 @@ The APIs are classified into categories: efficient access to the data. * emergency: these deal with scenarios where the master is offline -If the incoming API call should be resent to the master than a XenAPI ```HOST_IS_SLAVE``` +If the incoming API call should be resent to the master than a XenAPI `HOST_IS_SLAVE` error message containing the master's IP is sent to the client. Once past the initial checks, API calls enter the "message forwarding" layer which -- locks resources (via the ```current_operations``` mechanism) +- locks resources (via the `current_operations` mechanism) - decides which host should execute the request. If the request should run locally then a direct function call is used; otherwise @@ -63,7 +63,7 @@ If the XenAPI call is a storage operation then the "storage access" layer - invokes the relevant operation in the Storage Manager API (SMAPI) v2 interface; - uses the SMAPIv2 to SMAPIv1 converter to generate the necessary command-line to talk to the SMAPIv1 plugin (EXT, NFS, LVM etc) and to execute it -- persists the state of the storage objects (including the result of a ```VDI.attach``` +- persists the state of the storage objects (including the result of a `VDI.attach` call) to persistent storage Internally the SMAPIv1 plugins use privileged access to the Xapi database to directly @@ -76,19 +76,19 @@ The SMAPIv1 plugins also rely on Xapi for The Xapi database contains Host and VM metadata and is shared pool-wide. The master keeps a copy in memory, and all other nodes remote queries to the master. The database associates each object with a generation count which is used to implement the XenAPI -```event.next``` and ```event.from``` APIs. The database is routinely asynchronously flushed to disk +`event.next` and `event.from` APIs. The database is routinely asynchronously flushed to disk in XML format. If the "redo-log" is enabled then all database writes are made synchronously as deltas to a shared block device. Without the redo-log, recent updates may be lost if Xapi is killed before a flush. High-Availability refers to planning for host failure, monitoring host liveness and then following-through on the plans. Xapi defers to an external host liveness monitor -called ```xhad```. When ```xhad``` confirms that a host has failed -- and has been +called `xhad`. When `xhad` confirms that a host has failed -- and has been isolated from the storage -- then Xapi will restart any VMs which have failed and which have been marked as "protected" by HA. Xapi can also impose admission control to prevent -the pool becoming too overloaded to cope with ```n``` arbitrary host failures. +the pool becoming too overloaded to cope with `n` arbitrary host failures. -The ```xe``` CLI is implemented in terms of the XenAPI, but for efficiency the implementation -is linked directly into Xapi. The ```xe``` program remotes its command-line to Xapi, +The `xe` CLI is implemented in terms of the XenAPI, but for efficiency the implementation +is linked directly into Xapi. The `xe` program remotes its command-line to Xapi, and Xapi sends back a series of simple commands (prompt for input; print line; fetch file; exit etc). diff --git a/xapi/design/XenAPI-evolution.md b/xapi/design/XenAPI-evolution.md index a4de57469..84660d917 100644 --- a/xapi/design/XenAPI-evolution.md +++ b/xapi/design/XenAPI-evolution.md @@ -87,6 +87,6 @@ The XenAPI documentation will contain its complete lifecycle history for each XenAPI element. Only the elements described in the documentation are "official" and supported. -Each object, message and field in ```datamodel.ml``` will have lifecycle +Each object, message and field in `datamodel.ml` will have lifecycle metadata attached to it, which is a list of transitions (transition type * release * explanation string) as described above. Release notes are automatically generated from this data. diff --git a/xapi/futures/database/distributed-database.md b/xapi/futures/database/distributed-database.md index fabb0dc68..0b30e3922 100644 --- a/xapi/futures/database/distributed-database.md +++ b/xapi/futures/database/distributed-database.md @@ -28,8 +28,8 @@ Using git via Irmin ------------------- A git repository is a database of key=value pairs with branching history. -If we placed our host and VM metadata in git then we could ```commit``` -changes and ```pull``` and ```push``` them between replicas. The +If we placed our host and VM metadata in git then we could `commit` +changes and `pull` and `push` them between replicas. The [Irmin](https://github.com/mirage/irmin) library provides an easy programming interface on top of git which we could link with the Xapi database layer. @@ -40,7 +40,7 @@ Proposed new architecture The diagram above shows two hosts: one a master and the other a regular host. The XenAPI client has sent a request to the wrong host; normally this would -result in a ```HOST_IS_SLAVE``` error being sent to the client. In the new +result in a `HOST_IS_SLAVE` error being sent to the client. In the new world, the host is able to process the request, only contacting the master if it is necessary to acquire a lock. Starting a VM would require a lock; but rebooting or migrating an existing VM would not. Assuming the lock can @@ -85,22 +85,22 @@ We will lose the following - the ability to use the Xapi database as a "lock" - coherence between hosts: there will be no guarantee that an effect seen by host 'A' will be seen immediately by host 'B'. In particular this means - that clients should send all their commands and ```event.from``` calls to + that clients should send all their commands and `event.from` calls to the same host (although any host will do) Stuff we need to build ---------------------- -- A ```pull```/```push``` replicator: this would have to monitor the list +- A `pull`/`push` replicator: this would have to monitor the list of hosts in the pool and distribute updates to them in some vaguely efficient manner. Ideally we would avoid hassling the pool master and use some more efficient topology: perhaps a tree? -- A ```git diff``` to XenAPI event converter: whenever a host ```pull```s +- A `git diff` to XenAPI event converter: whenever a host `pull`s updates from another it needs to convert the diff into a set of touched - objects for any ```event.from``` to read. We could send the changeset hash - as the ```event.from``` token. + objects for any `event.from` to read. We could send the changeset hash + as the `event.from` token. - Irmin nested views: since Tasks can be nested (and git branches can be nested) we need to make sure that Irmin views can be nested. @@ -114,7 +114,7 @@ Stuff we need to build without triggering an early merge (which would harm efficiency) - We need to create a first-class locking API to use instead of the - ```VDI.sm_config``` locks. + `VDI.sm_config` locks. Prototype --------- @@ -127,7 +127,7 @@ opam pin add xapi-database git://github.com/djs55/xapi-database opam pin add xapi git://github.com/djs55/xen-api#schema-sexp ``` -The ```xapi-database``` is clone of the existing Xapi database code +The `xapi-database` is clone of the existing Xapi database code configured to run as a separate process. There is [code to convert from XML to git](https://github.com/djs55/xapi-database/blob/master/core/db_git.ml#L55) and diff --git a/xenopsd/design/Tasks.md b/xenopsd/design/Tasks.md index 82519533f..4cb85833e 100644 --- a/xenopsd/design/Tasks.md +++ b/xenopsd/design/Tasks.md @@ -28,6 +28,7 @@ Types ----- A task has a state, which may be Pending, Completed or failed: + ``` type async_result = unit @@ -49,7 +50,8 @@ To see how they are marshalled, see [Xenops_server](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L564). From the point of view of a client, a Task has the immutable type (which can be - queried with a ```Task.stat```): +queried with a `Task.stat`): + ``` type t = { id: id; @@ -60,6 +62,7 @@ From the point of view of a client, a Task has the immutable type (which can be debug_info: (string * string) list; } ``` + where - id is a unique (integer) id generated by Xenopsd. This is how a Task is represented to clients @@ -96,7 +99,7 @@ Lifecycle of a Task ------------------- All Tasks returned by API functions are created as part of the enqueue functions: -[queue_operation_*](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L1451). +[queue_operation\_\*](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L1451). Even operations which are performed internally are normally wrapped in Tasks by the function [immediate_operation](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L1451). @@ -105,18 +108,18 @@ A queued operation will be processed by one of the [queue worker threads](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L554). It will - set the thread-local debug key to the Task.dbg -- call ```task.Xenops_task.run```, taking care to catch exceptions and update - the ```task.Xenops_task.state``` +- call `task.Xenops_task.run`, taking care to catch exceptions and update + the `task.Xenops_task.state` - unset the thread-local debug key - generate an event on the Task to provoke clients to query the current state. Task implementations must update their progress as they work. For the common -case of a compound operation like ```VM_start''' which is decomposed into -multiple "micro-ops" (e.g. ```VM_create``` ```VM_build```) there is a useful +case of a compound operation like `VM_start` which is decomposed into +multiple "micro-ops" (e.g. `VM_create` `VM_build`) there is a useful helper function [perform_atomics](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L1092) which divides the progress 'bar' into sections, where each "micro-op" can have -a different size (```weight```). A progress callback function is passed into +a different size (`weight`). A progress callback function is passed into each Xenopsd backend function so it can be updated with fine granulatiry. For example note the arguments to [B.VM.save](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/xenops_server.ml#L1092) @@ -169,7 +172,7 @@ a way to cancel the watch. The easiest way to cancel a watch is to watch an additional path (a "cancel path") and delete it, see [cancellable_watch](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/xc/cancel_utils.ml#L117). The "cancel paths" are placed within the VM's Xenstore directory to ensure that -cleanup code which does ```xenstore-rm``` will automatically "cancel" all outstanding +cleanup code which does `xenstore-rm` will automatically "cancel" all outstanding watches. Note that we trigger a cancel by deleting rather than creating, to avoid racing with delete and creating orphaned Xenstore entries. @@ -183,7 +186,7 @@ Testing with cancel points Cancellation is difficult to test, as it is completely asynchronous. Therefore Xenopsd has some built-in cancellation testing infrastructure known as "cancel points". -A "cancel point" is a point in the code where a ```Cancelled``` exception could +A "cancel point" is a point in the code where a `Cancelled` exception could be thrown, either by checking the cancelling boolean or as a side-effect of a cancel callback. The [check_cancelling](https://github.com/xapi-project/xenopsd/blob/f876f9029cf53f14a52bf42a4a3a03265e048926/lib/task_server.ml#L216) diff --git a/xenopsd/walkthroughs/VM.start.md b/xenopsd/walkthroughs/VM.start.md index 5042e4b3c..70d085d43 100644 --- a/xenopsd/walkthroughs/VM.start.md +++ b/xenopsd/walkthroughs/VM.start.md @@ -40,6 +40,7 @@ users: of dealing with Xen from other components. The Xenopsd "VM.add" function has code like this: + ``` let add' x = debug "VM.add %s" (Jsonrpc.to_string (rpc_of_t x)); @@ -48,6 +49,7 @@ The Xenopsd "VM.add" function has code like this: B.VM.add x; x.id ``` + This function does 2 things: - it stores the VM configuration in the "database" - it tells the "backend" that the VM exists @@ -56,6 +58,7 @@ The Xenopsd database is really a set of config files in the filesystem. All objects belonging to a VM (recall we only have VMs, VBDs, VIFs, PCIs and not stand-alone entities like disks) and are placed into a subdirectory named after the VM e.g.: + ``` # ls /run/nonpersistent/xenopsd/xenlight/VM/7b719ce6-0b17-9733-e8ee-dbc1e6e7b701 config vbd.xvda vbd.xvdb @@ -64,6 +67,7 @@ config vbd.xvda vbd.xvdb ... } ``` + Xenopsd doesn't have as persistent a notion of a VM as xapi, it is expected that all objects are deleted when the host is rebooted. However the objects should be persisted over a simple Xenopsd restart, which is why the objects are stored @@ -169,12 +173,14 @@ To understand this and other internal details of Xenopsd, consult the [architecture description](../architecture/README.md). The [queue_operation_int](https://github.com/xapi-project/xenopsd/blob/30cc9a72e8726d1e7501cd01ddb27ced6d53b9be/lib/xenops_server.ml#L1451) function looks like this: + ``` let queue_operation_int dbg id op = let task = Xenops_task.add tasks dbg (fun t -> perform op t; None) in Redirector.push id (op, task); task ``` + The "task" is a record containing Task metadata plus a "do it now" function which will be executed by a thread from the thread pool. The [module Redirector](https://github.com/xapi-project/xenopsd/blob/30cc9a72e8726d1e7501cd01ddb27ced6d53b9be/lib/xenops_server.ml#L395) @@ -185,10 +191,11 @@ takes care of: - providing a diagnostics interface Once a thread from the worker pool becomes free, it will execute the "do it now" -function. In the example above this is ```perform op t``` where ```op``` is -```VM_start vm``` and ```t``` is the Task. The function +function. In the example above this is `perform op t` where `op` is +`VM_start vm` and `t` is the Task. The function [perform](https://github.com/xapi-project/xenopsd/blob/30cc9a72e8726d1e7501cd01ddb27ced6d53b9be/lib/xenops_server.ml#L1194) has fragments like this: + ``` | VM_start id -> debug "VM.start %s" id; @@ -196,27 +203,27 @@ has fragments like this: VM_DB.signal id ``` -Each "operation" (e.g. ```VM_start vm```) is decomposed into "micro-ops" by the +Each "operation" (e.g. `VM_start vm`) is decomposed into "micro-ops" by the function [atomics_of_operation](https://github.com/xapi-project/xenopsd/blob/30cc9a72e8726d1e7501cd01ddb27ced6d53b9be/lib/xenops_server.ml#L736) where the micro-ops are small building-block actions common to the higher-level operations. Each operation corresponds to a list of "micro-ops", where there is no if/then/else. Some of the "micro-ops" may be a no-op depending on the VM configuration (for example a PV domain may not need a qemu). In the case of -```VM_start vm``` this decomposes into the sequence: +`VM_start vm` this decomposes into the sequence: 1. run the "VM_pre_start" scripts --------------------------------- -The ```VM_hook_script``` micro-op runs the corresponding "hook" scripts. The +The `VM_hook_script` micro-op runs the corresponding "hook" scripts. The code is all in the [Xenops_hooks](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/lib/xenops_hooks.ml) -module and looks for scripts in the hardcoded path ```/etc/xapi.d```. +module and looks for scripts in the hardcoded path `/etc/xapi.d`. 2. create a Xen domain ---------------------- -The ```VM_create``` micro-op calls the ```VM.create``` function in the backend. +The `VM_create` micro-op calls the `VM.create` function in the backend. In the classic Xenopsd backend the [VM.create_exn](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/xc/xenops_server_xen.ml#L633) function must @@ -253,7 +260,7 @@ function must 1. run pygrub (or eliloader) to extract the kernel and initrd, if necessary 2. invoke the *xenguest* binary to interact with libxenguest. -3. apply the ```cpuid``` configuration +3. apply the `cpuid` configuration 4. store the current domain configuration on disk -- it's important to know the difference between the configuration you started with and the configuration you would use after a reboot because some properties (such as maximum memory @@ -284,7 +291,7 @@ and configures VBDs and VIFs are said to be "active" when they are intended to be used by a particular VM, even if the backend/frontend connection hasn't been established, -or has been closed. If someone calls ```VBD.stat``` or ```VIF.stat``` then +or has been closed. If someone calls `VBD.stat` or `VIF.stat` then the result includes both "active" and "plugged", where "plugged" is true if the frontend/backend connection is established. For example xapi will @@ -305,12 +312,12 @@ can be assured that the disks are now free to be reassigned. ------------------------------ A non-persistent disk is one which is reset to a known-good state on every -VM start. The ```VBD_epoch_begin``` is the signal to perform any necessary reset. +VM start. The `VBD_epoch_begin` is the signal to perform any necessary reset. 6. plug VBDs ------------ -The ```VBD_plug``` micro-op will plug the VBD into the VM. Every VBD is plugged +The `VBD_plug` micro-op will plug the VBD into the VM. Every VBD is plugged in a carefully-chosen order. Generally, plug order is important for all types of devices. For VBDs, we must work around the deficiency in the storage interface where a VDI, once attached @@ -323,11 +330,11 @@ The function [VBD.plug](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/xc/xenops_server_xen.ml#L1631) will -- call ```VDI.attach``` and ```VDI.activate``` in the storage API to make the +- call `VDI.attach` and `VDI.activate` in the storage API to make the devices ready (start the tapdisk processes etc) - add the Xenstore frontend/backend directories containing the block device info -- add the extra xenstore keys returned by the ```VDI.attach``` call that are +- add the extra xenstore keys returned by the `VDI.attach` call that are needed for SCSIid passthrough which is needed to support VSS - write the VBD information to the Xenopsd database so that future calls to *VBD.stat* can be told about the associated disk (this is needed so clients @@ -363,7 +370,7 @@ Again, the order matters. Unlike VBDs, there is no read/write read/only constraint and the devices have unique indices (0, 1, 2, ...) *but* Linux kernels have often (always?) ignored the actual index and instead relied on the order of results from the -```xenstore-ls``` listing. The order that xenstored returns the items happens +`xenstore-ls` listing. The order that xenstored returns the items happens to be the order the nodes were created so this means that (i) xenstored must continue to store directories as ordered lists rather than maps (which would be more efficient); and (ii) Xenopsd must make sure to plug the vifs in @@ -397,7 +404,7 @@ the VM might execute without all the port locking properly configured. 9. create the device model -------------------------- -The ```VM_create_device_model``` micro-op will create a qemu device model if +The `VM_create_device_model` micro-op will create a qemu device model if - the VM is HVM; or - the VM uses a PV keyboard or mouse (since only qemu currently has backend support for these devices). @@ -447,7 +454,7 @@ suddenly restarted. It guarantees to always leave the system in a valid state, in particular there should never be any "half-created VMs". We achieve this for VM start by exploiting the mechanism which is necessary for reboot. When a VM wishes to reboot it causes the domain to exist (via SCHEDOP_shutdown) with a -"reason code" of "reboot". When Xenopsd sees this event ```VM_check_state``` +"reason code" of "reboot". When Xenopsd sees this event `VM_check_state` operation is queued. This operation calls [VM.get_domain_action_request](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/xc/xenops_server_xen.ml#L1443) to ask the question, "what needs to be done to make this VM happy now?". The @@ -466,6 +473,7 @@ this is a separate "operation" queued by the client (such as xapi) after the VM.start has completed. The function [VM.unpause](https://github.com/xapi-project/xenopsd/blob/b33bab13080cea91e2fd59d5088622cd68152339/xc/xenops_server_xen.ml#L808) is reassuringly simple: + ``` if di.Xenctrl.total_memory_pages = 0n then raise (Domain_not_built); Domain.unpause ~xc di.Xenctrl.domid;