Skip to content

Commits

Permalink
ibm-io-acceler…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Sep 17, 2013

  1. Guest-side Paravirtual posted interrupts support

    Posted interrupts allows the KVM running one one core (root mode) to
    inject a  virtual interrupt to a guest running on another core (guest mode)
    without forcing a guest exit.
    
    Posted interrupts is not yet available in current hardware but
    this patch implements this future hardware virtualization feature using
    software (para-virtual guest).
    
    As described in a previous patch, we extended ELI mechanism to deliver
    exitless virtual interrupts. This patch implements the guest-side logic.
    When a pre-specified IPI (fixed vector number) is received by the guest,
    the guest kernel checks in a memory descriptor which interrupt
    handler (virtual vector number) needs to be called. KVM pre-specifies
    the virtual interrupt vector number in the shared descriptor before
    sending the posted interrupt IPI.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    96ec830 View commit details
    Browse the repository at this point in the history
  2. ELI guest kernel module

    With ELI, the guest is run using a "shadow IDT" instead of the guest's
    requested IDT. This shadow IDT is built by KVM in a way that causes exits
    for some interrupts while running the guest's normal handlers on others.
    
    The processor requires that the IDT location be given using a
    virtual address, thus, the shadow IDT must always be mapped in the guest
    address space. To solve this issue,  this patch introduces a new kernel
    module. When this module is loaded, we allocate a page (in the guest)
    and pass the corresponding address (GVA) to KVM. ELI can then use
    this address to prepare the shadow IDT.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    2d6b7c8 View commit details
    Browse the repository at this point in the history
  3. Make vhost.ko a standalone module

    Previously, vhost.o was combined with net.o to form vhost_net.ko
    and with blk.o to form vhost_blk.ko. With this patch vhost.ko is
    a standalone module on which vhost_net.ko and vhost_blk.ko depend.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    6305e91 View commit details
    Browse the repository at this point in the history
  4. Virtio in-kernel accelerator for block devices

    Vhost blk implementation written by Asias He.
    Code taken from https://github.com/asias/linux/tree/blk.vhost-blk
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    70427d1 View commit details
    Browse the repository at this point in the history
  5. Add statistics to monitor vhost performance

    This patch introduces a set of statistics to monitor different performance
    metrics of vhost and our polling and I/O scheduling mechanisms.
    
    The statistics are exposed using debugfs and can be easily displayed with a
    Python script (vhost_stat)
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    ac14206 View commit details
    Browse the repository at this point in the history
  6. Do not add a work item if a queue is being polled

    vhost stops processing a virqueue once it consumed all the descriptors (processed
    all the pending data). To avoid starvation, vhost also limits the amount of data
    that can be processed for a virqueue. When the limit is reached, vhost stops
    processing the current virtqueue and switches to other virtqueue. vhost may
    not receive further notifications for the queue that was limited. Thus, to
    ensure that pending data on this queue will be processed later, vhost
    adds a new item to the work queue.
    
    If queue is being polled by our mechanism, we don't need to add a new work
    item when we limit the queue. We just switch queues and continue polling
    as usual.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    b04e898 View commit details
    Browse the repository at this point in the history
  7. Add herustics to improve I/O scheduling

    This patch enhaces the round-robin mechanism with a set of heuristics to
    decide when to leave a virtqueue and proceed to the next. The patch also
    introduces a set of kernel module parameters to configure these heuristics.
    
    The vhost generic layer exposes a new function that implements the heuristics.
    The concrete code (e.g. vhost-net or vhost-blk) is required to call this function
    to check when it should stop processing data.
    
    The heuristics work as follows:
    
    (1) We always leave a queue after doing a certain maximum
    amount of work on it, even if it is not yet empty. This limitation is required
    to avoids starvation.
    (2) We may leave a queue earlier if we recognize that another non-empty queue
    or the work queue is stuck and therefore likely to be latency-sensitive.
    We call a queue stuck  if a certain time  has passed since it last received new
    work. A latency-sensitive workload which waits for replies before sending further
    requests will get "stuck" this sense, while a high-throughput workload which
    continuously creates  new requests will not be found stuck.
    (3) If we leave a queue erarlier without processing half of the maximum data, we
    move the queue to the head of the round-robin list. We use this condition to
    avoid degrading queues that were limited because other queues were stuck.
    (4) Usually a bursty queues will add lot of items per burst. The time between
    bursts may be bigger than the time we specified to detect stuck queues.
    If a queue has more items than the specified in a module param, we never
    consider the queue as stuck.  We use this condition to avoid detecting a
    bursty queue as a stuck queue.
    (5) We do not leave a queue before we did some minimum amount of work on it.
    This technique improves cache efficiency and limits the number of queue switches
    (important to improve throughput).
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    f6a4f1a View commit details
    Browse the repository at this point in the history
  8. Add virtqueue polling mode to vhost

    When vhost is waiting for buffers from the guest driver (e.g., more packets
    to send in vhost-net's transmit queue), it normally goes to sleep and waits
    for the guest to "kick" it. This kick involves a PIO in the guest, an
    therefore an exit (and possibly userspace involvement in translating this PIO
    exit into a file descriptor event), all of which hurts performance.
    If we can dedicate a core to vhost, we can have it continuously poll the
    virtqueues for new buffers, and avoid asking the guest to kick us.
    
    Polling on all virtqueues happens on the same worker thread, in round-robin
    fashion. Thanks to the previous patch, the virtqueues of multiple VMs may be
    polled on the same worker thread, which allows dedicating only one core to
    servicing the I/O from multiple vcpus.
    
    When polling is active for one of the virtqueues, the guest is asked to
    disable notification (kicks), and the worker thread continuously checks for
    new buffers. When it does discover new buffers, it simulates a "kick" by
    invking the  underlying backend driver (such as vhost-net), which thinks it
    got a real kick from the guest, and acts accordingly. If the underlying
    driver asks not to be kicked, we disable polling on this virtqueue.
    
    In this version, we start polling on a virtqueue when we notice it has
    work to do. Polling on this virtqueue is later disabled after 3 seconds of
    polling turned out no new work, as in this case we are better off returning
    to the exit-based notification mechanism. The default timeout of 3 seconds
    can be changed with the "poll_stop_idle" kernel module parameter.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    2661613 View commit details
    Browse the repository at this point in the history
  9. Share vhost thread between mutiple devices

    Vanilla vhost creates a worker thread and worker queue per virtio device.
    This patch creates a worker thread and worker queue shared accross multiple
    virtio devices (VMs). The number of maximum virtio devices per worker can be
    specified using a kernel module parameter.
    
    Every time a new virtio devices is created we search for a running worker
    thread that is serving less than the maximum number of devices. If we find
    such worker thread, we bind the new virtio device to this thread. Otherwise,
    we create a new worker thread to serve the new virtio devices and more
    devices that may be created in the future.
    
    Currently, once a device is binded to a specicific worker thread it can not be
    migrated to other worker thread. In the future, we should improve the
    mechanism to balance the devices accross threads (migrate a device to a
    different worker thread) and to create/destroy threads dynamically during
    runtime based on the I/O activity.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    3dc6a3c View commit details
    Browse the repository at this point in the history
  10. Handle EOIs for injected virtual interrupts

    KVM traps and emulates EOIs for every virtual interrupt it injects.
    When we inject interrupts using the paravirtual posted interrupts mechanism,
    KVM still needs to emulate EOIs unless we enable exitless EOIs (requires
    x2apic).
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    5ae5620 View commit details
    Browse the repository at this point in the history
  11. Enable paravirtual posted interrupts support

    Add a hypercall enabling/disabling virtual interrupt injection via
    paravirtual posted interrupts.
    
    When the enable hypercall is called, we check that the paravirtual posted
    interrupts mechanism was properly initialized and in this case we
    enable the mechanism for every VCPU.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    9e844af View commit details
    Browse the repository at this point in the history
  12. VMX implementation of PV posted interrupts

    This patch implements the two kvm_x86_ops introduces above:
    send_posted_interrupt and has_posted_interrupts
    
    Every time KVM wishes  to inject a virtual interrupt to a given VCPU,
    we check if the VCPU is currently running in some core. In this case,
    to avoid unnecessary exits, we try to inject the virtual interrupts using
    the paravirtual posted interrupts mechanism. If the VCPU is not running,
    we proceed as usual (queue the interrupt).
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    bf0c33a View commit details
    Browse the repository at this point in the history
  13. Initialize paravirtual posted interrupts support

    The patch adds a hypercall for the guest to pass a pointer to a notification
    vector variable and an injection page, and initialize paravirtual posted
    interrupts support for all vcpus of the this guest.
    It does not yet enable exitless injection with PV posted interrupts (this
    will be done in the next patch).
    
    The notification vector variable is used by the host to tell the guest
    which vector number will be used to deliver posted interrupts (set
    once during the initialization phase).
    The injection page is used by the host to tell the guest which virtual
    interrupt KVM wishes to inject each time a posted interrupt is sent.
    (set every time a virtual interrupt is delivered using the paravirtual
    posted interrupts mechanism).
    
    Once the guests receives an interrupt that corresponds to the
    number stored in the notification vector variable, the guests reads
    the injections page to identify the virtual interrupt the host is
    asking to inject and call the corresponding handler.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    17c2982 View commit details
    Browse the repository at this point in the history
  14. Re-queue posted interrupts received in root mode

    Paravirtual posted interrupts allows injecting virtual interrupts into a
    running guest (without causing it to exiting to root mode first).
    However, in rare cases, at the time we send a posted interrupt from core A
    the guest running in core B may exit for an unrelated reason, and the IPI
    we sent (from core A to core B) to signal the posted interrupt
    arrives in the host, instead of the guest. In a previous patch we already
    reserved this vector in the host. This patch introduces the handler, which
    re-queues the injection using the traditional KVM injection mechanism
    (because the guest is not running anymore).
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    02a7783 View commit details
    Browse the repository at this point in the history
  15. VMX fields for PV posted interrupts

    This patch defines the per vcpu (vmx) fields required for virtual interrupt
    injection via Exitless IPIs
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    1c77bff View commit details
    Browse the repository at this point in the history
  16. Injection using PV posted interrupts

    This patch uses paravirtual posted interrupts for injecting virtual
    interrupts into guests running on another core, instead of using the
    traditional exit-causing IPI (kvm_vcpu_kick).
    The patch introduces new kvm_x86_ops methods which will be implemented in
    later patches.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    b2a5b63 View commit details
    Browse the repository at this point in the history
  17. Reserve vector for paravirtual posted interrupts

    This patch is the first in a series of patches enabling exit-less
    virtual interrupt injection, by implementing paravirtual Posted-Interrupts
    using ELI.
    
    This feature allows the KVM running one one core, to inject an interrupt to
    a guest running on another core, without the guest exiting. This is done
    by choosing one interrupt vector (the "posted interrupts" PI vector) and a
    memory location. The host writes a vector to be injected to this memory
    location, and sends the PI vector with an IPI to the core running the guest.
    This guest is run with ELI to have this specific PI vector delivered
    directly at the guest, without causing an exit. However, now we need some
    cooperation at the guest (hence we call this feature paravirtual posted
    interrupts):
    The interrupt the guest received is the fixed PI vector, but upon getting
    it, it needs to read the agreed memory location, and run the handler for
    the vector written there.
    
    This first patch of the series reservers a vector to send Exitless IPIs.
    It also adds a new counter to KVM_STATS which shows how many virtual
    interrupt injections were done using Exitless IPIs. If we send an Exitless
    IPI to a running vcpu, in rare cases, the IPI might arrive in root mode
    (host) because an exit ocurred while the IPI was being sent. Thus, we also
    add a handler in the host for the IPI and a counter in /proc/interrupts
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    d3f2797 View commit details
    Browse the repository at this point in the history
  18. Add support for disabling halt exits

    If we dedicate a physical core per VCPU (to improve performance), there
    is no reason to force an exit when the guest executes HLT because the host
    will not schedule other VCPU in the same core. HLT exits increase
    the time the CPU spends in root mode (host context) and thus increase
    the chances that an assigned physical interrupt will arrive in root mode
    and not in guest mode. To achieve maximum performance, we strive to maximize the
    number of physical assigned interrupts delivered in guest mode so better to
    avoid HLT exits if we dedicate a core per VCPU.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    73ed255 View commit details
    Browse the repository at this point in the history
  19. Use VMX preemption timer

    With ELI, if a guest disables interrupts for a long time, it can prevent
    the host also from receiving the timer interrupt - and thus prevent the host
    from ever giving time slices to other guests.
    
    To avoid this security problem, we use the VMX Preemption Timer feature,
    to force an exit after some time elapsed, regardless of what the guest does
    (including disabling interrupts). To avoid uncessary exits, the value of the
    preempt_timer kernel module parameter should be slightly higher than the
    host timer interrupt rate (in cycles) - exits due to preemtion timer should
    not occur during normal execution, only in misbehaving guests (in the future,
    such guests should have ELI disabled for them).
    
    If preemption timer exits occur frequently (log warnings), then the specified
    preemption value might be too short or something might be wrong with the
    guest. Note the current default value corresponds to 200 timer interrupts per
    second on a 2.93GHz processor. In the future we should calculate this
    automatically.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    88a8190 View commit details
    Browse the repository at this point in the history
  20. Handle descriptor-table exits

    ELI needs to trap guest changes to the IDTR, to prevent the guest replacing
    the Shadow IDT enforced by the host. To do this, it must enable the
    "Descriptor-table exiting" bit of the VM-execution controls. Unfortunately,
    this bit enables exiting not only on IDTR changes, but also on changes of
    the unrelated LDTR, GDTR and TR registers, and when any of these exits come,
    we need to handle them appropriately.
    
    The appropriate way to handle IDTR changes is to rebuild the shadow IDT
    based on the instruction's operand, and the appropriate way to handle LDTR,
    GDTR and TR exits is to turn off descriptor-table exiting (i.e., go to
    injection mode), and run the guest for a while until it exits - letting it
    handle these instructions normally (which is fine - KVM doesn't normally
    need to trap and emulate these instructions).
    
    In this version, however, we didn't do all of this yet. LDTR and
    TR changes are handled correctly (via injection mode), but for GDTR and
    IDTR (which share the same exit reason, so separating them requires further
    decoding), we just disable ELI which is a valid, although drastic, solution.
    We note that Linux guests rarely change any of these registers after boot,
    so provided that ELI is only enabled after boot (by a command doing the
    hypercall enabling ELI), this case will rarely happen anyway, and if it
    does, a warning message can be noticed.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    8b2d824 View commit details
    Browse the repository at this point in the history
  21. Hypercalls for enabling/disabling ELI

    This patch introduces hypercalls to start and stop ExitLess Interrupt (ELI)
    delivery. In a previous patch we already introduced hypercalls for
    starting and stopping ExitLess Interrupt completion.
    
    When ELI is enabled, we prepare the shadow IDT and set all the entries
    corresponding to non-assigned interrupts as not present.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    65de636 View commit details
    Browse the repository at this point in the history
  22. Exit-less EOI using x2APIC

    After previous patches introduced exit-less interrupt delivery (the guest
    receives assigned interrupts without exit), this patch introduces exit-less
    interrupt completion - i.e., allowing a guest to EOI an assigned interrupt
    without an exit.
    
    Our implementation relies on x2APIC, in which the APIC EOI register is a
    separate MSR, so we can use the MSR-Bitmap feature to avoid exits on EOI,
    while still having exits on a guest attempt to use other APIC registers.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    0ce6906 View commit details
    Browse the repository at this point in the history
  23. EOI handling

    As described in a previous patch, ELI switches to injection mode every time
    KVM needs to inject a virtual interrupt. We need to switch back to the
    regular ELI mode once we finish the injection. Thus, right after the guest
    acknowledges a virtual interrupt that was delivered in injection mode, we
    switch back to ELI mode.
    
    Outside injection mode, i.e., on acknowledgment of assigned interrupt, we
    have nothing special to do (we EOI on any exit anyway), and we just need
    to circumvent the regular EOI handling of KVM. Moreover we'll show later
    a mechanism (x2APIC) for avoiding exits on EOI in this case.
    
    IMPORTANT NOTE: PV-EOI must be disabled (-cpu -kvm_pv_eoi) otherwise ELI will
    remain in injection mode for longer periods of time causing external
    interrupt exits for any physical interrupt.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    ca433f2 View commit details
    Browse the repository at this point in the history
  24. IDTR get and set

    With ELI enabled, the VMCS contains not the guest's chosen IDTR, but rather
    a shadow IDTR that points to our shadow IDT. In this patch we override the
    IDTR get and set functions to  get/save the guest IDTR and not the shadow
    IDTR.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    36a7a78 View commit details
    Browse the repository at this point in the history
  25. Handle NP exceptions

    In the previous patch, we saw that in ELI, outside injection mode,
    non-assigned interrupts cause an exit with an NP exception (while assigned
    interrupts do not cause an exit at all).
    
    In this patch, we handle these NP exceptions: We verify that it was really
    caused by an external interrupt, and if it was we re-generate the same
    interrupt, causing the host (Linux) to handle this interrupt as usual.
    
    To regenerate the interrupt that caused the NP exception exit, we
    use INT X instruction (software interrut)
    
    Alternatively, it is also possible to send a self-IPI to regenerate
    the interrupt.
    
    ELI changes the way KVM handles interrupt completion (EOIs). With ELI
    enabled, assigned interrupts could be delivered during guest mode execution
    and might remain in service (pending EOI) after an exit. Thus, we always
    acknowledge (EOI/ack_APIC_irq) on exit to complete pending interrupts and
    let the hardware continue raising new interrupts.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    a505da5 View commit details
    Browse the repository at this point in the history
  26. When not in injection mode, trap NP exceptions

    When in ELI but not in injection mode (defined in the previous patch),
    all interrupts arrive at the guest, but the ones not assigned to the
    guest cause an NP exception (because a handler for it is not present in
    the shadow IDT).
    
    In this patch, when needed we have these NP exceptions cause exits, so
    the host can handle the non-assigned interrupts. Note that NP exceptions
    completely unrelated to ELI are also possible, and they will cause
    unnecessary exits, but the number of these is usually negligable.
    
    In the next patch, we will handle these NP exceptions.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    24ab2d5 View commit details
    Browse the repository at this point in the history
  27. ELI injection mode

    ELI runs the guest using the shadow IDT and configures the processor to
    deliver physical interrupts directly to the guest (no exit). The shadow
    IDT causes NP exceptions (exits) for non-assigned interrupts. With ELI
    enabled, KVM can not inject virtual interrupts from emulated devices
    (e.g. keyboard) because the injection will cause an NP exception.
    
    Thus, we introduce in this patch a special mode of operation we call
    "injection mode".  During this mode, ELI configures the processor to exit
    on external interrupts and uses the the original guest IDT for guest mode
    execution. ELI temporary switches to injection mode each time KVM needs to
    inject a virtual interrupt. In a later patch we'll see that we switch
    off injection mode on the next exit, typically an EOI.
    
    IMPORTANT NOTE: PV-EOI must be disabled (-cpu -kvm_pv_eoi) otherwise ELI will
    remain in injection mode for longer periods of time causing external
    interrupt exits for any physical interrupt.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    49b8b4d View commit details
    Browse the repository at this point in the history
  28. In device assignment, use eli_remap_vector

    This patch modifies device assignment to call kvm_arch_eli_remap_vector()
    telling it which host IRQ should be mapped to each interrupt vector that the
    guest chose for the device.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    e5d43ba View commit details
    Browse the repository at this point in the history
  29. Remap host-vector to guest-vector

    In device assignment, the guest believes it configures the device on which
    interrupt vectors to generate. However, what really happens is that the host
    chooses different vectors, ones that are available in the host, and when the
    host recieves the *host* vector, it injects (forwards) the different *guest*
    vector into the guest.
    
    With ELI, the interrupt will arrive directly at the guest, but the interrupt
    vector received will be the *host* vector. So we need the shadow IDT to
    contain, in position host-vector, the handler that the guest IDT set for
    guest-vector.
    
    This patch provides a function, kvm_arch_eli_remap_vector(), which KVM's
    device assignment code will call (see the next patch) to remap a given
    host-chosen IRQ to the guest-chosen vector. The host IRQ is specified,
    not the vector, because device assignment only choses the IRQ and the
    actual vector will only be chosen by the kernel later.
    
    If ELI is not yet enabled, kvm_arch_eli_remap_vector() only remembers the
    mapping but doesn't actually create the shadow IDT. This will be done when
    the eli_remap() function, also provided in this patch, is called after
    enabling ELI.
    
    Note ELI assumes kvm device-assignment implementation. VFIO is not supported
    by this patch.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    9aeab0a View commit details
    Browse the repository at this point in the history
  30. Expose functions for accessing guest memory

    The functions to read and write to guest virtual addresses are currently
    private to x86.c, but in the next patch we will also need to use them in
    vmx.c, where ELI will need to read and write the shadow IDT.
    So this patch exports the necessary functions.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    6c6466d View commit details
    Browse the repository at this point in the history
  31. Shadow IDT initialization

    With ELI, the guest is run using a "shadow IDT" instead of the guest's
    requested IDT. This shadow IDT is built by KVM in a way that causes exits
    for some interrupts while running the guest's normal handlers on others.
    
    Unfortunately, the processor requires that the IDT location be given using a
    virtual address, so we need to cause the guest to map a spare page which can
    be used to hold the shadow IDT.
    
    This patch allows a guest to specify, via a new hypercall, a location for
    the shadow IDT in its own virtual memory space. When a guest calls this
    hypercall, ELI is initialized for this guest, and the shadow IDT is located
    in the given guest virtual address (GVA).
    
    Note that this only initializes ELI, but does *not* enable it. To enable
    ELI, it must be specified which physical interrupts should be assigned
    to the guest (arrive directly in the guest, without exit), and of course
    the shadow IDT needs to be built. This will be done in more patches below.
    
    IMPORTANT NOTE: In this implementation we offer no protection against a
    guest modifying the shadow IDT directly: The guest knows the GVA and GPA
    of the shadow IDT and can modify it maliciously in one of two ways - either
    by modifying its contents (writing to the page itself), or by changing its
    page tables (assuming EPT is being used) to make the same GVA point to a
    different GPA with different content. The ELI paper explains how both holes
    can be avoided - the first by trapping writes to the shadow IDT, and the
    second by using IDTR limit or by periodically verifying that the guest
    hasn't change the mapping - but none of these solutions is built into
    this patch set. Instead, it is recommended to either limit ELI's use to
    non-malicious guests, or make this a non-issue by pinning critical host
    interrupts to a core not running guests so that even malicous guests cannot
    cause these critical interrupts to be lost.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    cc92638 View commit details
    Browse the repository at this point in the history
  32. Add missing bit and exit reasons to vmx.h

    This patch is the first in a series of patches enabling the "Exit-Less
    Interrupts" (ELI) feature for KVM on VMX. While ordinarily VMX causes
    an exit for every physical interrupt, ELI allows KVM to determine that
    certain interrupts should be handled directly in the guest, without exit.
    This can significantly improve device-assignment I/O performance, as
    the only remaining causes of exits - the interrupts and their completion -
    are elliminated. The ELI technique is explained and evaluated in more detail
    in the ASPLOS 2012 paper "ELI: Bare-Metal Performance for I/O Virtualization".
    
    ELI needs to enable the "Descriptor-table exiting" bit of the VM-execution
    controls, which will causes exits when the guest attempts to change the
    IDT pointer through LIDT (or other changes to other descriptor tables).
    
    In this patch, we give a name in vmx.h to this bit, and to the two exit
    reasons which it leads to.
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    9d41fe3 View commit details
    Browse the repository at this point in the history
  33. Add IBM copyright notice

    Add IBM copyright notice to all the files we modified
    to implement our Virtual I/O acceleration technologies
    
    Signed-off-by: Abel Gordon <abelg@il.ibm.com>
    abelg committed Sep 17, 2013
    Copy the full SHA
    bd00dfc View commit details
    Browse the repository at this point in the history

Commits on Apr 29, 2013

  1. Linux 3.9

    torvalds committed Apr 29, 2013
    Copy the full SHA
    c1be5a5 View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2013

  1. Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/arm/arm-soc
    
    Pull ARM SoC fix from Olof Johansson:
     "A late-arriving fix for musb on OMAP4, resolving an issue where the
      musb IP won't be clocked and thus not functional.  Small in scope,
      most of the lines changed is a longish comment."
    
    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
      ARM: OMAP4: hwmod data: make 'ocp2scp_usb_phy_phy_48m" as the main clock
    torvalds committed Apr 27, 2013
    Copy the full SHA
    4cbbd1d View commit details
    Browse the repository at this point in the history
Older