Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Hardware Devices #5523

Closed
wants to merge 4 commits into from

Conversation

Projects
None yet
8 participants
@mareklibra
Copy link
Contributor

commented Dec 1, 2016

New 'devices' plugin is introduced.

The user can list hardware devices (like network or graphics
cards) installed on the system and perform basic management
tasks like (un)binding a driver.

Feature page: https://github.com/cockpit-project/cockpit/wiki/Feature:-Hardware-Devices

  • event-driven refresh
  • IOMMU Group
  • Localization
  • Integration tests, image refresh
  • USB
  • design

Depends on:

@mareklibra mareklibra changed the title WIP: Devices WIP: Hardware Devices Dec 1, 2016

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Dec 1, 2016

First screenshot:
hardwaredevices

@mareklibra mareklibra force-pushed the mareklibra:devices branch 2 times, most recently from 2af2ef1 to 8806a6e Dec 1, 2016

@stefwalter

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2016

This is awesome work.

@andreasn Do you have time to look over the design here and give rough guidance.

I've done some code review, but some fundamental things this needs:

  • A way to monitor for changes without polling, either using udev, a socket or file descriptor.
  • Unit tests for the various middleware
  • Integration tests for making sure things work across the operating systems.
Makefile.am Outdated

WEBPACK_MAKE = NODE_ENV=$(NODE_ENV) SRCDIR=$(srcdir) BUILDDIR=$(builddir) \
$(srcdir)/tools/missing $(srcdir)/tools/webpack-make
$(srcdir)/tools/missing $(srcdir)/tools/webpack-make

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

This change is unnecessary.

<div className='btn-group'>
<button className={`btn btn-default ${isActive('pci')}`} onClick={() => deviceActions.onPciClassSelected(null)}>PCI</button>
<button className={`btn btn-default ${isActive('scsi')}`}>SCSI</button>
<button className={`btn btn-default ${isActive('usb')}`}>USB</button>

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

These words need to be translated like this >{_("USB")}</button>

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

I'll mark each block where I find untranslated strings. At the top of the file you can have:

const _ = cockpit.gettext

or equivalent.

};

export default CONFIG;

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

Extra new line.

display: block;
}

.content-header-extra {

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

This is old style CSS. Could you use the .content-filter stuff you see in pkg/docker/docker.css instead? Sorry for the confusion. We should soon migrate that to a reusable component with reusable CSS, but for now lets just copy and paste it.

return deferred.promise;
}

export function spawnScript(script, failCallback) {

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

Fundamental question. How are we notified when we need to spawn the scirpt again or run a command again. How do we know when something changes. Do we listen for udev events or similar? Do we need a way of being notified when an FD polls readable?

@@ -0,0 +1 @@
../../lib/react-lite-cockpit/dist/react-lite.js

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

This file should not be necessary.

// --- compatibility hack for PhantomJS
if (typeof Object.assign != 'function') {
Object.assign = function (target) {
'use strict';

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

All javascript compiled with webpack should be strict. Why is this line needed?

* @param doneCallback
*/
function readSysfsFile({ path, mustExist = false, doneCallback = content => {}}) {
cockpit.script(`cat ${path}`, [], {superuser: 'try', err: 'out'})

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

Again ... needs LC_ALL=C or LANG=C


readSysfsFile({path: `${DEVDIR}/vendor`, mustExist: true, doneCallback: vendor => {
readSysfsFile({path: `${DEVDIR}/class`, mustExist: true, doneCallback: devClass => {
readSysfsFile({path: `${DEVDIR}/device`, mustExist: true, doneCallback: device => {

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

Can you read these in parallel rather than one after the other. See cockpit.all() or ES6 Promise.all

@@ -41,14 +41,16 @@ require('./listing.css');
* to view expanded item details, if not set, navigation isn't available
* listingDetail optional: text rendered next to action buttons, similar style to the tab headers
* listingActions optional: buttons that are presented as actions for the expanded item
* expanded optional: the entry will be initially rendered as expanded, changes after component creation do not take effect

This comment has been minimized.

Copy link
@stefwalter

stefwalter Dec 1, 2016

Contributor

These changes to the shared component should be a separate pull request, reviewed separately. Could you split them out?

This comment has been minimized.

Copy link
@mareklibra

mareklibra Dec 2, 2016

Author Contributor

Sure, I'll just wait a little bit to be sure I'll not need anything else

@stefwalter stefwalter added the needswork label Dec 1, 2016

@andreasn

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2016

Do you have time to look over the design here and give rough guidance.

Sure, I'll take a look.

@andreasn andreasn self-assigned this Dec 2, 2016

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Dec 5, 2016

There's an open question how to deal with the data refresh or reaction on active user actions, like (un)bind.

First of all, these changes are expected to be rare and full data reload is not an expensive operation when performed with reasonable delay.
From this perspective, simple polling with interval set to something between 30 and 120 seconds together with refresh initiated by an active operation, shall be working well - reasonable UI reaction time and without any significant system overhead.

Anyway, event-driven refresh will bring even higher responsiveness and efficiency.
There are multiple options how to achieve that on udev based system, some of them:

"udevadm monitor -u"
Seems like best option.

The plugin would spawn new 'udevadm monitor -u' process and keep listening for its output.
The output does not need to be parsed but signals full refresh using lspci, lsusb and sysfs, etc.

No more than one refresh per a time-period is allowed, i.e. at most once per 5 seconds.

The udevadm is restarted if exited for whatever reason.

DBus
Can anyone recommend (stable) DBus for udev?
It would be great if one can be reused. Otherwise disadvantages of its new implementation outweigh advantages.

udev rule on netlink message
Either stack messages to a file and monitor for changes from JS. So that is what 'udevadm monitor' does.
Or run a process - which is probably not handy for our case.

Listen on netlink socket
Too low-level for use in Cockpit, I think.

Monitor for sysfs changes
Complicated in comparision to other approaches.

Whatever "event provider" will be used, there must be involved manipulation with the hardware database.
For this reason, the 'lspci' or 'lsusb' are recently used for majority of data retrieval (plus sysfs for not provided data).

Please let me know your thoughts before I'm going to implement it.
Thanks.

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Dec 7, 2016

Still WIP but added:

  • listening to udev messages for changes
  • localization via gettext + localized device class names
  • IOMMU Groups are working
  • bunch of smaller UI enhancements and fixes
  • first integration test covering rendering and real data load
@stefwalter

This comment has been minimized.

Copy link
Contributor

commented Dec 12, 2016

"udevadm monitor -u" Seems like best option.

Agree. Could you combine it with: cockpit.hidden and cockpit.onvisibilitychange

http://cockpit-project.org/guide/latest/cockpit-location.html#cockpit-jump-hidden

So that the udevadm monitor is only running while the page is visible?

@mareklibra mareklibra force-pushed the mareklibra:devices branch from 6a05c49 to 623cd74 Jan 6, 2017

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Jan 6, 2017

Support for the USB bus is added.

Due to the nature of USB, the devices are rendered in a tree.
Recently not many attributes are identified to be displayed for the user, so they fit as columns in the tree-grid component.

Any idea for different UI data structure?

If the device is a hub and has connected sub-devices, corresponding row can be expanded.
Initially, tree is expanded up to the 2nd level, the rest is collapsed by default.

Please see following screenshot.

Any other attributes are worth to display?

@andreasn or @garrett, can you please have a look from the UX perspective?
Some notes:

  • Definitely, recent grid-tree can look better.
  • Expandable rows shall be highlighted - recently 'path' is in bold.
  • What about the PCI-USB switch on the top? (Additional buses will follow)

Data retrieval is paused if the component is not visible (cockpit.hidden).

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Jan 6, 2017

cockpit_devices_usb

@mareklibra mareklibra force-pushed the mareklibra:devices branch from 623cd74 to 9c70c1d Jan 30, 2017

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2017

Rebased

@mareklibra mareklibra force-pushed the mareklibra:devices branch 3 times, most recently from f33f02c to 402c5d8 Jan 30, 2017

mareklibra added some commits Nov 4, 2016

devices: Initial commit of new 'devices' package
New 'devices' plugin is introduced.

The user can list hardware devices (like network or graphics
cards) installed on the system and perform basic management
tasks like (un)binding a driver.

Feature page: https://github.com/cockpit-project/cockpit/wiki/Feature:-Hardware-Devices
devices: Integration tests
Initial commit for integration tests of the 'devices' package.

@mareklibra mareklibra force-pushed the mareklibra:devices branch from 402c5d8 to baaa579 Jan 30, 2017

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Jan 31, 2017

@andreasn, any feedback, please?

@garrett

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2017

Hi @mareklibra, Andreas and I were looking over the features document and were trying to understand the various personas written on the page. I have attempted to summarize each with a heading, but really did not understand the last one. (I think I might've guessed the main task each person was attempting to accomplish for the rest of the stories, however?)

This is the one in particular I really do not understand: https://github.com/cockpit-project/cockpit/wiki/Feature:-Hardware-Devices#make-the-device-lists-in-the-vm-look-like-the-host-

Please also look at the other headline summaries in the user stories too @ https://github.com/cockpit-project/cockpit/wiki/Feature:-Hardware-Devices#user-stories — thanks!

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2017

@garrett , Please have a look at the Use Cases written bellow the User Stories section, they are derived from them and might help you to better understand.
I have updated the Feature Page to conform the following notes:

In the Frank's story, the user needs to see how devices are classified to particular IOMMU Groups (if supported on the system).
With Adam and Frank's stories, we have "View a list of recognized HW by Class or by IOMMU".
In the recent implementation, there's selection of 'Class | IOMMU Group | List' in the top right corner switching the views on the same data.

In the James story, the headline is correct based on the story.
But it's good to consider more general side effect - the user is able to change driver for the device. Not only to (un)bind just the VFIO driver.

The Suzan's story shows that some devices might be configured. In her particular case, she has SR-IOV networkcard and wants to spawn virtual functions.
It means, she requests the device (network card with this capability) to spawn N virtual functions leading to creation of N new (virtual) devices - after this operation the system will have N new network cards. Each of them might be either assigned to a VM or used more or less as any other network card in the system.
The corresponding Use Case is about "configuration of a device". There are not many options how/what to configure, but SR-IOV network cards are the most visible and useful example in the case of server environment.
Another example are vGPUs which are "nice-to-have". Just mentioning it because they cover similar UI flow.

The B.J.'s story is about user's understanding of NUMA topology of the (physical) machine - what devices/memory/CPUs are collocated.
With this knowledge, the user is able to better pin the HW to a particular VM, mostly in high-performance environment.
The User Story is read-only from the cockpit:devices perspective.
I would leave the actual HW pining on other tooling for this moment, the value of this feature is in presentation of the actual NUMA topology.
This feature is useful but it's userbase is not so big as in the previous use cases, so I skipped that from the initial implementation.

The stories above are related to PCI devices.

There's already implementation of USB bus view and SCSI will follow later.
Nature of PCI and USB is completely different and I think it's worth not to mix these two kinds in a single list but provide sort of selection of particular bus.

In case of the USB bus, I would really appreciate help with designing it's representation. So far I think about a tree, which is the way how the USB devices are connected to each other in the real system.
To achieve that, so far I used a tree-grid representation allowing me to show the few of useful attributes coming with an USB device. The tree-grid rows can be collapsed to (un)hide particular branches.

For sure, the design of "branches" is not good enough now, designer's help is really required and I would like to ask you for it.
Anyway, I don't stick to the tree-grid - I'm open to any other (means original :-) ) suggestion.

Thanks for your help.

@martinpitt

This comment has been minimized.

Copy link
Member

commented Feb 7, 2017

I hate to be the new guy who carps at your work, but IMHO the scope of this should be reduced to viewing devices, at least in the first iteration, for the following reasons:

  • This is dangerously widely scoped to become a generic "device manager" for Linux, an approach that already has failed several times (does anyone still remember HAL and hal/gnome-device-manager?)This was already a futile attempt as a local desktop application, and it won't get any easier as a (potentially remote) web app. Device handling under Linux is still far from being consistent and uniform, and every now and then new types of hardware or buses arrive which need different treating once again.

  • However, displaying a nicely formatted list/tree of PCI/USB devices is narrow enough in scope for one PR, and devices in each class are reasonably similar wrt. their properties. It's also useful to display (if possible) if a device is currently being used by the host, or if it's available for access from a VM. All that and listening to uevents works without special privileges and is safe. Other classes of devices should either get their own module (e. g. Cockpit already handles block devices, and we have network management), or should then be designed/added as a separate step.

  • IMO offering buttons for driver unbinding/rebinding or possibly even driver assignment is way too dangerous for a web UI. One must really know what one is doing with this, it's very risky and will quite possibly result in kernel or hardware crashes, or at least losing your network connection. Someone with that skill should rather do this on the command line. This is by far not a common thing to do under Linux, and we shouldn't give the impression in Cockpit that this was a sanctioned thing in general. This kind of operation could maybe become a third-party plugin in some extra repository?

  • Assigning devices to a VM (if they aren't being used by the host) is an useful, common, and reasonably safe operation. However, it works differently with every VM manager (QEMU, libvirt, VirtualBox, VMWare), you can't actually do this from outside (e. g. via D-Bus) at least in most cases, and this potentially conflicts with configuration from within the VM manager. Assigning a device to a container is much more uniform provided that they use the devices cgroup controller for this, but this is also not a given -- e. g. at least old LXC versions instead bind-mount the device into the container instead (not sure about recent versions or docker). In either case this seems to be too complex and brittle to do it from within Cockpit IMHO.

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Feb 9, 2017

@garrett

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2017

@mareklibra: Do you have an example list of the devices on a computer? I'd like to work on a design with actual, typical data that's used on a server.

Would the outputs of lspci and lsusb be enough (I don't see a way to get SCSI info)? Or is there a better way to get the hardware info (like poking around in /proc/)?

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Feb 16, 2017

Hi @garrett , the best source is probably the /sys filesystem. Anyway - it's not well human readable.

Reasonable subset of /sys can be retrieved via lsusb or lspci, try to call them without arguments or with either -t or -vvv (do not mix -t and -vvv in one call).

Or try lshw -short .
For SCSI try lsscsi .

Driver info can be retrieved partially from the above commands as well or better from /sys/bus/pci/drivers/ or lsmod .

@fabiand

This comment has been minimized.

Copy link
Contributor

commented Feb 16, 2017

I think that /sys would be a good low-level source.
But most easy to parse and pretty complete seems to be lshw -xml, it also provides a host cnetric view of the hardware.
Just my 2ct.

@martinpitt

This comment has been minimized.

Copy link
Member

commented Feb 16, 2017

lspci and lsusb are tailored for these buses and more readable than plain /sys. If you want to look at all devices as Linux sees them and userspace picks them up, udevadm info --export-db is also fairly useful.

@michalskrivanek

This comment has been minimized.

Copy link
Contributor

commented Feb 23, 2017

design-wise, it's important to keep in mind that on enterprise-level hosts the list of devices may be enormous, with SR-IOV getting more popular we can have hundreds or thousands of block devices and NICs, SCSI endpoints. They may not necessarily make sense to show, but we need to count with that internally to be able to handle them.

@garrett

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2017

@michalskrivanek: I agree with you about how it's important to make a design flexible enough to work on computers with a few devices all the way to one with a bunch of devices. Do you have access to one (or more) of these enterprise-level machines? If so, I would really appreciate it if you could install the lshw command and running the following (and send me the output):

sudo lshw -sanitize -html > hardware-`hostname`.html

The above command outputs hardware in a (scrubbed for serial numbers) list that I can use in building the design. (The XML version is too verbose; the HTML output has all the important information without having to deal with dozens of levels of nodes, and I built a small tool to parse the HTML output.)

BTW: I'm also happy to receive the output of anyone else's computer to make sure the design I'm working on is flexible enough.

@garrett

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2017

@michalskrivanek, @mareklibra: Do you think it makes sense to separate real hardware from virtual hardware used for virtual machines, and treat each a little differently?

@michalskrivanek

This comment has been minimized.

Copy link
Contributor

commented Feb 28, 2017

@garrett: it would, but I do not think it is possible from host point of view. With additional metadata from oVirt or other management system you can know which devices are configured for passthrough, but on the host you don't, SR-IOV has legitimate use case on real hardware.
It can perhaps be differentiated by analyzing running/defined VMs on this host....not sure if it helps much and also not sure if it wouldn't be too expensive to get.

But it would make sense in general to differentiate devices with virtual functions and without them. (maybe it is confusing, virtual functions in SR-IOV doesn't have anything to do with machine virtualization)

@mpolednik

This comment has been minimized.

Copy link

commented Feb 28, 2017

@garrett attaching few lshw's from somewhat advanced machines:

  • sriov-host - relatively modern enterprise host with SR-IOV VFs spawned ,
  • casual-host - quite old enterprise box,
  • big-host - very modern workstation w/ multiple GPUs.

They're html files masked as .txt because github.

big-host.txt
casual-host.txt
sriov-host.txt

None of these is able to spawn thousands of devices, I don't have access to such machines... but it's possible to fake it by copying e.g. the VFs on text level.

@garrett

This comment has been minimized.

Copy link
Contributor

commented Feb 28, 2017

@mpolednik: Thanks! I've been adapting the script to adjust for the new data and playing with the output.

It's kind of silly to show this information in screenshots, but I don't have the output ready to share in any other way yet... so here's a glimpse of my progress with displaying the output on a first-pass. (BTW: This is NOT considered to be a mockup or design reference.)

So, here's how to read the output: The left column is all the hardware information grouped by the description, which is basically the class of hardware. The middle column is grouped by a normalized ID (everything before the colon, then humanized for the output). The right column is only the USB, SCSI, and PCI devices.

The current information shown is NOT going to be the information shown in Cockpit (for example. the bus info probably won't be shown like this). I also don't have any expanded way to view the information, so I'm just simply showing it in the tables.

This experiment's purpose is to try to determine:

  1. what types of hardware information we want to show for each device
  2. what's the most relevant up-front (without drilling into an item to see extended details)
  3. how well this might scale from small computers to large computers
  4. are we interested just in bus-based devices, or do we want to show more?

I'll work to make this available in a non-cruddy way (again: PNGs are not the right way to show this) and start moving it toward a design we can use in Cockpit now that I have the data, can parse it, and am able to style things up in a Cockpit way.

One additional big benefit of posting screenshots of the HTML like this is that we can look at the "texture" of the data and get an idea which grouping might make sense, especially across computers.

My Banana Pi

Let's start with the simplest...

banana

Lenovo X230

This is my laptop.

x230

Casual

Results from the "Casual" host...

casual

Big

The hardware list is getting longer.

big

SRIOV

And here's the largest example.

sriov

@garrett

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2017

I've played around with the data a bunch since the screedshots, but it's still not in the right state naturally. Also, I worked on getting a URL to see resulting HTML (which is also not final) — but it's better to look at this stuff on a web page.

http://garrettlesage.com/cockpit-mockweb/hardware/

I still have three columns of different presentation and plan on removing two soon and then start to work on:

  1. trimming the data we don't need
  2. look into the expanded rows
  3. work on the presentation
@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2017

Hi @garrett , any progress, please?

@andreasn

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2017

So I looked over the wiki page again and I was thinking that with regards to the Adam persona, it might make sense to refocus it so that it becomes necessary for him to use the GUI (right now it sounds like he knows the CLI, but just want a nicer looking system). One of the big target audiences for Cockpit is people who knows Windows Server, but doesn't know Linux, so it would be good to do something that is very focused on them.
I also want to make his task one very specific task, so what about changing a network card?

What about something like this?
"Adam is an administrator at a mid-size company. He is part of a small team responsible for the company IT infrastructure. Every once in a while they need to add or replace hardware in their servers, such as network cards, hard drives and (what is a third thing that is common to replace?). He has several years of experience doing administration on Windows Server machines, but he doesn't have a lot of Linux experience.
He needs to replace a network card in one of the servers. Once he plugged it in and replaced the card, he need to verify it's properly recognized by the system."

@andreasn

This comment has been minimized.

Copy link
Contributor

commented Mar 14, 2017

Oh, and what is a third thing that is common to replace that is not a hard drive or a network card on a server?

@garrett

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2017

@mareklibra: I was waiting for feedback, so I focused work on other tasks meanwhile.

One big sticking point is that the personas do not address what end goal people are actually trying to achieve. Sure, they're using technology to do something — what why? (Surely not just to play around with IOMMU or SR-IOV.)

And what hardware are people looking at and why are they looking at it? In the lists I shared, there's a LOT of hardware information; we certainly don't want to show all of that. (There's even more available too, but I pruned it down a bit.) There would be a disclosure to expand the item to view more information, but there's an info overload in the list as-is, even on each item.

Do people really care about all the hubs and busses or do are they mainly interested in the actual devices that do things (network, graphics, disks, etc.)? Does the bus even matter or is it really just an implementation detail? (You're never going to have a keyboard over SCSI or PCI... and on a server, you wouldn't think of using a USB disk.) Most non-admins think of hardware as what it can do for them — that is: you store something on a "disk" (whether that is a hard disk, SSD, a virtual disk over iSCSI, etc.), view something on a monitor, use a keyboard for input, CPU/GPU for processing, and so on. From my decades-old experience being an admin (almost a lifetime ago), the admins I knew also thought similarly... and if busses come up, it's just how a certain piece of hardware is attached. What about current admins today — does this still hold true?

And then, at what point is some of the information listed here better served in another part of Cockpit? (Example: We can have partition information here on the disks/volumes area... but perhaps it should just show an overview with a link to another part of Cockpit that's tailor made for storage.)

@garrett

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2017

@andreasn: On a server, I could envision people also upgrading RAM (filling banks, replacing old DIMMs with higher capacity ones) and adding/swapping graphics cards (more cards for parallel processing, upgrading to faster GPUs).

@andreasn

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

Is the consensus that we should reduce this PR to only the first use case of listing hardware for now? @mareklibra @martinpitt @stefwalter @garrett ?

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2017

Sorry for not responding to the comments above earlier.

A couple of thoughts:

  • @andreasn, thanks for rephrasing the story of Adam. Please correct me, there's no semantical difference - the story leads to "List devices" use-case.
  • @garret, the story about SR-IOV is not about "play" but about "spawning virtual functions". In other words: sort of creating new HW (but virtual one). I can rephrase, to be more obvious. And sure, I can explicitely start from the user goal: To spawn virtual functions with more human-intelligible description.
  • @garret, in case of USB: I believe it makes sense to get understanding of whole hub-tree when resolving potential issues.

The HW info in the recent implementation is structured by Class. The classification is standardized and commonly used by admins same as by HW manufacturers.

I understand doubts about level of details to not discourage not-so-skilled users/admins. Anyway, clearly providing information whether particular HW is connected to PCI, USB or SCSI seems to be important to even non-advanced users. And in case of cockpit, we are talking about admins, mostly with "root" access.

I think it's fine to implement in small steps like with the first use case (list recognized HW). But I would like to see agreement about next steps already in early phase.
In more particular, we need the active operations in oVirt use cases. Of course, I can implement them elsewhere but I'm convinced they can be used by wider audience as well.

Active operations like the driver (un)bind are useful. And since their semantics on command-line is error prone, it makes sense to have them in GUI.
Sure, driver (un)bind is a very dangerous thing and in hindsight I would allow it for selected device classes only and with more descriptive waring what might happen.

In general, I don't see Cockpit just like a "display" but like a tool for administrators. And yes, administrators (should) have access to dangerous functionality. The role of the UI is to lead them and provide sufficient information for their decisions. Responsibility relies on a human.

@andreasn

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

@mareklibra yes, it does lead to the conclusion of "I want to list hardware", but the critical thing for me is how will it be listed. The rewritten story takes a couple of things into consideration:

  1. It can't rely on having certain parts of the workflow met by CLI-tools, since Adam in this case is completely unfamiliar with the CLI.
  2. It focuses only on servers, and not workstations, as I think addressing the needs of laptop users is outside the scope of Cockpit.
  3. Vocabulary needs to be adapted to what people are used to from other platforms.

These are considerations that are critical to how it will be designed.

@martinpitt

This comment has been minimized.

Copy link
Member

commented Mar 24, 2017

From my side, splitting this up into smaller steps would indeed be better for the actual landing and code review. But I think it makes sense to first discuss and agree to the whole plan and design here -- otherwise we may get stuck in a dead end.

Active operations like the driver (un)bind are useful. And since their semantics on command-line is error prone, it makes sense to have them in GUI.

That's the bit I still disagree with. IMHO exposing it in the GUI is much more error prone, as it's much easier to trigger it on a device you should't be doing this on. The chance that you "accidentally" unbind a device from the shell is virtually zero; the only relevant danger there is to use the wrong sysfs path, I guess that's what you mean? I still don't see a good general use case for that -- if it's a regular operation for VFIO, then perhaps let's limit un/rebinding to those?

@michalskrivanek

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2017

I think a concept of allowing/disallowing operations per device class makes sense. As long as it is configurable somehow...we do want to eventually provide even more fine grained control (e.g. allow it for NIC PCI device with no IP configured, or in our use case simply defined by oVirt-supplied metadata of which devices are designated for VM use and which are system's)
Many devices do not need anything but display, some are useful in general (e.g. virtual function generic setting), some are "special" where we would wrap it with oVirt functionality for e.g. extra config of https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt

I agree the chance of people screwing things up by mistake is real - safe starting point would definitely be to disallow the operations.

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2017

From the discussion above, following might be generally acceptable consensus:

  • meet requirements for the very basic user story of Adam (as rephrased by @andreasn)
  • for devices with (already) vfio driver bound, allow driver change as a single-step operation:
    • user selects new driver
    • user clicks "Change driver" button, leading to
      • unbind the vfio
      • bind new driver
      • rollback if failure: bind the vfio back (or prompt if error, highly unlikely)
    • this is not only more convenient for the user but safer as well

Follow-ups

There will be decision mechanism whether given device might be configured via

  • vGPU configuration
  • Virtual Functions configuration (spawning)
  • (or nothing for majority of devices)

The sysfs shall provide sufficient information for this decision without the need for additional configuration, I hope.

List Devices

It's still unclear how the list of devices will be represented (Adams story).

From my perspective, I would go the per-class split (see screenshot on the top of this PR).
The classification used in original implementation is commonly accepted and understood and reflects common sense like "See all network cards".

There shall be different views for various buses (PCI, USB, SCSI, ...)
The nature of buses is so different, leading to different sets of properties to render. It would be confusing to mix them.
Or any other option allowing to separate them at the first glance is fine.

In case of USB, the USB tree should be represented somehow since its important for

  • trouble shooting
  • performance optimization
  • power-related issues

The USB tree structure is the very nature of this bus which most users understand.
Example: Why is my USB-3 device which is connected to the USB-3 hub so slow? Because the root hub is 2.0.

Additions

List drivers/modules

I've also implemented list of drivers and related kernel modules.
This is mostly for advanced users for brief look, related troubleshooting most probably requires root shell.

As an use case for less-advanced admins, it's simple to get understanding what drivers are actually used and which devices are using it (some drivers are not bound to particular device class).

What about this functionality?

IOMMU Groups

See story of Frank.

IOMMU Groups are especially related to servers and their understanding is important for further performance tuning.

I would consider it as a more advanced use case, anyway already implemented.

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented May 4, 2017

Any comments, please? If no objection, I can adjust the PR accordingly.

@mareklibra

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2018

Leaving for now. It will be reimplemented in context of VMs, maybe taking snippets from here.

@mareklibra mareklibra closed this Feb 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.