Skip to content

Commit e4847e4

Browse files
committed
kernel modules: add a quick scull port from LDD3
Also: * fix fops.c on both kernels: * 5.9: the out of space error code was 1 not 8 * 6.6: for whatever reason we can't read the user buffer as before on the diagnostic print, it leads to segfault and oops * create memfile.c which is like fops.c but of unlimited size
1 parent 3d84ecc commit e4847e4

File tree

16 files changed

+2133
-50
lines changed

16 files changed

+2133
-50
lines changed

README.adoc

Lines changed: 118 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,14 @@ insmod /mnt/9p/out_rootfs_overlay/lkmc/hello.ko
338338

339339
and the new `pr_info` message should now show on the terminal at the end of the boot.
340340

341+
If you are simultaneously developing the test script and the kernel module, some smart test scripts should take the kernel module as first argument so you can directly run:
342+
343+
....
344+
/mnt/9p/rootfs_overlay/lkmc/scull.sh /mnt/9p/out_rootfs_overlay/lkmc/scull.ko
345+
....
346+
347+
and it will pick up both the test script and the kernel module from host.
348+
341349
This works because we have a <<9p>> mount there setup by default, which mounts the host directory that contains the build outputs on the guest:
342350

343351
....
@@ -7682,11 +7690,19 @@ Bibliography: https://stackoverflow.com/questions/5970595/how-to-create-a-device
76827690

76837691
==== File operations
76847692

7685-
File operations are the main method of userland driver communication.
7693+
File operations are the main method of userland driver communication that uses common file system calls such as read and write.
7694+
7695+
Through `struct file_operations` drivers tell the kernel what it should do on filesystem system calls of <<pseudo-filesystems>>.
7696+
7697+
[[fops]]
7698+
===== fops.c
7699+
7700+
This example illustrates the most basic system calls: `open`, `read`, `write`, `close` and `lseek`.
76867701

7687-
`struct file_operations` determines what the kernel will do on filesystem system calls of <<pseudo-filesystems>>.
7702+
* link:kernel_modules/fops.c[]
7703+
* link:rootfs_overlay/lkmc/fops.sh[]
76887704

7689-
This example illustrates the most basic system calls: `open`, `read`, `write`, `close` and `lseek`:
7705+
In it we create a debugfs special file that behaves like a regular file, except that it is stored in memory for as long as the kernel module is loaded, and it has a fixed lengh of 4 bytes. Any longer `write` attempt gets simply truncated up at the end:
76907706

76917707
....
76927708
./fops.sh
@@ -7699,11 +7715,6 @@ Outcome: the test passes:
76997715
0
77007716
....
77017717

7702-
Sources:
7703-
7704-
* link:kernel_modules/fops.c[]
7705-
* link:rootfs_overlay/lkmc/fops.sh[]
7706-
77077718
Then give this a try:
77087719

77097720
....
@@ -7714,6 +7725,14 @@ We have put printks on each fop, so this allows you to see which system calls ar
77147725

77157726
No, there no official documentation: https://stackoverflow.com/questions/15213932/what-are-the-struct-file-operations-arguments
77167727

7728+
[[memfile]]
7729+
====== memfile.c
7730+
7731+
This example behaves the same as <<fops>>, except that the in-memory virtual file has unlimited size. In the kernel module we have therefore to so a bit of memory management and somehow increase the size of the buffer as needed.
7732+
7733+
* link:kernel_modules/memfile.c[]
7734+
* link:rootfs_overlay/lkmc/memfile.sh[]
7735+
77177736
[[seq-file]]
77187737
==== seq_file
77197738

@@ -9994,6 +10013,89 @@ See also:
999410013
* https://stackoverflow.com/questions/5429137/how-to-print-register-values-in-gdb/31340294#31340294
999510014
* https://stackoverflow.com/questions/24169614/how-to-show-all-x86-control-registers-when-debugging-the-linux-kernel-in-gdb-thr/59311764#59311764
999610015

10016+
[[scull]]
10017+
==== scull
10018+
10019+
This kernel module is a port of scull example from LDD3. It was tested on LKMC e1834763088b8a7532b5fae800039de880471f2d + 1 with Linux kernel 6.8.12.
10020+
10021+
"Scull" is an acronym for "Simple Character Utility for Loading Localities". This expansion is mostly meaningless however, but there you are.
10022+
10023+
Source code:
10024+
10025+
* link:kernel_modules/scull.c[]
10026+
* link:rootfs_overlay/lkmc/scull.sh[]
10027+
10028+
Create the devices and test them:
10029+
10030+
....
10031+
scull.sh
10032+
....
10033+
10034+
scull creates several character devices.
10035+
10036+
The most "basic" one is `/dev/scull0`, which acts a bit as an in-memory file, except that it has weird quantizations applied to it so that you can't append as normal and it doesn't really look like a regular file. What it actually is more like is an object pool.
10037+
10038+
The original scull interface is very weird and would erase all data on write-only `O_WRONLY`, but not on read/write `O_RDWR`, which doesn't make much sense:
10039+
10040+
....
10041+
int scull_open(struct inode *inode, struct file *filp) {
10042+
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY)
10043+
scull_trim(dev); /* ignore errors */
10044+
....
10045+
10046+
We have modified that to the much more reasonable:
10047+
10048+
....
10049+
if ((filp->f_flags & O_TRUNC)) {
10050+
....
10051+
10052+
The old weird truncation condition makes the code hard to test as there is no way to write to two different blocks like it and keep them both in memory, unless you are able to find a CLI tool that supports `O_RDWR` or you write a C program to test things.
10053+
10054+
With our new inferface, we can differentiate clear all vs don't clear all in the usual manner, e.g. this clears:
10055+
10056+
....
10057+
echo asdf > /dev/scull0
10058+
....
10059+
10060+
but this doesn't:
10061+
10062+
....
10063+
echo asdf >> /dev/scull0
10064+
....
10065+
10066+
The examples from our test should make its weird behavior clearer e.g.:
10067+
10068+
....
10069+
# Append starts writing from the start of the 4k block, not like the usual semantic.
10070+
printf asdf > "$f"
10071+
printf qw >> "$f"
10072+
[ "$(cat "$f")" = qwdf ]
10073+
10074+
# Overwrite first clears everything, then writes to start of 4k block.
10075+
printf asdf > /dev/${module}0
10076+
printf qw > /dev/${module}0
10077+
[ "$(cat "$f")" = qw ]
10078+
10079+
# Read from the middle
10080+
printf asdf > /dev/${module}0
10081+
[ "$(dd if="$f" bs=1 count=2 skip=2 status=none)" = df ]
10082+
10083+
# Write to the middle
10084+
printf asdf > /dev/${module}0
10085+
printf we | dd of="$f" bs=1 seek=1 conv=notrunc status=none
10086+
[ "$(cat "$f")" = aqwf ]
10087+
...
10088+
10089+
It is also worth noting that the implementation of scull is meant to be "readable" but not optimal:
10090+
10091+
____
10092+
kmalloc is not the most efficient way to allocate large areas of memory (see Chapter 8), so the implementation chosen for scull is not a particularly smart one. The source code for a smart implementation would be more difficult to read, and the aim of this section is to show read and write, not memory management. That’s why the code just uses kmalloc and kfree without resorting to allocation of whole pages, although that approach would be more efficient.
10093+
____
10094+
10095+
Another shortcoming of the example is that it uses mutexes, where rwsem would be the clearly superior choice.
10096+
10097+
This module was derived from https://github.com/martinezjavier/ldd3/tree/30f801cd0157e8dfb41193f471dc00d8ca10239f/scull which had already ported it to much more recent kernel versions for us. Ideally we should just use that repo as a submodule, but we were lazy to setup the buildroot properly for now, and decided to dump it all into a single file to start with.
10098+
999710099
== FreeBSD
999810100

999910101
https://en.wikipedia.org/wiki/FreeBSD
@@ -28112,6 +28214,14 @@ The `--linux-build-id` option should be passed to all scripts that support it, m
2811228214

2811328215
To run both kernels simultaneously, one on each QEMU instance, see: xref:simultaneous-runs[xrefstyle=full].
2811428216

28217+
You can also build <<kernel-modules>> against a specific prebuilt kernel with:
28218+
28219+
....
28220+
./build-modules --linux-build-id v4.16
28221+
....
28222+
28223+
This will then allow you to insmod the kernel modules on your newly built kernel.
28224+
2811528225
==== QEMU build variants
2811628226

2811728227
Analogous to the <<linux-kernel-build-variants>> but with the `--qemu-build-id` option instead:

kernel_modules.code-workspace

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// This workspace exists to work on C files formatted like the Linux kernel,
2+
// notably using tabs instead of space. This is unlike the C files in our userland
3+
// programs, and we couldn't find a better way to make this distinction
4+
// https://stackoverflow.com/questions/47405315/visual-studio-code-and-subfolder-specific-settings
5+
{
6+
"folders": [
7+
{
8+
"path": "."
9+
},
10+
{
11+
"path": "submodules/linux"
12+
}
13+
],
14+
"settings": {
15+
"files.watcherExclude": {
16+
"data/**": true,
17+
".git/**": true,
18+
"out.docker/**": true,
19+
"out/**": true,
20+
"submodules/**": true,
21+
},
22+
"search.exclude": {
23+
"data/**": true,
24+
".git/**": true,
25+
"out.docker/**": true,
26+
"out/**": true,
27+
"submodules/**": true,
28+
},
29+
"[c]": {
30+
"editor.tabSize": 8,
31+
"editor.insertSpaces": false
32+
},
33+
"files.associations": {
34+
"rwsem.h": "c"
35+
}
36+
}
37+
}

kernel_modules/.vscode/settings.json

Lines changed: 0 additions & 6 deletions
This file was deleted.

kernel_modules/fops.c

Lines changed: 31 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/* https://cirosantilli.com/linux-kernel-module-cheat#file-operations */
1+
/* https://cirosantilli.com/linux-kernel-module-cheat#fops */
22

33
#include <linux/debugfs.h>
44
#include <linux/errno.h> /* EFAULT */
@@ -10,7 +10,15 @@
1010
#include <uapi/linux/stat.h> /* S_IRUSR */
1111

1212
static struct dentry *debugfs_file;
13+
// The buffer can be stored in two ways: static module data or kmalloc.
14+
#define STATIC 1
15+
#if STATIC
1316
static char data[] = {'a', 'b', 'c', 'd'};
17+
#define BUFLEN sizeof(data)
18+
#else
19+
static char *data;
20+
#define BUFLEN 4
21+
#endif
1422

1523
static int open(struct inode *inode, struct file *filp)
1624
{
@@ -19,29 +27,26 @@ static int open(struct inode *inode, struct file *filp)
1927
}
2028

2129
/* @param[in,out] off: gives the initial position into the buffer.
22-
* We must increment this by the ammount of bytes read.
30+
* We must increment this by the amount of bytes read.
2331
* Then when userland reads the same file descriptor again,
2432
* we start from that point instead.
2533
*/
2634
static ssize_t read(struct file *filp, char __user *buf, size_t len, loff_t *off)
2735
{
2836
ssize_t ret;
2937

30-
pr_info("read\n");
31-
pr_info("len = %zu\n", len);
32-
pr_info("off = %lld\n", (long long)*off);
33-
if (sizeof(data) <= *off) {
38+
pr_info("read len=%zu off=%lld\n", len, (long long)*off);
39+
if (BUFLEN <= *off) {
3440
ret = 0;
3541
} else {
36-
ret = min(len, sizeof(data) - (size_t)*off);
42+
ret = min(len, BUFLEN - (size_t)*off);
3743
if (copy_to_user(buf, data + *off, ret)) {
3844
ret = -EFAULT;
3945
} else {
4046
*off += ret;
4147
}
4248
}
43-
pr_info("buf = %.*s\n", (int)len, buf);
44-
pr_info("ret = %lld\n", (long long)ret);
49+
pr_info("ret=%lld\n", (long long)ret);
4550
return ret;
4651
}
4752

@@ -54,13 +59,11 @@ static ssize_t write(struct file *filp, const char __user *buf, size_t len, loff
5459
{
5560
ssize_t ret;
5661

57-
pr_info("write\n");
58-
pr_info("len = %zu\n", len);
59-
pr_info("off = %lld\n", (long long)*off);
60-
if (sizeof(data) <= *off) {
62+
pr_info("write len=%zu off=%lld\n", len, (long long)*off);
63+
if (BUFLEN <= *off) {
6164
ret = 0;
6265
} else {
63-
if (sizeof(data) - (size_t)*off < len) {
66+
if (BUFLEN - (size_t)*off < len) {
6467
ret = -ENOSPC;
6568
} else {
6669
if (copy_from_user(data + *off, buf, len)) {
@@ -89,9 +92,7 @@ static loff_t llseek(struct file *filp, loff_t off, int whence)
8992
{
9093
loff_t newpos;
9194

92-
pr_info("llseek\n");
93-
pr_info("off = %lld\n", (long long)off);
94-
pr_info("whence = %lld\n", (long long)whence);
95+
pr_info("llseek off=%lld whence=%lld\n", (long long)off, (long long)whence);
9596
switch(whence) {
9697
case SEEK_SET:
9798
newpos = off;
@@ -100,7 +101,7 @@ static loff_t llseek(struct file *filp, loff_t off, int whence)
100101
newpos = filp->f_pos + off;
101102
break;
102103
case SEEK_END:
103-
newpos = sizeof(data) + off;
104+
newpos = BUFLEN + off;
104105
break;
105106
default:
106107
return -EINVAL;
@@ -124,12 +125,24 @@ static const struct file_operations fops = {
124125

125126
static int myinit(void)
126127
{
128+
#if STATIC == 0
129+
data = kmalloc(BUFLEN, GFP_KERNEL);
130+
if (!data)
131+
return -ENOMEM;
132+
data[0] = 'a';
133+
data[1] = 'b';
134+
data[2] = 'c';
135+
data[3] = 'd';
136+
#endif
127137
debugfs_file = debugfs_create_file("lkmc_fops", S_IRUSR | S_IWUSR, NULL, NULL, &fops);
128138
return 0;
129139
}
130140

131141
static void myexit(void)
132142
{
143+
#if STATIC == 0
144+
kfree(data);
145+
#endif
133146
debugfs_remove_recursive(debugfs_file);
134147
}
135148

kernel_modules/kernel_modules.code-workspace

Lines changed: 0 additions & 7 deletions
This file was deleted.

0 commit comments

Comments
 (0)