Skip to content

cdo256/cdoedit

Repository files navigation

cdoedit - a really simple graphical text editor

Overview
--------
cdoedit is a simple text file editor for X for my (cdo256's) personal use. It comes with default keybindings that will
be most familiar to those who grew up with Windows. Ctrl+C is copy, Ctrl+V is paste, arrow keys to move the
cursor, Ctrl+Arrows to move by word/paragraph, hold down shift when you move and it selects. All the rest of
this is in the configuration file, config.h. If you edit this file, make sure to rebuild (sudo make install)
to get the changes, since the config file is compiled in to the executable. I haven't tested this on any
other platform so this could corrupt files you try to edit, pick up files from the wrong directory or
otherwise misbehave so keep a keen eye on it and be sure you back up the original file before you save.


Requirements
------------
In order to build cdoedit you need the Xlib header files.


Installation
------------
Edit config.mk to match your local setup (cdoedit is installed into
the /usr/local namespace by default).

Afterwards enter the following command to build and install cdoedit (if
necessary as root):

    make clean install


Credits
-------
cdoedit is based on Suckless' Simple Terminal (st).
st in turn is based on Aurélien APTEL <aurelien dot aptel at gmail dot com> bt source code.


Implementation
--------------

Files
=====
cdoedit.c handles just the grid of glyphys that form the display buffer.
editor.c manages the gap buffer and has convenient edit operations. This file is brand new since the st fork.
x.c does all the interaction with the xserver. This file is mostly unchanged since the st fork.

The Gap Buffer
==============
The core of the Document structure in editor.c is a gap buffer. This stores all the characters in the document
in a way that's more efficient than a compact array (with no gap) while still maintaining a lot of simplicity.
It works by splitting a single buffer into three regions: the left section, the gap, and the right section like
so:

      __left___  ______gap______  _________right__________
     /         \/               \/                        \
     [Hello wor]/////////////////[ld! This is a text file.]
     ^         ^                 ^                        ^
  bufstart  gapleft           gapright                  bufend

The concrete document is simply gotten by concatenating the left and right buffers. The contents of the gap
section is never read (unless one of the gap pointers is first moved). To grow the document around the gap, the
text is inserted at gapleft and the gapleft pointer is moved to point to the end of the newly inserted string.

This is a simple data structure for sequences of values expected to have most edits clustered close together.
Text editors are a perfect example of having edits close together because very often when you're editing a text
document, you're either navigating or typing sequentially or editing/deleting for long sequences in a small
part of the document. So a gap buffer should be a fast (and relatively small) method of representing an editable
text document. Even when doing a find-and- replace (a feature not supported in cdoedit) on something spread out
throughout the document works (as long as it's sequential) it only has to move as many characters as the
document is long.

Gap placement
=============
One consideration of this structure is where to put the gap. There are two main options:
 1. Have the gap always be where the cursor is.
 2. Have the gap be where the last edit was.

There are more options for example using some huristic to work out where the next edit point will be but we want
to keep things simple. #1 would slow down large navigations but #2 would put the delay on the first edit after a
navigation. Because text editors are designed to be used by humans, a large navigation isn't usually immediately
followed by an edit because it takes a little time for the user to see where they are in the the document and move
the cursor more finely to where they want to type. Whereas when typing starts, it's usually lots of little edits
in quick succession and a delay to the user to the first edit could hurt the user experience. Because of these
reasons I've opted to have the gap always follow the cursor precisely.

One obvious exception here is that jumping to the start or end of the document doesn't really require the user
looking at the surrounding context, because they know they just want to quickly add something to the end of the
document. So they can Ctrl+End, press enter and immediately start typing. Thankfully for these kinds of cases
user input is buffered by the X server so no key strokes are lost.

Realloc spike
=============
Another big performance consideration is the situation when the buffer needs to grow. This can happen at any time
when the user is in the middle of editing some text and may require copying over the entire document. When this
happens, the user experiences a delay proportional to the size of the document. This is because, the gap has got
too small and the whole buffer must be reallocated and the right section must be moved to the end of the new
buffer. This can be mitigated by doing the reallocation during idle time or eleminated entirely by taking
advantage of virtual memory.

The simplest way to mitigate this is to preemptively grow the buffer during idle time before the buffer is actually
needed. This could be done after a keypress in anticipation of the next, before calling select() in the main loop,
or after a certain period of inactivity.

Eliminating the reallocation spike
==================================
The reallocation spike is fundamental to the data-structure and forces us to make a trade off between space and
worst-case latency of a key-press. Either we start with a buffer much larger to accomodate most of the expansion
and hog much more memory than we need or we keep the buffer tight but take the ux hit of more frequent reallocation
(or somewhere between these two extremes).

All is not lost however. A solution exists that only involves minimal code changes, but gives us both sides of the
trade-off for free with basically no downside. The only problem is that this requires a reallocation method that
grows _downward_, which cannot be written using just the POSIX C standard library.

The solution works by taking advantage of virtual memory. We know that chunks can be mapped from anywhere in memory
and will only be mapped when we ask for them. This gives us the option of making a very large gap but only
allocating pages that actually contain data.

[MMMMMM] = Mapped page
[//////] = Unmapped page

Pages: [MMMMMM][MMMMMM][//////][//////][//////][//////]........[//////][//////][//////][MMMMMM][MMMMMM]
Data:  [This side ]////////////////////////////////////........///////////////////////////[is far away]
       \___left___/\____________________________(very large) gap_________________________/\___right___/

This not only solves the reallocation spike problem but actually reduces the memory footprint of the document to
be at most file_len + page_size * 2 instead of file_len * 2 which could make a difference on memory-constrained
systems but won't matter at all in most cases.

This leaves us with two related questions:
 1. Can we get this to play nicely with malloc?
 2. How do we decide size and location in memory?

This unfortunately doesn't appear to be currently possible in the current version of Linux (as of December 2019).
Since there's no way to reserve the gap pages so that (A) the pages cannot be reused by malloc and (B) these pages
don't count toward the total system memory usage. These points could be remedied by simply telling malloc to not
allocate on our turf but there doesn't seem to be an easy way to do this that's also robust enough to not break
with some internal change to libc's malloc. I've looked for allocation libraries that can do this but they appear
to either not exist anywhere or just not exist on GitHub. Point B is a problem because otherwise the kernel will
just refuse to allocate the buffer spitting out ENOMEM.

Alternatively, a reverse reallocator could be written that would allow for reallocation which would keep the
pages at the end of the allocated block instead of at the beginning (as realloc() currently does). This currently
also doesn't appear to exist anywhere so it'd have to be a project for a future data.

The story on Windows is a lot more optimistic. It's a simple case of calling VirtualAlloc() with MEM_RESERVE but
delay committing the memory until it's actually needed. This appears to do exactly what we want with no caveats!
One of the few cases where Windows is actually better than Linux.

Because of the difficulty in implementing this on Linux, and because the gain is so slight with the kind of files
I usually edit, I've decided to leave this as an exercise for the reader.

The Update Set
==============
Because I found I was repeating a lot of the same code to update various utility pointers in the Document structure,
I decided to write a structure of pointers to keep track of them and functions to update the pointers when the
document four key pointers (bufstart, gapleft, gapright, and bufend) changes. This can currently happen in one of
four situations:
 1. The buffer is reallocated to grow the gap.
 2. The gap is moved due to navigation (navigation means cursor position change).
 3. Some text is inserted into the buffer.
 4. Some text is deleted from the buffer.

It's worth mentioning at this point using document indexes instead of pointers would allow situations 1 and 2 to
not require any additional updates. Specifically these indexes would have to be indexes to left ++ right (ie. the
the buffer with the gap removed). This means that when the gap changes size (situation 1) or moves (situation 2),
the index will still refer to the same point. The reason why I chose pointers instead of indexes is for simplicity,
looking up the character from a pointer address just requires a bounds check. Indexes would likely be similar in
most other ways.

The solution I employed is a (multi-)set of all the active pointers to the document. Whenever one of the above four
situations occurs. All of the pointers to the document are updated. In order for this to work, the following data-
structure invariants must be maintained:
 1. All pointers in the update set must be valid and point to pointers that point to the document that owns the
  update set.
 2. Whenever one of the four situations occurs, the corresponding dupdateon[a-z]*() functions should be called
  before any reads/writes to the pointers pointed to by pointers in the update set.
 
The reason why multiset instead of set is to make add and remove symmetric. For example in a normal (not multi) set:

 ({1,2,3} u {1}) \ {1} = {1,2,3} \ {1} = {2,3}

But in a multiset union 'u' and set difference '\':

 ({1,2,3} u {1}) \ {1} = {1,1,2,3} \ {1} = {1,2,3}