- Work with C strings as
chararrays with null terminator. - Understand equivalence of C pointers and array references.
- Use & operator to obtain a reference to an array element (or equivalently, to a subarray).
- Revisit Java's pass-by-value with references (pass-by-value-of-reference) as it manifests in C (pointers and pass-by-pointer).
- Recognize that the signature of a function or API is not sufficient to specify or understand how to use the API safely.
- Understand the importance of mutability and ownership of reference/pointer arguments to use an API safely.
- Expose yourself to space allocation in its simplest form: a log abstraction.
- Give an excuse to take another look at facileASM.
Follow the same basic steps as for all C coding labs.
-
Fork this repository and mark it private.
-
Clone (
git clone) your newly forked repository into your development directory (folder) for your Linux container. -
Change directory (
cd) into that folder andmake. -
Edit the source file (
kvlog_student.c). You may ignore the main program (kvlog.c).
Remember: always save your changes and make again to test your code. When you are done:
git add kvlog_student.c statement.txt; git commit -m "your message"; git pushIn CPS 201 you explored the map interface and abstraction, and compared various implementations of the interface.
In this lab, we have a main program that reads comma-separated map pairs (string key, integer value) from standard input,
one pair per line, and inserts the pairs into a map. The map is partially implemented in kvlog_student.c.
Your assignment is to finish it. Let's get started:
make
That builds a binary executable called kvlog.
Your can run kvlog in the usual way. Don't forget the ./, and be sure you understand why it is needed.
./kvlogWhat is it doing? It's just sitting there.
We've seen this before: this Linux process is waiting for input from its standard input. Since the program is
running (in the foreground), every line you type now goes to its standard input, until you finish with ctrl-d,
which causes the process to read an end-of-file (EOF) from its standard input and finish (exit).
You can type some pairs. Just make it up. The string keys should not have spaces. Maybe:
hullo,123
seeya,456
These pairs are inserted into the kvlog map. You can also query:
?hullo
123
?seeya
123
Wait, what? That value for seeya should have been 456.
End your input with ctrl-d. At exit, kvlog prints out its full map.
seeya,123
seeya,456
Not what you expected? Something is wrong. Run it again and try some more pairs and queries to gain some insight.
When something goes haywire, we take a deep breath and look at the code.
In fact, kv_student.c has a complete and reasonable-looking implementation. But it doesn't work.
It saves each pair in a struct pair (pair_t) record. It allocates a new pair record for each new pair. How does it allocate
the space? It uses a static array of pairs and just takes the next one for each new insert. That's called a log: each new entry is appended on the end,
so the entries appear in temporal order. Then lookup searches the whole log.
Simple! It has some weaknesses, but it gets it done.
Think about it. What if it runs out of space in the log? What does lookup cost? What
happens if you enter a new value for an old key? Are these behaviors reasonable? Where they are not, can you think of a fix?
What if you want to delete a key or pair from the map: how would you do it?
There is lots to critique in this implementation, but the real problem is deeper. A comment gives you a hint.
You don't need to look at kvlog.c. It has some hairball code to deal with processing the input safely. That is not our
concern. But if you did look at it, you would see one important thing: it uses a static char array as a string buffer. (Actually there are two
buffers.) Each new line of input overwrites the previous line in the buffer. Then it passes the key string input to kvlog insert or lookup
as a reference into its buffer. But it overwrites the buffer to handle the next line after insert or lookup returns. The referenced string
value is mutable: its value changes while the map still holds a reference to it.
We tell you
that to save you from having to look at kvlog.c yourself. Safe input processing is important, but you don't need to understand
it now.
Your mission is to fix kvlog_student.c so that it works for mutable key strings. The best way to do that is to save your own copy of each inserted key string before returning. We recommend to copy the key string with a simple loop through the key string one character at a time. If you prefer you may use the C library function strncpy, but we have not gotten to that yet.
Where to get space to save the string?
-
Not in a local variable of
insert, because local variables are destroyed after a function returns. -
Add space for a key string buffer in
pair_t? Maybe. But you do not know how big each key is. You don't want to waste space for small keys or run out of space for long keys. And the string does not have to reside in thepair_trecord: we can put it somewhere else and just put a reference (pointer) in thepair_t, like the default code. You don't have to changepair_tat all. -
In Java you might just say
new StringBuffer. And you can do something like that in C (malloc), but we have not talked about that yet. Java string operations and evenStringBufferdo some crazy and expensive stuff to handle variable length strings. But Java hides it from you, so you can ignore it until you have to figure out why your program is so heavy and slow. Here we want to bring you into closer contact with managing space efficiently.
We recommend that you use a simple form of the most efficient approach. Use a static char array as a buffer of some reasonable size. 10,000
characters is enough to pass all the tests. You can use the log idea to allocate space in the buffer as needed to save each string, similarly to
the pair_t pool (the pairs array).
Multiple strings can reside back-to-back in the same buffer like this, tight-packed like a can of sardines. Efficient! But be sure to separate them: terminate each string with its null terminator character.
Run make to rebuild kvlog after making your changes. Changing the C code has no
effect until you save your changes and recompile/rebuild.
You can test by typing pairs and queries to standard input. If that becomes tiresome, you can
use cat or a text editor to put them in a file, and then pass it to the standard input using <:
./kvlog <myinput.txt
Add, commit, and push kvlog_student.c to your git repo.
Also create, add, and commit a text file called statement.txt disclosing any sources of assistance for the lab.
git add kvlog_student.c statement.txt
git commit -m "finished kvlog"
git pushIn future labs, you will be responsible for typing the commands to add, commit, and push your files.
Note: The commit message, which comes after -m, can be whatever you
want. It is a short string describing the changes in this commit.
Finally, submit your completed lab to Gradescope via the GitLab submission button.
This code can be used as an extension to the facile assembler. Consider the branch instructions in facile. (See Section 3 of the Facile doc.) The target of a branch instruction is a constant: a code address of the instruction to jump to. If a programmer puts the wrong number in a branch instruction, the code jumps to the wrong place. Nothing good can come of it.
Real assemblers make it easier for ASM programmers to use branch instructions safely. They permit the ASM code to add instruction labels. If an instruction is used as a branch target, the programmer adds a new line above the instruction with a string label, e.g.:
addit:
add r0,r1
The programmer can then use a label in a branch instruction, e.g.:
bnz r0,addit
To handle labels, the assembler must keep a log or map of label names and the corresponding instruction address. Then, when it encounters a branch with a label, it looks up the label in the map to retrieve the target instruction address. Then it fills it into the branch instruction as the branch target.
The ASM code works even if the programmer adds or removes instructions before the target, changing its address. It saves the programmer from having to update all the branch targets to get their code to work.
That is why we call the kvlog a label log. But of course maps are useful for many other purposes as well.