Skip to content

HemanB/kvlog

Repository files navigation

Kvlog lab

Objectives

  • Work with C strings as char arrays with null terminator.
  • Understand equivalence of C pointers and array references.
  • Use & operator to obtain a reference to an array element (or equivalently, to a subarray).
  • Revisit Java's pass-by-value with references (pass-by-value-of-reference) as it manifests in C (pointers and pass-by-pointer).
  • Recognize that the signature of a function or API is not sufficient to specify or understand how to use the API safely.
  • Understand the importance of mutability and ownership of reference/pointer arguments to use an API safely.
  • Expose yourself to space allocation in its simplest form: a log abstraction.
  • Give an excuse to take another look at facileASM.

Preliminaries

Follow the same basic steps as for all C coding labs.

  1. Fork this repository and mark it private.

  2. Clone (git clone) your newly forked repository into your development directory (folder) for your Linux container.

  3. Change directory (cd) into that folder and make.

  4. Edit the source file (kvlog_student.c). You may ignore the main program (kvlog.c).

Remember: always save your changes and make again to test your code. When you are done:

git add kvlog_student.c statement.txt; git commit -m "your message"; git push

Get started

In CPS 201 you explored the map interface and abstraction, and compared various implementations of the interface.

In this lab, we have a main program that reads comma-separated map pairs (string key, integer value) from standard input, one pair per line, and inserts the pairs into a map. The map is partially implemented in kvlog_student.c. Your assignment is to finish it. Let's get started:

make

That builds a binary executable called kvlog.

Your can run kvlog in the usual way. Don't forget the ./, and be sure you understand why it is needed.

./kvlog

What is it doing? It's just sitting there.

We've seen this before: this Linux process is waiting for input from its standard input. Since the program is running (in the foreground), every line you type now goes to its standard input, until you finish with ctrl-d, which causes the process to read an end-of-file (EOF) from its standard input and finish (exit).

You can type some pairs. Just make it up. The string keys should not have spaces. Maybe:

hullo,123
seeya,456

These pairs are inserted into the kvlog map. You can also query:

?hullo
123
?seeya
123

Wait, what? That value for seeya should have been 456.

End your input with ctrl-d. At exit, kvlog prints out its full map.

seeya,123
seeya,456

Not what you expected? Something is wrong. Run it again and try some more pairs and queries to gain some insight.

What now?

When something goes haywire, we take a deep breath and look at the code.

In fact, kv_student.c has a complete and reasonable-looking implementation. But it doesn't work.

It saves each pair in a struct pair (pair_t) record. It allocates a new pair record for each new pair. How does it allocate the space? It uses a static array of pairs and just takes the next one for each new insert. That's called a log: each new entry is appended on the end, so the entries appear in temporal order. Then lookup searches the whole log.

Simple! It has some weaknesses, but it gets it done.

Think about it. What if it runs out of space in the log? What does lookup cost? What happens if you enter a new value for an old key? Are these behaviors reasonable? Where they are not, can you think of a fix? What if you want to delete a key or pair from the map: how would you do it?

There is lots to critique in this implementation, but the real problem is deeper. A comment gives you a hint.

You don't need to look at kvlog.c. It has some hairball code to deal with processing the input safely. That is not our concern. But if you did look at it, you would see one important thing: it uses a static char array as a string buffer. (Actually there are two buffers.) Each new line of input overwrites the previous line in the buffer. Then it passes the key string input to kvlog insert or lookup as a reference into its buffer. But it overwrites the buffer to handle the next line after insert or lookup returns. The referenced string value is mutable: its value changes while the map still holds a reference to it.

We tell you that to save you from having to look at kvlog.c yourself. Safe input processing is important, but you don't need to understand it now.

Fix it

Your mission is to fix kvlog_student.c so that it works for mutable key strings. The best way to do that is to save your own copy of each inserted key string before returning. We recommend to copy the key string with a simple loop through the key string one character at a time. If you prefer you may use the C library function strncpy, but we have not gotten to that yet.

Where to get space to save the string?

  • Not in a local variable of insert, because local variables are destroyed after a function returns.

  • Add space for a key string buffer in pair_t? Maybe. But you do not know how big each key is. You don't want to waste space for small keys or run out of space for long keys. And the string does not have to reside in the pair_t record: we can put it somewhere else and just put a reference (pointer) in the pair_t, like the default code. You don't have to change pair_t at all.

  • In Java you might just say new StringBuffer. And you can do something like that in C (malloc), but we have not talked about that yet. Java string operations and even StringBuffer do some crazy and expensive stuff to handle variable length strings. But Java hides it from you, so you can ignore it until you have to figure out why your program is so heavy and slow. Here we want to bring you into closer contact with managing space efficiently.

We recommend that you use a simple form of the most efficient approach. Use a static char array as a buffer of some reasonable size. 10,000 characters is enough to pass all the tests. You can use the log idea to allocate space in the buffer as needed to save each string, similarly to the pair_t pool (the pairs array).

Multiple strings can reside back-to-back in the same buffer like this, tight-packed like a can of sardines. Efficient! But be sure to separate them: terminate each string with its null terminator character.

Testing

Run make to rebuild kvlog after making your changes. Changing the C code has no effect until you save your changes and recompile/rebuild.

You can test by typing pairs and queries to standard input. If that becomes tiresome, you can use cat or a text editor to put them in a file, and then pass it to the standard input using <:

./kvlog <myinput.txt

Submission

Add, commit, and push kvlog_student.c to your git repo.

Also create, add, and commit a text file called statement.txt disclosing any sources of assistance for the lab.

git add kvlog_student.c statement.txt
git commit -m "finished kvlog"
git push

In future labs, you will be responsible for typing the commands to add, commit, and push your files.

Note: The commit message, which comes after -m, can be whatever you want. It is a short string describing the changes in this commit.

Finally, submit your completed lab to Gradescope via the GitLab submission button.

Why do we call it Label Log?

This code can be used as an extension to the facile assembler. Consider the branch instructions in facile. (See Section 3 of the Facile doc.) The target of a branch instruction is a constant: a code address of the instruction to jump to. If a programmer puts the wrong number in a branch instruction, the code jumps to the wrong place. Nothing good can come of it.

Real assemblers make it easier for ASM programmers to use branch instructions safely. They permit the ASM code to add instruction labels. If an instruction is used as a branch target, the programmer adds a new line above the instruction with a string label, e.g.:

addit:
   add r0,r1

The programmer can then use a label in a branch instruction, e.g.:

   bnz r0,addit

To handle labels, the assembler must keep a log or map of label names and the corresponding instruction address. Then, when it encounters a branch with a label, it looks up the label in the map to retrieve the target instruction address. Then it fills it into the branch instruction as the branch target.

The ASM code works even if the programmer adds or removes instructions before the target, changing its address. It saves the programmer from having to update all the branch targets to get their code to work.

That is why we call the kvlog a label log. But of course maps are useful for many other purposes as well.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors