Strings and String Constants

We already know what a string is. It is an array of non-null characters terminated by a null byte.

"xxx" is called a string literal or a string constant
- do not confuse it with a character constant, e.g. 'A', as it uses single quotes. In contrast to Python, for example, single and double quotes are two different things in C.
a string constant internally initializes an array of characters, with a null character appended.
we already know that a string is a contiguous sequence of characters terminated by and including the first null character
- so, a string literal may include multiple null characters, thus defining multiple strings. In other words, a string literal and a string are two different things.

$ cat main.c
#include <stdio.h>

int
main(void)
{
	printf("%s\n", "hello\0world.");
}
$ cc main.c
$ ./a.out
hello

note that '\0' is just a special case of octal representation '\ooo', called octal escape sequence, where o is an octal figure (0-7). You can use any ASCII character like that.
- the full syntax is '\o', '\oo', or '\ooo'

$ cat main.c
/*
 * The code assumes that our environment uses ASCII but that is not mandatory.
 * See 5.2.1 Character Sets in the C99 standard for more information.
 */
#include <stdio.h>

int
main(void)
{
	/* Check the ascii manual page that 132 is 'Z', and 71 is '9'. */
	printf("%c\n", '\132');
	printf("%c\n", '\71');
	/* Will print "a<tab>b" */
	printf("a\11b\n");
}
$ ./a.out
Z
9
a	b

a string constant may be used to initialize a char array and usually that is how the string initialization is used (in contrast to { 'a', 'b', ... })

int
main(void)
{
	char s[] = "hello, world.";

	printf("%s\n", s);
}

$ gcc -Wall -Wextra main.c
$ ./a.out
hello, world.

remember that if {} is used for the initialization, you must add the terminating zero yourself unless you use the size of the array and the string was shorter (in which case the rest would be initialized to zero as usual):

char s[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

so, the following is still a propertly terminated string but do not use it like that

char s[10] = { 'h', 'e', 'l', 'l', 'o' };

Anyway, you would probably never do it this way. Just use = "hello".

👀 array-fill.c 👀 array-fill2.c

string is printed via printf() as %s

printf("%s\n", "hello, world");

experiment with what %5s and %.5s do (use any reasonable number)

👀 string-format.c

Warm-up

🔧 extend the program 👀 tr-d-chars.c from last time to translate character input, e.g.

tail /etc/passwd | tr 'abcdefgh' '123#'

use character arrays defined with string literals to represent the 2 strings
- see the tr(1) man page on what needs to happen if the 1st string is longer than the 2nd
  - do not store the expanded 2nd string as literal in your program ! (i.e. not 123#####)

🔑 tr.c

🔧 (bonus): refactor the code into a function(s)

remember that arrays are passed into a function as a pointer (to be explained soon, not needed now) which can be used inside the function with an array subscript.

The `for` loop

Formally defined as:

for (<init>; <predicate>; <update>)
	<body>

It is the same as:

<init>
while (<predicate>) {
	<body>
	<update>
}

Using the for loop is very often easier and more readable.

Example:

int i;
for (i = 0; i < 10; ++i) {
	printf("%d\n", i);
}

Or in C99:

for (int i = 0; i < 10; ++i) {
	printf("%d\n", i);
}

the break statement terminates execution of the loop (i.e. it jumps right after the enclosing })
with the continue statement, the execution jumps to the end of the loop body (i.e. it jumps at the enclosing }). That means the execution continues with the <predicate> test.
both break and continue statements relate to the smallest enclosing loop they are executed in

👀 for.c

🔧 task: compute minimum of averages of lines in 2-D integer array (of arbitrary dimensions) that have positive values (if there is negative value in given line, do not use the line for computation).

👀 2darray-min-avg.c

Expressions

"In mathematics, an expression or mathematical expression is a finite combination of symbols that is well-formed according to rules that depend on the context."

In C, expressions are, amongst others, variables, constants, strings, and expressions in parentheses. Also arithmetic expressions, relational expressions, function calls, sizeof, assignments, and ternary expressions.

http://en.cppreference.com/w/c/language/expressions

In C99, an expression can produce results (2+2 gets 4) or generate side effects (printf("foo") sends a string literal to the standard output).

👀 expression-statement.c

🔧 task: make the warning an error with your choice of compiler (would be a variant of -W in GCC)

Statements

Statements are (only from what we already learned), expressions, selections (if, switch (not introduced yet)), {} blocks (known as compounds), iterations, and jumps (goto (not introduced yet), continue, break, return).

http://en.cppreference.com/w/c/language/statements

Basically, statements are pieces of code executed in sequence. The function body is a compound statement. The compound statement (a.k.a. block) is a sequence of statements and declarations. Blocks can be nested. Blocks inside a function body are good for variable reuse.

A semicolon is not used after a compound statement but it is allowed. The following is valid code then:

int
main(void)
{
	{ }
	{ };
}

A declaration is not a statement (there are subtle consequences, we can show them later).

Some statements must end with ;. For example, expression statements. The following are all valid expression statements. They do not make much sense though and may generate a warning about an unused result.

/* this one is not a statement */
char c;

/* these are all expression statements */
c;
1 + 1;
1000;
"hello";

👀 null-statement.c 👀 compound-statement.c

w.r.t. compound statement vs. expression:

It is not allowed to have a compound statement within an expression.
- That said, GCC has a language extension (gcc99) that can be used to allow this - the reason is for protecting multiple evaluation within macros.
- The C99 standard does not define it.
- gotcha: the following code has to be compiled with gcc with the -pedantic-errors option in order to reveal the problem

gcc -std=c99 -Wall -Wextra -pedantic-errors compound-statement-invalid.c

compound-statement-invalid.c: In function ‘main’:
compound-statement-invalid.c:8:6: error: ISO C forbids braced-groups within expressions [-Wpedantic]
    8 |  if (({ i *= 2; puts("doubled");}), i % 2 == 0) {
      |      ^

Our recommendation is to always use these options.

👀 compound-statement-invalid.c

Pointers

Motivation:

memory allocation / shared memory
protocol buffer parsing
pointer arithmetics
the value stored in a pointer variable is the address of the memory storing the given pointer type
declared like this, e.g.

int *p; // pointer to an int

note on style:

int * p;
int *p;  // preferred in this lecture

a pointer is always associated with a type
to access the object the pointer points to, use a dereference operator *:

printf("%d", *p);

the dereference is also used to write a value to the variable pointed:

*p = 5;

in a declaration, you may assign like this:

int i = 5;
int *p = &i;

good practice is to prefix pointers with the 'p' letter, e.g.:

int val;
int *pval = &val;

the & is an address-of operator and gets the address of the variable (i.e. the memory address of where the value is stored)
the pointer itself is obviously stored in memory too (it's just another variable). With the declarations above, it looks as follows:

     p
     +---------------+
     |     addr2     |
     +---------------+        i
     ^                        +-------+
     |                        | 5     |
   addr1                      +-------+
                              ^
                              |
                            addr2

the size of the pointer depends on the architecture and the way the program was compiled (see -m32 / -m64 command line switches of gcc)
sizeof (p) will return the amount of memory to store the address of the object the pointer points to
sizeof (*p) will return the amount needed to store the object the pointer points to

🔧 write a program to print:

address of the pointer
the address where it points to
the value of the pointed to variable

Use the %p formatting for the first two.

🔑 ptr-basics.c

Null pointer

the real danger with pointers is that invalid memory access results in a crash (the program is terminated by kernel)
can assign a number directly to a pointer (that should trigger a warning though. We will get to casting and how to fix that later).

int *p = 0x1234;

zero pointer value is called a null pointer constant and is defined as a macro NULL in the C specification. NULL is converted to a null pointer which is guaranteed in C not to point to any object or function. In other words, dereferencing a null pointer is guaranteed to terminate the program.
- this is because zero address is left unmapped on purpose, or a page that cannot be accessed maps to the address.
- the C specification says that the macro NULL must be defined in <stddef.h>

🔧 create the null pointer and try to read from it / write to it

🔑 null-ptr.c

Basic operations

notice the difference:

int i;
int *p = &i;

vs.

int i;
int *p;

// set value of the pointer (i.e. the address where it points to).
p = &i;

operator precedence gotcha:

*p		// value of the pointed to variable
*p + 1		// the value + 1
*(p + 1)	// the value on the address + 1 (see below for what it means)

note that the & reference operator is possible to use only on variables
- thus this is invalid:

p = &(i + 1);

store value to the address pointed to by the pointer:

*p = 1;

Changing pointers:

Pointers can be moved forward and backward

p = p + 1;
p++;
p--;

The pointer is moved by the amount of underlying data type when using arithmetics.

🔧 create a pointer to an int, print it out, create a new pointer that points to p + 1. See what is the difference between the 2 pointers.

🔑 ptr-diff.c

Operator gotchas

* has bigger precedence than + so:

i = *p + 1;

is equal to

i = (*p) + 1;

postfix ++ has higher precedence than *:

i = *p++;

is evaluated as *(p++) but it still does this

i = *p; p++;

because ++ is used in postfix mode, ie. the value of expression p++ is p:

File API

Part of the standard since C90.

Opening/closing

fopen opens a file and returns an opaque handle (pointer)
- getting NULL means an error
- the mode argument controls the behavior: read, write, append
  - the + adds the other mode (write for read and vice versa, read for append)
- write mode creates the file if it does not exist
- the b binary mode usually does not have any effect (see the standard)
fclose closes the handle
- important to avoid resource leak (fopen can allocate both memory and file descriptor)
freopen can be used to associate the standard streams (stderr, stdin, or stdout) with a file

🔧 write a code that opens the same file in an cycle (until fopen() fails) without calling fclose() on the handle. After how many iterations does it fail on your system?

👀 fopen-leak.c

I/O

fprintf - printf to a stream
fscanf
- basically parses text input from a stream according to format string
- except the format string all the parameters must be pointers
fputs/fgets - send/read string to/from a stream
fputc/fgetc - send/read char to/from a stream
fwrite/fread - for writing/reading binary data (such as structures or raw numeric types)

Read a file

fread() reads a selected number of items of a given size to memory. We can use either an array or we can directly read to a variable through an operator address-of. In our case, we will be reading a file byte by byte, so we can give fread() just an address of a character variable.

char c;
FILE *fp;

/* Choose any other file you have on your system. */
if ((fp = fopen("/etc/passwd", "r")) == NULL)
	err(1, "fopen");

/*
 * fread() returns a number of *items* read.  In our case, it's the same
 * as number of bytes as we read it one byte at a time.
 */
while (fread(&c, sizeof (c), 1, fp) == 1) {
	putchar(c);
}

fclose(fp);

👀 read-file.c

🔧 Note that you could read more characters at a time. However, keep in mind the 2nd argument is size of the element read, and the 3rd argument is how many elements we read in one call. For example:

char a[16];
...
while ((n = fread(a, 1, sizeof (a), fp)) > 0) {
	/* process the bytes here */

	/* if we read less than requested, we hit end of file */
	if (n < sizeof (a))
		break;
}

Check the solution here: :key: read-file2.c

Also check manual page for fread() and ignore for now that the 1st argument is of type void *, we will get there later. As mentioned above, you can safely put there an array or an address of a variable.

🔧 Check the man page for fwrite() and modify the code so that what is read from the file you write to some other file. Do not remember to open the output file for writing. All the details are in the man page.

Seeking

When reading/writing to a file using the above function, the current position changes accordingly. However, the position can be manipulated without performing any I/O.

fseek - moves the position
- the whence parameter has 3 possible values and makes the offset parameter relative to:
  - SEEK_SET - the beginning of the file
  - SEEK_END - the end of the file
  - SEEK_CUR - the current location of the cursor in the file
ftell - get current position in the file

`err`() family of functions

not defined by any standard however handy
present in BSD, glibc (= Linux distros), Solaris, macOS, etc.
use the <err.h> include, see the err(3) man page
this is especially handy when working with I/O
instead of writing:

if (error) {
	fprintf(stderr, "error occured: %s\n", strerror(errno));
	exit(1);
}

write the following

if (error)
     err(1, "error occured: ");

notice that a newline is inserted automatically at the end of the error message.
or for functions that do not modify the errno value, use the x variant:

if (some_error)
	errx(1, "ERROR: %d", some_error);

there's also warn()/warnx() that do not exit the program

🔧 Home assignment

Note that home assignments are entirely voluntary but writing code is the only way to learn a programming language.

File of integers

🔧 get a file size using the standard IO API (that is, lseek(2) is prohibited even if you know it).

🔧 create array of int values (of arbitrary positive length with values ranging from INT_MAX to 0), write the array to a file, read the values into another array and print them to the standard error. Between the writing and reading the file handle has to remain open. Use the same file handle for reading and writing. Use od(1) to verify the content of the file (thus it is handy to start with INT_MAX and e.g. divide by 2 for each successive value).

🔑 fopen-binary.c

🔧 use the file created by the previous program. Read the values from the end of the file to the beginning of the file one by one without knowing the file size and print the numbers to the standard error output.

File read

🔧 create a text file where each line begins with integer followed by space and a string, e.g.:

42 towel
13 dwarfs and Snow White

Read the file using fscanf() and print the values (i.e. integer and a string) from each line to standard output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

06.md

06.md

Strings and String Constants

Warm-up

The `for` loop

Expressions

Statements

w.r.t. compound statement vs. expression:

Pointers

Null pointer

Basic operations

Changing pointers:

Operator gotchas

File API

Opening/closing

I/O

Read a file

Seeking

`err`() family of functions

🔧 Home assignment

File of integers

File read

Files

06.md

Latest commit

History

06.md

File metadata and controls

Strings and String Constants

Warm-up

The for loop

Expressions

Statements

w.r.t. compound statement vs. expression:

Pointers

Null pointer

Basic operations

Changing pointers:

Operator gotchas

File API

Opening/closing

I/O

Read a file

Seeking

err() family of functions

🔧 Home assignment

File of integers

File read

The `for` loop

`err`() family of functions