# Input and output using streams

<div class="alert alert-block alert-info">
    You can find all of the C programs in this notebook in the subdirectory containing this notebook:
    <code>./src/io</code>
</div>


Most programs need to perform some sort of input (reading data), or output (writing data), or both.
Rather than dealing with the very low level details of transferring information between various hardware devices, 
C abstracts input and/or output (I/O) operations using streams.

Recall that a stream can be thought of as simply a source or sink of characters or bytes.
A stream of characters represents a text-based data, whereas a
stream of bytes represents a binary data.

The standard library provides functions for managing streams, formatted I/O (such as the
`printf` family of functions), as well I/O functions for individual characters and entire lines.

Operating systems typically support lower level forms of I/O for programmers that require high performance
I/O operations.

The I/O operations discussed in this notebook are declared in `<stdio.h>`.

## The `FILE` type

The data structure that represents a stream in C is an opaque type called `FILE`.
A `FILE` object stores all of the information needed to control an I/O stream, and precisely what
information is stored in the data structure is not readily available to the programmer nor is it specified
by the C standard.

The programmer should never attempt to create a `FILE` object directly:

> FILE objects are allocated and managed internally by the input/output library functions. 
> Don’t try to create your own objects of type FILE; let the library do it. 
> Your programs should deal only with pointers to these objects (that is, FILE * values) 
> rather than the objects themselves." <https://www.gnu.org/software/libc/manual/html_node/Streams.html>

Recall that when working with opaque types, it is common to always use a pointer to the type instead of using
the type directly. All of the standard library functions that deal with streams use a `FILE *` pointer when
specifying a stream.

### Standard streams

There are three standard streams opened and ready for use when the `main` function runs on a hosted environment
(defined in `<stdio.h>`):

```c
FILE *stdin;
FILE *stdout;
FILE *stderr;
```

In Bash, these streams correspond to standard input, standard output, and standard error.

### Buffering

I/O operations are often buffered. Data is stored in a temporary buffer until the buffer is full
(or manually flushed) and then full contents of the buffer are transferred. This is
done because I/O operations can have high latency. Individual streams can have their own buffers.

A stream can be in one of three buffered states:

* unbuffered
    * data is transferred as soon as possible when it appears
    * on Linux systems, `stderr` is typically unbuffered because it is assumed that the system user wants
    to see error and warning messages as soon as they are written
* fully buffered
    * data is transferred when the buffer is full
    * streams that read/write to files are usually fully buffered because writing to an external storage device
    is typically orders of magnitude slower than CPU and memory operations
* line buffered
    * data is transferred as entire lines (delimited by newline character)
    * in Linux, `stdout` and `stdin` are typically line buffered when they are connected to a terminal
    


## I/O overview

The process of reading or writing the contents of a file follows three main steps:

1. Create a connection to the file by *opening* the file.
2. Perform I/O operations via the connection.
3. Disconnect the connection to the file by *closing* the file.

When using streams, opening the file connects a stream to the file. Information regarding the stream is
stored in a `FILE` object and the programmer is given a pointer to the `FILE` object. Connecting a stream
to a file allocates operating system resources to the file which must must be returned to the operating
system when the stream is no longer needed.

I/O operations expect a pointer to a `FILE` object to specify the file on which the operation should be performed.

When all required I/O operations have been performed with a stream, the connection to the stream must be closed
to release the operating system resources allocated to the stream.

## Opening a file

A stream is attached to a file using the `fopen` function:

```c
FILE *fopen( const char *filename, const char *mode );
```

`fopen` opens the file having the indicated `filename` and returns a pointer to the stream associated with the file.
It returns `NULL` if the file could not be opened. On POSIX systems, `errno` is also set if the file could not
be opened.

`filename` is the pathname of the file to be opened. The pathname can be a relative path in which case the
path is relative to the current working directory.

`mode` is a string indicating if the file should be opened for reading, writing, or both operations.

The following program opens a file for reading and then immediately closes the file:

In [None]:
// open.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    char pathname[] = "./src/io/a_file.txt";
    FILE *f = fopen(pathname, "r");
    if (!f) {
        perror("fopen() failed");
        exit(EXIT_FAILURE);
    }
    printf("Successfully opened %s for reading\n", pathname);
    
    fclose(f);
    return 0;
}

There are six basic file access modes:

Mode | Meaning | Explanation | Action if file exists | Action if file does not exist |
| :---- | :---- | :---- | :---- | :---- |
| `r` | read | open a file for reading | read from start of file | failure |
| `w` | write | create a file for writing | destroy contents | create new file |
| `a` | append | append to end of file | write at end of file | create new file |
| `r+` | read extended | open a file for reading/writing | read from start of file | failure |
| `w+` | write extended | create a file for reading/writing | destroy contents | create new file |
| `a+` | append extended | open a file for reading/writing | write at end of file | create new file |

If the mode is `w` or `w+` then the file contents are destroyed when the file is opened, or a new file
is created if the file does not already exist.

If the mode is `a` or `a+` then all write operations always append to the end of the file.

The modes with a `+` are *extended* or *update* modes. They allow both read and write operations but some
a file positioning function must be called between input and output operations.

## Closing a file

Opening a file in Linux allocates system resources to manage the opened file. Closing a file releases
any system resources allocated to the opened file.
If you repeatedly open a file without closing it, you will eventually run out of the resources 
needed for maintaining an open file.
To close a file use the function `fclose`:

```c
int fclose( FILE *stream );
```

Calling `fclose` causes 
any unwritten buffered data to be delivered to the OS so that it can be written, and 
any unread buffered data is discarded. After calling `fclose` the stream is no longer associated with
its file, even if `fclose` fails for some reason. The `stream` pointer can no longer be safely used.

`fclose` can fail, in which case it returns the constant value `EOF`.
Under most conditions, the programmer cannot do anything about the failure so the return value is usually ignored.

## File position

An open file has a file position that points to where the next character will be read or written.
In Linux, the file position is equal to the number of bytes from the beginning of the file (the
first character in the file has position 0, the second character in the file has position 1, and so on).

Opening a file in `r`, `w`, `r+`, or `w+` mode sets the file position to zero. Reading or writing or
single character causes the file position to be incremented by one position.

Opening a file in `a` or `a+` mode causes the file position to be treated specially. All write operations
always append to the end of the file (possibly ignoring the current file position). Read operations
use the file position to determine which character to read.

The file position is a property of the stream and not of the file. The same file can have multiple streams
connected to it and each stream may have a different file position.

Functions that directly manipulate the file position are discussed later in this notebook.

## Unformatted I/O

Unformatted I/O reads or writes individual characters or strings using a stream.

### Reading a single character

Use `fgetc` to read a character from an input stream:

```c
int fgetc( FILE *stream );
```

`fgetc` reads the next character from the input stream, advances the file position, and returns the character as an 
`unsigned char` converted to an `int` (`int` is used instead of `char` for historical purposes).
The constant `EOF` is returned on failure.

If the read failure is caused by attempting to read past the end of the file, then `fgetc` sets the end-of-file
indicator for the stream. If the read failure occurs for some other reason, then `fgetc` sets the error
indicator for the stream.

The following program reads a file one character at a time printing each character to standard output:

In [None]:
// readfile1.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int status = EXIT_FAILURE;
    char pathname[] = "./src/io/a_file.txt";
    FILE *f = fopen(pathname, "r");
    if (!f) {
        perror("fopen() failed");
        exit(status);
    }
    
    char c;
    while ((c = fgetc(f)) != EOF) {
        if (c == '\n') {
            printf("<NEWLINE CHAR>");
            putchar(c);
        }
        else {
            putchar(c);
        }
    }
    if (ferror(f)) {
        fprintf(stderr, "I/O error");
    }
    else if (feof(f)) {
        status = EXIT_SUCCESS;
    }
    
    fclose(f);
    return status;
}

Note that if the file has multiple lines, then each line will typically end with a newline character `\n`
in Linux. If the file originates from some other operating system, then other line ending character
combinations are possible.

The `ferror` function tests if the error indicator has been set on the specified stream. It returns a non-zero
value (true) if the error indicator is set, and zero (false) otherwise.

The `feof` function tests if the end-of-file indicator has been set of the specified stream. It returns a non-zero
value (true) if the end-of-file indicator is set, and zero (false) otherwise.

The only way to clear the error and end-of-file indicators on a stream is to use the `clearerr` function.

### Reading lines of text

If you have a line-oriented file, then you can consider using `fgets` to read lines of text:

```c
char *fgets( char *str, int count, FILE *stream );
```

`fgets` reads at most `(count - 1)` characters from `stream` and stores them in `str` writing a null 
terminator to the string. The file position is advanced by the number of characters read.
Reading stops at a newline character or at the end of the file.
The returned string includes the newline character.
`str` is returned on success, and a null pointer is returned on failure.

If the read failure is caused by attempting to read past the end of the file, then `fgets` sets the end-of-file
indicator for the stream. If the read failure occurs for some other reason, then `fgets` sets the error
indicator for the stream.

The following program reads a file one line at a time printing each line to standard output:

In [None]:
// readfile2.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int status = EXIT_FAILURE;
    char pathname[] = "./src/io/a_file.txt";
    FILE *f = fopen(pathname, "r");
    if (!f) {
        perror("fopen() failed");
        exit(status);
    }
    
    char buf[100];
    while (fgets(buf, 100, f) != NULL) {
        // NOTE: buf will include the new line character \n at the end of each line.
        // In many applications, you will want to remove the \n character before
        // processing the string.
        //
        // Here, the newline character is useful because after printing the string
        // subsequent output will start on the next line.
        printf("%s", buf);
    }
    if (ferror(f)) {
        fprintf(stderr, "I/O error");
    }
    else if (feof(f)) {
        status = EXIT_SUCCESS;
    }
    
    fclose(f);
    return status;
}

### Writing a single character

Use `fputc` to write a character to an output stream:

```c
int fputc( int ch, FILE *stream );
```

`fputc` writes a character to an output stream and advances the file position.
It returns the written character on success, or `EOF` on failure. On failure, the function also
sets the error indicator on the stream.

The following program writes the lowercase English alphabet to a file:

In [None]:
// write_alphabet.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int status = EXIT_FAILURE;
    char pathname[] = "./src/io/alphabet.txt";
    FILE *f = fopen(pathname, "w");
    if (!f) {
        perror("fopen() failed");
        exit(status);
    }
    status = EXIT_SUCCESS;
    for (char c = 'a'; c <= 'z'; c++) {
        fputc(c, f);
        if (ferror(f)) {
            status = EXIT_FAILURE;
            fprintf(stderr, "I/O error");
            break;
        }
    }
    if (!ferror(f)) {
        // make sure line ends with a newline character
        fputc('\n', f);
    }
    
    fclose(f);
    return status;
}

### Writing strings

Use `fputs` to write a string to an output stream:

```c
int fputs( const char *str, FILE *stream );
```

`fputs` writes a null-terminated string to an output stream (the terminating null character is not written).
It returns a non-negative value on success, or `EOF` on failure. On failure, the function also sets the
error indicator on the stream.

The following program reads its own source code file one line at a time. It prepends a line number at the
beginning of each line and writes the prepended line to a new file:

In [None]:
// catn.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int status = EXIT_FAILURE;
    
    // input file
    char in_name[] = "./src/io/catn.c";
    FILE *f_in = fopen(in_name, "r");
    if (!f_in) {
        perror("fopen() failed");
        exit(status);
    }
    
    // output file
    char out_name[] = "./src/io/catn_with_line_numbers.txt";
    FILE *f_out = fopen(out_name, "w");
    if (!f_out) {
        perror("fopen() failed");
        exit(status);
    }
    
    // current line number (starting at 1)
    int line_num = 1;
    
    // input and output line buffers
    char str_in[100];
    char str_out[104];
    while (fgets(str_in, 100, f_in) != NULL) {
        sprintf(str_out, "%03d %s", line_num, str_in);
        fputs(str_out, f_out);
        if (ferror(f_out)) {
            break;
        }
        line_num++;
    }
    if (ferror(f_in) || ferror(f_out)) {
        fprintf(stderr, "I/O error");
    }
    else if (feof(f_in)) {
        status = EXIT_SUCCESS;
    }
    
    fclose(f_in);
    fclose(f_out);
    return status;
}

## Reading and writing structured data

`fscanf` is identical to `sscanf` except that it reads formatted data from a stream instead of a string.

`fprintf` is identical to `printf` except that it writes formatted data to a stream instead of to standard output.

## Working with the file position

There are several standard library functions that manipulate the file position. This notebook describes
only three of these functions.

### Getting the current file position

The `ftell` function returns the current file position for a stream:

```c
long ftell( FILE *stream );
```

In Linux, the returned value is the number of bytes from the beginning of the file.

Consider a file (`tell.txt`) having the following contents:

```
ABCDE
FGHIJ
KLMNO
PQRST
```

Note that each line is actually 6 characters long because of the newline character at the end of each line.

The following program reads and prints the contents of the file one character at a time, printing
the file position as each character is read:

In [None]:
// tell.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE *f = fopen("./src/io/tell.txt", "r");
    if (!f) {
        exit(EXIT_FAILURE);
    }
    char c;
    long pos = ftell(f);
    while ((c = fgetc(f)) != EOF) {
        if (c == '\n') {
            printf("\\n : position = %ld\n", pos);
        }
        else {
            printf("%c  : position = %ld\n", c, pos);
        }
        pos = ftell(f);
    }
    fclose(f);
    
    return 0;
}

### Resetting the file position to the beginning of the file

The `rewind` function resets the file position to the beginning of the file:

```c
void rewind( FILE *stream );
```

A small modification to the previous example prints the contents of the file `MAX` times:

In [None]:
// telln.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    FILE *f = fopen("./src/io/tell.txt", "r");
    if (!f) {
        exit(EXIT_FAILURE);
    }
    
    const size_t MAX = 2;
    size_t count = 0;
    
    char c;
    long pos = ftell(f);
    while ((c = fgetc(f)) != EOF && count < MAX) {
        if (c == '\n') {
            printf("\\n : position = %ld\n", pos);
        }
        else {
            printf("%c  : position = %ld\n", c, pos);
        }
        if (pos == 23) {
            printf("rewinding...\n");
            rewind(f);
            count++;
        }
        pos = ftell(f);
    }
    fclose(f);
    
    return 0;
}

### Moving to a specified file position

The `fseek` function sets the file position for a stream:

```c
int fseek( FILE *stream, long offset, int origin );
```

`stream` is a pointer to a currently open stream.

`offset` is the number of file positions to move relative to `origin`.

`origin` may have one of three possible values:

* `SEEK_CUR` is the current file position
* `SEEK_SET` is the beginning of the file
* `SEEK_END` is the end of the file

The return value is zero on success, or `-1` on failure.

The following program (which should be run from the command line) prints the character in the
file `tell.txt` at a file position specified by the user:

```c
// seek.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void discard() {
    int c;
    while ((c = getchar()) != '\n' && c != EOF) {
    }
}

int main(int argc, char *argv[]) {
    FILE *f = fopen("tell.txt", "r");  // should test if fopen fails
    while (1) {
        char buf[3] = { 0 };
        puts("Enter a position to seek to: ");
        fgets(buf, 3, stdin);
        // discard the rest of stdin in case the user typed a long string
        if (!strchr(buf, '\n')) {
            discard();
        }
        int pos = atoi(buf);
        if (pos < 0) {
            break;
        }
        if (pos >= 0 && pos < 24) {
            fseek(f, pos, SEEK_SET);
            char c = fgetc(f);
            printf("found : %c at position : %d\n", c, pos);
        }
    }
    fclose(f);
    return 0;
}
```

The following program replaces all occurrences of a user-specified character with a replacement character.
It is an example of a program that opens a file for both reading and writing:

```c
// replace_char.c

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    if (argc != 4) {
        fprintf(stderr, "Usage: replace_char file targetchar replacementchar\n");
        exit(1);
    }
    FILE *f = fopen(argv[1], "r+");
    char target = argv[2][0];             // 3 argument, first character
    char replace = argv[3][0];            // 4 argument, first character
    char c;
    while ((c = fgetc(f)) != EOF) {
        if (c == target) {
            fseek(f, -1, SEEK_CUR);
            fputc(replace, f);
            fflush(f);                    // see explanation below
        }
    }
    fclose(f);
}
```

When opening a file in update mode (for reading and writing) there are some rules that the programmer 
must keep in mind:

* a file position operation must be called between a read operation and a subsequent write operation,
* a file position operation or `fflush` must be called between a write operation and a subsequent read operation

The second rule is why `fflush` is called in the example above.