# 02: File i/o in C

Writing text-formatted data to files is handled in C using a special datatype which is defined in <stdio.h>, called FILE.

There are some handy functions which manipulate objects of type FILE, I will give examples below.



In [27]:
#include <stdio.h>     //input and output to file or screen
#include <stdlib.h>    //standard.... library...

int main( void ){
    
    //we never need to declare an actual FILE, we only ever need a pointer to one.
    FILE *f;
    int   i, j, count;
    char  some_chars[1024], more_chars[1024];
    char *pt_chars_1, *pt_chars_2;
    
    //open a file in "w" (write) mode.
    f = fopen("test_output.txt", "w");
    //Available modes are:
    //
    // "r" (read), "w" (write), "a" (append), 
    // "r+" (read if existing, or write such that it will be zeroed on the first write)
    // "w+" (write or write-then-read, if the file already exists it will be zeroed on opening)
    // "a+" (read (from beginning) or append)
    
    // I don't remember ever using "r+" mode for anything, it seems silly.
    // "r" or "w" are enough for almost every use.
    
    /////////////////////////////////
    //Did it work?  Check, or we end up reading from a NULL pointer later on.
    if( f == NULL ){
        fprintf(stderr, "Error opening file\n");
        return EXIT_FAILURE;
    }
    for( i = 0; i < 10; i++ ){
        fprintf(f, "Hello world %i\n", i);  //fprintf() works pretty much like printf.
    }
    for( i = 0; i < 10; i++ ){
        fprintf(f, "Just some stuff. ");  //fprintf() works pretty much like printf.
    }
    fprintf(f, "\n");
    fclose( f ); //Frees the FILE object; also makes sure that everything is written to disk.
    /////////////////////////////////
    
    
    /////////////////////////////////
    //fscanf() is the basic way to grab info from a file.
    f = fopen("test_output.txt", "r");
    if( f == NULL ){
        fprintf(stderr, "Error re-opening file\n");
        return EXIT_FAILURE;
    }
    for( i = 0; i < 10; i++ ){
        fscanf(f, "%s %s %i\n", some_chars, more_chars, &j); //fscanf needs the *addresses* of the variables
        printf("read from file, a string: _%s_, a string: _%s_, and an integer: %i\n", 
                some_chars, more_chars, j);
    }
    //move the read pointer back to the beginning of the file
    rewind( f );
    /////////////////////////////////
    
    
    /*!!!
    fscanf() is actually a terrible way to process input, and has cost billions of dollars in 
    broken computer systems and world-wide security breaches over the years.
    
    With care, the function getline() can be used securely.
    
    The key for secure i/o in C is always make sure you have enough space to accept what you are
    being fed.
    
    */
    
    //I don't like to declare variables in the middle of code, but these 
    //are relevant for getline():
    char    *line_buf = NULL;  //this has to be intialised NULL: won't be allocated unless it is NULL
    size_t   line_buf_size = 0; //this variable is written to if zero, otherwise read.
    
    ssize_t  line_size; //"Signed Size_Type" : size of a buffer, but can be -1 if failed.
    
    /////////////////////////////////
    int      line_count = 0;
    for( i = 0; i < 11; i++ ){
        //read a line, returning the number of chars in the line, not including the terminating NULL.
        //a buffer big enough to hold the line is allocated here: we pass a pointer-to-a-pointer as &line_buf
        //and we also pass the address of another size_t variable, letting us know the size of the buffer.
        line_size = getline(&line_buf, &line_buf_size, f);
        if( line_size <= 0 ){
            printf("Reached end of file, or otherwise failed to read. Returned: %i\n", (int)line_size);
            break; //exit the loop
        }
        
        //I can't be bothered looking up the proper output formatting for size_t,
        //it nearly always casts to int quite safely.
        printf("Got a line of size: %i\n", (int)line_size); 
        printf("Into a buffer of size: %i\n", (int)line_buf_size);
        printf("Looks like: _%s_\n", line_buf);
        
        //now that we know the size of the line, we can allocate the right amount of memory to
        //scan some strings out of it.  Need (line_size+1) in order to store the terminating NULL.
        pt_chars_1 = (char *)malloc((line_size+1)*sizeof(char));
        pt_chars_2 = (char *)malloc((line_size+1)*sizeof(char));
        count = sscanf(line_buf, "%s %s %i\n", pt_chars_1, pt_chars_2, &j);
        if( count != 3 ){
            printf("\n**\n");
            printf("WARNING: sscanf() did not unpack three variables, expected format did not match actual input line.\n");
            printf("Input line was:\n%s", line_buf);
            printf("**\n");
        }else{
            printf("sscanf() successfully unpacked %i variables\n", count);
            printf("Read from file to a buffer, then unpacked: a string: _%s_, a string: _%s_, and an integer: %i\n\n\n", 
                pt_chars_1, pt_chars_2, j);
        }
        free(pt_chars_1);
        free(pt_chars_2); //free again.
        
        
    }
    
    //We have the choice to free and clean up
    //after every call to getline, or we can let it reuse the same line buffer
    //that it allocated on the first pass, reallocating with a bigger space if needed.
    free(line_buf);
    line_buf = NULL;
    line_buf_size = 0;
        
    fclose( f ); //close the file.
    /////////////////////////////////
    
    return( EXIT_SUCCESS );
    
}

main()

read from file, a string: _Hello_, a string: _world_, and an integer: 0
read from file, a string: _Hello_, a string: _world_, and an integer: 1
read from file, a string: _Hello_, a string: _world_, and an integer: 2
read from file, a string: _Hello_, a string: _world_, and an integer: 3
read from file, a string: _Hello_, a string: _world_, and an integer: 4
read from file, a string: _Hello_, a string: _world_, and an integer: 5
read from file, a string: _Hello_, a string: _world_, and an integer: 6
read from file, a string: _Hello_, a string: _world_, and an integer: 7
read from file, a string: _Hello_, a string: _world_, and an integer: 8
read from file, a string: _Hello_, a string: _world_, and an integer: 9
Got a line of size: 14
Into a buffer of size: 120
Looks like: _Hello world 0
_
sscanf() successfully unpacked 3 variables
Read from file to a buffer, then unpacked: a string: _Hello_, a string: _world_, and an integer: 0


Got a line of size: 14
Into a buffer of size: 120
Looks l

0

OK that was a basic demonstration of how to read/write from a file in C without giving away your real name and home address.

It turns out that really it was a lesson in memory management, that is how it is.

The moment of enlightenment comes when you learn to program by thinking first about how to move and structure your data, and only second to write the actual code.  

That is when C programming starts to feel natural and efficient.


## Assignment, week 02 : Sort a File

The assignment this week is to *safely* read a multiline file (making no assumptions what is in it), define a numerical key for each line, and write the lines out again in sorted order based on that number, to a new file.

To convert a line into a number, we can just cast characters to integers (unsigned long integers), something like:


In [17]:
#include <limits.h>

unsigned long assignKey( char* some_line ){
    /*
       Function to convert the first 8 characters of a string
       into a numerical key to sort it by.
       
       If there are fewer then eight, it uses those.
    */
    char          *p;
    unsigned long  line_val, posn_val;
    int            i, use_chars;

    //scan along to the end of the string or for 8 chars, whichever comes first.
    p        = &some_line[0];
    for( i = 0; i < 8 && *p != '\0'; i++ ){ 
        p++;  //p++ "advance the pointer p".
    } 
    use_chars = i;
    p--; //now pointing to either the end of the line, or the eighth character.
    
    posn_val = 1;  //hundreds, tens, units etc but in base UCHAR_MAX (usually 256).
    line_val = 0;
    for( i = 8; i >= 0; i-- ){ //scan back again
        if( i < use_chars ){
            line_val += posn_val * ((unsigned char)*p--); //*p-- : "read value at p, then move p to the left".
        }
        posn_val *= UCHAR_MAX; //max value taken from limits.h
    } 
    
    return( line_val );
}

In [19]:
//THIS IS JUST A TEST OF THE ABOVE FUNCTION
char          pretend_input_line[1024], *p;
unsigned long line_val;

//just put some data into the line using the function sprintf()
sprintf(pretend_input_line, "Just some chars\n");

//test the function
line_val = assignKey( pretend_input_line );
printf("'%s' has a key: %lu\n\n", pretend_input_line, line_val);

//test the function with a shorter string
pretend_input_line[6] = '\0';
line_val = assignKey( pretend_input_line );
printf("'%s' has a key: %lu\n\n", pretend_input_line, line_val);

//test the function with an empty string
pretend_input_line[0] = '\0';
line_val = assignKey( pretend_input_line );
printf("'%s' has a key: %lu\n\n", pretend_input_line, line_val);



'Just some chars
' has a key: 3048712446812019843

'Just s' has a key: 3048712446804774273

'' has a key: 0



17

Don't worry about an efficient sorting algorithm, that is a different exercise.

For this case you can use something crude, like the below:


### Crude sorting algorithm

This is not how to sort things, it has performance $\mathcal{O}(  N^2 )$.

 1) Allocate an array of int flags, one for each line, and set then to 0.
 
 2) For each line, does it have the smallest key of all the lines with a zero flag?
 
 3) If yes, write it out and set the corresponding flag to be 1.
 
 4) Back to (2) until there is nothing left with a non-zero flag.

This sort is inefficient, but easy to implement.


When you have finished, save your notebook as an ipynb file, then upload it to moodle.   [ File->Download as->Notebook ] is one series of menu options to achieve this.