<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

# Reading and writing binary data

<hr style="border-width:4px; border-color:coral; border-style:solid"/>

This notebook shows you how to write binary data in C and read it in in Python.  The advantages of the binary output is that we store the full precision of the data in the smallest possible file size.  It is very easy to read binary data, and considerably faster than reading data stored as ascii (human-readable) format.

In [1]:
%matplotlib notebook
%pylab

Using matplotlib backend: nbAgg
Populating the interactive namespace from numpy and matplotlib


<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

## Create a binary data file in C

<hr style="border-width:4px; border-color:coral; border-style:solid"/>

As an example, we first create a data file containing the error in computing an integral

\begin{equation*}
I(a,b) = \int_a^b \frac{1}{x} \; dx
\end{equation*}

For this example, we will compute $I(1,2) = \log(2)$.  We will use the "left endpoint" rule you may have learned in Calculus to approximate the integral. 

In [2]:
%%file integral.c

#include <math.h>

double integrand(double x)
{
    // Solution : I(1,2) = log(2.0)
    return 1/x;
}

double integral(double a, double b, int N)
{
    double h = (b-a)/N;

    double I = 0;
    for(int i = 0; i < N; i++)
    {
        double x = a + h*i;
        I += integrand(x)*h;
    }
    return I;
}

Overwriting integral.c


In [3]:
%%file binary_scalar.c

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

double integral(double a, double b, int N);

int main(int argc, char** argv)
{
    int N = atoi(argv[1]);        

    double a = 1;
    double b = 2;
    
    double I = integral(a,b,N);

    double error = fabs(I - log(2.0));
        
    char fname[16];
    sprintf(fname,"errors.out");

    FILE *fout = fopen(fname,"w");        
    fwrite(&N,1,sizeof(int),fout);
    fwrite(&error,1,sizeof(double),fout);
    fclose(fout);

    return 0;
}

Overwriting binary_scalar.c


In [4]:
%%bash

rm -rf binary_scalar binary_scalar.o integral.o

gcc -o binary_scalar binary_scalar.c integral.c

binary_scalar 128

This code stores the value `N` as `int` and a length-3 vector `err` is an array of three doubles that stores the 1-norm, the 2-norm and Inf-norm of the error in the solution.  The file is a binary file that should be exactly size

                     sizeof(double) + sizeof(int) = 12 bytes

In [5]:
ls -l errors.out

-rw-r--r-- 1 calhoun 12 Mar  9 11:17 errors.out


The file is not "human readable", though.

In [6]:
%cat errors.out

�     ��`?

<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

## Read a binary data in Python

<hr style="border-width:4px; border-color:coral; border-style:solid"/>

To read this data in Python, we create a Numpy data type, open the file as binary, 
read from the file, and create a data type which can easily access the binary data.

First, we create a data type to store our error data. 

In [7]:
dt_error = dtype([('N','int32'), ('err','d')])  

dt_error

dtype([('N', '<i4'), ('err', '<f8')])

From this, we see that our data type `dt_error` is one `int` of 4 bytes and a length three array of type double (8 bytes). 

The following code reads our binary file and prints the results as an array of length 1 containing a tuple with entries to type `dt_error`. 

In [8]:
output_file = 'errors.out'

fout = open(output_file,"rb")
d = fromfile(fout,dtype=dt_error, count=1)
fout.close()

print(d)

[(128, 0.00195694)]


### (1) Access data in the datatype tuple directly

We can access the data in the dtype-tuple ('N','err') as

In [9]:
print(d[0])

(128, 0.00195694)


We need the `[0]` indexing so that we access the first entry in the array of datatypes return.  

We can also extract individual values `(N,err)` by accessing individual entries in the tuple.

In [10]:
N = d[0][0]
err = d[0][1]
print(N,err)

128 0.0019569396681622386


### (2) Assign values directly in one statement

We can assign values to `N` and `err` in one statement:

In [11]:
N,err = d[0]
print(N,err)

128 0.0019569396681622386


### (3) Read directly into a datatype from a file

Or, we can do this directly using from the `fromfile` call.  

*NOTE* : The trailing `[0]` is needed to extract the tuple from the array. 

In [12]:
fout = open(output_file,"rb")
N,err = fromfile(fout,dtype=dt_error, count=1)[0]
fout.close()

print('{:10d} {:12.4e}'.format(N,err))

       128   1.9569e-03


<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

## Reading a table of error data

<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

The above Python code can also be used to read a table of values from a binary file. 

In [13]:
%%file binary_table.c

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double integral(double a, double b, int N);

int main(int argc, char* argv[])
{
    int rows = 5;
    int N0 = atoi(argv[1]);
    
    double a = 1;
    double b = 2;
    
    /* Sample binary output for the trapezoidal data on 1 processor*/
    FILE *fout = fopen("errors.out","w"); 
    int N = N0;
    for(int i = 0; i < rows; i++)
    {
        double I = integral(a,b,N);
        
        double err = fabs(I - log(2.0));
        fwrite(&N,1,sizeof(int),fout);
        fwrite(&err,1,sizeof(double),fout);
        
        N *=2;
    }
    fclose(fout);
    
    return 0;
}

Overwriting binary_table.c


In [14]:
%%bash

rm -rf binary_table binary_table.o integral.o

gcc -o binary_table binary_table.c integral.c -lm

binary_table 32

To read the table of values, we have to return 

In [15]:
output_file = 'errors.out'

fout = open(output_file,"rb")
error_table = fromfile(fout,dtype=dt_error)
fout.close()

print(error_table)

[( 32, 0.00787353) ( 64, 0.00392151) (128, 0.00195694) (256, 0.00097752)
 (512, 0.00048852)]


We can access individual tuple entries as

In [16]:
print(error_table[3])

(256, 0.00097752)


<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

## Converting table to a Pandas DataFrame object

<hr style="border-width:4px; border-color:coral; border-style:solid"/></br>

Storing the data as a Pandas DataFrame makes it easy to extract our data for plotting. 

In [17]:
import pandas

fout = open(output_file,"rb")
data = fromfile(fout,dtype=dt_error)
fout.close()

df_error = pandas.DataFrame(data)
fstr = {'N' : '{:d}'.format, 'err' : '{:.4e}'.format}
df_error.style.format(fstr)

Unnamed: 0,N,err
0,32,0.0078735
1,64,0.0039215
2,128,0.0019569
3,256,0.00097752
4,512,0.00048852


<hr style="border-style:solid; border-width:4px; border-color:coral"/>

## Plotting error results

<hr style="border-style:solid; border-width:4px; border-color:coral"/>

We can extract the N values and error values from the DataFrame, and plot the error results.  The slope of the best-fit line should be close to -2.   This is a good test to see that you are doing your discretization and communication in your MPI code correctly.

In [18]:
figure(1)
clf()

loglog(df_error['N'],df_error['err'],'b.-',color='b',markersize=20,label="Computed error")

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x11fd92cd0>]

In [None]:
# Add slope to get best fit line
figure(1)

c = polyfit(log(df_error['N']),log(df_error['err']),1)
loglog(df_error['N'],exp(polyval(c,log(df_error['N']))),'r*', markersize=10,\
         label='Best-fit line (slope={:.2f})'.format(c[0]),linewidth=1)

In [19]:
# Add title, xlabel, ylabel, xticks and a legend

figure(1)

def fix_xticks(Nvec):
    p0 = log2(Nvec[0])
    p1 = log2(Nvec[-1])
    xlim([2**(p0-0.5), 2**(p1+0.5)])
    
    # Make nice tick marks
    pstr = (['{:d}'.format(int(N)) for N in Nvec])
    xticks(Nvec,pstr)

fix_xticks(df_error['N'].values)  # Need numpy array, not a Pandas 'Series'
xlabel("N",fontsize=16)
ylabel("Error",fontsize=16)
title("Error in integral method",fontsize=18)
legend()

<matplotlib.legend.Legend at 0x11fda4b50>