# Bits and bytes and binary

In this notebook we will be looking at how computers represent information and how you can access that information in C.

But first:

In the space below write a C program that compiles without warnings or errors with the -ansi -Wall -pedantic flags turned on.  Include your name, login-id, and student ID number.

In [None]:
%%file l02_works.c

/* put code here */


/* end of program */


In [None]:
%%bash
gcc -ansi -Wall -pedantic l02_works.c -o works

The smallest unit of information in a ditigal computer is a bit.  A bit is represented by a voltage level in the circuits of a computer.  Each bit has two possible values:  0 and 1.  The voltage levels in the computer will be low to represent 0 and high to represent 1 (but they won't necessarily be 0 and 1 volts).

We can collect multiple bits together to respresent more information (more than two values).  A byte is 8 bits and can have a combination of $2^8=256$ possible values.

In the C programming language bytes are called <tt>char</tt>.  A <tt>char</tt> can be used to represent at least 3 different things:
<ul>
    <li> an 8-bit ascii character, </li>
    <li> a signed integer in the range -128 to 127, or </li>
    <li> an unsigned integer in the range 0 to 255. </li>
</ul>

In [None]:
%%file l02_ex00.c
#include <stdio.h>

int main()
{
    char i;
    unsigned char j;
    
    printf( "sizeof(char)=%ld sizeof(unsigned char)=%ld\n\n", sizeof(char), sizeof(unsigned char) );
    
    
    printf( "Printing out chars:\n" );
    for (i=1;i!=0;i++)  /* loop from one, increment by one, stop when you hit zero */
        printf( "%d\n", i );
    printf( "%d\n", i );  /* print the value of i that made it stop */
    
    printf( "Printing out unsigned chars:\n" );
    for (j=1;j!=0;j++)  /* loop from one, increment by one, stop when you hit zero */
        printf( "%d\n", j );
    printf( "%d\n", j );  /* print the value of i that made it stop */
    
    return 0;
}

In [None]:
%%bash
gcc -ansi -Wall -pedantic l02_ex00.c -o l02_ex00

In [None]:
%%bash
./l02_ex00

Things to notice:
<ul>
    <li> Both <tt> char </tt> and <tt> unsigned char </tt> are one byte in length. </li>
    <li> If you keep adding 1, they eventually wrap around. </li>
    <li> No warning or error is generated when they wrap around! (Invisible bug!)</li>
</ul>

Let's look at bits.

There are a number of bit-wise operators (see K&R Section 2.9).  Let's look at 3 in particular.
<ol>
    <li> <tt> &amp; </tt> represents bit-wise AND, </li>
    <li> <tt> &lt;&lt; </tt> represents left shift, and </li>
    <li> <tt> &gt;&gt; </tt> represents right shift. </li>
</ol>


Let's write a program that shows the bits in an unsigned character.

In [None]:
%%file l02_ex01.c

#include <stdio.h>
#include <stdlib.h>

int main( int argc, char **argv )
{
    int number, bitno, bit;
    unsigned char uc;
    
    if (argc!=2)
    {
        fprintf( stderr, "Usage: %s [n]\n", argv[0] );
        exit(-1);
    }
    number = atoi( argv[1] );
    
    uc = number;
    
    printf( "%d is ", uc );
    for (bitno=sizeof(unsigned char)*8-1;bitno>=0;bitno--)
    {
        bit = (uc >> bitno)&1;
        printf( "%d", bit );
    }
    printf( "\n" );
    return 0;
}

In [None]:
%%bash
gcc -ansi -Wall -pedantic l02_ex01.c -o l02_ex01

In [None]:
%%bash
./l02_ex01

In [None]:
%%bash
./l02_ex01 0
./l02_ex01 1
./l02_ex01 2
./l02_ex01 3
./l02_ex01 4
./l02_ex01 5
./l02_ex01 6
./l02_ex01 7

Copy the l02_ex01.c below and create a new program to display the bits of signed <tt>char</tt>s.

In [None]:
%%file l02_ex02.c

/* add code here */

In [None]:
%%bash
gcc -ansi -Wall -pedantic l02_ex02.c -o l02_ex02

In [None]:
%%bash
./l02_ex02 -7
./l02_ex02 -6
./l02_ex02 -5
./l02_ex02 -4
./l02_ex02 -3
./l02_ex02 -2
./l02_ex02 -1
./l02_ex02 0
./l02_ex02 1
./l02_ex02 2
./l02_ex02 3
./l02_ex02 4
./l02_ex02 5
./l02_ex02 6
./l02_ex02 7

Notice how the negative numbers are represented.  This is called two's complement (you will see this in CIS*1910 in the second week of February).

Repeat the above experiment with:
<ul>
    <li> <tt>short</tt>, </li>
    <li> <tt>int</tt>, and </li>
    <li> <tt>long</tt>. </li>
</ul>

Floating point numbers are a whole other kettle of fish that we won't get into.

In [None]:
%%file l02_ex03.c

/* add code here */

In [None]:
%%bash
gcc -ansi -Wall -pedantic l02_ex03.c -o l02_ex03

In [None]:
%%bash
./l02_ex03 -7
./l02_ex03 -6
./l02_ex03 -5
./l02_ex03 -4
./l02_ex03 -3
./l02_ex03 -2
./l02_ex03 -1
./l02_ex03 0
./l02_ex03 1
./l02_ex03 2
./l02_ex03 3
./l02_ex03 4
./l02_ex03 5
./l02_ex03 6
./l02_ex03 7

But let's look at characters.  Characters in C are, by default, represented in the American Standard Code for Information Interchange (aka ISO 646).  This was an old character encoding system that used only 7 bits and therefor couldn't include many international characters (nowadays we have Unicode which adresses these limitations).

There is a handy <tt>man</tt> page for ASCII.

In [None]:
%%bash
man ascii

Some useful things to note:
<ul>
    <li> characters below 32 are generally "unprintable", so is the last one (127)</li>
    <li> <tt> SPACE </tt> is 32, </li>
    <li> <tt> 0 </tt> is 48 and the rest of the digits follow, </li>
    <li> <tt> A </tt> is 65 and the rest of the uppercase alphabet follows, and </li>
    <li> <tt> a </tt> is 97 and the rest of the lowercase alphabet follows. </li>
</ul>
We can do math with characters.

Write a program that prints the characters from ' ' (a.k.a. 32) to '~' (a.k.a. 126) using a for loop.

In [None]:
%%file l02_ex04.c

/* add code here */

In [None]:
%%bash
gcc -ansi -Wall -pedantic l02_ex04.c -o l02_ex04

In [None]:
%%bash
./l02_ex04

Write a program, called <tt>l02_yeller</tt> that takes a single command line argument that is a single word consisting of lower case letters and converts it to upper case letters, printing it out.