## Integer Representation

_burton rosenberg, 24 june 2023_

Among the basic operations built into a computer is the ability to add and subtract integers. 

Recall the simplified hardware model,

<div style="float:right;margin:2em;width:450px;border:1px solid green;padding:1em;">
<a title="MCS-80/85 Family User's Manual, Intel, October 1979" href="https://archive.org/details/Mcs80_85FamilyUsersManual/">
<img src="https://www.cs.miami.edu/home/burt/learning/csc421.241/images/databuses_8085.png">
<br>MCS-80/85 Family User's Manual, Intel, October 1979.
</a>
</div>

The address is an postive integer and the data can one or multiple bytes. If multiple bytes, for example the 4 bytes for a 32-bit integer, we have explained that the address will be a multiple of 4, and the bytes will the 4 bytes at that address, one, two and three plus that address. Given an integer varible in C, `int i`, then address-of operator returns a slightly abstract item called a point-to-int, or int star, or `int *`, that can be used to find the integer `i` in memory.

We coerced the pointer to a 64-bit integer, which can work, and then we have something the implements this simple model of RAM, of a byte array indexed by a positive integer.


To perform an add, typically the two addends are brought from RAM memory to a CPU memory called a _register_. The number of registers on a CPU differ substantially by the processor family. The add is implemented by gating the values of two registers to an integer addition unit located in the Arithmetic Logical Unit (ALU). The result is captured in a third register. Then the register contents are written to RAM in the location of the output variable.

We can later create an addition circuit out of logic gates, but first we will familiarize ourselves more with integers in C.

### Integer specification

When a number is written in a program, it is better thought of as a picture of a number than a number itself. Such things are these pictures of numbers are called _literals_, and the language specifies their syntax. For integers there are four types of literals which the compile turns in to an integer value and places into the compiled program,

- Decimal representation. Which is the number as generally written
- Octal representation. This is the integer written in base 8. When the integer starts with a leading 0, then the number is in octal representation.
- Hexidecimal representation. This it the integer written in base 16. When a number begins with `0x` then it is in hexidecimal representation.
- Binary representation. This it the integer written in base 2. When a number begins with `0b` then it is in hexidecimal representation.

#### Hexidecimal representation

Number written in hexidecimal are very useful and you should get familiar with the integer representation. 

Any integer can be represented in hexidecimal. It does not change the number's value or how it is stored in the RAM or registers. It is a way of making a picture of the number. Hexidecimal is a very good picture since we can visualize the exact bits in the binary representation of the number. 

Sometimes information is encoded in the individual bits of a number, and to think about this we need to think about the number in hexidecimal. We will use this representation shortly to think about negative integers.

Hexidecimal expresses the number $n$ in hexdigits $h_0, h_1, \dots, h_k$ such that,

$$
n = \sum_i h_i 16^i
$$

The hexits $h_i$ are from the set 

$$
\{\,0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f\,\}
$$

The program here will count to $n$ and display the number in both decimal and hexidecimal.


In [1]:
def various_bases(n):
    print(f'decimal\tbinary\thexidecimal')
    for i in range(n):
        print(f'{i}\t{i:#b}\t{i:#x}')
        
various_bases(24)

decimal	binary	hexidecimal
0	0b0	0x0
1	0b1	0x1
2	0b10	0x2
3	0b11	0x3
4	0b100	0x4
5	0b101	0x5
6	0b110	0x6
7	0b111	0x7
8	0b1000	0x8
9	0b1001	0x9
10	0b1010	0xa
11	0b1011	0xb
12	0b1100	0xc
13	0b1101	0xd
14	0b1110	0xe
15	0b1111	0xf
16	0b10000	0x10
17	0b10001	0x11
18	0b10010	0x12
19	0b10011	0x13
20	0b10100	0x14
21	0b10101	0x15
22	0b10110	0x16
23	0b10111	0x17



###  <a name=intrepr>Representations of integers</a>


The bit patterns can also be associated with positive integers by the formula,

$$
\mathcal{N}(b_l, b_{l-1}, \ldots, b_0) = \sum_i 2^i b_i
$$

That is, write $n$ in binary, and make a sequence out of the bits in the representation.



In [2]:
%%file string-to-int.c

#include<stdio.h>
#include<string.h>

int main(int argc, char * argv[]){
    int i ;
    int sum = 0 ;
    int two_to_the_i = 1 ;
    char * s = argv[1] ; 
    printf("%s\t", s) ;
    
    for (i=strlen(s);i>0;i--){
        if (s[i-1]=='1') {
            sum = sum + two_to_the_i ;
        }
        two_to_the_i = 2 * two_to_the_i ;
    }
    
    printf("%d\n", sum) ;
    return 0 ;
}

Writing string-to-int.c


In [3]:
#
# a python program to ennumerate all the bit sequence on i bits
# it uses recursion to create a list for i-1 bits, then adds one more 
# bit.
#

def ennumerate_zero_one_patterns(i):
    
    def ennumerate_zero_one_patterns_aux(i):
        if i==1:
            return ['0','1']
        l = ennumerate_zero_one_patterns_aux(i-1)
        r = l[:]
        for i in range(len(l)):
            r[i] = '1'+l[i]
            l[i] = '0'+l[i]
        return l+r
    
    assert i>0, 'input must be greater than one'
    return ennumerate_zero_one_patterns_aux(i)
    

In [None]:
!cc -o string-to-int string-to-int.c
int_representations = ennumerate_zero_one_patterns(3)
for a_representation in int_representations:
    !./string-to-int {a_representation}
!rm string-to-int string-to-int.c

000	0
001	1
010	2
011	3
100	4
101	5
110	6


In [None]:
%%file sizeof-wow.c

#include<stdio.h>

int main(int argc, char * argv[]) {
    printf("type:\tbytes\n") ;
    printf("char:\t%lu\n", sizeof(char)) ;
    printf("short:\t%lu\n", sizeof(short int)) ;
    printf("int:\t%lu\n",  sizeof(int)) ;
    printf("long:\t%lu\n",  sizeof(long int)) ;
    return 0 ;
}

In [None]:
%%bash
cc -o sizeof-wow sizeof-wow.c
./sizeof-wow
rm sizeof-wow sizeof-wow.c

### The int and long int datatypes



We have shown that the computer can represent integers in binary, and have discussed so far only bytes. Since bytes have only 256 bit patterns, they can only store a small range of integers. So far we have shown how it can store the integers 0 through 255. There are two deficiencies,

- We must be able to store much larger intergers
- We must be able to represent both positive and negative integers.

C Language has two data types for integers, _signed_ and _unsigned_. The type _unsigned char_ is one byte and the various bit patterns are used to represent the integers 0 through 255, using the obvious binary representation. 

We set aside for now the representation of negative numbers, and address that we would like a much larger range of positive numbers represented.

To store larger numbers the computer will use more bytes, and will collect them so that they have consecutive adresses in the RAM. This way, the location of the integer remains a single address. The number of bytes is known because the reference has a type that includes the number of bytes. 

<div style="float:right;margin:2em;">
<img width="512" src="https://www.cs.miami.edu/home/burt/learning/csc421.241/images/TCPL-1ed-bytesize.png"></a>
</div>

It is a fact that C Language did not lay down the law about the number of bytes for each integer datatype, except that a char is one byte, and "larger" data types should have more bytes. However, 32 bits is the standard integer, with type names `int` and `unsigned int`. The image is from TCPL first edition, where they give the number of bits in the various integer and byte types of computers of that time.

There were then two variants of `int`, the `short int` and the `long int`. The actual number of bytes is not defined in the C Language, except that a short int cannot be longer than an int, and a long int cannot be shorter than an int. Let's say for normality that a short is 16 bits and a long is 64 bits. Beware though, this will depend on the computer and the compiler.

The builtin operator `sizeof` gives the number of bytes of the object mentioned as its argument. The argument can be a data type or a variable. Although `sizeof` looks like a function call, it is not. If it were a function call, we would have to wait until the prgram ran before the value of `sizeof` is known. It is already known at compile time.


In [None]:
%%file sizeof-wow.c

#include<stdio.h>

int main(int argc, char * argv[]) {
    printf("type:\tbytes\n") ;
    printf("char:\t%lu\n", sizeof(char)) ;
    printf("short:\t%lu\n", sizeof(short int)) ;
    printf("int:\t%lu\n",  sizeof(int)) ;
    printf("long:\t%lu\n",  sizeof(long int)) ;
    return 0 ;
}

In [None]:
%%bash
cc -o sizeof-wow sizeof-wow.c
./sizeof-wow
rm sizeof-wow sizeof-wow.c

## Negative Integers

From the representation of numbers as some zero-one pattern in mulitple bytes, we now have a direct understanding of integers and their addition by the ALU. Now we look at negative numbers.

The essence of a negative number, for this discussion, is its arithmetic properties. Given an $x$, the negative $y$ has the defining propert of $x+y=0$. For a literal picture of $x$, the literal picture of $y$ is created by adding the minus sign in the front.

This is called __sign-magnitude__ notation of a negative number. We do not really do any calculation, we just post a symbol at the front of the number for use later. 

We will use the __modular arithmetic__ nature of computer integers to calculate negative numbers, and hence do subtraction, while reusing the addition unit. We will subtract using addition.

Note the following arithmetic sequence,

$$
9, 18, 27, 36, 45, 54, 63, 72, 81, 90
$$

Just looking at the one's digit, we have the falling sequence from 9 to 0. If we ignore the ten's digit, we have achieved subtraction by one by adding by 9. Looked at this modulo ten, this makes perfect sense, as,

$$
9 = 10 - 1 = -1 \pmod{10}
$$

We will do the name with our integers and the ALU addition unit noting that for a 32-bit integer, the addition is modulo $2^{32}$,

$$
-1 = 2^{32}-1  \pmod{2^{32}}
$$

Using hexidecimal notation we can proof this easily. In hex, $2^{32}=$ 0x100000000. You have to count those zeros carefully. There are 8 of them, each represent 4 bits, for 32 bits of zeros and a 1 in front. We can do this subraction mentally, $2^{32}-1=$ 0xffffffff. Think of adding a one to that string if f's, and like a domino reaction they all carry a one leaving a zero until in the 33-ird place the carry becomes a one.

We proof this out with an experiment.

In [None]:
%%file count-down.c

#include<stdio.h>
#include<stdlib.h>
#include<assert.h>

int main(int argc, char * argv[]) {
    int minus_one = 0xffffffff ;
    int i = 5 ;
    
    assert(sizeof(int)==4) ; // else the minus_one is not correct
    
    while (i!=0) {
        printf("%d ", i) ;
        i = i + minus_one ;
    }
    printf("%d\n", i) ;
    return 0 ;
}

In [None]:
%%bash
S=count-down
cc -o $S $S.c
./$S 27
rm $S $S.c

### Two's Complement

Given the success of that experiment, we wish to calculate explicitely the negative number $-x$.

We had noted that the ALU could do addition. We tell you that it can also carry out other operations, such as bitwise logical operations. We shall use the bit complemenation unit in the ALU along with the addition unit in the ALU to calculate negative numbers and to subtract.

##### C Language bit operations

C Language has bit operations, such as `~` and `|`. These operators treat the byte or bytes as a collection of bits and applies the operation to the bits, place by place. The `~` is bit complemetation, the bits are flipped, zero to one or one two zero. The `|` is the or operator taking two integers (etc) and giving an integer value defined as, the result is 1 in a bit position when the inputs are 1 in that bit position, either one or both of the inputs. There are also a and and exclusive-or binary operator that you will find in the exercises.

What we are interested in right now is that `x | (~x)` will equal `0xffffffff`.

Note that in this case `x + (~x)` will equal `x | (~x)`.

Here is an example for a single hexidecimal hexit.


In [None]:
%%file flip-out.c

#include<stdio.h>

int main(int argc, char * argv[]) {
    int i, j, k ;
    for (i=0;i<0x10;i++) {
        j = 0xf&(~i) ;
        k = 0xf&( i | j ) ;
        printf("0x%x | 0x%x  = 0x%x\n", i, j, k) ;
    }
    return 0 ;
}

In [None]:
%%bash
S=flip-out
cc -o $S $S.c
./$S 27
rm $S $S.c

Hence, 

`0 == 0xffffffff + 1 == ( x | (~x) ) + 1  == ( x + ~x ) + 1 == x + ( ~x + 1 )`

Therefore negative x equals one more than the bitwise complement of x. This is how we make negative numbers, as well as this is how we will do subtraction. The ALU only needs to do bitwise complement and addition, and it can use those to do subtraction.


In [None]:
%%file two-complement.c

#include<stdio.h>
#include<stdlib.h>

int main(int argc, char * argv[]) {
    unsigned int n = atoi(argv[1]) ;
    unsigned int nn ;
    
    nn = 1 + ~n ; // twos complement homemade negatives
    
    printf("%u:\t%u\t%d\n", n, nn, nn) ;
    
    return 0 ;
}

In [None]:
%%bash
S=two-complement
cc -o $S $S.c
./$S 10
./$S 0
./$S -45
rm $S $S.c