# **Chapter 03: Data and C**  

<br>

gonna learn about:  
- Keywords:  
`int`, `short`, `long`, `unsigned`, `char`, `float`, `double`, `_Bool`, `_Complex`, `_Imaginary`
- Operator:  
`sizeof`
- Function:  
`scanf()`
- The basic data types that C uses
- The distinctions between integer types and floating-point types
- Writing constants and declaring variables of those types
- How to use the `printf()` and `scanf()` functions to read and write values of different types  

<br>

Programs work with data.  
You feed numbers, letters, and words to the computer, and you expect it to do something with data.  
In this chapter, you'll learn how to read about data and how to manipulate data.  


This chapter explores the two great families of data types: integer and floating point.  
C offers several varieties of these types.  
This chapter tells you what the types are, how to declare them, and how and when to use them.  
Also, it introduces the differences between the constants and variables.  

<br>
<br>  

## **A Sample Program**  
---  

In [3]:
// Listing 3.1 The platinum.c Program

/* platinum.c -- your weight in platinum */

#include <stdio.h>

int main(void){
    float weight;   /* user weight */
    float value;    /* platinum equivalent */

    printf("Are you worth your weight in platinum?\n");
    printf("Let's check it out.\n");
    printf("Please enter your weight in pounds: \n");

    /* get input from the user */
    scanf("%f", &weight);
    /* assume platinum is $1700 per ounce */
    /* 14.5833 converts pounds avd. to ounces troy */
    value = 1700.0 * weight * 14.5833;
    printf("Your weight in platinum is worth $%.2f.\n", value);
    printf("You are easily worth that! If platinum prices drop,\n");
    printf("eat more to maintain your value.\n");

    return 0;
}

Are you worth your weight in platinum?
Let's check it out.
Please enter your weight in pounds: 
Your weight in platinum is worth $3825841.50.
You are easily worth that! If platinum prices drop,
eat more to maintain your value.


Note that "entering" your weight means to type your weight and then press Enter or Return key.  
Pressing Enter informs the computer that you have finished typing your response.  
The program expects you to enter a number, not words.  

<br>
<br>

### What's New in This Program?  
---  
There are several new elements of C in this program:  
- the program uses a new kind of variable declaration  
    - `float weight;`
    - previous programs used an integer variable type(`int`)
    - but, this program adds a floating-point variable type(`float`)  
    $\rightarrow$ you can handle a wider variety of data  
    the `float` type can hold numbers with decimal points  
- the program uses some new ways of writing constants  
    - `value = 1700.0 * weight * 14.5833;`
    - you have numbers with decimal points  
- the program prints new kind of variable  
    - `printf("Your weight in platinum is worth $%.2f.\n", value);`
    - the program uses the `%f` specifier in the `printf()` to handle a floating-point value
    - the `.2` modifier to the `%f` specifier fine-tunes the appearance of the output  
    $\rightarrow$ it displays two places to the right of the decimal  
- the program reads values with the `scanf()` function  
    - `scanf("%f", &weight);`
    - the `scanf()` function provides keyboard input to the program
    - the `%f` instructs `scanf()` to read a floating-point number
    - the `&weight` tells `scanf()` to assign the input value to the variable named `weight`
    - the `scanf()` function uses the `&` notation to indicate where it can find the `weight` variable
- these new feature tells that the program is interactive  
    - the computer asks you for information and uses it  
    - the interactive approach makes programs more flexible  
        - ex) the Listing 3.1 can be used for any reasonable weight, not just for sample weight  
        $\therefore$ you don't need to rewrite the program every time you want to try it on a new weight  
    - the `scanf()` and `printf()` functions make this interactivity possible  
        - the `scanf()` reads data from the keyboard and delivers it to the program  
        - the `printf()` reads data from a program and delivers it to the screen  
        - `scanf()` and `printf()` enable you to establish a two-way communication with your computer  


<div class="mermaid">
graph LR
    subgraph "printf()"
        direction LR
        id4(printf function) --displaying program output--> id3(screen)
    end
    
    subgraph "scanf()"
        direction RL
        id1(keyboard) --getting keyboard input--> id2(scanf function)
    end
</div>

<br>

This chapter explains the first two items in this list:  
variables and constants of various data types

<br>
<br>

## **Data Variables and Constants**  
---  
A computer, under the guidance of a program, can do many things.  
But, to do these tasks, the program needs to work with data.  

<br>

constants and variables:
- *constants*:  
data are preset before a program is used,  
and keep their values unchanged throughout the life of the program  
- *variables*:  
data may change or be assigned values as the program runs  
- examples of them in the Listing 3.1:  
    - `14.5833` is a constant
    - `weight` is a variable
- difference:  
    - a variable can have its value assigned or changed while the program runs  
    - a constant can't be changed or assigned a new value while the program runs

<br>
<br>

## **Data: Data-Type Keywords**  
---  
There is a distinction between different types of data.  
Some types of data are numbers, some are letters, or characters.  
The computer needs a way to identify and use these different kinds.  
C does this by recognizing several fundamental *data types*.  

<br>

Following table(Table 3.1) shows the fundamental type keywords recognized by C:  
|**Original K&R Keywords**|**C90 K&R Keywords**|**C99 Keywords**|
|:-|:-|:-|
|`int`|`signed`|`_Bool`|
|`long`|`void`|`_Complex`|
|`short`||`_Imaginary`|
|`unsigned`|||
|`char`|||
|`float`|||
|`double`|||  
- K&R C recognized seven keywords  
- the C90 standard added two more keywords  
- the C99 standard added three more keywords  
    - `int`: provides the basic class of integers  
    - `long`, `short`, `unsigned`, `signed`: provide variations of the basic type  
    (ex. `unsigned short int`, `long long int`, ...)   
    - `char`: designates the type used for letters of the alphabet and other characters  
    (ex. `#`, `$`, `%`, ...)  
        - also can be used to represent small integers  
    - `float`, `double`, `long double`: used to reppresent numbers with decimal points  
    - `_Bool`: for Boolean values  
    (ex. `true`, `false`)  
    - `_Complex`, `_Imaginary`: represent complex and imaginary numbers, respectively  
- the types created with these keywords can be divided into two families:  
    - *ingeger* types 
    - *floating-point* types 
    - it is based on the way the computer stores

<br>

> #### Bits, Bytes, and Words  
> *bit*, *byte*, *word*:  
> - used to describe units of computer data  
> - also used to describe units of computer memory  
> (we're concentrate on this usage here)  
> - *bit*:  
> the smallest unit of memory  
>   - can hold one of two values: 0 or 1  
>   - you can say that the bit is set to "off" or "on"
>   - it is the basic block of computer memory
> - *byte*:  
> usual unit of computer memory  
>   - 1 byte = 8 bits  
> $\rightarrow$ it is the standard definition, at least when used to measure storage
>   - there are 256 possible bit patterns  
> $\because$ each bit can be either 0 or 1  
> $\rightarrow$ a byte can represent the integers from 0 to 255 or a set of characters  
> - *word*:  
> the natural unit of memory for a given computer design  
>   - ex) for 8-bit microcomputers, a word is just 8 bits  
>   - at the present, a word is usually 64 bits  
>   - larger word sizes enable faster transfer of data and allow more memory to be accessed

<br>
<br>

### Integer Versus Floating-Point Types  
---  
You don't need to learn about the details of how the computer stores or manipulates data.    
However, it can help you occasionally.  


For a human, the difference between integers and floating-point numbers is reflected in the way they can be written.  
For a computer, the difference is reflected in the way they are stored.  
Let's look at each of the two classes in turn.

<br>

### The Integer  
*integer*:  
- a number with no fractional part  
- in C, number that never written with a decimal point  
- ex) integers: 2, -23, 2456 / not integers: 3.14, 0.22, 2.000
- stored as binary numbers  
    - ex) the integer 7 $\rightarrow$ 111 in binary  
    $\therefore$ stored in an 8-bit byte as 00000111  
    $\because$ 111 $\rightarrow$ 2^2 + 2^1 + 2^0 = 7  

<br>  

### The Floating-Point Number  
*floating-point*:  
- a number more or less corresponds to what mathematicians call a real number  
- real numbers include the numbers between the integers  
- ex) 2.75, 3.16E, 7.00, 2e-8, ...  
- notice that adding a decimal point makes a value a floating-point value  
    $\rightarrow$ 7: integer type, 7.00: floating-point type
- there is a way to write floating-point number except for the decimal point  
    - e-notation:  
    ex) 3.16E7 = 3.16 * 10^7  
- storing floating-point numbers is more complex than storing integers  
    - floating-point representation: breaking up a number into a fractional part and an exponent part  
    $\rightarrow$ storing the parts separately  
    - example for storing the floating-point:  
    let's store 7.00  
        - 7.00 = 0.7E1 = 0.7 * 10^1  
        $\rightarrow$ 0.7: fractional part, 1: exponent part  
        - a computer would use binary numbers and powers of 2 instead of powers of 10 for internal storage  
- the differences between the integer and floating-point:  
    - an integer has no fractional part;  
    a floating-point number can have a fractional part  
    - the range of values represented by:  
    floating-point numbers $\gt$ integer numbers  
    - for some arithmetic operations, floating-point numbers are subject to greater loss of precision  
    - computer floating-point numbers can't represent all real numbers  
    $\because$ there is an infinite number of real numbers in any range  
    $\rightarrow$ floating-point values are often approximations of a true value  
    - floating-point operations were once much slower than integer operations

<br>
<br>

## **Basic C Data Types**  
---  
Let's look at the specifics of the basic data types used by C.  


For each type, we describe:  
- how to declare a variable 
- how to represent a constant with a literal value  
- what a typical use would be.  

<br>
<br>

### The `int` Type  
---  
integer types in C:
- C offers many integer types
$\because$ C gives the programmer the option of matching a type to a particular use
- the C integer types vary in:
    - the range of values offered  
    - whether negative numbers can be used  

<br>

`int` type:  
- the basic choice  
- signed integer type  
$\rightarrow$ it must be an integer and it can be positive or negative or zero  
- the range in possible values depends on the computer system  
    - typically, `int` uses one machine word for storage  
    $\rightarrow$ current personal computers typically use 32-bit integers  
    - ISO C sepcifies that the minumum range of `int` should be -32767 to 32767

<br>

##### Declaring an `int` Variable  
- keyword `int` is used to declare the basic integer variable  
- ex) `int num;`  
- to declare more than one variable,  
    - you can declare each variable separately  
    ex) `int num1; int num2; int num3;`
    - you can follow the `int` with a list of names separated by commas  
    ex) `int num1, num2, num3;`
- the declaration of variables performs:  
    - associate names  
    - arrange storage space for `int`-sized variables  

<br>

##### Initializing a Variable  
Only declarations don't supply values to variables.  
There are 3 ways to supply values to variables:  
- assignment  
ex) `num = 5;`  
- use `scanf()` function  
ex) `scanf("%d", &num);`
- initialization  
    - it means to assign it a starting, or initial, value  
    - in C, initialization can be done as part of declaration  
    - ex) `int num = 5;`, `int num1 = 5, num2 = 10;`, `int num1, num2 = 10;`  
    - in the last example, only `num2` is initialized to 10  
    it can be seem that `num1` is also initialized to 10  
    $\therefore$ it is best to avoid putting initialized and noninitialized variables in the same declaration  

<br>

##### Type `int` Constants  
- *integer constants*: 1, 2, 21, 32, 14, 94, ...  
- it is also called *integer literals*  
- a number without:  
    - a decimal point  
    - an exponent  
    $\rightarrow$ C recognizes it as an integer  
- C treats most integer constants as type `int`  
(but, very large integers can be treated differently)  

<br>  

##### Printing `int` Values  
- use the `printf()` function to print `int` types  
- use the `%d` notation to indicate just where in a line the integer is to be printed  
    - `%d` is called a *format specifier*  
    $\because$ it indicates the form that `printf()` uses to display a value  
    - every `%d` in the format string must be matched by a corresponding `int` value in the list of items to be printed  
    - the `int` value can be:  
        - `int` variable  
        - `int` constant  
        - an expression having an `int` value  
    - you have to make sure the number of format specifiers matches the number of values  

<br>

##### Octal and Hexadecimal  
Normally, C assumes that integer constants are decimal, or base 10, numbers.  
However, octal(base 8) and hexadecimal(base 16) numbers are popular with many programmers.  


octal and hexadecimal:  
- 8 and 16 are powers of 2  
$\rightarrow$ these number systems occasionally offer a more convenient way for expressing computer-related values  
- ex) the number 65536 which often appears in 16-bit machines = 10000 in hexadecimal  
- each digit in a hexadecimal number corresponds to exactly 4 bits  
    - the hexadecimal digit 3 = 0011
    - the hexadecimal digit 5 = 0101  
    - the hexadecimal digit 53 = 0101 0011  
    - the hexadecimal digit 35 = 0011 0101  
    $\rightarrow$ this correspondence makes it easy to go back and forth between hexadecimal and binary notation  
- in C, special prefixes indicate which number base you are using  
    - 0x, 0X: hexadecimal  
    (ex. 16 is written as 0x10 or 0X10)
    - 0: octal  
    (ex. 16 is written as 020)  
- using different number systems is optional  
    - it doesn't affect how the computer stores the value  
    - values are stored in the same way (binary)

<br>
<br>

##### Displaying Octal and Hexadecimal  
C enables you to display a number in any of the three number systems: decimal, octal, and hexadecimal.  
- decimal: `%d`
- octal: `%o`
- hexadecimal: `%x`  
- you can display the C prefixes by using:  
    - `%#o`: octal  
    - `%#x`: hexadecimal(0x)
    - `%#X`: hexadecimal(0X)  

<br>

Listing 3.3 shows a short example of how to display numbers in octal and hexadecimal.

In [1]:
// Listing 3.3 The bases.c Program

/* bases.c -- printfs 100 in decimal, octal, and hex */

#include <stdio.h>

int main(void){
    int x = 100;

    printf("dec = %d; octal = %o; hex = %x\n", x, x, x);
    printf("dec = %d; octal = %#o; hex = %#x\n", x, x, x);

    return 0;
}

dec = 100; octal = 144; hex = 64
dec = 100; octal = 0144; hex = 0x64


You can see the same value displayed in 3 different number systems.  

<br>
<br>

### Other Integer Types  
---  
When you are just learning the language, the `int` type will probably meet most of your integer needs.  
However, we'll cover the other forms.  


C offers three adjective keywords to modify the basic integer type:  
`short`, `long`, and `unsigned`  
- `short`  
    - `short int` or `short`(briefly)  
    - use less storage than `int`  
    $\rightarrow$ saving space when only small numbers are needed  
    - signed type  
- `long`  
    - `long int` or `long`(briefly)  
    - use more storage than `int`  
    $\rightarrow$ it enables you to express larger integer values  
    - signed type  
- `long long`  
    - `long long int` or `long long`(briefly)  
    - use more storage than `long`  
    $\rightarrow$ it must use at least 64 bits  
    - signed type  
- `unsigned`  
    - `unsigned int` or `unsigned`(briefly)  
    - used for variables that have only nonnegative values  
    - shifts the range of numbers that can be stored  
    (ex. a 16-bit `unsigned int` allows a range from 0 to 65535 which is shifted from -32768 to 32767)  
    - the bit used to indicate the sign of signed numbers $\rightarrow$ another binary digit  
    $\therefore$ allows the larger range of positive numbers  
- `unsigned long int`(`unsigned long`), `unsigned short int`(`unsigned short`), `unsigned long long int`(`unsigned long long`)  
- `signed`  
    - used with any of the signed types to make your intent explicit  

<br>
<br>

##### Declaring Other Integer Types  
Other integer types are declared in the same manner as the `int` type.  
The following list shows several examples:  
- `long int estine;`
- `long johns;` 
- `short int erns;`
- `short ribs;`
- `unsigned int s_count; `
- `unsigned players; `
- `unsigned long headcount; `
- `unsigned short yesvotes; `
- `long long ago;`

<br>
<br>

##### Why Multiple Integer Types?  
C guarantees only relative sizes of the integer types.  
$\because$ This idea is to fit the types to the machine.  
The most common practice today on personal computers is to set up:
- `long long` $\rightarrow$ 64 bits
- `long` $\rightarrow$ 32 bits
- `short` $\rightarrow$ 16 bits
- `int` $\rightarrow$ 16 or 32 bits  

$\rightarrow$ This is depending on machine's word size.  
In principle, these types could represent 4 distinct sizes,  
but in practice at least some of the types normally overlap.  

<br>

The C standard provides guidelines specifying the minimum allowable size for each basic data type:  
- both `short` and `int`:  
-32,767 ~ 32,767  
(corresponding to 16 bit unit)
- `long`:  
-2,147,483,647 ~ 2,147,483,647  
(corresponding to 32 bit unit)  
- `long long`:  
-9,223,372,036,854,775,807 ~ 9,223,372,036,854,775,807  
(corresponding to 64 bit unit)  
- both `unsigned short` and `unsigned int`:  
0 ~ 65,535  
- `unsigned long`:  
0 ~ 4,294,967,295  
- `unsigned long long`:  
0 ~ 18,446,744,073,709,551,615

<br>

use of various integer types:  
- `unsigned` types:  
    - use for counting  
        - when we count things, we don't need negative numbers  
        - we can reach higher positive numbers than signed types  
- `long` type:  
    - use for treating large ranges of numbers  
    (which can't be handled by `int` type)  
    - on systems for which `long` $\gt$ `int`,  
    $\rightarrow$ using `long` can slow down the calculations  
    $\therefore$ do not use `long` if it is not essential  
    - on systems for which `long` = `int`  
    (you do need 32-bit integers)  
    $\rightarrow$ using `long` may cause confusion if translated to 16-bit systems  
- `short` type:  
    - use for saving storage space  
    - saving storage space is important only if your program uses arrays of integers 
    $\because$ it can be large in relation to a system's available memory
    - `short` may correspond in size to hardware registers used by particular components in a computer

<br>

> #### Integer Overflow  
> What happens if an integer tries to get too big for its type?  
>  
> 
> Let's find out with an example:  
> - set an integer which is the largest possible value to `int` variable
> - set an integer which is the largest possible value to `unsigned int` variable
> - add 1 and 2 to each variable
> - print the result



In [1]:
/* toobig.c - exceeds maximum int size on our system */

#include <stdio.h>

int main(void){
    int i = 2147483647;
    unsigned int j = 4294967295;

    printf("%d %d %d\n", i, i + 1, i + 2);
    printf("%u %u %u\n", j, j + 1, j + 2);

    return 0;
}

2147483647 -2147483648 -2147483647
4294967295 0 1


> example explanation:  
> - `unsigned int` variable `j`:  
>   when it reaches its maximum value,  
>   it starts over at the beginning. 
> - `int` variable `i`:  
>   when it reaches its maximum value,  
>   it starts over at the beginning. 
> - `unsigned int` variable begins at 0  
> - `int` variable begins at -2147483648. 
> - you are not informed that the variables have exceeded their maximum values  
>  
>  
> The behavior described here is mandated by the rules of for unsigned types.  
> The standard does not define how signed types should behave.  
> The behavior shown here is typical, but you could encounter something different.  

<br>

##### `long` Constants and `long long` Constants  
use of `long` and `long long` constants:  
- decimal:
    - normally, the compiler use `int` type for integer constants  
    - if you use a constant that is bigger than the largest `int` value,  
    $\rightarrow$ the compiler treats it as a `long` constant  
    - if the number is larger than the `long` maximum,  
    $\rightarrow$ the compiler treats it as `unsigned long` constant  
    - if that is still insufficient,  
    $\rightarrow$ the compiler treats the value as `long long` or `unsigned long long` constant  
- octal and hexadecimal:  
    - octal and hexadecimal constants are treated as `int`  
    unless the value is too large  
    - then the compiler tries :
        - `unsigned int`
        - `long`
        - `unsigned long`
        - `long long`
        - `unsigned long long`
- sometimes, you might want the compiler to store a small number as a `long` integer:  
    - you can cause a small constant to be trated as `long` by appending an `l` or `L` as a suffix
        - ex) `7` $\rightarrow$ `7l` or `7L`
    - the `l` and `L` suffixes can also be used with octal and hexadecimal constants  
        - ex) `020`, `0x10` $\rightarrow$ `020l`, `0x10L`
- you can also use the `long long` type to store a small number  
    - use the `ll` or `LL` suffixes  
        - ex) `7` $\rightarrow$ `7ll` or `7LL`
    - use the `u` or `U` suffixes to use `unsigned long long` type  
        - ex) `7` $\rightarrow$ `7ull` or `7ULL` or `7llu` or `7LLU`

<br>

##### Printing `short`, `long`, `long long`, and `unsigned` Types  
To print various types of integers, you can use the following notations: 
- `short` number in decimal: `%hd`
- `unsigned int` number in decimal: `%u`
- `long` number in decimal: `%ld`
    - if `int` and `long` are the same size on the system,  
    just `%d` will suffice  
    - but, it will not work properly when transferred to a system on which the two types are different  
    $\rightarrow$ use the `%ld` specifier for `long`
- `long long` number in decimal: `%lld`
- you can use prefix for octal and hexadecimal numbers:
    - octal: `o`
    - hexadecimal: `x`
    - C allows both uppercase and lowercase letters for constant suffixes,  
    these format specifiers use just lowercase

<br>

some examples:  
- print octal and hexadecimal `short` numbers:  
`%ho`, `%hx
`
- print `unsigned long` numbers:  
`%lu`
- print octal and hexadecimal `long long` numbers:  
`%llo`, `%llx`

<br>

Listing 3.4 provides an example of formatting.

In [1]:
// Listing 3.4 The print2.c Program

/* print2.c - more printf() properties */

#include <stdio.h>

int main(void){
    unsigned int un = 3000000000;   /* system with 32-bit int */
    short end = 200;    /* and 16-bit short */
    long big = 65537;
    long long verybig = 12345678908642;

    printf("un = %u and not %d\n", un, un);
    printf("end = %hd and %d\n", end, end);
    printf("big = %ld and not %hd\n", big, big);
    printf("verybig = %lld and not %ld\n", verybig, verybig);

    return 0;
}

/tmp/tmp2jk07ghd.c: In function ‘main’:
   15 |     printf("big = %ld and not %hd\n", big, big);
      |                               ~~^          ~~~
      |                                 |          |
      |                                 int        long int
      |                               %ld
   16 |     printf("verybig = %lld and not %ld\n", verybig, verybig);
      |                                    ~~^              ~~~~~~~
      |                                      |              |
      |                                      long int       long long int
      |                                    %lld


un = 3000000000 and not -1294967296
end = 200 and 200
big = 65537 and not 1
verybig = 12345678908642 and not 12345678908642


example explanation:  
- points out that using the wrong specification can produce unexpected results  
- note that using the `%d` for the unsigned variable `un` produces a negative number  
    - $\because$ the unsigned value `3000000000` and the signed value `-1294967296` have exactly the same internal representation in memory  
    - it shows up with values larger than the maximum signed value  
    - if the number was smaller value, such as 96, the program would stored and displayed the same for both signed and unsigned types
- note that the `short` variable `end` is displayed the same whether you use `%hd` or `%d`  
    - $\because$ C automatically expands a type `short` value to a type `int` value when it is passed as an argument to a function  
    - why does this conversion take place?:  
    `int` type is intended to be the integer size that the computer handles most efficiently
    - what is the use of the `h` modifier?:  
    you can show how a longer integer would look if truncated to the size of `short`
- the 3rd output line illustrates the use of the `h` modifier  
    - the value `65537` expressed in binary format as a 32-bit number:  
    00000000 00000001 00000000 00000001  
    - using the `%hd` persuaded `printf()` to look at just the last 16 bits  
    $\therefore$ it displayed the values as `1`
- simillary, the final output line shows the use of `%ld` specifier
    - it shows the full value of `verybig` and then the value stored in the last 32 bits
    
<br>

In previous section(`long` Constants and `long long` Constants), you saw that it is your reponsibility to make sure the number of specifiers matches the number of values to be displayed.  
Here you see that it is also your responsibility to use the correct specifier for the type of value to be displayed.  


>