In [None]:
// run this cell to prevent Jupyter from displaying the null output cell
com.twosigma.beakerx.kernel.Kernel.showNullExecutionResult = false;

<a id="notebook_id"></a>
# `char`

The primitive type `char` is used to represent single characters in Java. 

Technically, a `char` represents a UTF-16 encoded Unicode character. [Unicode](http://www.unicode.org). Quoting the [Unicode FAQ](https://www.unicode.org/faq/basic_q.html):

> Unicode is the universal character encoding, maintained by the Unicode Consortium. This encoding standard provides the basis for processing, storage and interchange of text data in any language in all modern software and information technology protocols.

The Unicode standard and the use of Unicode characters is beyond the scope of this notebook. Curious readers can use the following links to learn more about Unicode:

* https://www.unicode.org/standard/WhatIsUnicode.html
* [Joel on Softare](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)

This notebook restricts the discussion of characters to those that can be typed using a standard North American keyboard; i.e., upper and lowercase English letters, numeric digits, common symbols, and the small number of special literals [described in a following section](#special_literals).

## Literals

A `char` literal is a single character enclosed by single quotes; for example, `'a'` is the `char` literal for the lowercase letter *a*. 


### Special literals
<a id="special_literals"></a>

Because the single quote is used to delimit a `char` literal a special technique is required to write the literal corresponding to a single quote; similarly, the double quote is used to delimit a `String` literal which also requires special treatment. Also, there are a small number of different whitespace literals that require special treatment.

The special `char` literals begin a backslash `\` character and then are followed by a single character:

* `'\b'` backspace
* `'\t'` tab
* `'\n'` line feed
* `'\f'` form feed
* `'\r'` carriage return
* `'\"'` double quote
* `'\''` single quote
* `'\\'` backslash

The tab character `'\t'` is a whitespace character. Its effect when printed is to move to the next tab stop which on many computer systems occurs every 8 spaces but the exact spacing is not universal.

In [None]:
System.out.println("1" + '\t' + "2" + '\t' + "3"); // concatenate '\t' with strings
System.out.println("12345678\t9");                 // or embed \t inside a string

The line feed character `'\n'` is a whitespace character and is often incorrectly referred to as the newline character. On many computer systems, it's effect is to insert a new line when printed, but this is not universal.

In [None]:
System.out.println("1\n2\n3");

The form feed character `'\f'` is a whitespace character that indicates a page break. Its effect when printed depends on the system interpreting the printed text.

The carriage return character `'\r'` is a whitespace character. Its effect when printed is to move to the front of the current line; this causes previous printed text to be overwritten.

In [None]:
System.out.println("abc" + '\r' + "efg");

## `char` as integer values

`char` values are integer values between 0 and 65,535 ($2^{16} - 1$), inclusive. The limits of the range of `char` values are available from the `Character` class:

In [None]:
System.out.println(Character.MIN_VALUE);
System.out.println(Character.MAX_VALUE);

Printing the value `Character.MIN_VALUE` prints the `char` corresponding to 0 which is called the *null character* (which somewhat confusingly has nothing to do with the Java value `null`) and printing the value `Character.MIN_VALUE` prints the `char` corresponding to 65,535 which prints out a square with four `F`s.

To print the integer numeric value of a `char` value a cast to `int` or `long` is required:

In [None]:
System.out.println((int) Character.MIN_VALUE);
System.out.println((int) Character.MAX_VALUE);

It may be useful to know that the most familiar English printable characters that can be easily typed on a typical keyboard start at the integer value of 32 (the space character) and end at the integer value of 126 (the tilde `'~'`). A simple loop can be used to print these integer values and the corresponding character:

In [None]:
for (char c = 32; c <= 126; c++) {
    System.out.println("" + ((int) c) + '\t' + c);
}

Arithmetic can be performed with `char` values but beware that Java converts `char` values to `int` values when performing arithmetic and a cast back to `char` is often required; see [the Arithmetic notebook](./arithmetic.ipynb#byte_char_short) for details.

Subtracting `char` values can be used to find the "distance" between characters; for example the distance between `'a'` and `'d'` can be computed as:

In [None]:
int dist = 'd' - 'a';
System.out.println(dist);

which indicates that we need to move 3 characters starting from `'a'` to get to `'d'`; i.e.:

In [None]:
char ch = (char) ('a' + 3);   // cast required because ('a' + 3) is an int value
System.out.println(ch);

English letters *of the same case* can be compared to determine their lexicographical, or dictionary, order. Run the following cell entering letters of the same case at each prompt and verify the results for a few different sets of inputs:

In [None]:
import java.util.Scanner;

Scanner s = new Scanner(System.in);

System.out.println("Enter a first single character followed by the enter key: ");
char c1 = s.next().charAt(0);

System.out.println("Enter a second single character followed by the enter key: ");
char c2 = s.next().charAt(0);

if (c1 < c2) {
    System.out.println("" + c1 + " is before " + c2);
}
else if (c1 > c2) {
    System.out.println("" + c1 + " is after " + c2);
}
else {
    System.out.println("" + c1 + " is equal to " + c2);
}


Now re-run the previous cell entering an uppercase `Z` at the first prompt and a lowercase `a` at the second prompt. The character `Z` has the numeric value 90 and the character `a` has the numeric value 97; thus comparing them using a comparison operator results in `Z` being less than `a`.

## Wrapper class `Character`



The wrapper class `java.lang.Character` contains fields and methods that are useful when working with character values. Many of the fields and methods are related to working with Unicode values.

For our purposes, the interesting methods tend to have names that begin with `is` (for example, `isDigit`, `isLetter`, `isLowerCase`, and `isUpperCase`). Other useful methods include `compare`, `toLowerCase`, and `toUpperCase`.

The `Character` class is [documented here](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Character.html).

## Exercises

1. An empty string (a string having no characters) can be written as `""` (two double quotes with no space in between). Is there such thing as an empty `char`? Try creating an empty character in the next cell:

In [None]:
// Exercise 1
char c;       // try assigning an empty char here; note that a space char is not an empty char

2. What is the significance of the symbol printed using `System.out.println(Character.MAX_VALUE);`? Do not spend too much time trying to solve this question; the answer really is not that significant.

3. With a little bit of research you can discover how to enter Unicode literals in Java. Use the following cell to print a heart character.

In [None]:
// Exercise 3


4. The following loop was used in this notebook to print the character associated with each integer value between 32 and 126:
```java
for (char c = 32; c <= 126; c++) {
    System.out.println("" + ((int) c) + '\t' + c);
}
```
A funny thing happens if you remove the empty string just inside the left `(`. Run the following cell and try to explain the output.

In [None]:
// Exercise 4
for (char c = 32; c <= 126; c++) {
    System.out.println(((int) c) + '\t' + c);   // removed empty string ""
}

5. Use subtraction to compute the distance to the character `z` starting from the character `a`

In [None]:
// Exercise 5


6. Use addition to compute the character `z` starting from the character `a`

In [None]:
// Exercise 6


7. The following code results in a compilation error:
```java
char c = 'g';
c = c + 1;
```
whereas the following code compiles and runs:
```java
int c = 'g';
c += 1;
```
Explain why the first example fails to compile why the second example does compile (and run).