In [None]:
// run this cell to prevent Jupyter from displaying the null output cell
com.twosigma.beakerx.kernel.Kernel.showNullExecutionResult = false;

<a id="notebook_id"></a>
# Strings

A Java `String` instance represents a sequence of characters.

## String literals

The `String` class is unusual in that it is the only class in Java that has literals. A Java string literal is a sequence of characters enclosed by double quotes; for example:

In [None]:
String greeting = "Hello, world!";
System.out.println(greeting);

Observe that the double quotes that delimit a string literal are *not* part of the sequence of characters in the string.

Because the double quote character delimits a string literal you need to use the special character `'\"'` if you want to use a double quote inside of a string literal. For example, if we wanted to print a quote from the Pratchett and Gaiman book *Good omens* we could write the following:

In [None]:
String quote = "\"The kraken stirs. And ten billion sushi dinners cry out for vengeance.\"";
System.out.println(quote);

The empty string is the string having no characters. It can be written as the string literal `""` (two double quotes with nothing in between).

### Exercises

1. Java string literals are unusual in another fashion. Consider the following code:
```java
String s = "hello";
String t = "hello";
```
How many `String` objects do you think are in memory? Write some code in the following cell to prove or disprove your answer. *Hint: Use the `==` operator.*

In [None]:
// Exercise 1


2. Sketch the memory diagram for the code in Exercise 1.

3. Write some code in the following cell that generates the following output:
    ```
    "She said, "He said, "Lucinda said, "Enough already.""""
    ```

In [None]:
// Exercise 3


4. Write some code in the following cell that generates the following output:
    ```
    That is my sister's university's presidents' son's cousin's bicycle.
    ```

In [None]:
// Exercise 4


## Immutability

Java strings are immutable; once a string is created the sequence of characters in the string cannot be changed. Consider the statement:

```java
String greeting = "Hello, world!";
```

Because the statement contains the literal `"Hello, world!"` the compiler creates a `String` object containing the character sequence `Hello, world!` (but see [the Interning](#interning) section at the end of this notebook). This `String` object is immutable.

The variable `greeting` does not store a `String` object; it stores a reference to a `String` object (see the []() notebook). We can assign a different reference to the variable `greeting`:

In [None]:
String greeting = "Hello, world!";
System.out.println(greeting);

greeting = "Bonjour le monde";
System.out.println(greeting);

In the above example, the object corresponding to the literal `"Hello, world!"` has not changed; instead a second `String` object is created containing the sequence `Bonjour le monde` and the variable `greeting` now refers to this new object. 

### Exercise

5. Sketch the memory diagram for the example above.

## Concatenation

Another unusual feature of the `String` class is that it is the only class in Java that has its own operator. The `+` operator is the *string concatenation* operator. The concatenation of two strings is the new string containing the characters of the two strings joined end to end.  For example the concatenation of the strings `"hair"` and `"ball"` is the string `"hairball"`.

For two `String` references `s` and `t` the expression `s + t` is a reference to the `String` object that is formed by the concatenation of the strings refereneced by `s` and `t`; for example:

In [None]:
String s = "hair";
String t = "ball";
String concat = s + t;
System.out.println(concat);

If only one of `s` and `t` is a `String` reference then the non-`String` operand is converted to a `String` and then string concatenation is performed. For example, a string and an `int` can be concatenated like so:

In [None]:
// string + int
int x = 2;
String s = "the value of x is: ";
String msg = s + x;
System.out.println(msg);


A list and a string can be concatenated like so:

In [None]:
%classpath add jar ../resources/jar/notes.jar

import java.util.List;
import ca.queensu.cs.cisc124.notes.util.Utils;

List<Character> t = Utils.listOf('1', '2', '3', '!');
String s = " I\'m living here on the third speck from the sun";
String z = t + s;
System.out.println(z);

Caution must be used when concatenating more than two values. The `+` operator is always left-to-right associative which means that the expression `a + b + c` is evaluated as `(a + b) + c`. Run the following cell

In [None]:
System.out.println(1 + 2 + " fiddlers");

and observe the difference compared to running the following cell:

In [None]:
System.out.println("fiddlers " + 1 + 2);

To ensure that string concatenation is performed use the empty string as the first operand:

In [None]:
String s = "" + 1 + false + 3.1415 + " this string is gibberish";
System.out.println(s);

### Exercises

6. In the following cell use string concatenation of the variables initialized for you to print the string `"CISC124 Fall 2020"`.

In [None]:
// Exercise 6
String dept = "CISC";
String num = "124";
String term = "Fall";
String year = "2020";


7. What does the following print?
```java
System.out.print("M" + "a");     // print() does not go to the next line
System.out.println('M' + 'a');
```
Explain the output. [From 'Java Puzzlers' by Bloch and Gafter]

In [None]:
// Exercise 7


8. What does the following print?
```java
System.out.println("2 + 2 = " + 2 + 2);
```
Explain the output. [From 'Java Puzzlers' by Bloch and Gafter]

In [None]:
// Exercise 8


9. What does the following print?
```java
String pig = "length: 10";
String dog = "length: " + 10;
System.out.println("Animals are equal: " + pig == dog);
```
Explain the output. [Adapted from 'Java Puzzlers' by Bloch and Gafter]

In [None]:
// Exercise 9


## Constructors and methods

`String` instances, including string literals, are objects and thus are initialized using constructors and have methods.

The `String` class has a [fairly large API](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html). It is not practical or particularly enlightening to demonstrate every method. Instead, this notebook focuses on some of the more commonly used methods.

### Constructors

The `String` class is unusual in that many programmers will never use a `String` constructor in their careers. Instead, the `String` class is the single class in the Java language that has literals (as described earlier in the notebook). The `String` class also has a number of methods that return `String` references. Finally, many (most?) classes contain methods that return `String` references.

### Methods

#### `equals`

The `equals(Object obj)` tests if this string is equal to another object; the method returns `true` if and only if `obj` is a reference to a `String` object having the exact same sequence of characters as this string. For example:

In [None]:
String t = "tee";
String b = "bee";
System.out.println("\"" + t + "\"" + ".equals(\"" + b + "\") : " + t.equals(b));

String alsoT = new String("tee");
System.out.println("\"" + t + "\"" + ".equals(\"" + alsoT + "\") : " + t.equals(alsoT));

You should always use `equals` to test strings for equality instead of using `==`. Changing the previous example to use `==` instead of `equals` produces a possibly surprising result:

In [None]:
String t = "tee";
String b = "bee";
System.out.println("\"" + t + "\"" + " == \"" + b + "\" : " + (t == b));

String alsoT = new String("tee");
System.out.println("\"" + t + "\"" + " == \"" + alsoT + "\" : " + (t == alsoT));

The reason that `t == alsoT` is `false` is because `==` tests for equality of identity; i.e., it asks the question "are `t` and `alsoT` the same object?" In this case, the answer is "no" because `alsoT` was initialized using the `new` operator which always creates a new object. 

`equals` uses the case of the characters to determine equality (`"a"` is not equal to `"A"`, for example). If you want to compare two strings for equality ignoring the case use the `equalsIgnoreCase` method.

#### `length`

The `length` method returns the number of characters in a string.

In [None]:
String s = "";
int n = s.length();
System.out.println("length of \"" + s + "\" : " + n);

s = "A";
n = s.length();
System.out.println("length of \"" + s + "\" : " + n);

s = "AT";
n = s.length();
System.out.println("length of \"" + s + "\" : " + n);

s = "ATE";
n = s.length();
System.out.println("length of \"" + s + "\" : " + n);

#### `isEmpty`

The `isEmpty` method returns `true` if a string is empty (has zero characters) and false otherwise. Note that the string containing only whitespace is not empty.

In [None]:
String s = "";
System.out.println(s.isEmpty());

String space = " ";
System.out.println(space.isEmpty());

#### `charAt(int)`

A string represents a sequence of characters where each character can be read using an integer index. Strings are zero-based; the first character in the string has index 0, the second character has index 1, and so on, up to index `length() - 1`. Run the next cell to print the indexes and corresponding characters for a string:

In [None]:
String s = "The big dwarf only jumps";
System.out.println("index\tcharacter");
for (int i = 0; i < s.length(); i++) {
    char c = s.charAt(i);
    System.out.println("" + i + "\t" + c);
}

Using an invalid index causes an exception to be thrown; try running the next two cells to see what type of exception is thrown.

In [None]:
String s = "abc";
s.charAt(-1);     // index must not be negative

In [None]:
String s = "abc";
s.charAt(3);      // index must be less than s.length()

#### `indexOf(char)`

The method `indexOf(char)` seaches a string for a specified character. It returns the index of the first occurrence of the specified character, or `-1` if the character does not occur in the string. There is also a `lastIndexOf(char)` method if the index of the last occurrence is required.

In [None]:
import java.util.Scanner;

String s = "abcdefghijklmnopqrstuvwxyz";

Scanner scanner = new Scanner(System.in);
System.out.println("Type a character followed by the enter key: ");
char c = scanner.next().charAt(0);
int index = s.indexOf(c);
if (index != -1){
    System.out.println("" + c + " has index " + index);
}
else {
    System.out.println("" + c + " is not a lowercase English letter");
}

#### `startsWith(String)` and `endsWith(String)`

The methods `startsWith(String)` and `endsWith(String)` test if a string starts with or ends with, respectively, a specified substring. For example, you can test if a string might be a single line Java comment like so:

In [None]:
%classpath add jar ../resources/jar/notes.jar

import java.util.List;
import ca.queensu.cs.cisc124.notes.util.Utils;

List<String> t = Utils.listOf("// this is a Java comment",
                             "# this is a Python comment",
                             "-- this is a Haskell comment");

for (String s : t) {     // for each string s in t
    boolean isJavaComment = s.startsWith("//");
    String result = " might be ";
    if (!isJavaComment) {
        result = " is not ";
    }
    System.out.println("\"" + s + "\"" + result + "a Java comment");
} 


Similarly you can test if a string might be a multiline Java comment like so:

In [None]:
%classpath add jar ../resources/jar/notes.jar

import java.util.List;
import ca.queensu.cs.cisc124.notes.util.Utils;

List<String> t = Utils.listOf("/* this is a Java comment */",
                             "=begin this is a Ruby comment =end",
                             "--[[ this is a Lua comment ]]");

for (String s : t) {     // for each string s in t
    boolean isJavaComment = s.startsWith("/*") && s.endsWith("*/");
    String result = " might be ";
    if (!isJavaComment) {
        result = " is not ";
    }
    System.out.println("\"" + s + "\"" + result + "a Java comment");
} 

#### `toUpperCase()`and `toLowerCase()`

The `toUpperCase` and `toLowerCase` methods return the uppercase and lowercase, respectively, copies of a string; they *do not* change the case of the string that was used to call the method (because strings are immutable).

Run the following cell to see a string, the uppercase version of the string, and the lowercase version of a string.

In [None]:
String s = "aBcDeFgHijKlMnOpQrStUvWxYz";
String up = s.toUpperCase();
String low = s.toLowerCase();
System.out.println("original : " + s);
System.out.println("uppercase: " + up);
System.out.println("lowercase: " + low);

#### `substring(int)` and `substring(int, int)`

The `substring` methods return a string equal to a selected part of this string.

The `substring(int beginIndex)` method returns a string formed from the characters of this string starting from `beginIndex` and going to the end of this string. `beginIndex` must be non-negative and less than *or equal to* the length of the string. This is a little unusual because `s.length()` is a valid index for the method *but is not* a valid index for `charAt(int)`.

Run the following cell to see what `substring(int beginIndex)` returns for all acceptable values of `beginIndex` for the string `"happy"`:

In [None]:
String s = "happy";
for (int i = 0; i <= s.length(); i++) {
    String t = s.substring(i);
    System.out.printf("\"%s\".substring(%d) returns \"%s\"%n", s, i, t);
}

The `substring(int beginIndex, int endIndex)` returns a string formed from the characters of this string starting from index `beginIndex` and going to `endIndex - 1`; notice that the character at index `endIndex` is *not* included in the returned string. One reason that the character at index `endIndex` is not included is so that an expression of the form `s.substring(i, s.length() - i)` will return the string equal to `s` with the first `i` and last `i` characters removed; for example:

In [None]:
String s = "kayak";
for (int i = 0; i < 3; i++) {
    String t = s.substring(i, s.length() - i);
    System.out.printf("\"%s\".substring(%d, %d) returns \"%s\"%n", s, i, s.length() - i, t);
}

For `substring(int beginIndex)` it is an error if `beginIndex` is negative or greater than the length of the string.

For `substring(int beginIndex, int endIndex)` it is an error if `beginIndex` is negative or greater than `endIndex`, or if `endIndex` is greater than the length of the string.

Try running the next three cells to see what type of exception is thrown when an invalid index is used.

In [None]:
String s = "oops";
s.substring(-1);

In [None]:
String s = "oops";
s.substring(0, s.length() + 1);

In [None]:
String s = "oops";
s.substring(3, 2);

## Exercises

1. If `s` is equal to the empty string is there any way to call the `charAt` method using `s` without causing an exception to be thrown?

2. In the cell below use the `length` method to get the length of the string `s`. Print the length of the string.

In [None]:
// Exercise 2
String s = "sparring with a purple porpoise";


3. In the cell below use the `charAt` method to get the character at index `1`. Print the character.

In [None]:
// Exercise 3
String s = "sparring with a purple porpoise";


4. In the cell below use the `indexOf` method to find the index of the first `'w'` character. Print the index.

In [None]:
// Exercise 4
String s = "sparring with a purple porpoise";


5. In the cell below use the `lastIndexOf` method to find the index of the last `'h'` character. Print the index.

In [None]:
// Exercise 5
String s = "sparring with a purple porpoise";


6. In the cell below use the `indexOf` and `lastIndexOf` methods to find the indexes of the first and last space characters. Then use the `substring` method to get the string containing all of the characters between the first and last indexes that you found (not including the characters at the first and last index). Print the substring.

In [None]:
// Exercise 6
String s = "sparring with a purple porpoise";


7. Does `s.substring(0)` return a new string or does it return a reference to the string `s`? Write some code in the following cell to prove your answer.

In [None]:
// Exercise 7


8. There is a `String` method that replaces all occurrences of a character in a string with another character returning a new string. In the cell below use the method to replace all of the `'p'` characters with a `'t'`. Print the new string.

In [None]:
// Exercise 8
String s = "sparring with a purple porpoise";


9. There is a `String` method that replaces all occurrences of a substring in a string with another string returning a new string. In the cell below use the method to replace all of the `"hiho"` substrings with the string `"oh, no"`. Print the new string.

In [None]:
// Exercise 9
String s = "hiho, hiho, it\'s off to work we go";


10. One way to find the number of digits in an integer is to convert the integer to its string representation and then get the length of the resulting string. You can use the `Integer.toString` method to perform the conversion or you can use string concatenation (how?). In the cell below write some code that prints the number of digits in the integer value `val`.

In [None]:
// Exercise 10
int val = 12345;   // try several different values for val


11. A path name is the full operating system name for a file including all of the directories. For example, in Windows the path name for a file named `secret.docx` might be the string `"C:\Users\Homer Simpson\Documents\secret.docx"`. Use some string methods to get from a path name:
    1. just the file name (`"secret.docx"` in the example)
    2. just the file name minus the file extension (`"secret"` in the example)
    3. just the extension (`"docx"` in the example)

In [None]:
// Exercise 11
String path = "C:\\Users\\Homer Simpson\\Documents\\secret.docx";  // try different path names
