# 13. Strings and Text Processing

## Manipulating Strings
---

Strings are **immutable**! 
   
Any change of a variable of the `string` type creates a **new** `string` in which the result is stored.   
   
Therefore, any operations that apply to a `string` will return a **reference** to the result.

<br>

### Concatenation

Suppose we have declared the following `string` types: 

In [13]:
string greet    = "Hello, ",
       subject  = "User!";

<br>

Gluing two `string` types together and obtaining a new one as a result is referred to as **concatenation**.   

It can be implemented be done in **three ways**: 
1. using the `string.Concat(…)` method  
2. using the `+` operator.
3. using the `+=` operator.

In [14]:
string.Concat( greet, subject )

Hello, User!

In [15]:
greet + subject

Hello, User!

In [16]:
greet + (greet += subject)

Hello, Hello, User!

Please note that `string` **concatenation** does not change the existing `string`,   
but rather returns a **new** `string` as a result.

<br>

### Switching to Uppercase and Lowercase Letters

Sometimes we need to **change the case** of a `string` so that all the characters in it to be entirely **uppercase** or **lowercase**. 
   
The **two methods** that would work best in this case are:
1. `ToLower(…)`
2. `ToUpper(…)`

<br>

Let's say we have the following `string` with wildly inconsistent case:

In [17]:
string whackyCase = "iS tHiS hARd tO rEaD?";

<br>

Consequently, we may use either `ToLower(…)` or `ToUpper(…)` in order to normalize the inconsistencies in our textual data:

In [18]:
whackyCase.ToUpper()

IS THIS HARD TO READ?

In [19]:
whackyCase.ToLower()


is this hard to read?

<br>

### Searching for a String within Another String

When we have a `string` with a specified content, it is often necessary to
process only a *part* of its value.    

The $.NET\,\,Framework$ provides us with **two methods** to **search a** `string` **within another** `string`: 
1. `IndexOf(…)`
2. `LastIndexOf(…)`

These methods both search the contents of the text sequence, but in opposite directions, for the index corresponding to the first occurrence of the specified subtring. 

If it finds the substring, it **returns the index where it was found**.  
Otherwise, it will **return** $-1$ to indicate that the **substring was not found**:   

In [20]:
// searching forwards from the beginning of the string
// for the substring "_"
"__Where's that____underscore at?_".IndexOf("_")

In [21]:
// searching forwards from the beginning of the string
// for the substring "Not Found"
"__Where's that____underscore at?_".IndexOf("Not Found")

In [22]:
// searching backwards from the end of the string
// for the substring "_"
"__Where's that____underscore at?_".LastIndexOf("_")

In [23]:
// searching backwards from the end of the string
// for the substring "Not Found"
"__Where's that____underscore at?_".LastIndexOf("Not Found")

<br>

#### Finding All Occurrences of a Substring

Take the following quote, for example:

In [24]:
string quote = 

    "How much wood would a woodchuck chuck, "
    +
    "if a woodchuck could chuck wood?";

<br>

Let's say we wanted to see how many times the **substring** `wood` appears in the quote.     

In [25]:
string subString = "wood"; 

<br>

Our first step is to *find the first index* that **matches the substring**:

In [27]:
int matchIndex = quote.IndexOf( subString );

<br>

We can then **iterate through consecutive substring matches** using a `while` loop, as demonstrated below:

In [28]:
while( matchIndex != -1 )
{
 
    // Print the index at which the substring was found
    Console.WriteLine(
        $"{ subString } found at { matchIndex }"
    );


    // Get the subsequent index corresponding to the next 
    // match of the substring, if any exists, or a -1 otherwise
    matchIndex = quote.IndexOf( subString, matchIndex + 1);

}

wood found at 9
wood found at 22
wood found at 44
wood found at 66


<br>

### Extracting a Portion of a String

We know how to *check whether a substring occurs in a text* and which are the occurrence positions. But how can we **extract the substring itself?**  

The solution of this problem is the `Substring(…)` method.    
    
By using it, we can **extract a part of the string** (**substring**) by a given *starting position in the text* and its *length*.   
    
If the *length* is omitted, the it will be from the beginning to the end.

<br>

Let's say we wanted to **extract the file name** for a cool pic from the following file path:

In [29]:
string filePath = "C:\\Pics\\CoolPic.jpg";

<br>

Since the `filePath` is also like an **array of characters**, we mmay visualize it as follows:

<table style="background: white; color: black;">
    <thead>
        <th style="border: 1px solid black;">0</th>
        <th style="border: 1px solid black;">1</th>
        <th style="border: 1px solid black;">2</th>
        <th style="border: 1px solid black;">3</th>
        <th style="border: 1px solid black;">4</th>
        <th style="border: 1px solid black;">5</th>
        <th style="border: 1px solid black;">6</th>
        <th style="border: 1px solid black;">7</th>
        <th style="border: 1px solid black; font-style: italic;">8</th>
        <th style="border: 1px solid black; font-style: italic;">9</th>
        <th style="border: 1px solid black; font-style: italic;">10</th>
        <th style="border: 1px solid black; font-style: italic;">11</th>
        <th style="border: 1px solid black; font-style: italic;">12</th>
        <th style="border: 1px solid black; font-style: italic;">13</th>
        <th style="border: 1px solid black; font-style: italic;">14</th>
        <th style="border: 1px solid black;">15</th>
        <th style="border: 1px solid black;">16</th>
        <th style="border: 1px solid black;">17</th>
        <th style="border: 1px solid black;">18</th>
    </thead>
    <tbody>
        <tr>
            <td style="border: 1px solid black;">C</td>
            <td style="border: 1px solid black;">:</td>
            <td style="border: 1px solid black;">\</td>
            <td style="border: 1px solid black;">P</td>
            <td style="border: 1px solid black;">i</td>
            <td style="border: 1px solid black;">c</td>
            <td style="border: 1px solid black;">s</td>
            <td style="border: 1px solid black;">\</td>
            <td style="border: 1px solid black; font-weight: bold;">C</td>
            <td style="border: 1px solid black; font-weight: bold;">o</td>
            <td style="border: 1px solid black; font-weight: bold;">o</td>
            <td style="border: 1px solid black; font-weight: bold;">l</td>
            <td style="border: 1px solid black; font-weight: bold;">P</td>
            <td style="border: 1px solid black; font-weight: bold;">i</td>
            <td style="border: 1px solid black; font-weight: bold;">c</td>
            <td style="border: 1px solid black;">.</td>
            <td style="border: 1px solid black;">j</td>
            <td style="border: 1px solid black;">p</td>
            <td style="border: 1px solid black;">g</td>
        </tr>
    </tbody>
</table>

<br>

As such, to **extract the file name** from the `filePath`, we should **start at index** $8$, and then **extract the next** $7$ **characters**:

In [30]:
filePath.Substring( 8, 7 )

CoolPic

<br>