# 13. Strings and Text Processing

## The Anatomy of a String
---

In $C\#$, a **string** is a **sequence of characters** stored at a certain address in memory.

In $.NET\,Framework$, each character is represented by a serial number from the $Unicode$ table. The $Unicode$ standard's predecessor, $ASCII$, is able to record only $128$ or $256$ characters (respective $ASCII$ standard with *7-bit* or *8-bit* table). Unfortunately, this often does not meet user needs – as we can only fit, within these $128$ characters, *digits*, *uppercase* and *lowercase* *Latin letters*, along with some other specific individual characters. When you have to work with text in Cyrillic or other specific language (e.g. Chinese or Arabian), $128$ or $256$ characters are extremely insufficient.

As such, $.NET$ uses a *16-bit* code table for the characters which store stores $2^{16} = 65,536$ characters.    

What's more is that some characters are encoded in a specific way, such that it is possible to use **two** **characters** of the $Unicode$ table to create a new character – the resulting possibilities exceed 100,000.

<br>

### The `System.String` Class

The `System.String` class is what enables us to directly handle strings in $C\#$.    
   
For declaring the strings, we will continue using the keyword `string`, which is an alias in C# of the `System.String` class from $.NET\,Framework$.

<br>

#### Declaring a `string`

In [1]:
string greeting = "Hello, C#";

Above, we have just declared the variable greeting of type `string` whose content is the text phrase "Hello, C#".   
The representation of the content in the string looks closely to this:

<table style="margin: auto;">
    <thead>
        <th style="border: 1px solid black;">H</th>
        <th style="border: 1px solid black;">E</th>
        <th style="border: 1px solid black;">L</th>
        <th style="border: 1px solid black;">L</th>
        <th style="border: 1px solid black;">O</th>
        <th style="border: 1px solid black;"> </th>
        <th style="border: 1px solid black;">C</th>
        <th style="border: 1px solid black;">#</th>
    </thead>
</table>

The internal representation of the class is quite simple – an **array of characters**. 

<br>

##### What's the Difference Between a `string` and a `char[]`?

We *could*, alternatively, declare a variable of type `char[]`, and fill in the array’s elements character by character:   

In [2]:
char[] stringIsh = new char[]{'H','e','l','l','o',' ','C','#'};

In [3]:
stringIsh

index,value
0,H
1,e
2,l
3,l
4,o
5,
6,C
7,#


However, there are some cosiderable *disadvantages* to doing so:
1. *Filling in the array happens character by character, not at onc*e.
2. *We should know the length of the text in order to be aware whether it will fit into the already allocated space for the array*.
3. *The text processing is manual*.