# 13. Strings and Text Processing

## Comparing Strings
---

There are many ways to compare strings, and depending on what exactly we need in the particular case, we can take advantage of the various features of the `string` class.

<br>

#### Comparison for Equality

If we wanted to **compare two strings for** **equality**, the most convenient method is the` Equals(…)`, which works equivalently to the operator `==`, but is invoked directly from the base `string` class:

In [1]:
string.Equals("duck", "duck")

In [2]:
string.Equals("duck", "goose")

In [3]:
"duck" == "duck"

In [4]:
"duck" == "goose"

<br>

##### `.Equals()` Is Case Sensitive ( By Default )

In [5]:
string.Equals(
    "even bigger", 
    "EVEN BIGGER"
)

In practice, we are often interested of only the actual text content when comparing two strings, regardless of the character casing (uppercase / lowercase).    
   
To **ignore case** in string comparison, we can use the `Equals(…)` method with the parameter `StringComparison.CurrentCultureIgnoreCase`:

In [6]:
string.Equals(
    "even bigger", 
    "EVEN BIGGER", 
    StringComparison.CurrentCultureIgnoreCase
)

<br>

#### Comparison For Alphabetical Order

The `<`,`<=`,`>`, and `>=` operators work handily for Integral types like `int`, `long`, `float`, `double`, etc., but not so great with `string` types:

In [7]:
"Apple" < "Bannana"

Error: (1,1): error CS0019: Operator '<' cannot be applied to operands of type 'string' and 'string'

<br>

The `.CompareTo(…)` method from the `string` class returns a **negative value**, **zero**, or **a positive value** depending on the **lexical order of the two compared strings**.    

- A **negative value** means that the first string is lexicographically *before* the second
- A **Zero** means that the two strings are *equal* 
- A **positive value** means that the first string is lexicographically *after* the second

In [8]:
// Apple is BEFORE Bannana
"Apple".CompareTo( "Bannana" )

In [9]:
// Apple is THE SAME THING AS Apple
"Apple".CompareTo( "Apple" )

In [10]:
// Bannana is AFTER Apple
"Bannana".CompareTo( "Apple" )

<br>

##### `.CompareTo()` Is Case Sensitive...But `string.Compare()` Has An Option To Ignore Case

In [11]:
// A horse is a Horse? 
// Not so fast....
"horse".CompareTo("Horse")

If we have to compare the strings lexicographically, but also **ignore the case**, then we could either of the following: 
- `string.Compare(string strA, string strB, bool ignoreCase)` 
- `string.Compare(string strA, string strB, StringComparison.CurrentCultureIgnoreCase)` . 
      
This is an overload to a static method, included in the `string` class, which works in the same way as `CompareTo(…)`:

In [12]:
// A horse is a Horse? OF COURSE, OF COURSE!
string.Compare("horse", "Horse", true)

In [13]:
// It also can accept a StringComparison.CurrentCultureIgnoreCase argument,
// which works similarly in the .Equals() method
string.Compare("horse", "Horse", StringComparison.CurrentCultureIgnoreCase)

<br>

##### Lexicographical Comparison Does Not Follow The Arrangement in the Unicode Table.

Please note that, according to the `Compare(…)` and `CompareTo(…)` methods, **the small letters are lexicographically before the capital ones**:

In [14]:
// apple is BEFORE Apple
"apple".CompareTo("Apple")

The correctness of this rule is quite controversial as in the $Unicode$ table the capital letters are before the small ones. For example due to the standard $Unicode$, the letter $A$ has a code $65$, which is smaller than the code of the letter $a$  has code $97$.