# Strings in Ruby

There are many methods in the **`String`** class like:

- **`reverse`** that gives a backwards version of the string (`reverse` does not change the original string).
- **`length`** that tells us the number of characters (including spaces) in the string.
- **`upcase`** changes every lowercase letter to uppercase, and **`downcase`** changes every uppercase letter to lowercase.
- **`swapcase`** switches the case of every letter in the string.
- **`capitalize`** is just like `downcase`, except that it switches the first character to uppercase (if it is a letter).
- **`slice`** gives you a substring of a larger string.

The methods `upcase`, `downcase`, `swapcase` and `capitalize` have corresponding methods that modify a string in place rather than creating a new one: `upcase!`, `downcase!`, `swapcase!` and `capitalize!`. Assuming you don't need the original string, these methods will save memory, specially if the string is large.

We know that `String` literals are sequences of characters between singles or double quotation marks. The difference between the two forms is the amount of processing Ruby does on the string while constructiong the literal.

In the _**single-quoted**_ case, Ruby does very little:

  - The backslash works to escape another backslash, so that the second backslash is not itself interpreted as an escape character.
  - A backslash is not special if the character that follows it is anything other than a quote or a backslash. For example, `a\b` and `a\\b` are equal.

In the _**double-quoted**_ case, Ruby does more work:

  1. It looks for substitutions - sequences that start with a backslash character - and replaces them with some binary value.
  2. Expression interpolation. Within the string, the sequence `#{expression}` is replaced by the value of expression.

In [1]:
def say_goodnight(name)
    result = "Good night, #{name}"
    return result
end
puts say_goodnight 'Satish'

def say_goodnight2(name)
    "Good night, #{name}"
end
puts say_goodnight2 'Talim'

Good night, Satish
Good night, Talim


It is to be noted that every time a string literal is used in an assignment or as a parameter, a new `String` object is created.

Strings are objects of class `String`. The `String` class has more than 75 standard methods.

## Table of Contents

- [Listing all methods of a class object](#Listing-all-methods-of-a-class-object)
- [Comparing two strings for equality](#Comparing-two-strings-for-equality)
- [Using `%w`](#Using-%w)
- [Character Set](#Character-Set)
- [Character Encoding](#Character-Encoding)
- [Encoding class](#Encoding-class)

## Listing all methods of a class object

- `String.methods.sort`

Shows you a list of methods that the `Class` object `String` responds to.

- `String.instance_methods.sort`

This method tells you all the instance methods that instances of `String` are endowed with.

- `String.instance_methods(false).sort`

With this method, you can view a class's instance methods without those of the class's ancestors.

## Comparing two strings for equality

Strings have several methods for testing equality.

- The most common one is `==`.
- Another equality-test instance method, `String.eql?`, tests two strings for identical content. It returns the same result as `==`.
- A third instance method, `String.equal?`, tests whether two strings are the same object.

In [2]:
s1 = 'Jonathan'
s2 = 'Jonathan'
s3 = s1

if s1 == s2
    puts 'Both Strings (s1, s2) have identical content'
else
    puts 'Both Strings (s1, s2) do not have identical content'
end

if s1.eql?(s2)
    puts 'Both Strings (s1, s2) have identical content'
else
    puts 'Both Strings (s1, s2) do not have identical content'
end

if s1.equal?(s2)
    puts 'Two Strings (s1, s2) are identical objects'
else
    puts 'Two Strings (s1, s2) are not identical objects'
end

if s1.equal?(s3)
    puts 'Two Strings (s1, s3) are identical objects'
else
    puts 'Two Strings (s1, s3) are not identical objects'
end

Both Strings (s1, s2) have identical content
Both Strings (s1, s2) have identical content
Two Strings (s1, s2) are not identical objects
Two Strings (s1, s3) are identical objects


## Using `%w`

Sometimes creating arrays of words can be a pain, what with all the quotes and commas. Fortunately, Ruby has a shorcut: `%w` does just what we want.

In [3]:
names1 = ['ann', 'richard', 'william', 'susan', 'pat']
puts names1[0].capitalize
puts names1[3].capitalize

names2 = %w{ ann richard william susan pat }
puts names2[0].capitalize
puts names2[3].capitalize

Ann
Susan
Ann
Susan


## Character Set

A character set, or more specifically, a coded character set is a ser of character symbols, each of which has a unique numerical ID, which is called the character's code point.

An example of a character set is the 128-character ASCII character set, which is mostly made up of the letters, numbers, and punctuation used in the English language. The most expansive character set in common use is the _Universal Character Set_ (UCS), as defined in the Unicode standard, which contains over 1.1 million code points.

The letter A, for example is assigned a magic number by the Unicode consortium which is written like this: `U+0041`. A string `"Hello"` which, in Unicode, corresponds to these five code points:

`U+0048 U+0065 U+006C U+006C U+006F`

Just a bunch of code points. Numbers, really. We haven't yet said anything about how to store this in memory. That's where encodings come in.

## Character Encoding

UTF-8 can be used for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes. This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII.

**It does not make sense to have a string without knowing what encoding it uses**. Thus, if you have a string, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

Ruby supports the idea of character encodings.

## Encoding class

Objects of class **`Encoding`** each represent a different character encoding. The **`Encoding.list`** method returns a list of the built-in encodings.

Ruby has a way of setting the encoding on a file-by-file basis using a new magic comment. If the first line of a file is a comment (or the second line if the first line is a `#!`shebang line), Ruby scans it looking for the string `coding:`. If it finds it, Ruby then skips any spaces and looks for the (case-insensitive) name of an encoding. Thus, to specify that source file is in UTF-8 encoding, you can write this:

```ruby
# coding: utf-8
```

As Ruby is just scanning for `coding:`, you could also write the following:

```ruby
# encoding: utf-8
```

**Note**: Ruby writes a byte sequence `\xEF\xBB\xBF` at the start of a source fle, when you use _utf-8_.

If nothing overrides the setting, the default encoding for source is US-ASCII.

In [4]:
# encoding: utf-8

# λ is the Greek character Lambda here
puts 'λ'.length
puts 'λ'.bytesize
puts 'λ'.encoding

1
2
UTF-8


You can refer to all the detail of the `String` class [here](https://ruby-doc.org/core-3.0.0/String.html).