# 2. Character

Get ready to master strings in R! This section will teach you how to create, manipulate, and work with text data effectively.

Strings in R are used to store text data and are considered character vectors. Each element in a character vector is treated as a string. Like in Python, R treats strings as sequences of characters, and you can access or manipulate individual characters within a string using indexing and various string functions. However, indexing in R starts from 1, unlike Python, where it starts from 0.

We'll learn about the following topics:

   - [2.1. Creating Strings](#Creating_Strings)
   - [2.2. Printing Strings](#Printing_Strings)
   - [2.3. Built-in String Functions](#Builtin_String_Functions)
   - [2.4. String Indexing](#String_Indexing)
   - [2.5. String Formatting](#String_Formatting)

<table>
  <thead>
    <tr>
      <th>Name</th>
      <th>Type in R</th>
      <th>Description</th>
      <th>Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Strings</td>
      <td>character</td>
      <td>Ordered sequence of characters, represented using single or double quotes.</td>
      <td>"hello"  'apple'  "I don't do that"  "2000f"</td>  
    </tr>
  </tbody>
</table>


In [1]:
class('hello')

<a name='Creating_Strings'></a>

## 2.1. Creating Strings:

n R, strings are used to store text data. You can create strings using either single quotes or double quotes. R treats these as character data types. Strings are considered sequences of characters in R, but they do not support direct indexing like in Python. Instead, R provides functions to manipulate and extract characters from strings.

In [2]:
#Single Quote
'hello'

In [3]:
#phrase
'beauty is in the eye of the beholder'

In [4]:
#Double Quotes
"beauty is in the eye of the beholder"

Please note that if you use ' you also have to use `'` at the end of string not `"`. Moreover, if your text contains `'` in itself, you have to use `"` for determining the string.

In [5]:
"I'm happy"

In [6]:
'I'm happy'

ERROR: Error in parse(text = input): <text>:1:4: unexpected symbol
1: 'I'm
       ^


This error occurred because `'` in the I'm stopped the string. Python will treat the second `'` as an ending character.

<a name='Printing_Strings'></a>

## 2.2. Printing Strings:

In R, strings (or any other object) are automatically displayed in the console when you enter them into the script or R console. However, the proper way to print strings explicitly is to use the `print()` function.

In [7]:
#Using Print
print('Hello World 1')
print('Hello World 2')

[1] "Hello World 1"
[1] "Hello World 2"


<table>
<tr>
    <th>Special Escape Character</th>
    <th>Result</th>
</tr>

<tr>
    <td>\n</td>
    <td>New Line</td>
</tr>

<tr>
    <td>\t</td>
    <td>Tab</td>
</tr>
</table>


The behavior of escape characters like \n and \t in R can be slightly different from Python depending on how you print them. In R, to see the actual effect of escape characters like newlines and tabs, you generally need to use the `cat()` function, which displays the output more like Python's `print()` function. If you use `print()` or just type the string directly, R will escape the characters and show them in the output literally.

In [8]:
text <- "This is a line.\nThis is another line.\tAnd this is a tabbed line."
print(text)

[1] "This is a line.\nThis is another line.\tAnd this is a tabbed line."


In [9]:
cat(text)

This is a line.
This is another line.	And this is a tabbed line.

<a name='Builtin_String_Functions'></a>

## 2.3. Built-in String Functions:

| **Function**      | **Description**                                       | **Example**                                      |
|--------------------|-------------------------------------------------------|--------------------------------------------------|
| `nchar()`          | Returns the number of characters in a string.        | `nchar("Hello")` output `5`.                   |
| `toupper()`        | Converts a string to uppercase.                       | `toupper("hello")` output `"HELLO"`.           |
| `tolower()`        | Converts a string to lowercase.                       | `tolower("HELLO")` output `"hello"`.           |
| `substr()`         | Extracts a substring from a string.                  | `substr("Hello World", 1, 5)` output `"Hello"`.|
| `strsplit()`       | Splits a string into substrings based on a delimiter.| `strsplit("a,b,c", ",")` output `list("a", "b", "c")`. |
| `paste()`          | Concatenates strings together.                        | `paste("Hello", "World")` output `"Hello World"`. |
| `trimws()`         | Removes leading and trailing whitespace from a string.| `trimws("  Hello  ")` output `"Hello"`.        |
| `sub()`           | 	Replaces the first occurrence of a pattern in a string.   | `sub("o", "x", "Hello World")` output `"Hellx World"`. |
| `gsub()`           | Replaces all occurrences of a pattern in a string.   | `gsub("o", "x", "Hello World")` output `"Hellx Wxrld"`. |
| `grep()`           | Searches for a specific pattern within a vector of strings.| `grep("o", "Hello World")` output `1`.         |
| `grepl()`          | Returns `TRUE` or `FALSE` if a pattern matches a string.| `grepl("o", "Hello World")` output `TRUE`.     |
| `regexpr()`          | Returns the starting positions of the first match of the pattern.| `regexpr("world", "Hello, world!")` output 8.     |


**`nchar()`**: Returns the number of characters in a string.

In [10]:
nchar('Hello World')

**`toupper()`**: Converts a string to uppercase.

In [11]:
toupper('Hello World')

**`tolower()`**: Converts a string to lowercase.

In [12]:
tolower('Hello World')

**`substr(x, start, stop)`**: Extracts a substring from a string.

In [13]:
substr("Hello World", 1, 5)

**`strsplit()`**: Splits a string into substrings based on a delimiter.

In [14]:
strsplit('Hello World', " ")

In [15]:
strsplit("a,b,c", ",")

**`paste()`**: Concatenates strings together.

In [16]:
paste("Hello", "World", "1", sep = " ")

**`trimws()`**: Removes leading and trailing whitespace from a string.

In [17]:
trimws(" Hello World ")

**`sub()`**: 	Replaces the first occurrence of a pattern in a string.

In [18]:
sub("o", "x", "Hello World")

**`gsub()`**: Replaces all occurrences of a pattern in a string.

In [19]:
gsub("o", "x", "Hello World")

In [20]:
gsub("hel", "x", "Hello World", ignore.case = TRUE)

**`grep()`**: Searches for a specific pattern within a vector of strings.

In [21]:
grep("o", "Hello World")

In [22]:
x <- c("Hello, world!", "Goodbye", "Another world")
grep("world", x)

**`grepl()`**: Returns TRUE or FALSE if a pattern matches a string.

In [23]:
grepl("o", "Hello World")

**`regexpr()`**: Returns the starting positions of the first match of the pattern.

In [24]:
regexpr("world", "Hello, world!")

<a name='String_Indexing'></a>

## 2.4. String Indexing:

Strings are sequences, so we can access specific parts using indexes. Indexes start from **1** in R.

<p align="center">
  <img width="700" height="300" src="https://www.sharpsightlabs.com/wp-content/uploads/2018/07/string-with-index-example-768x185.png">
</p>

The most common way to extract a specific character or substring from a string in R is to use the `substr()` function:

`substr(x, start, stop)`

- x: The string you want to extract from.
- start: The starting index of the substring.
- stop: The ending index of the substring.


**Comparison with Python:**

- Starting index: R starts from **1**, while Python starts from **0**.
- Slicing: Python offers a more flexible slicing syntax with negative indices for accessing characters from the end.

In [25]:
a <- "Hello World"

In [26]:
substr(a, 3, 7)

In [27]:
substr(a, 3, nchar(a))

In [28]:
substr(a, 3, 3)

<a name='String_Formatting'></a>

## 2.5. String Formatting:

**`sprintf()`**: We can use the `sprintf()` method to add formatted objects to printed string statements.

In [29]:
name <- "John"
age <- 30

sprintf("My name is %s and I am %d years old.", name, age)

The `%s` is used for strings, `%d` is for integers, and other format specifiers like `%f` for floats can be used.

**`paste()`** separates elements with a space by default,

In [30]:
name <- "John"
age <- 30

paste("My name is", name, "and I am", age, "years old.")

`paste0()` concatenates without spaces.

In [31]:
paste0("My name is", name, "and I am", age, "years old.")