# String Manipulation

## String Literals

Typing string values in Python code is fairly straightforward: they begin and end with a single quote. But then how can you use a quote inside a string? Typing `'That is Alice's cat.'` won’t work, because Python thinks the string ends after `Alice`, and the rest (`s cat.'`) is invalid Python code. Fortunately, there are multiple ways to type strings.

In [2]:
spam = 'That is Alice's Cat'

SyntaxError: unterminated string literal (detected at line 1) (2887093883.py, line 1)

### Double Quotes

Strings can begin and end with double quotes, just as they do with single quotes. One benefit of using double quotes is that the string can have a single quote character in it.

In [3]:
spam = "That is Alice's Cat"

Since the string begins with a double quote, Python knows that the single quote is part of the string and not marking the end of the string. However, if you need to use both single quotes and double quotes in the string, you’ll need to use escape characters.

### Escape Characters

An escape character lets you use characters that are otherwise impossible to put into a string. An escape character consists of a backslash (`\`) followed by the character you want to add to the string. (Despite consisting of two characters, it is commonly referred to as a singular escape character.) For example, the escape character for a single quote is `\'`. You can use this inside a string that begins and ends with single quotes

In [None]:
spam = 'That is Alice\'s cat'

Python knows that since the single quote in `Alice\'s` has a backslash, it is not a single quote meant to end the string value. The escape characters \' and \" let you put single quotes and double quotes inside your strings, respectively.

#### Common Escape Characters

| Escape Character | Description              |
|------------------|--------------------------|
| `\\`             | Backslash                |
| `\'`             | Single quote             |
| `\"`             | Double quote             |
| `\n`             | Newline (line break)     |
| `\r`             | Carriage return          |
| `\t`             | Horizontal tab           |
| `\b`             | Backspace                |


#### Question

What will be the output of the following code?

In [4]:
print("Hello there!\nHow are you?\nI\'m doing fine.")

Hello there!
How are you?
I'm doing fine.


### Raw Strings

You can place an r before the beginning quotation mark of a string to make it a raw string. A raw string completely ignores all escape characters and prints any backslash that appears in the string. For example:

In [5]:
print(r'That is Carol\'s cat.')

That is Carol\'s cat.


Because this is a raw string, Python considers the backslash as part of the string and not as the start of an escape character. Raw strings are helpful if you are typing string values that contain many backslashes, such as the strings used for Windows file paths like `r'C:\Users\Adam\Desktop'`.

### Multiline Strings with Triple Quotes

While you can use the `\n` escape character to put a newline into a string, it is often easier to use multiline strings. A multiline string in Python begins and ends with either three single quotes or three double quotes. Any quotes, tabs, or newlines in between the “triple quotes” are considered part of the string. Python’s indentation rules for blocks do not apply to lines inside a multiline string

In [6]:
print('''Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob''')

Dear Alice,

Eve's cat has been arrested for catnapping, cat burglary, and extortion.

Sincerely,
Bob


Notice that the single quote character in Eve's does not need to be escaped. Escaping single and double quotes is optional in multiline strings. The following print() call would print identical text but doesn’t use a multiline string:

### Indexing and Slicing Strings

Strings use indexes and slices the same way lists do. You can think of the string 'Hello, world!' as a list and each character in the string as an item with a corresponding index.

| H | e | l | l | o | , |   | w | o | r | l | d | ! |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12|

The space and exclamation point are included in the character count, so 'Hello, world!' is 13 characters long, from H at index 0 to ! at index 12.

In [7]:
spam = "Hello, world!"

In [8]:
spam[0]

'H'

In [9]:
spam[4]

'o'

In [10]:
spam[-1]

'!'

In [11]:
spam[0:5]

'Hello'

In [12]:
spam[:5]

'Hello'

#### Question

What will be the output of the following code?

In [14]:
spam[:4]

'Hell'

If you specify an index, you’ll get the character at that position in the string. If you specify a range from one index to another, the starting index is included and the ending index is not. That’s why, if spam is 'Hello, world!', spam[0:5] is 'Hello'. The substring you get from spam[0:5] will include everything from spam[0] to spam[4], leaving out the comma at index 5 and the space at index 6. This is similar to how range(5) will cause a for loop to iterate up to, but not including, 5.

Note that slicing a string does not modify the original string. You can capture a slice from one variable in a separate variable

By slicing and storing the resulting substring in another variable, you can have both the whole string and the substring handy for quick, easy access.

In [16]:
spam = "Hello, world!"
fizz = spam[:5]
print(fizz)
print(spam)

Hello
Hello, world!


### The `in` and `not in` operators in strings

The `in` and `not in` operators can be used with strings just like with list values. An expression with two strings joined using `in` or `not in` will evaluate to a Boolean `True` or `False`

In [None]:
"Hello" in "Hello world"

In [None]:
"Hello" in "Hello"

In [None]:
"HELLO" in "Hello"

In [None]:
"" in "Hello"

In [None]:
"Hello" not in "Hello world"

The expressions above test whether the first string (the exact string, case-sensitive) can be found within the second string.

### String Interpolation

String concatenation using the `+` operator in python is tedious and we have to be cautious of manually adding spaces. A simpler approach is to use string interpolation, in which the %s operator inside the string acts as a marker to be replaced by values following the string. One benefit of string interpolation is that str() doesn’t have to be called to convert values to strings

In [17]:
name = "Adam"
employer = "General Motors"

print("My name is %s. I work for %s." % (name, employer))

My name is Adam. I work for General Motors


### `f-strings`

Python 3.6 introduced f-strings, which is similar to string interpolation except that braces are used instead of %s, with the expressions placed directly inside the braces. Like raw strings, f-strings have an f prefix before the starting quotation mark

In [18]:
name = "Adam"
team = "ASG"

print(f"My name is {name}. I am on the {team} team.")

My name is Adam. I am on the ASG team.


Remember to include the f prefix; otherwise, the braces and their contents will be a part of the string value:

In [19]:
print("My name is {name}. I am on the {team} team")

My name is {name}. I am on the {team} team


### The `upper()`, `lower()`, `isupper()`, and `islower()` methods

The `upper()` and `lower()` string methods return a new string where all the letters in the original string have been converted to uppercase or lowercase, respectively. Nonletter characters in the string remain unchanged

In [20]:
spam = "Hello"
spam = spam.upper()
print(spam)

HELLO


In [21]:
spam = "HELLO, WORLD"
spam = spam.lower()
print(spam)

hello, world


Note that these methods do not change the string itself but return new string values. If you want to change the original string, you have to call `upper()` or `lower()` on the string and then assign the new string to the variable where the original was stored. This is why you must use `spam = spam.upper()` to change the string in spam instead of simply `spam.upper()`

The `upper()` and `lower()` methods are helpful if you need to make a case-insensitive comparison. For example, the strings `'great'` and `'GREat'` are not equal to each other. But in the following small program, it does not matter whether the user types `Great`, `GREAT`, or `grEAT`, because the string is first converted to lowercase.

In [22]:
"great" == "GREAT"

False

In [None]:
feeling = input("How do you feel?")
if feeling == "Great":
    print("I feel great too.")

Converting the `feeling` variable to uppercase for comparison:

In [24]:
feeling = input("How do you feel?")
if feeling.upper() == "GREAT":
    print("I feel great too.")

I feel great too.


### String transformation methods

| Method         | Description                                                     |
|----------------|-----------------------------------------------------------------|
| `str.upper()`  | Converts all cased characters in the string to uppercase.       |
| `str.lower()`  | Converts all cased characters in the string to lowercase.       |
| `str.title()`  | Converts the first character of each word to uppercase and the remaining characters to lowercase. |
| `str.capitalize()` | Converts the first character to uppercase and the rest to lowercase. |
| `str.swapcase()` | Converts uppercase characters to lowercase and lowercase characters to uppercase. |
| `str.casefold()` | Similar to `str.lower()`, but more aggressive in converting text to lowercase. Used for caseless matching. |


In [25]:
text = 'groß'

print(text.lower())
print(text.casefold())

groß
gross


### Useful string check methods

| Method           | Description                                                                                   |
|------------------|-----------------------------------------------------------------------------------------------|
| `isalpha()`      | Returns `True` if the string consists only of letters and isn't blank                         |
| `isalnum()`      | Returns `True` if the string consists only of letters and numbers and isn't blank             |
| `isdecimal()`    | Returns `True` if the string consists only of numeric characters and isn't blank              |
| `isdigit()`      | Returns `True` if the string contains only digits and isn't blank                             |
| `isnumeric()`    | Returns `True` if the string contains only numeric characters and isn't blank                 |
| `isspace()`      | Returns `True` if the string consists only of spaces, tabs, and newlines and isn't blank      |
| `istitle()`      | Returns `True` if the string is in title case                                                 |
| `islower()`      | Returns `True` if all cased characters in the string are lowercase and there is at least one |
| `isupper()`      | Returns `True` if all cased characters in the string are uppercase and there is at least one |
| `startswith()`   | Returns `True` if the string starts with the specified prefix                                 |
| `endswith()`     | Returns `True` if the string ends with the specified suffix                                   |



### Justifying Text with the `rjust()`,`ljust()`, and `center()` Methods

The `rjust()` and `ljust()` string methods return a padded version of the string they are called on, with spaces inserted to justify the text. The first argument to both methods is an integer length for the justified string

In [26]:
'Hello'.rjust(10)

'     Hello'

`'Hello'.rjust(10)` says that we want to right-justify 'Hello' in a string of total length 10. 'Hello' is five characters, so five spaces will be added to its left, giving us a string of 10 characters with 'Hello' justified right.

An optional second argument to `rjust()` and `ljust()` will specify a fill character other than a space character. Enter the following into the interactive shell:

In [27]:
'Hello'.rjust(10, "*")

'*****Hello'

The `center()` string method works like `ljust()` and `rjust()` but centers the text rather than justifying it to the left or right.

In [30]:
'Hello'.center(20, "+")

'+++++++Hello++++++++'

### Removing Whitespace with the `strip()`, `rstrip()`, and `lstrip()` Methods

Sometimes you may want to strip off whitespace characters (space, tab, and newline) from the left side, right side, or both sides of a string. The `strip()` string method will return a new string without any whitespace characters at the beginning or end. The `lstrip()` and `rstrip()` methods will remove whitespace characters from the left and right ends, respectively.

In [34]:
spam = '    Hello  '
print(spam.strip())

Hello
