# Manipulating Strings

## Introduction

In this chapter you will learn how to manipulate strings, so you can work more flexibly. There are several possibilities to adjust strings, like removing parts of them, convert letters from uppercase into lowercase and much more.

This notebook covers some parts of [chapter 6](https://automatetheboringstuff.com/2e/chapter6/) of the book.

To get more information you can visit [Python Tutorial: Strings](https://docs.python.org/3/tutorial/introduction.html#strings) and [W3C Schools: Python Strings](https://www.w3schools.com/python/python_strings.asp).

## Summary

Strings are sequences of bytes, which represent characters, just like in other programming languages. You can access a single element of the string with square brackets. Fortunately Python has many helpful string methods to process text.

### Single and Double Quotes

In Python you can produce string literals in two ways by surrounding the whole text with single quotation marks (`'text'`) or double quotation marks (`"text"`).

A problem arises if you use single quote marks to surround the text and use them in the text itself again. 

```python
print('That's a lot of fun.')
```

As you can see, the row doesn't work because Python reads only until it reaches the second single quote mark as a string. So you will get just `That` as a string and an error because of the unreadable subsequent part `s a lot of fun.'`. The same happens if you want to use double quote marks themselves in your text.

```python
print("That's a lot of fun.")
print('That\'s a lot of fun.')
```

If you have a single quote in the text itself, you can solve the problem with two variants. First, you can use double quote marks as surrounding. Second, you can use an escape character before the desired single quote.

```python
print("He asked, "What?", and left.")
print("He asked, \"What?\", and left.")
```

In case you want to integrate double quote marks in the text itself, you have to use an escape character before each mark.

### Escape Characters

With the help of an escape character, Python is able to write illegal characters into the string. Therefore you can use them to structure your text too.

| Escape character | Print as             |
| :--------------- | :------------------- |
| \\'              | Single quote         |
| \\"              | Double quote         |
| \t               | Tab                  |
| \n               | Newline (line break) |
| \\\              | Backslash             |

### Raw Strings

Let's say you want to print out a whole file path as a string. An easier and faster way to do this is to use raw strings. You only need to insert an `r` before the actual text.

```python
print(r'C:\Users\Bob\Desktop')
```

### Multiline Strings

It would be really time-consuming if you had to use `\n` several times to write a longer text as a string. To simplify this, you can use multiline strings with triple quotes (using either single quotes or double quotes). And another advantage in the example below is that you won't need to use an escape character after the word `haven`.

```python
# easier & faster to write
print('''Dear Bob, 

You still haven't answered my question.

Sincerely,
Ross''')

# other possiblity, but takes a while to write it
print('Dear Bob,\n
\n
You still haven\'t answered my question.\n
\n
Sincerely,\n
Ross')
```

### Indexing and Slicing Strings

Just like lists, you can work with strings by using indices and slices.

```python
text = 'Hello, world!'
text[0]   # output = 'H'
text[0:5] # output = 'Hello' 
text[-1]  # output = '!' 
text[:5]  # output = 'Hello' 
text[7:]  # output = 'world!' 
```

### Formatting Strings

Strings can contain placeholders using the `{}` brackets. These are replaced with the provided values when calling `format` or when using f-Strings:

```python
greeting = 'Hello'
person = 'Peter'
text = '{}, {}!'.format(greeting, person)
text = f'{greeting}, {person}!'
```

These placeholders can optionally contain instructions on how to fill in the provide values:

```python
value = 1.2345
print(f'{value:.3}')  # only print 3 digits: 1.23
```

A good overview of all possibilities, including instructions, is provided on [pyformat.info](https://pyformat.info/).

### String Methods

The full list of available string methods is [documented in the Python documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).

## Exercises

### Exercise 1: Defining Strings
Rewrite the following text using only one line. Use single quotes. Make sure your version equals the given one.

In [135]:
text = r"""
I'm a text containing some "other text" and some \n!
"""

# todo: rewrite

### Exercise 2: String Functions

#### a) Cleaning up
Clean up the following text by removing all whitespace characters. How many characters are removed from the left and right ends of the string?

In [None]:
text = "   \tThis is a messy text\n\t"
# todo: cleanup
# todo: count cleaned up characters

Oftentimes files contain different line endings: `\r\n` (Windows) or `\n` (Mac, Linux).
- Replace all `\r\n` endings with `\n` in the following text.
- Replace all `\n` endings with `\r\n` in the following text.

In [None]:
text = " \nThis text has different\r\nline\nendings"
# todo: replace endings

#### b) Splitting and Joining.
Split the following text
- into words
- into texts between tabs: 'This', 'is a text', 'with tabs'

Note that you may need to do some cleaning up by removing whitespaces.

In [None]:
text = "This \tis a text\t with tabs."
# todo: split

Split the following sentence into words and join it back together.

In [None]:
text = "This is just a text."
# todo: split and join

There is also a splitting function which handles the different line endings for you. Use it to manipulate the following string so that:
- Uppercase words are lowercase afterwards
- Lowercase words are capitalized
- Capitalized words are uppercase

`My HOUSE is Burning` becomes `MY house Is BURNING`.

Hint: Use `enumerate` to modify list items inplace.

In [None]:
text = "My HOUSE is Burning"
# todo: manipulate and print

### Exercise 3: Indexing and Slicing
Here's the first few sentences of Oscar Wilde's 'The Selfish Giant':

> Every afternoon, as they were coming from school, the children used to go and play in the Giant's garden.
>
> It was a large lovely garden, with soft green grass. Here and there over the grass stood beautiful flowers like stars, and there were twelve peach-trees that in the spring-time broke out into delicate blossoms of pink and pearl, and in the autumn bore rich fruit. The birds sat on the trees and sang so sweetly that the children used to stop their games in order to listen to them. 'How happy we are here!' they cried to each other.

In [None]:
text = """
Every afternoon, as they were coming from school, the children used to go and play in the Giant's garden.

It was a large lovely garden, with soft green grass. Here and there over the grass stood beautiful flowers like stars, and there were twelve peach-trees that in the spring-time broke out into delicate blossoms of pink and pearl, and in the autumn bore rich fruit. The birds sat on the trees and sang so sweetly that the children used to stop their games in order to listen to them. 'How happy we are here!' they cried to each other.
"""

Print the first and the last letter (E, r) of the text.

In [None]:
# todo: first and last character

Now print the first and last word (Every, other) of the text.

In [None]:
# todo: first and last word

Finally, take the second paragraph of the text: Of each sentence, print the second to last character of the last two words each.

Example: "It was a large lovely garden, with soft gre**e**n gra**s**s."

Your cell should output the highlighted characters: ... gre**e**n gra**s**s ... ri**c**h fru**i**t ... **t**o th**e**m ... ea**c**h oth**e**r

In [None]:
# todo: use slicing and for loops

### Exercise 4: String Formatting

Print `Bob Solo from New york is 41 years old.` by using the following variables.

In [None]:
age = 41.25
first_name = "bob "
last_name = " SOLO"
location = "   New York"

# todo: print

Print the content of the following dictionary like this:
```
A: 101.0
B: 120.3
C: 130.2
```

In [None]:
values = {
    "a": 101,
    "b": 120.3,
    "c": 130.223,
}

# todo: print values

### Exercise 5: Counting Things

Create a dictionary containing the number of lines, commas, periods, words, unique words and whitespace characters of the following text.

The cell should return:

```python
{'lines': 4, 'commas': 4, 'periods': 3, 'words': 50, 'unique words': 41, 'whitespace characters': 54}
```

In [None]:
text = """ Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat,
sed diam voluptua. At vero eos et \taccusam et justo duo dolores et ea rebum.
Stet clita kasd gubergren, no sea takimata sanctus est Lorem   ipsum dolor sit amet.
"""

# todo: count