# Strings in Python

In previous weeks, we have certainly encountered strings in Python, for example:

```python
name = "John"
print(name)
```

```plaintext {.output}
John
```

And we have also learned that operations between strings and numbers are mostly not allowed:

```python
5 + '1'
```

```python {.error}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 5 + '1'

  TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

To convert from another data type to a string, we can use the `str()` function:

```python
str(5), str(5.0), str(True)
```

```plaintext {.output}
('5', '5.0', 'True')
```


# String Operations

In Python, **string operations** allow you to m**anipulate and transform text data**. Just like how arithmetic operations are used with numbers and list operations with lists, string operations enable you to concatenate, slice, find substrings, and more.

## Concatenation
Concatenation is the process of joining two or more strings together to form a new string.

In Python, you can concatenate strings using the `+` operator:

```python
greeting = "Hello"
name = "John"
message = greeting + ", " + name
print(message)
```

```plaintext {.output}
Hello, John
```

## Comparison
Strings can be compared using comparison operators. Python compares strings alphabetically:

```python
print('abc' < 'abd')  # True
print('abc' == 'abc')  # True
```

## Sequence operations

Many operations that work on lists also work on strings. Use each of the tabs below to see some examples of these operations.

::::: tabs

- [Indexing](#indexing)
- [Slicing](#slicing)
- [Length](#length)
- [Membership](#membership)
- [Repetition](#repetition)
- [Iteration](#iteration)

:::: tab-content | indexing

Similar to lists, you can access individual characters in a string using **indexing** via square brackets `[]`. 

```python
text = "Hello World"
print(text[0])  # H
print(text[-1]) # d
```

```plaintext {.output}
H
d
```

::::

:::: tab-content | slicing

You can also extract a substring from a string using **slicing**. Slicing is done using the `start:stop:step` syntax.

```python
text = "Good Morning"
print(text[0:4]) # Good
print(text[5:])  # Morning
```

```plaintext {.output}
Good
Morning
```

::: note | The `stop` index is exclusive

Similar to lists, the `stop` index is exclusive, meaning the character at the `stop` index is not included in the slice. For example, `text[0:3]` will return the characters from index 0 to 2 (`G`, `o`, `o`), not including the character at index 3 (`d`).

:::

::::

:::: tab-content | length

You can find the length of a string using the `len()` function:

```python
text = "Hello World"
print(len(text))  # 11
```

```plaintext {.output}
11
```

::::

:::: tab-content | membership

You can check if a substring is present in a string using the `in` operator:

```python
text = "Hello World"
print("Hello" in text)    # True
print("Goodbye" in text)  # False
```

```plaintext {.output}
True
False
```

::::

:::: tab-content | repetition

You can repeat a string using the `*` operator:

```python
text = "ha"
text * 3  # hahaha
```

```plaintext {.output}
'hahaha'
```

::::

:::: tab-content | iteration

You can iterate over each character in a string using a `for` loop:

```python
text = "Hello"
for char in text:
    print(char)
```

```plaintext {.output}
H
e
l
l
o
```

::::

:::::

# String Methods
Besides the sequence operations above that work on strings and other sequences like lists, strings have their own set of methods that can be used for text processing.

All these methods are called using the `.` operator after the string variable, and return a new value without modifying the original string. For example:

```python
unit = 'EGD103'
unit.lower()
```

```plaintext {.output}
'egd103'
```

Note that the original string `unit` is not modified:

```python
unit
```

```plaintext {.output}
'EGD103'
```

Use each of the tabs below to see some examples of string methods.

::::: tabs {.vertical} |

- [`.lower()`](#lower)
- [`.upper()`](#upper)
- [`.split()`](#split)
- [`.join()`](#join)
- [`.find()`](#find)
- [`.replace()`](#replace)
- [`.startswith()`](#startswith)
- [`.endswith()`](#endswith)


:::: tab-content | lower

The `lower()` method converts all characters in a string to lowercase:

```python
text = "Hello World"
text.lower()
```

```plaintext {.output}
'hello world'
```

::::

:::: tab-content | upper

The `upper()` method converts all characters in a string to uppercase:

```python
text = "Hello World"
text.upper()
```

```plaintext {.output}
'HELLO WORLD'
```

::::

:::: tab-content | split

The `split()` method splits a string into a list of substrings based on a delimiter:

```python
text = "Welcome to EGD103 - Computing and Data for Engineers"
text.split("-")
```

```plaintext {.output}
['Welcome to EGD103 ', ' Computing and Data for Engineers']
```

Without any arguments, `split()` splits the string based on whitespace:

```python
parts = text.split("-")
parts[1].split()
```

```plaintext {.output}
['Computing', 'and', 'Data', 'for', 'Engineers']
```

::::

:::: tab-content | join

The `join()` method joins the elements of a list into a single string:

```python
parts = ['Computing', 'and', 'Data', 'for', 'Engineers']
" ".join(parts)
```

```plaintext {.output}
'Computing and Data for Engineers'
```

Here, `" "` is the **delimiter** that is inserted between each element of the list:

```python
"-".join(parts)
```

```plaintext {.output}
'Computing-and-Data-for-Engineers'
```

::::

:::: tab-content | find

The `find()` method returns the index of the first occurrence of a substring in a string:

```python
text = "Hello World"
text.find("World")
```

```plaintext {.output}
6
```

If the substring is not found, `find()` returns `-1`:

```python
text.find("Goodbye")
```

```plaintext {.output}
-1
```

::::

:::: tab-content | replace

The `replace()` method replaces all occurrences of a substring with another substring:

```python
text = "Hello World"
text.replace("World", "EGD103")
```

```plaintext {.output}
'Hello EGD103'
```

::::

:::: tab-content | startswith

The `startswith()` method returns `True` if a string starts with a specified substring:

```python
text = "EGD103"
text.startswith("EGD")
```

```plaintext {.output}
True
```

```python
text.startswith("ITD")
```


```plaintext {.output}
False
```

::::

:::: tab-content | endswith

The `endswith()` method returns `True` if a string ends with a specified substring:

```python
text = "EGD103"
text.endswith("103")
```

```plaintext {.output}
True
```

```python
text.endswith("104")
```

```plaintext {.output}
False
```

::::

:::::

Since many of the methods above return a new string, you can chain multiple methods together:

```python
text = "EGD103 - Computing and Data for Engineers"
"-".join(text.replace("-", "").split()).lower()
```

```plaintext {.output}
'egd103-computing-and-data-for-engineers'
```

Of course, the above code can be broken down into multiple lines for readability:

```python
text = "EGD103 - Computing and Data for Engineers"
unhyphen_text = text.replace("-", "")
words = unhyphen_text.split()
text = "-".join(words)
text = text.lower()
text
```

```plaintext {.output}
'egd103-computing-and-data-for-engineers'
```

# Escape Characters
To insert characters that are illegal in a string, use an escape character. An escape character is a backslash `\` followed by the character you want to insert:

```python
txt = "We are the so-called \"Vikings\" from the north."
```

Besides the double quote (or single quote), there are other escape characters that can be used in strings:

| Escape Character | Description |
|------------------|-------------|
| `\\` | Backslash |
| `\n` | New Line |
| `\t` | Tab |

For example:

```python
print('Life\'s like a box of chocolates.\nYou never know what you\'re gonna get.\n\t\t\t- Forrest Gump')
```

```plaintext {.output}
Life's like a box of chocolates.
You never know what you're gonna get.
                        - Forrest Gump
```


# Multi-line Strings
You can create a multi-line string using triple quotes (`'''` or `"""`):

```python
quote = '''Life's like a box of chocolates.
You never know what you're gonna get.
                        - Forrest Gump'''
print(quote)
```

```plaintext {.output}
Life's like a box of chocolates.
You never know what you're gonna get.
                        - Forrest Gump
```

Triple quotes can also be useful for creating comments without having to put a `#` on each line:

```python
'''
Created in Wed Jun 28 10:02:19 2024.

@author: rovere
'''
print("Hello World")
```

```plaintext {.output}
Hello World
```

# String Formatting

Often strings are used to display information to users, like the result of a calculation. While it is possible to concatenate strings and variables together like so:

```python
import math
radius = 3
area = math.pi * radius ** 2
print("The area of a circle with radius " + str(radius) + " is " + str(area) + ".")
```

```plaintext {.output}
The area of a circle with radius 3 is 28.274333882308138.
```

This can be cumbersome to write and read. Instead, we can use a **formatted string** (or `f-string`) to insert variables directly into a string:

```python
print(f"The area of a circle with radius {radius} is {area}.")
```

```plaintext {.output}
The area of a circle with radius 3 is 28.274333882308138.
```

The `f` before the string indicates that it is a formatted string. The curly braces `{}` are used to insert variables, or even expressions, into the string.

```python
def circle_circumference(radius):
    return 2 * math.pi * radius

def circle_area(radius):
    return math.pi * radius ** 2

radius = 3
print(f"The area of a circle with radius {radius} is {circle_area(radius)} and its circumference is {circle_circumference(radius)}.")
```

```plaintext {.output}
The area of a circle with radius 3 is 28.274333882308138 and its circumference is 18.84955592153876.
```

::: note | Format Specifiers

In the examples above, the `area` and `circumference` are printed with their full precision (with many decimal places). If you want to limit the number of decimal places, you can use **format specifiers**, which are placed after the colon `:` in the curly braces `{}`:

```python
print(f"The area of a circle with radius {radius} is {area:.2f}.")
```

```plaintext {.output}
The area of a circle with radius 3 is 28.27.
```

It is possible to allocate a certain number of characters for the output, padding with spaces if necessary:

```python
# Allocate 10 characters for the radius and 30 characters for the area
print(f"The area of a circle with radius {radius:10} is {area:30}.")
```

```plaintext {.output}
The area of a circle with radius          3 is             28.274333882308138.
```

Note that in the above example, the variables `radius` and `area` are right-aligned by default. To left-align them, use the `<` character:

```python
print(f"The area of a circle with radius {radius:<10} is {area:<30}.")
```

```plaintext {.output}
The area of a circle with radius 3          is 28.274333882308138             .
```

It is also possible to combine multiple format specifiers:

```python
# Allocate 10 characters for the radius and 30 characters for the area, with 2 decimal places
print(f"The area of a circle with radius {radius:<10.2f} is {area:<30.2f}.")
```

```plaintext {.output}
The area of a circle with radius 3.00       is 28.27                         .
```

:::

# User Input

While we have been using the `print` function to display information to the user, in most applications, we would also like to receive input from the user.

You can **prompt for user** input with the `input()` function:

```python
name = input("Enter your name:")
print("Hello, " + name)
```

```plaintext {.output}
Enter your name: Dan
Hello, Dan
```

::: note {.warning} | User Input is a String

The `input()` function always returns a string, even if the user enters a number. As a result, attempting to perform mathematical operations on the input will **result in an error**:

```python
number = input("Enter a number to square:")
square = number ** 2
print(f"The square of {number} is {square}.")
```

```plaintext {.output}
Enter a number to square: 4.2
```

```python {.error}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[16], line 2
      1 number = input("Enter a number to square:")
----> 2 square = number ** 2
      3 print(f"The square of {number} is {square}.")

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'
```

Instead, you may need to convert the input to the appropriate data type:

```python
number = input("Enter a number to square:")
number = float(number)
square = number ** 2
print(f"The square of {number} is {square}.")
```

```plaintext {.output}
Enter a number to square: 4.2
The square of 4.2 is 17.64.
```

:::