# Strings

## A string is a sequence

In Python, a string is considered a sequence of characters. It is one of the built-in data types and is used to represent textual data as a sequence of characters enclosed within single quotes `'` or double quotes `"`.

Strings in Python support indexing, slicing, iteration, and various sequence operations. This means you can access individual characters of a string using index positions, extract substrings using slicing, iterate over the characters using loops, and perform operations like concatenation and repetition.

Here are some examples of string operations that demonstrate their sequence-like behavior {cite:p}`downey2015think,PythonDocumentation`:

### Indexing

```{figure} Fig5.01.png
---
width: 600px
align: left
---
Visual representation of "Hello, World!".
```

In [1]:
text = "Hello, World!"
print(text[0])  # Output: 'H'
print(text[7])  # Output: 'W'

H
W


### Slicing

In [2]:
text = "Hello, World!"
print(text[0:5])  # Output: 'Hello'
print(text[7:])   # Output: 'World!'

Hello
World!


### Iteration using a loop

In [3]:
text = "Hello"
for char in text:
    print(char)

H
e
l
l
o


### Concatenation

In [4]:
str1 = "Hello"
str2 = "World"
result = str1 + ", " + str2
print(result)  # Output: 'Hello, World'

Hello, World


### Repetition

In [5]:
str1 = "Hello "
result = str1 * 3
print(result)  # Output: 'Hello Hello Hello '

Hello Hello Hello 


Because strings are sequences, they share many properties with other sequence types in Python, such as lists and tuples. This makes them versatile and useful for various text manipulation tasks in programming.

## len

In Python, the `len()` function is used to find the length of a string or any other sequence (e.g., list, tuple). The `len()` function returns the number of elements (characters in the case of a string) in the given sequence.

Here's how you can use the `len()` function to find the length of a string:

In [6]:
text = "Hello, World!"
length = len(text)
print(length)  # Output: 13 (length of the string 'Hello, World!')

13


In this example, the `len()` function is applied to the string variable `text`, and it returns the length of the string, which is 13 characters, including spaces and punctuation.

The `len()` function is a handy tool for performing various operations on strings, such as checking if a string is empty, iterating over characters, or controlling loops based on the length of a string. It's a simple and commonly used function in Python for working with sequences of data.

## Traversal with a for loop

In Python, you can traverse (iterate over) a sequence, such as a string, list, tuple, or dictionary, using a `for` loop. The `for` loop allows you to access each element of the sequence one by one and perform operations on them {cite:p}`downey2015think,PythonDocumentation`.

Let's look at how to use a `for` loop to traverse a string:

In [7]:
text = "Hello, World!"
for char in text:
    print(char)

H
e
l
l
o
,
 
W
o
r
l
d
!


In this example, the `for` loop iterates over each character in the string `text`. For each iteration, the variable `char` holds the current character of the string, and we print it on a separate line. The loop continues until all characters in the string have been processed.

You can also use a `for` loop to traverse other sequences, such as lists and tuples:

In [8]:
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    print(num)

1
2
3
4
5


Similarly, you can use a `for` loop to traverse the keys, values, or items of a dictionary:

In [9]:
student_scores = {"Alice": 85, "Bob": 92, "Charlie": 78}
for name, score in student_scores.items():
    print(f"{name}: {score}")

Alice: 85
Bob: 92
Charlie: 78


`````{admonition} Remark
:class: important

We will delve into the topic of dictionaries during our upcoming lectures.
`````


The `for` loop is a versatile construct that allows you to process each element in a sequence efficiently. It is a fundamental part of Python and is commonly used in various programming tasks to work with collections of data.


## String slices

In Python, string slicing allows you to extract a substring from a given string by specifying a range of indices. The general syntax for slicing a string is as follows:

```python
string[start_index:stop_index]
````

Here's what each part of the syntax means:
- `start_index`: The index of the first character of the substring (inclusive).

- `stop_index`: The index of the first character after the end of the substring (exclusive).

The result of slicing will be a new string containing the characters from the `start_index` up to, but not including, the `stop_index`.

Let's see some examples of string slicing:

In [10]:
text = "Hello, World!"

# Slice from index 0 to 5 (exclusive)
substring1 = text[0:5]
print(substring1)  # Output: "Hello"

# Slice from index 7 to the end of the string
substring2 = text[7:]
print(substring2)  # Output: "World!"

# Slice from index 2 to 8 (exclusive)
substring3 = text[2:8]
print(substring3)  # Output: "llo, W"

# Slice from the beginning to index 5 (exclusive)
substring4 = text[:5]
print(substring4)  # Output: "Hello"

# Slice the entire string (returns a copy of the original string)
substring5 = text[:]
print(substring5)  # Output: "Hello, World!"

# Negative indices can also be used for slicing (counting from the end of the string)
substring6 = text[-6:-1]
print(substring6)  # Output: "World"

Hello
World!
llo, W
Hello
Hello, World!
World


As you can see, the `start_index` and `stop_index` define the range of characters to include in the substring. If `start_index` is not specified, it defaults to 0, and if `stop_index` is not specified, it defaults to the end of the string.

String slicing is a useful feature for extracting specific parts of a string and working with substrings in Python. It allows you to manipulate strings and obtain the information you need from a larger string.

## Strings are immutable
In Python, strings are immutable objects. This means that once a string is created, its contents cannot be changed or modified. If you want to modify a string, you must create a new string with the desired changes. Let's see some examples to demonstrate the immutability of strings {cite:p}`downey2015think,PythonDocumentation`:

```python
text = "Hello, World!"

# Attempting to change a character at a specific index (this will raise an error)
text[0] = 'h'  # Raises "TypeError: 'str' object does not support item assignment"

# Slicing to create a new string with changes
modified_text = text[:6] + 'Python!'
print(modified_text)  # Output: "Hello, Python!"
```

In the first example, we attempt to change the first character of the string `text` from 'H' to 'h', but this results in a `TypeError` because strings do not support item assignment.

To modify the string, we can use string slicing to create a new string with the desired changes. In the second example, we slice the original string up to index 6 (exclusive) and then concatenate the new substring `'Python!'`. This creates a new string `"Hello, Python!"`.

The immutability of strings is an essential property in Python, as it ensures the integrity of strings and prevents unintended changes to their content. If you need to perform modifications on strings, you can use string methods and string concatenation to create new strings with the desired changes while keeping the original string unchanged.

## Searching

### Finding a character in a string
Here's a `find` function:


In [11]:
def find(word, letter, start_index=0):
    index = start_index
    while index < len(word):
        if word[index] == letter:
            return index
        index += 1
    return -1

``````{admonition} Questions
1. What would be the output of the following?
```python
find('Hello World', 'W')
```
2. What would be the output of the following?
```python
find('Hello World', 'w')
```

``````

Also,

In [12]:
text = "Hello, World!"

# Start the search from index 3
result1 = find(text, 'l', 3)
print(result1)  # Output: 3 (index of the first 'l' after index 3)

# Start the search from the beginning (default behavior)
result2 = find(text, 'o')
print(result2)  # Output: 4 (index of the first 'o' in the string)

# Search for a character that doesn't exist in the string
result3 = find(text, 'z')
print(result3)  # Output: -1 (character 'z' not found in the string)

3
4
-1


### Finding a character/word in a string
In Python, you can search for substrings within a string using various methods and operations. Here are some common approaches for searching in a string:

#### Using the `in` keyword:
The `in` keyword is used to check if a substring exists within a given string. It returns a Boolean value `True` if the substring is found and `False` otherwise.


In [13]:
text = "Hello, World!"

if "Hello" in text:
    print("Substring found.")
else:
    print("Substring not found.")

Substring found.


#### Using the `find()` method:

The `find()` method returns the index of the first occurrence of the substring within the string. If the substring is not found, it returns -1.

In [14]:
text = "Hello, World!"

index = text.find("World")
if index != -1:
    print("Substring found at index:", index)
else:
    print("Substring not found.")

Substring found at index: 7


#### Using the `index()` method:
Similar to `find()`, the `index()` method returns the index of the first occurrence of the substring within the string. However, if the substring is not found, it raises a `ValueError`.


In [15]:
text = "Hello, World!"

try:
    index = text.index("World")
    print("Substring found at index:", index)
except ValueError:
    print("Substring not found.")

Substring found at index: 7


#### Using regular expressions (with the `re` module):

For more advanced and flexible searching, you can use regular expressions with the `re` module.

In [16]:
import re

text = "Hello, World!"

matches = re.findall(r"\b\w{5}\b", text)
if matches:
    print("Substring found:", matches)
else:
    print("Substring not found.")

Substring found: ['Hello', 'World']


In this example, we use a regular expression to find all words that have exactly five characters. The `findall()` function returns a list of all matches found in the string.

These are some of the common ways to search for substrings within a string in Python. Depending on your specific needs, you can choose the appropriate method for your use case.

## String methods
In Python, strings are objects that have several built-in methods to perform various operations and manipulations on strings. These methods are used to transform, search, split, and perform other tasks on strings. Here are some commonly used string methods {cite:p}`downey2015think,PythonDocumentation`:

### `upper()`
Converts all characters in the string to uppercase.


In [17]:
text = "hello, world!"
print(text.upper())  # Output: "HELLO, WORLD!"

HELLO, WORLD!


### `lower()`
Converts all characters in the string to lowercase.

In [18]:
text = "Hello, World!"
print(text.lower())  # Output: "hello, world!"

hello, world!


### `capitalize()`
Capitalizes the first character of the string and makes the rest lowercase.

In [19]:
text = "hello, world!"
print(text.capitalize())  # Output: "Hello, world!"

Hello, world!


### `strip()`
Removes leading and trailing whitespace characters (spaces, tabs, newlines) from the string.

In [20]:
text = "   hello, world!   "
print(text.strip())  # Output: "hello, world!"

hello, world!


Please also check `rstrip()` and `lstrip`.

### `split()`
Splits the string into a list of substrings based on a given delimiter.

In [21]:
text = "apple,banana,orange"
fruits = text.split(",")
print(fruits)  # Output: ['apple', 'banana', 'orange']

['apple', 'banana', 'orange']


### `join()`
Joins a list of strings into a single string, using the calling string as the separator.

In [22]:
fruits = ['apple', 'banana', 'orange']
text = ",".join(fruits)
print(text)  # Output: "apple,banana,orange"

apple,banana,orange


### `replace()`
Replaces occurrences of a substring with another substring.

In [23]:
text = "Hello, World!"
modified_text = text.replace("Hello", "Hi")
print(modified_text)  # Output: "Hi, World!"

Hi, World!


### `find()`
Finds the index of the first occurrence of a substring in the string. Returns -1 if not found.

In [24]:
text = "Hello, World!"
index = text.find("World")
print(index)  # Output: 7

7


These are just a few examples of the many string methods available in Python. String methods are powerful tools for handling and manipulating text data, and they make it easier to work with strings in Python.

Here's a summarized version of the commands and their descriptions in a Markdown table:

| Command           | Description                                                                                                         |
|-------------------|---------------------------------------------------------------------------------------------------------------------|
| `upper()`         | Converts all characters in the string to uppercase.                                                                |
| `lower()`         | Converts all characters in the string to lowercase.                                                                |
| `capitalize()`    | Capitalizes the first character of the string and makes the rest lowercase.                                        |
| `strip()`         | Removes leading and trailing whitespace characters (spaces, tabs, newlines) from the string.                      |
| `split()`         | Splits the string into a list of substrings based on a given delimiter.                                            |
| `join()`          | Joins a list of strings into a single string, using the calling string as the separator.                          |
| `replace()`       | Replaces occurrences of a substring with another substring.                                                        |
| `find()`          | Finds the index of the first occurrence of a substring in the string. Returns -1 if not found.                    |

## The in operator
In Python, the `in` operator is used to check if a value exists within a sequence or a collection, such as strings, lists, tuples, and dictionaries. The `in` operator returns a Boolean value `True` if the value is found in the sequence and `False` if it is not found.
Here are some examples of using the `in` operator:

### Using `in` with a string

In [25]:
text = "Hello, World!"
print('o' in text)  # Output: True
print('z' in text)  # Output: False

True
False


### Using `in` with a list

In [26]:
fruits = ['apple', 'banana', 'orange']
print('banana' in fruits)  # Output: True
print('grapes' in fruits)   # Output: False

True
False


### Using `in` with a tuple

In [27]:
numbers = (1, 2, 3, 4, 5)
print(3 in numbers)  # Output: True
print(6 in numbers)  # Output: False

True
False


### Using `in` with a dictionary (checks for keys, not values):

In [28]:
student_scores = {"Alice": 85, "Bob": 92, "Charlie": 78}
print("Bob" in student_scores)   # Output: True
print("Eve" in student_scores)   # Output: False


True
False


The `in` operator is commonly used in conditional statements to check if a value exists in a sequence before performing certain actions. It is a handy and efficient way to determine the presence of an element without having to manually search for it using loops or methods like `find()` or `index()`.

Keep in mind that the behavior of the `in` operator may vary depending on the data type and the specific collection being used. For example, with dictionaries, the `in` operator checks for the presence of keys, not values.

## String comparison

In Python, you can compare strings using various comparison operators to
check if they are equal, not equal, greater than, or less than each
other. Here are the commonly used string comparison operators in Python {cite:p}`downey2015think,PythonDocumentation`:

1.  **Equality (==):** It checks if two strings have the same content.

2.  **Inequality (!=):** It checks if two strings have different
    content.

3.  **Greater than (\>):** It checks if one string comes after the other
    in lexicographic (dictionary) order.

4.  **Less than (\<):** It checks if one string comes before the other
    in lexicographic order.

5.  **Greater than or equal to (\>=):** It checks if one string comes
    after or is equal to the other in lexicographic order.

6.  **Less than or equal to (\<=):** It checks if one string comes
    before or is equal to the other in lexicographic order.

Here are some examples to illustrate these comparisons:

In [29]:
# Equality check
str1 = "hello"
str2 = "Hello"
print(str1 == str2)  # Output: False

# Inequality check
str3 = "world"
str4 = "world"
print(str3 != str4)  # Output: False

# Greater than and Less than check
str5 = "apple"
str6 = "banana"
print(str5 > str6)   # Output: False
print(str5 < str6)   # Output: True

# Greater than or equal to and Less than or equal to check
str7 = "python"
str8 = "java"
print(str7 >= str8)  # Output: True
print(str7 <= str8)  # Output: False

False
False
False
True
True
False


Note that string comparisons are case-sensitive. For case-insensitive comparisons, you can convert the strings to lowercase or uppercase before performing the comparison.

In [30]:
str1 = "hello"
str2 = "Hello"
print(str1.lower() == str2.lower())  # Output: True (case-insensitive comparison)

True


Also, keep in mind that Python uses the lexicographic order for comparing strings, which means it compares strings character by character based on their Unicode code points. So, "a" is considered less than "b," and "Z" is considered less than "a."