# Strings
*PY4E Textbook*

### A string is a sequence

A string is a sequence of characters. You can access the characters one at a time with the bracket operator:

The second statement extracts the character at index position 1 from the `fruit` variable and assigns it to the `letter` variable.

The expression in brackets is called an *index*. The index indicates which character in the sequence you want (hence the name).

In [2]:
fruit = 'banana123' #string
letter = fruit[1] #index starts with zero, thats why a (the second letter) is one
letter

'a'

We can index a string value in a variable by using square brackets. Notice the key components: On the left is the name of the variable or the string literal value. Next, we have an opening square bracket. Then, we have a number, which is called the index. Finally, we end with a closing square bracket.

For most people, the first letter of “banana” is “b”, not “a”. But in Python, the index is an offset from the beginning of the string, and the offset of the first letter is zero.

In [None]:
letter = fruit[0]
letter

'b'

So “b” is the 0th letter (“zero-th”) of “banana”, “a” is the 1th letter (“one-th”), and “n” is the 2th (“two-th”) letter.indiex.svg

You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. Otherwise you get:

In [None]:
letter = fruit[1.5]
letter

TypeError: string indices must be integers

### Getting the length of a string using len

`len` is a built-in function that returns the number of characters in a string:

In [4]:
fruit = 'banana'
len(fruit)

6

The expression fruit[-1] yields the last letter, fruit[-2] yields the second to last, and so on.

In [6]:
print(fruit[-1]) #going backwards
print(fruit[-2]) #also going backwards

a
n


### Traversal through a string with a loop

A lot of computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a *traversal*. One way to write a traversal is with a `while` loop:

In [8]:
index = 0
while index < len(fruit):
    letter = fruit[index]
    print(letter)
    index = index + 1 #increment

b
a
n
a
n
a


This loop traverses the string and displays each letter on a line by itself. The loop condition is `index < len(fruit)`, so when `index` is equal to the length of the string, the condition is false, and the body of the loop is not executed. The last character accessed is the one with the index `len(fruit)-1`, which is the last character in the string.

### String slices
A segment of a string is called a `slice`. Selecting a slice is similar to selecting a character:

In [None]:
s = 'Monty Python'
print(s[0:5])
print(s[6:12])

Monty
Python


The operator [n:m] returns the part of the string from the “n-th” character to the “m-th” character, including the first but excluding the last.

If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the second index, the slice goes to the end of the string:

In [None]:
fruit = 'banana'
print(fruit[:3])
print(fruit[3:])

ban
ana


If the first index is greater than or equal to the second the result is an empty string, represented by two quotation marks:

In [None]:
fruit = 'banana'
fruit[3:3]


''

### Strings are immutable

It is tempting to use the operator on the left side of an assignment, with the intention of changing a character in a string. For example:

In [None]:
greeting = 'Hello, world!'
greeting[0] = 'J'

TypeError: 'str' object does not support item assignment

The reason for the error is that strings are immutable, which means you can’t change an existing string. The best you can do is create a new string that is a variation on the original:

In [None]:
greeting = 'Hello, world!'
new_greeting = 'J' + greeting[1:]
print(new_greeting)

Jello, world!


This example concatenates a new first letter onto a slice of `greeting`. It has no effect on the original string.

### Looping and counting

The following program counts the number of times the letter “a” appears in a string:

In [9]:
word = 'bananaaaaa'
count = 0
for letter in word:
   # For each iteration:
    # 1. The variable 'letter' takes the value of the next character in 'banana'
    #    Example sequence: 'b' → 'a' → 'n' → 'a' → 'n' → 'a'
    if letter == 'a':
     # 2. Check if the current 'letter' is 'a'
        # 3. If true, increment the counter by 1
        count = count + 1
print(count)

7


This program demonstrates another pattern of computation called a *counter*. The variable `count` is initialized to 0 and then incremented each time an “a” is found. When the loop exits, `count` contains the result: the total number of a’s

In [10]:
# Using a while Loop
word = 'bananaaaaaaaa'
count = 0
index = 0  # Start at the first character

# Loop through each character using a while loop
while index < len(word):  # Continue until the end of the string
    if word[index] == 'a':  # Check if the current character is 'a'
        count += 1  # Increment the counter
    index += 1  # Move to the next character

print(count)  # Output: 3

10


you can use a `while` loop, but it’s less efficient and more verbose for this specific task.

The `for` loop is the preferred approach for iterating over sequences like strings.

### The `in` operator

The word `in` is a boolean operator that takes two strings and returns `True` if the first appears as a substring in the second:

In [13]:
'a' in 'banana'


True

In [None]:
'seed' in 'banana'

False

### String comparison

The comparison operators work on strings.

In [None]:
# Get user input
word = input("Enter a word: ")

# Compare the word to 'banana'
if word == 'banana':
    print('All right, bananas.')
elif word < 'banana':
    print(f'Your word, "{word}", comes before "banana".')
else:
    print(f'Your word, "{word}", comes after "banana".')


Enter a word: Pineapple
Your word, "Pineapple", comes before "banana".


In [None]:
word = input("Enter a word: ")

if word.lower() == 'banana':
    print('All right, bananas.')
elif word.lower() < 'banana':
    print(f'Your word, "{word}", comes before "banana".')
else:
    print(f'Your word, "{word}", comes after "banana".')


Enter a word: Pineapple
Your word, "Pineapple", comes after "banana".


Explanation of the Comparison
- Lexicographical Comparison: In Python, strings are compared alphabetically based on Unicode values of their characters. If you have `word < 'banana'`, it checks whether word appears before `"banana"` in alphabetical order.
- Case Sensitivity: This comparison is case-sensitive. For example, `"apple" < "banana"` is `True`, but `"Apple" < "banana"` is `True` because uppercase letters have lower Unicode values than lowercase letters. You can use` word.lower()` if you want to make the comparison case-insensitive.

### String methods
Strings are an example of Python *objects*. An object contains both data (the actual string itself) and methods, which are effectively functions that are built into the object and are available to any instance of the object.

Python has a function called `dir` which lists the methods available for an object. The `type` function shows the type of an object and the `dir` function shows the available methods.

In [None]:
stuff = 'Hello world'
type(stuff)

str

In [None]:
dir(stuff)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


In [None]:
help(str.capitalize)

Help on method_descriptor:

capitalize(self, /)
    Return a capitalized version of the string.
    
    More specifically, make the first character have upper case and the rest lower
    case.



In [None]:
help(str.strip)

Help on method_descriptor:

strip(self, chars=None, /)
    Return a copy of the string with leading and trailing whitespace removed.
    
    If chars is given and not None, remove characters in chars instead.



In [None]:
help(str.translate)

Help on method_descriptor:

translate(self, table, /)
    Replace each character in the string using the given translation table.
    
      table
        Translation table, which must be a mapping of Unicode ordinals to
        Unicode ordinals, strings, or None.
    
    The table must implement lookup/indexing via __getitem__, for instance a
    dictionary or list.  If this operation raises LookupError, the character is
    left untouched.  Characters mapped to None are deleted.



While the `dir` function lists the methods, and you can use `help` to get some simple documentation on a method, a better source of documentation for string methods would be

https://docs.python.org/library/stdtypes.html#string-methods.

Calling a *method* is similar to calling a function (it takes arguments and returns a value) but the syntax is different. We call a method by appending the method name to the variable name using the period as a delimiter.

For example, the method `upper` takes a string and returns a new string with all uppercase letters:


Instead of the function syntax `upper(word)`, it uses the method syntax `word.upper()`.

In [None]:
word = 'banana'
new_word = word.upper()
print(new_word)

BANANA


This form of dot notation specifies the name of the method, `upper`, and the name of the string to apply the method to, `word.` The empty parentheses indicate that this method takes no argument.

A method call is called an *invocation*; in this case, we would say that we are invoking `upper` on the `word`.

For example, there is a string method named `find` that searches for the position of one string within another:

In [None]:
word = 'banana'
index = word.find('a')
print(index)

1


Explaination:

- This code assigns the string `'banana'` to the variable word.
- The `find('a')` method searches for the first occurrence of the substring `'a'` in `word`.
- In `'banana'`, the first `'a'` appears at index `1` (remember, indexing in Python starts at `0`).
- The output is `1`.

In this example, we invoke `find` on `word` and pass the letter we are looking for as a parameter.

The `find` method can find substrings as well as characters:

In [None]:
word.find('na')

2

Explaination:

- `find('na')` looks for the substring `'na'` in `'banana'`.
- The first occurrence of `'na'` in `'banana'` starts at index `2`.
- So, the output of this would be `2`.

It can take as a second argument the index where it should start:

In [None]:
word.find('na', 3)

4

Explaination:

- This is a more advanced use of `find()`. Here, `word.find('na', 3)` is searching for the substring `'na'` starting from index `3` onwards.
- Starting the search from index `3` means that it will skip the initial `'na'` at index `2` and search for any occurrences from index `3` to the end of the string.
- In `'banana'`, the second `'na'` starts at index `4`.
- So, the output would be `4`.

One common task is to remove white space (spaces, tabs, or newlines) from the beginning and end of a string using the `strip` method:

In [None]:
line = '  Here we go  '
line.strip()

'Here we go'

Some methods such as startswith return boolean values.

In [None]:
line = 'Have a nice day'
line.startswith('Have')

True

In [15]:
line.startswith('h')

NameError: name 'line' is not defined

You will note that `startswith` requires case to match, so sometimes we take a line and map it all to lowercase before we do any checking using the `lower` method.

In [None]:
line = 'Have a nice day'
line.startswith('h')

False

In [None]:
line.lower()
print(line.lower())
line.lower().startswith('h')

have a nice day


True

### Parsing strings
Often, we want to look into a string and find a substring. For example if we were presented a series of lines formatted as follows:

`From stephen.marquard@ augusta.edu Sat Jan  5 09:14:16 2024`

and we wanted to pull out only the second half of the address (i.e., `augusta.edu`) from each line, we can do this by using the `find` method and string slicing.

First, we will find the position of the at-sign in the string. Then we will find the position of the first space *after* the at-sign. And then we will use string slicing to extract the portion of the string which we are looking for.

In [None]:
# Step 1: Find the Position of the @ Symbol
data = 'From stephen.marquard@augusta.edu Sat Jan  5 09:14:16 2024'
atpos = data.find('@')
print(atpos)

21


In [None]:
# Step 2: Find the Position of the First Space After the @ Symbol
sppos = data.find(' ',atpos)
print(sppos)

33


In [None]:
# Step 3: Use String Slicing to Extract the Domain
host = data[atpos+1:sppos]
print(host)

augusta.edu


Explaination:

- `data[atpos+1 : sppos]` uses string slicing to extract the characters starting from one position after `@` (i.e., `atpos + 1`) up to, but not including, the position of the first space after `@` (i.e., `sppos`).
- This slice effectively captures the substring between `@` and the next space.
- In this case, the slice `data[22:33]` extracts the substring 'augusta.edu', which is the domain part of the email address.
- The variable `host` is assigned this value.

We use a version of the `find` method which allows us to specify a position in the string where we want `find` to start looking. When we slice, we extract the characters from “one beyond the at-sign through up to but not including the space character”.

The documentation for the `find` method is available at

https://docs.python.org/library/stdtypes.html#string-methods.

### Formatted String Literals
A formatted string literal (often referred to simply as an f-string) allows Python expressions to be used within string literals. This is accomplished by prepending an `f` to the string literal and enclosing expressions in curly braces `{}`.

For example, wrapping a variable name in curly braces inside an f-string will cause it to be replaced by its value:

In [None]:
camels = 42
f'I have spotted {camels} camels.'

'I have spotted 42 camels.'

Several expressions can be included within a single string literal in order to create more complex strings.

In [None]:
years = 3
count = .1
species = 'camels'
f'In {years} years I have spotted {count} {species}.'

'In 3 years I have spotted 0.1 camels.'

Formatted string literals are powerful, and they can do even more than is covered here. You can read more about them at

https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals.