# Chapter 8 - Strings

A string is a <b>sequence</b>, which means it is an ordered collection of other values.

***

## A string is a sequence

Accessing characters via the <b>index</b>.

In [1]:
fruit = 'banana'
letter = fruit[1]

letter

'a'

Index starts at $0$

In [2]:
letter = fruit[0]

letter

'b'

Instead of numbers we can also use variables and operators as an expression

In [3]:
i = 1
fruit[i]

'a'

In [4]:
fruit[i + 1]

'n'

The value of the expression has to be an integer.

In [5]:
letter = fruit[1.5]

TypeError: string indices must be integers

## `len`

Built-in function that returns the number of letters in a string

In [6]:
fruit = 'banana'
len(fruit)

6

Getting the last character of a string:

In [7]:
length = len(fruit)

### Wrong

In [8]:
last = fruit[length]

IndexError: string index out of range

### Right

In [9]:
last = fruit[length - 1]

last

'a'

***

## Traversal with a `for` loop

Often you want to access one letter at a time and perform some action on each letter. This pattern is called <b>traversal</b>. It can be done in two different ways. The first way is the traditional `while` loop:

In [10]:
index = 0
while index < len(fruit):
    letter = fruit[index]
    print(letter)
    index += 1

b
a
n
a
n
a


This approach can be done in any programming language. Unique to Python (<i>as far as I know</i>) is this approach:

In [11]:
for letter in fruit:
    print(letter)

b
a
n
a
n
a


Using the `for` loop with concatenation (string addition)

In [12]:
prefixes = 'JKLMNOPQ'
suffix = 'ack'

for letter in prefixes:
    print(letter + suffix)

Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack


Fixed version:

In [13]:
prefixes = 'JKLMNOPQ'
suffix = 'ack'

for letter in prefixes:
    if letter == 'O' or letter == 'Q':
        print(letter + 'u' + suffix)
    else:
        print(letter + suffix)

Jack
Kack
Lack
Mack
Nack
Ouack
Pack
Quack


***

## String slices

A segment of a string is called a slice. Selecting a slice is similar to selecting a character:

In [14]:
s = 'Monty Python'

In [15]:
s[0:5]

'Monty'

In [16]:
s[6:12]

'Python'

The operator `[n:m]` returns the string from index `n` to index `m`, including `n` but excluding `m`. It might help to imagine, that the characters between the lines as shown in the image below are returned.<br>
<br>
<img src="https://i.imgur.com/9Ujmme3.png" alt="String Slices">

Omitting the first index will return the slices from the beginning until `m`

In [17]:
fruit = 'banana'
fruit[:3]

'ban'

Omitting the second index will return the slices from `n` until the end of the string

In [18]:
fruit = 'banana'
fruit[3:]

'ana'

If the first index is greater than or equal to the second index the result is an <b>empty string</b>.

In [19]:
fruit = 'banana'
fruit[3:3]

''

In [20]:
type(fruit[3:3])

str

In [21]:
len(fruit[3:3])

0

Omitting both the first and the last index will return the full string

In [22]:
fruit[:]

'banana'

***

## Strings are immutable

In [23]:
greeting = 'Hello, world!'
greeting[0] = 'J'

TypeError: 'str' object does not support item assignment

You cannot change existing strings because they are immutable. The best you can do is to create a new string.

In [24]:
greeting = 'Hello world!'
new_greeting = 'J' + greeting[1:]

new_greeting

'Jello world!'

***

## Searching

In [25]:
def find(word, letter):
    index = 0
    while index < len(word):
        if word[index] == letter:
            return index
        index = index + 1
    return -1

The function `find` is like the inverse of the bracket operator `[]`. It takes a letter and a string and will return the first `index` where it finds the character. If the letter cannot be found in the string the function will return `-1`.<br>
<br>
This pattern of computation is called a <b>search</b>.

***

## Looping and counting

The following code counts the number of times the letter <i>a</i> apears in a string.

In [26]:
word = 'banana'
count = 0
for letter in word:
    if letter == 'a':
        count = count + 1
print(count)

3


The variable `count` is initialized to be `0` and then incremented each time an <i>a</i> is found. When the loop exits, `count` contains the result - the total numbers of <i>a</i>'s.
<br>
This pattern of computation is called a <b>counter</b>.

***

## String methods

Strings provide <b>methods</b> that perform a variety of useful operations. A method is similar to a function, but the syntax is different. See for example the `upper` method

In [27]:
word = 'banana'
new_word = word.upper()

new_word

'BANANA'

This form of <b>dot notation</b> specifies the name of the method, `upper`, and the name of the string to apply the method to, `word`. The empty parentheses indicate that this method takes no arguments.<br>
<br>
A method call is called an <b>invocation</b>; in this case, we would say that we are invoking `upper` on `word`.<br>
<br>
Further string methods can be found in the official documentation: https://docs.python.org/3/library/stdtypes.html#string-methods<br>
There are also other online sources like w3schools that offer more explanation and examples: https://www.w3schools.com/python/python_ref_string.asp

***

## The `in` operator

The word `in` is a `boolean` operator that takes two strings and returns `True` if the first appears as a substring in the second:

In [28]:
'a' in 'banana'

True

In [29]:
'seed' in 'banana'

False

For example, the following function prints all the letters from `word1` that also appear in `word2`

In [30]:
def in_both(word1, word2):
    for letter in word1:
        if letter in word2:
            print(letter)

With well-chosen variable names, Python sometimes reads like English. You could read this loop, "for (each) letter in (the first) word, if (the) letter (appears) in (the second) word, print (the) letter."<br>
<br>
Here is what you get if you compare apples and oranges:

In [31]:
in_both('apples', 'oranges')

a
e
s


***

## String comparison

The relational operators work on strings.

In [32]:
word = 'pineapple'

if word == 'banana':
    print('All right, bananas.')

In [33]:
word = 'banana'

if word == 'banana':
    print('All right, bananas.')

All right, bananas.


Other relational operations are useful for putting words in alphabetical order:

In [34]:
def comes_before_banana(word):
    if word < 'banana':
        print('Your word, ' + word + ', comes before banana.')
    elif word > 'banana':
        print('Your word, ' + word + ', comes after banana.')
    else:
        print('All right, bananas.')

In [35]:
comes_before_banana('pineapple')

Your word, pineapple, comes after banana.


In [36]:
comes_before_banana('banana')

All right, bananas.


All the uppercase letters come before all the lowercase letters, so:

In [37]:
comes_before_banana('Pineapple')

Your word, Pineapple, comes before banana.


A common way to mitigate this issue is to convert strings to a standard format, such as all lowercase, before performing the comparison.

***

## Debugging

The below function contains two errors:

In [38]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    i = 0
    j = len(word2)
    while j > 0:
        if word1[i] != word2[j]:
            return False
        i = i + 1
        j = j - 1
    return True

The first `if` statement checks whether the words are the same length. If not, we can return `False` immediately. Otherwise, for the rest of the function, we can assume that the words are the same length. This is an example of the <b>guardian pattern</b>.<br>
<br>
`i` and `j` are indices: `i` traverses `word1` forward while `j` traverses `word2` backward. If we find two letters that do not match, we can return `False` immediately.
If we get through the whole loop and all the letters match, we return `True`.<br>
<br>
If we test this function with the words "pots" and "stop", we expect the return value `True`, but we get an `IndexError`:

In [39]:
is_reverse('pots', 'stop')

IndexError: string index out of range

For debugging this kind of error, my first move is to print the values of the indices immediately before the line where the error appears.

In [40]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    i = 0
    j = len(word2)
    while j > 0:
        print(i, j) # print here
        if word1[i] != word2[j]:
            return False
        i = i + 1
        j = j - 1
    return True

Now when I run the program again, I get more information:

In [41]:
is_reverse('pots', 'stop')

0 4


IndexError: string index out of range

The first time through the loop, the value of `j` is 4, which is out of range for the string `'pots'`. The index of the last character is 3, so the initial value for `j` should be `len(word2)-1`.<br>
<br>
If I fix that error and run the program again, I get:

In [42]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    i = 0
    j = len(word2) - 1
    while j > 0:
        print(i, j) # print here
        if word1[i] != word2[j]:
            return False
        i = i + 1
        j = j - 1
    return True

is_reverse('pots', 'stop')

0 3
1 2
2 1


True

This time we get the right answer, but it looks like the loop only ran three times, which is suspicious. To get a better idea of what is happening, it is useful to draw a state diagram. During the first iteration, the frame for `is_reverse` is shown below:<br>
<br>
<img src="https://i.imgur.com/zj1JRv7.png" alt="State Diagram for is_reverse"><br>
<br>
Starting with this diagram, run the program on paper, changing the values of `i` and `j` during each iteration. Find and fix the second error in this function.<br>
<br>
<img src="https://i.imgur.com/cZmxMbH.jpg" alt="State Diagram for is_reverse">

In [43]:
def is_reverse(word1, word2):
    if len(word1) != len(word2):
        return False
    i = 0
    j = len(word2) - 1
    while j >= 0:
        print(i, j) # print here
        if word1[i] != word2[j]:
            return False
        i = i + 1
        j = j - 1
    return True

is_reverse('pots', 'stop')

0 3
1 2
2 1
3 0


True

***

## Glossary

Tinycards will no longer work after 2020-09-01. The best alternative that was recommended is Anki:<br>
<br>
<img src="https://upload.wikimedia.org/wikipedia/commons/d/d9/Screenshot_von_Anki_v2.0.31_unter_LinuxMint.png" alt="Screenshot of Anki"><br>
<br>
There is both a Desktop and a Mobile version, however the cards are not available online. You can download the glossary manually download here: https://1drv.ms/u/s!Ak_yuM9-jft-g4AE2d1e7hAxMUM4Zw?e=1Qs00F<br>
<br>
And follow the instructions here: https://docs.ankiweb.net/#/contrib?id=sharing-decks-privately<br>