# Chapter 8: Strings

Data types that are comprised of smalles pieces are called **compound data types**. Python allows you to treat this data types as a single unit of by its component parts. Strings are one example of a compound data type, and its constituting units are called **charaters**.

In Python, strings are also **immutable**, meaning thatyoun _cannot_ change any character in a string. In order to change a string, you need to create an entirely new one.

## Strings as Single Units

Strings have several methods you can use that will modify a whole unit. For example:

* `capitalize`: Capitalizes the first character.
* `swapcase`: Change all upper case characters to lower case and vice-versa.
* `upper`: Change the entire string to upper case.
* `lower`: Change the entire string to lower case.

In [1]:
string = 'hello, World!'

print(f'Original string: "{string}"')
print(f'Capitalized: "{string.capitalize()}"')
print(f'Swap case: "{string.swapcase()}"')
print(f'Upper case: "{string.upper()}"')
print(f'Lower case: "{string.lower()}"')

Original string: "hello, World!"
Capitalized: "Hello, world!"
Swap case: "HELLO, wORLD!"
Upper case: "HELLO, WORLD!"
Lower case: "hello, world!"


## Working with Characters in a String

The **indexing operator** selects a units, based on its position from a compound type. The expression used to select them is called an **index**. In Python:

1. it's represented by square brackets,
2. you can use an integer to select a single element,
3. the first element is at index `0`,
4. you can use negative numbers as indexes, which is equivalent to _going backwards_ from the end of the string.

The `enumerate` function allows you to see indexes and units.

In [2]:
string = 'hello, World!'
char_num = 4

print(f'Original string: "{string}"')
print(f'Character at position {char_num}: "{string[char_num]}"')
print(f'Character at position -{char_num}: "{string[-char_num]}"')
print(f'Enumeration of string:\n{list(enumerate(string))}')

Original string: "hello, World!"
Character at position 4: "o"
Character at position -4: "r"
Enumeration of string:
[(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o'), (5, ','), (6, ' '), (7, 'W'), (8, 'o'), (9, 'r'), (10, 'l'), (11, 'd'), (12, '!')]


## String Traversal

Traversing strings refers to iterating through each of their characters. That's exactly what Python does when you use them in a loop. For example:

In [3]:
prefixes = "JKLMNOPQ"
suffix = "ack"

for p in prefixes:
    print(p + suffix)

Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack


## Useful Functions and Statements

Next, we'll review some useful functions and statements that can be used with strings or that relate to strings specifically. Distinctions will be made by the language we use.

### Length

The `len` function computes how many units are in a compound type. For example:

In [4]:
string = 'hello, World!'

print(f'Original string: "{string}"')
print(f'Number of characters in original string: {len(string):,}')

Original string: "hello, World!"
Number of characters in original string: 13


### Slicing

**Slicing** refers to extracting a subsection of compound types. In Python, this operation is denoted by a semicolon (`:`) within an indexing call. It has the following behaviour:

1. `[m:n]`: Extract the units `m` (inclusive) through `n` (exclusive). If `n` > `length`, it will return the rest of the type.
2. `[:n]` Extract from the first item (inclusive) to the `n`-th one (exclusive).
3. `[m:]` Extract from the `m`-th item (inclusive) to the last one (exclusive).

Examples:

In [5]:
string = 'hello, World!'

print(f'Original string: "{string}"')
print(f'Substring [2:5]: "{string[2:5]}"')
print(f'Substring [2:500]: "{string[2:500]}"')
print(f'Substring [:5]: "{string[:5]}"')
print(f'Substring [2:]: "{string[2:]}"')

Original string: "hello, World!"
Substring [2:5]: "llo"
Substring [2:500]: "llo, World!"
Substring [:5]: "hello"
Substring [2:]: "llo, World!"


### String Comparison

Two strings are equal if they are the same character by character. This can be tested by using the equality expression:

In [6]:
string_1 = 'hello, world!'
string_2 = 'hello, World!'

print(f'String 1: "{string_1}"')
print(f'String 2: "{string_2}"')
print('')
print(
    ('Since the "w" in "world" is '
     'capitalized differently, they are different:\n'
     f'string_1 == string_2 returns {string_1 == string_2}')
)
print('')
print(
    ('However, if we change it to lower case '
     'they become the equal:\n'
     'string_1.lower() == string_2.lower() returns '
     f'{string_1.lower() == string_2.lower()}')
)

String 1: "hello, world!"
String 2: "hello, World!"

Since the "w" in "world" is capitalized differently, they are different:
string_1 == string_2 returns False

However, if we change it to lower case they become the equal:
string_1.lower() == string_2.lower() returns True


### The in Operator

The boolean `in` operator checks for memership of a slice within a composite type. Its structure is:

``` <slice> in <test type>```

It returns `True` if there exists a slice in `<test type>` that is equal to the target `<slice>` (empty subslices are considered to be subslices of any test): 

In [7]:
print(f'Test "p" in "apple": {"p" in "apple"}')
print(f'Test "i" in "apple": {"i" in "apple"}')
print(f'Test "ap" in "apple": {"ap" in "apple"}')
print(f'Test "pa" in "apple": {"pa" in "apple"}')
print(f'Test "a" in "a": {"a" in "a"}')
print(f'Test "apple" in "apple": {"apple" in "apple"}')
print(f'Test "" in "apple": {"" in "apple"}')

Test "p" in "apple": True
Test "i" in "apple": False
Test "ap" in "apple": True
Test "pa" in "apple": False
Test "a" in "a": True
Test "apple" in "apple": True
Test "" in "apple": True


### The split Method

The `split` method separates a string into a list of words, removing whitespaces.

In [8]:
string = "Well I never did say 'Alice'."

print(f'Original string: "{string}"')
print(f'Split text: {string.split()}')

Original string: "Well I never did say 'Alice'."
Split text: ['Well', 'I', 'never', 'did', 'say', "'Alice'."]


Actually, `split` can also receive a parameter that allows you to split by any substring you want:

In [9]:
string = "12:11:00"
print(f'Original string: "{string}"')
print(f'Split text (splitting by ":"): {string.split(":")}')

print('')

string = "I<token>want<token>to<token>break<token>free"
print(f'Original string: "{string}"')
print(f'Split text (splitting by "<token>"): {string.split("<token>")}')

Original string: "12:11:00"
Split text (splitting by ":"): ['12', '11', '00']

Original string: "I<token>want<token>to<token>break<token>free"
Split text (splitting by "<token>"): ['I', 'want', 'to', 'break', 'free']


### The find Method

The `find` method returns the first index at which a substring occurs within a string.

In [10]:
print(f'Index of the first "na" in "banana": {"banana".find("na")}')

Index of the first "na" in "banana": 2


### The format Method

The format method has two functions:

1. It allows you to replace place holders in strings with variable contents.
2. It allows you to specify the way in which something is printed.

Place holders are defined using curly brackets within strings. They can be specified in four ways, which _cannot_ be mixed:

1. Ordinally: The inputs for the `format` method are in the same order as they appear in the string.

    ```'The first place holder {} and the second one {}'.format(first, second)```

2. Positionally: Place the number of input in the format function you want to use for the place holder.

    ```'The second place holder {1} and the first one {0}'.format(first, second)```
    
3. Through variable names: Each place holder has a variable name associated with it, and in the function call the variables are assigned.

    ```'The second place holder {second} and the first one {first}'.format(first=first, second=second)```
    
4. Through **f-strings**: f-strings were introduced in Python 3.6 and greatly simplify the format function. They're defined by placing an `f` token _before_ the string, and place holders contain the expressions to be evaluated or variables themselves:

   ```f'The second place holder {second} and the first one {first}'```
   
Example:

In [11]:
layout = "{0:>4}{1:>6}{2:>6}{3:>8}{4:>13}{5:>24}"

print(layout.format("i", "i**2", "i**3", "i**5", "i**10", "i**20"))
for i in range(1, 11):
    print(layout.format(i, i**2, i**3, i**5, i**10, i**20))

   i  i**2  i**3    i**5        i**10                   i**20
   1     1     1       1            1                       1
   2     4     8      32         1024                 1048576
   3     9    27     243        59049              3486784401
   4    16    64    1024      1048576           1099511627776
   5    25   125    3125      9765625          95367431640625
   6    36   216    7776     60466176        3656158440062976
   7    49   343   16807    282475249       79792266297612001
   8    64   512   32768   1073741824     1152921504606846976
   9    81   729   59049   3486784401    12157665459056928801
  10   100  1000  100000  10000000000   100000000000000000000


## Excercises.

### 1

What is the result of each of the following:

1. `"Python"[1]`
2. `"Strings are sequences of characters."[5]`
3. `len("wonderful")`
4. `"Mystery"[:4]`
5. `"p" in "Pineapple"`
6. `"apple" in "Pineapple"`
7. `"pear" not in "Pineapple"`
8. `"apple" > "pineapple"`
9. `"pineapple" < "Peach"`

#### Answer:

1. `"y"`
2. `"g"`
3. 9
4. `"Myst"`
5. `True`
6. `True`
7. `True`
8. `False`
9. `False`

In [12]:
print(f'Answer to 1: {"Python"[1]}')
print(f'Answer to 2: {"Strings are sequences of characters."[5]}')
print(f'Answer to 3: {len("wonderful")}')
print(f'Answer to 4: {"Mystery"[:4]}')
print(f'Answer to 5: {"p" in "Pineapple"}')
print(f'Answer to 6: {"apple" in "Pineapple"}')
print(f'Answer to 7: {"pear" not in "Pineapple"}')
print(f'Answer to 8: {"apple" > "pineapple"}')
print(f'Answer to 9: {"pineapple" < "Peach"}')

Answer to 1: y
Answer to 2: g
Answer to 3: 9
Answer to 4: Myst
Answer to 5: True
Answer to 6: True
Answer to 7: True
Answer to 8: False
Answer to 9: False


### 2

Modify

```
prefixes = "JKLMNOPQ"
suffix = "ack"

for letter in prefixes:
    print(letter + suffix)

```

so that "Ouack" and "Quack" are spelled correctly.

In [13]:
prefixes = "JKLMNOPQ"
suffix = "ack"

for letter in prefixes:
    if letter in 'OQ':
        letter += 'u'
    print(letter + suffix)

Jack
Kack
Lack
Mack
Nack
Ouack
Pack
Quack


### 3

Encapsulate

```
fruit = "banana"
count = 0
for char in fruit:
    if char == "a":
        count += 1
print(count)
```

in a function named `count_letters`, and generalize it so that it accepts the string and the letter as arguments. Make the function return the number of characters, rather than print the answer. The caller should do the printing.

In [14]:
def count_letters(string, letter):
    """
    Count the number of ocurrences of a letter in a string.
    """
    # Initialize the counter
    count = 0
    
    # Traverse the string
    for char in string:
        # Update count if current character is the target
        if char == letter:
            count += 1

    return count


assert count_letters('banana', 'a') == 3
assert count_letters('abanana', 'a') == 4
assert count_letters('BANANA', 'a') == 0
assert count_letters('banana', '') == 0

### 4

Now rewrite the `count_letters` function so that instead of traversing the string, it repeatedly calls the `find` method, with the optional third parameter to locate new occurrences of the letter being counted.

In [15]:
def count_letters(string, letter):
    """
    Count the number of ocurrences of a letter in a string.
    """
    # Initialize the counter and first instance
    
    count = 0
    
    if letter:
        found = string.find(letter)
    else:
        found = -1
    
    # While we still find instances of the letter in the string
    while found != -1:
        # Update count     
        count += 1
        
        # Search for the next instance
        found = string.find(letter, found+1)
        
        # If there are no more instances
        if found == -1:
            # We're done
            return count

    return count


assert count_letters('banana', 'a') == 3
assert count_letters('abanana', 'a') == 4
assert count_letters('BANANA', 'a') == 0
assert count_letters('banana', '') == 0

### 5

Assign to a variable in your program a triple-quoted string that contains your favourite paragraph of text — perhaps a poem, a speech, instructions to bake a cake, some inspirational verses, etc.

Write a function which removes all punctuation from the string, breaks the string into a list of words, and counts the number of words in your text that contain the letter “e”. Your program should print an analysis of the text like this:

> `Your text contains 243 words, of which 109 (44.8%) contain an "e".`

In [16]:
import string

def remove_punctuation(s):
    """
    Remove punctuation frm a string.
    """
    s_without_punct = ""
    for letter in s:
        if letter not in string.punctuation:
            s_without_punct += letter
    return s_without_punct

assert remove_punctuation(
    '"Well, I never did!", said Alice.') == "Well I never did said Alice"
assert remove_punctuation(
    "Are you very, very, sure?") == "Are you very very sure"

In [17]:
a_specials = ('á', 'à', 'â')
e_specials = ('é', 'è', 'ê')
i_specials = ('í', 'ì', 'î')
o_specials = ('ó', 'ò', 'ô')
u_specials = ('ú', 'ù', 'û')

def homologate_special_characters(s):
    """
    Remove accents from a string and change it to lower case.
    """
    result = s.lower()
    
    for special in a_specials:
        result = result.replace(special, 'a')
    for special in e_specials:
        result = result.replace(special, 'e')
    for special in i_specials:
        result = result.replace(special, 'i')
    for special in o_specials:
        result = result.replace(special, 'o')
    for special in u_specials:
        result = result.replace(special, 'u')
    
    return result

assert homologate_special_characters('á') == 'a'
assert homologate_special_characters('ê') == 'e'
assert homologate_special_characters('ì') == 'i'
assert homologate_special_characters('Ó') == 'o'
assert homologate_special_characters('Û') == 'u'
assert homologate_special_characters('hello') == 'hello'
assert homologate_special_characters('heLLo') == 'hello'

In [18]:
def analyze_paragraph(paragraph, letter='e'):
    """
    Print an analysis of a paragraph that contains the number of words and 
    how many contain a letter.
    """
    
    # Process the paragraph
    processed = homologate_special_characters(remove_punctuation(paragraph))
    
    # Create the list of words
    processed = processed.split()
    
    # Count number of words
    words = len(processed)
    
    # Count number of words with letter
    letter_words = 0
    for wor in processed:
        if letter in wor:
            letter_words += 1
    
    # Print analysis
    print(
        f'Your text contains {words:,} words, '
        f'of which {letter_words:,} ({letter_words/words:.2%}) '
        f'contain an "{letter}".'
    )

In [19]:
paragraph = """
Las tierras, las tierras, las tierras de España,
las grandes, las solas, desiertas llanuras.
Galopa, caballo cuatralbo,
jinete del pueblo,
al sol y a la luna. 
"""

analyze_paragraph(paragraph)

Your text contains 26 words, of which 10 (38.46%) contain an "e".


### 6

Print a neat looking multiplication table.

In [20]:
def print_multiples(n, high):
    """
    Print a row of a multiplication table.
    """
    string = f'{n:5}: '
    for i in range(1, high+1):
        # Build row by multiplying n i times
        string += f'{n*i:>5}'
    print(string)
    print()

def print_mult_table(max_num):
    """
    Build a multiplication table.
    """
    print_multiples(1, max_num)
    print(f'{"":5}: ' + ' - '* max_num * 2)
    print('')
    for i in range(1, max_num+1):
        print_multiples(i, max_num)
        
print_mult_table(12)

    1:     1    2    3    4    5    6    7    8    9   10   11   12

     :  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - 

    1:     1    2    3    4    5    6    7    8    9   10   11   12

    2:     2    4    6    8   10   12   14   16   18   20   22   24

    3:     3    6    9   12   15   18   21   24   27   30   33   36

    4:     4    8   12   16   20   24   28   32   36   40   44   48

    5:     5   10   15   20   25   30   35   40   45   50   55   60

    6:     6   12   18   24   30   36   42   48   54   60   66   72

    7:     7   14   21   28   35   42   49   56   63   70   77   84

    8:     8   16   24   32   40   48   56   64   72   80   88   96

    9:     9   18   27   36   45   54   63   72   81   90   99  108

   10:    10   20   30   40   50   60   70   80   90  100  110  120

   11:    11   22   33   44   55   66   77   88   99  110  121  132

   12:    12   24   36   48   60   72   84   96  108  120  132  144



### 7

Write a function that reverses its string argument, and satisfies these tests:

```
test(reverse("happy") == "yppah")
test(reverse("Python") == "nohtyP")
test(reverse("") == "")
test(reverse("a") == "a")
```

In [21]:
def reverse(s):
    """
    Reverse the order of a string.
    """
    result = ''
    for char in s:
        result = char + result
    return result
    
assert reverse("happy") == "yppah"
assert reverse("Python") == "nohtyP"
assert reverse("") == ""
assert reverse("a") == "a"

### 8

Write a function that mirrors its argument:

```
test(mirror("good") == "gooddoog")
test(mirror("Python") == "PythonnohtyP")
test(mirror("") == "")
test(mirror("a") == "aa")
```

In [22]:
def mirror(s):
    """
    Concatenate a string with its mirror image.
    """
    return s + reverse(s)

assert mirror("good") == "gooddoog"
assert mirror("Python") == "PythonnohtyP"
assert mirror("") == ""
assert mirror("a") == "aa"

### 9

Write a function that removes all occurrences of a given letter from a string:

```
test(remove_letter("a", "apple") == "pple")
test(remove_letter("a", "banana") == "bnn")
test(remove_letter("z", "banana") == "banana")
test(remove_letter("i", "Mississippi") == "Msssspp")
test(remove_letter("b", "") = "")
test(remove_letter("b", "c") = "c")

```

In [23]:
def remove_letter(letter, string):
    """
    Remove all instances of a letter from string.
    """
    result = ''
    for char in string:
        if char != letter:
            result += char
    return result

assert remove_letter("a", "apple") == "pple"
assert remove_letter("a", "banana") == "bnn"
assert remove_letter("z", "banana") == "banana"
assert remove_letter("i", "Mississippi") == "Msssspp"
assert remove_letter("b", "") == ""
assert remove_letter("b", "c") == "c"

### 10

Write a function that recognizes palindromes.

```
test(is_palindrome("abba"))
test(not is_palindrome("abab"))
test(is_palindrome("tenet"))
test(not is_palindrome("banana"))
test(is_palindrome("straw warts"))
test(is_palindrome("a"))
```

In [24]:
def is_palindrome(s):
    """
    Return whether a string is a palindrome or not.
    """
    return s == reverse(s)

assert is_palindrome("abba")
assert not is_palindrome("abab")
assert is_palindrome("tenet")
assert not is_palindrome("banana")
assert is_palindrome("straw warts")
assert is_palindrome("a")
assert is_palindrome("")

### 11

Write a function that counts how many times a substring occurs in a string.

```
test(count("is", "Mississippi") == 2)
test(count("an", "banana") == 2)
test(count("ana", "banana") == 2)
test(count("nana", "banana") == 1)
test(count("nanan", "banana") == 0)
test(count("aaa", "aaaaaa") == 4)
```

In [25]:
def count_substrings(sub, string):
    """
    Count the number of occurrences of a substring in a string.
    """
    result = 0
    sub_length = len(sub)
    for ix in range(len(string)):
        if string[ix: ix + sub_length] == sub:
            result += 1
    return result
        

assert count_substrings("is", "Mississippi") == 2
assert count_substrings("an", "banana") == 2
assert count_substrings("ana", "banana") == 2
assert count_substrings("nana", "banana") == 1
assert count_substrings("nanan", "banana") == 0
assert count_substrings("aaa", "aaaaaa") == 4

### 12

Write a function that removes the first occurrence of a string from another string.

```
test(remove("an", "banana") == "bana")
test(remove("cyc", "bicycle") == "bile")
test(remove("iss", "Mississippi") == "Missippi")
test(remove("eggs", "bicycle") == "bicycle")
```

In [26]:
def remove(sub, string):
    """
    Remove the first occurence of a substring in a string.
    """
    splitted = string.split(sub)
    result = splitted[0]

    if len(splitted) > 1:
        result += splitted[1]
        if len(splitted) > 2:
            result += sub
            final = sub.join([x for x in splitted[2:]])
            result += final
    return result
            

assert remove("an", "banana") == "bana"
assert remove("cyc", "bicycle") == "bile"
assert remove("iss", "Mississippi") == "Missippi"
assert remove("eggs", "bicycle") == "bicycle"

### 13

Write a function that removes all occurrences of a string from another string.

```
test(remove_all("an", "banana") == "ba")
test(remove_all("cyc", "bicycle") == "bile")
test(remove_all("iss", "Mississippi") == "Mippi")
test(remove_all("eggs", "bicycle") == "bicycle")
```

In [27]:
def remove_all(sub, string):
    """
    Remove all insances of sub in string.
    """
    splitted = string.split(sub)
    result = splitted[0]

    if len(splitted) > 1:
            final = ''.join([x for x in splitted[1:]])
            result += final
    return result
            

assert remove_all("an", "banana") == "ba"
assert remove_all("cyc", "bicycle") == "bile"
assert remove_all("iss", "Mississippi") == "Mippi"
assert remove_all("eggs", "bicycle") == "bicycle"