# Strings

## Goals

By the end of this class, the student should be able to:


- Describe how to work with strings as single things

- Describe how to work with the parts of a string

- Enumerate the main methods available to work with strings

- Describe how to format strings


# Data types: Strings

## 5.1.1 A compound data type

### A compound data type

- So far we have seen built-in types like `int`, `float`, `bool`,
    `str` and we've seen lists and pairs

- Strings, lists, and pairs are qualitatively different from the
    others because they are made up of smaller pieces

- In the case of strings, they're made up of smaller strings each
    containing one **character** in a particular order from left to
    right

- Types that comprise smaller pieces are called **collection or
    compound data types**

- Depending on what we are doing, we may want to treat a compound data
    type as a single thing

- A string that contains no characters, often referred to as the **empty string**, is still considered to be a string


## 5.1.2 Working with strings as single things

### Working with strings as single things

- Just like a turtle, a string is also an object

- So each string instance has its own attributes and methods (around
    70!)

- For example:

```
  >>> our_string = "Hello, World!"
  >>> all_caps = our_string.upper()
  >>> all_caps
  'HELLO, WORLD!'
```


### Built-in Functions

- See the Python Standard Library for a list of
    built-in functions:
    
<https://docs.python.org/3.7/library/functions.html>


### Common string operations

- See the Python Standard Library for a comprehensive list of
    operations on strings:
    
<https://docs.python.org/3.7/library/string.html>


In [None]:
import string
print(string.ascii_lowercase)
print(string.ascii_uppercase)
print(string.digits)
print(string.punctuation)

In [None]:

our_string = "Hello, World!"
all_caps = our_string.upper()

print()
print(our_string)
print(all_caps)

# all_caps.<TAB>

## some operations
# message = "Oi!"
# message - 1
# type(message)
# type(1)
# message + "1"
# message * 3
# message + " " * 3

### Operations on Strings

In general, you cannot perform mathematical operations on strings, even if the strings look like numbers. 

The following are illegal:

In [None]:
fruit = "banana"
fruit - 1
fruit * "Hello"

Interestingly, the + operator does work with strings, but for strings, the + operator represents **concatenation**, not addition. 

Concatenation means joining the two operands by linking them end-to-end. 

For example:

In [None]:
fruit = "banana"
bakedGood = " nut bread"
print(fruit + bakedGood)

## 5.1.3 Working with the parts of a string

### Working with the parts of a string

- The **indexing operator** selects a single character substring from
    a string:

```
  >>> fruit = "banana"   # a string
  >>> letter = fruit[0]  # this is also a string
  >>> print(letter)
```


In [None]:
fruit = "banana"
#
#print()
#print(fruit)
#print(list(enumerate(fruit)))
##
#letter = fruit[0]
#
#print()
#print(letter)

other_letter = fruit[len(fruit)-2]

print()
#print(len(fruit))
# other_letter = fruit[len(fruit)]
#other_letter = fruit[len(fruit)-1]
#print(other_letter)
other_letter = fruit[-2]
print(other_letter)

## indexing with lists is the same!
#prime_numbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31]
#
#print()
#print(prime_numbers)
#print(prime_numbers[4])

#friends = ["Joe", "Zoe", "Brad", "Angelina", "Zuki", "Thandi", "Paris"]
#
#print()
#print(friends)
#print(friends[3])

The characters are accessed by their *position* or *index value* (the expression in brackets). 

For example, in the string shown below, the 14 characters are indexed left to right from postion 0 to position 13.

![string](images/07/s-index.png)

In [None]:
mice = "Master in Informatics and Computing Engineering"
m = mice[0]
print(m)

In [None]:
lastchar = mice[-1]
print(lastchar)

## 5.1.4 Length

### Length

- The `len` function, when applied to a string, returns the number of
    characters in a string:

```
  size = len(word)
  last = word[size-1]
```

Try it:

In [None]:
word = "Monty"
size = len(word)
word[size-1]

In [None]:
print(size)
print(word[size-1])

In [None]:
last = word[size]       # ERROR!

## 5.1.5 Traversal and the `for` loop

### Traversal and the `for` loop

- A lot of computations involve processing a *string one character at
    a time*

- Often they start at the beginning, select each character in turn, do
    something to it, and continue until the end

- This pattern of processing is called a **traversal**

```
  word = "Banana"
  for letter in word:
      print(letter)
```


Since a string is simply a sequence of characters, the for loop iterates over each character automatically.

In [None]:
words = "Monty Python"
for l in words:
    print(l)

Recall that the loop variable takes on each value in the sequence of names. The body is performed once for each name.

In [None]:
for aname in ["Eric", "Graham", "John", "Michael", "Terry]:
    invitation = "Hi " + aname + ".  Please come to my party on Saturday!"
    print(invitation)

### Traversal and looping: By Index

- It is also possible to use the range function to systematically generate the indices of the characters. 

- The for loop can then be used to iterate over these positions. 

- These positions can be used together with the indexing operator to access the individual characters in the string.


In [None]:
fruit = "apple"
for idx in range(5):
    currentChar = fruit[idx]
    print(currentChar)

Likewise:


In [None]:
fruit = "apple"

position = 0
while position < len(fruit):
    print(fruit[position])
    position = position + 1

In [None]:
fruit = "apple"

for letter in fruit:
    print(letter)

## 5.1.6 Slices

### Slices

- A substring of a string is obtained by taking *a slice*

- Similarly, we can slice a list to refer to some sublist of the items
    in the list


In [None]:
singers = "Eric, Graham, John, Michael, Terry"
print(singers[0:4])
print(singers[14:18])
print(singers[:-22])
print()
print(singers[:])
print(singers[4:])
print(singers[-5:-3])

## 5.1.7 String comparison

### String comparison

- The comparison operators work on strings

- To see if two strings are equal:

```
  if word == "banana":
      print("Yes, we have bananas!")
```


In [None]:
word = "zebra"

if word < "banana":
    print("Your word, " + word + ", comes before banana.")
elif word > "banana":
    print("Your word, " + word + ", comes after banana.")
else:
    print("Yes, we have no bananas!")

Strings are compared lexicographically by comparing corresponding elements. Lexicographical order is similar to alphabetical order, so A comes before B comes before C.

There are sometimes unexpected results, for example case-sensitive ordering where "B" comes before "a" because the capital letters are ordered before lowercase ones.

In [None]:
print("zebra" < "banana")

In [None]:
print("Zebra" < "banana")

In [None]:
print("zebra" < "Banana")

## 5.1.8 Strings are immutable

### Strings are immutable

- Strings are **immutable**, which means you can't change an existing
    string

- The best you can do is create a new string that is a variation on
    the original

```
  greeting = "Hello, world!"
  greeting[0] = 'h'            # ERROR!

  greeting = "h" + greeting[1:]
  print(greeting)
```

In [None]:
greeting = "Hello, world!"
greeting[0] = 'h'

In [None]:
greeting = "Hello, world!"
greeting = "h" + greeting[1:]
  print(greeting)

## 5.1.9 The `in` and `not in` operators

### The `in` and `not in` operators

- The `in` operator tests for membership

- The `not in` operator returns the logical opposite results of `in`

```
  >>> "p" in "apple"
  True
  >>> "i" in "apple"
  False
  >>> "apple" in "apple"
  True
  >>> "" in "a"
  True
  >>> "x" not in "apple"
  True
```



In [None]:
print('p' in 'apple')
print('i' in 'apple')
print('ap' in 'apple')
print('pa' in 'apple')
print('apple' in 'apple')  # a string is a substring of itself

In [None]:
def remove_vowels(phrase):
    vowels = "aeiou"
    string_sans_vowels = ""
    for letter in phrase:
        if letter.lower() not in vowels:
            string_sans_vowels += letter
    return string_sans_vowels


phrase = "Programming"
print()
print(phrase)
print(remove_vowels(phrase))

## 5.1.10 A `find` function

### `enumerate` function

- Enumerate is a built-in function of Python


```
 enumerate(iterable, start=0)
```

- Return an enumerate object

- iterable must be a sequence, an iterator, or some other object which supports iteration



In [None]:
for counter, value in enumerate(some_list):
    print(counter, value)

In [None]:
my_list = ['apple', 'banana', 'grapes', 'pear']
for c, item in enumerate(my_list, 1):
    print(c, item)

In [None]:
words = "Eric Idle, Michael Palin"
for i, ch in enumerate(words):
    print(i, ":", ch)

### A `find` function

- In a sense, find is the opposite of the indexing operator

- What does the following function do?

```
  def my_find(haystack, needle):
  """
  No clues here!
  """
  for index, letter in enumerate(haystack):
      if letter == needle:
          return index
  return -1
```


In [None]:
def my_find(haystack, needle):
    """
    Find and return the index of needle in haystack.
    Return -1 if needle does not occur in haystack.
    """
    for index, letter in enumerate(haystack):
        if letter == needle:
            return index          # short-circuit evaluation
    return -1


haystack = "Bananarama!"
#
print()
print(my_find(haystack,'a'))
#
## the standard find method
#print(haystack.find('a'))

## 5.1.11 Looping and counting

### Looping and counting

- another example of the **counter** pattern (introduced in *Counting
    digits*)

```
  def count_a(text):
      count = 0
      for letter in text:
          if letter == "a":
              count += 1
      return count
  
  print(count_a("banana") == 3)
```



Try this one:

In [None]:
def count(text, aChar):
    lettercount = 0
    for c in text:
        if c == aChar:
            lettercount = lettercount + 1
    return lettercount

print(count("banana","a"))

## 5.1.12 Optional parameters

### Optional parameters

- To find the locations of the second or third occurrence of a
    character in a string

- we can modify the `find` function, adding a third parameter for the
    starting position in the search string

- Better still, we can combine find and find2 using an **optional
    parameter**:

```
  def find(haystack, needle, start=0):
      for index,letter in enumerate(haystack[start:]):
          if letter == needle:
              return index + start
      return -1
```


In [None]:
def find(astring, achar, start=0):
    """
    Find and return the index of achar in astring.
    Return -1 if achar does not occur in astring.
    """
    ix = start
    found = False
    while ix < len(astring) and not found:
        if astring[ix] == achar:
            found = True
        else:
            ix = ix + 1
    if found:
        return ix
    else:
        return -1

print(find('banana', 'a', 2))

## 5.1.13 The built-in `find` method

### The built-in `find` method

- The built-in `find` method is more general

- It can find substrings, not just single characters:

```
  >>> "banana".find("nan")
  2
  >>> "banana".find("na", 3)
  4
```

## 5.1.14 The `split` method

### The `split` method

- One of the most useful methods on strings is the split method

- it splits a single multi-word string into a list of individual
    words, removing all the whitespace between them<sup>1</sup>

```
  >>> phrase = "Oh, that's jolly good. Well, off you go then"
  >>> words = phrase.split()
  >>> words
  ['Oh,', "that's", 'jolly', 'good.', 'Well,', 'off', 'you', 'go', 'then']
```

<sup>1</sup> Whitespace means any tabs, newlines, or spaces.

In [None]:
phrase = "Oh, that's jolly good. Well, off you go then"
print(phrase.split())

## 5.1.15 Cleaning up your strings

### Cleaning up your strings

- We'll often work with strings that contain punctuation, or tab and
    newline characters

- But if we're writing a program, say, to count word frequency, we'd
    prefer to strip off these unwanted characters.

- We'll show just one example of how to strip punctuation from a
    string

    - we need to traverse the original string and create a new string,
        omitting any punctuation


In [None]:
s = "  I'm not        clean.  "
print(s)
print(s.strip())

## 5.1.16 The string `format` method

### The string `format` method

- The easiest and most powerful way to format a string in Python3 is
    to use the `format` method

- The template string contains place holders, ... `{0}` ...`{1}`
    ...`{2}` ...etc

- The format method substitutes its arguments into the place holders

- To see how this works, let's start with a few examples:

```
  phrase = "His name is {0}!".format("Arthur")
  print(phrase)

```




In [None]:
name = "Alice"
age = 10
phrase = "I am {0} and I am {1} years old.".format(age, name)
print(phrase)

### Format specification

- Each of the replacement fields can also contain a **format
    specification**

- This modifies how the substitutions are made into the template, and
    can control things like:

    - whether the field is aligned to the left **`<`**, center
        **`^`**, or right **`>`**

    - the width allocated to the field within the result string (a
        number like 10)

    - the type of conversion: we'll initially only force conversion to
        float, **`f`** or perhaps we'll ask integer numbers to be
        converted to hexadecimal using **`x`**)

    - if the type conversion is a float, you can also specify how many
        decimal places are wanted: typically, **`.2f`** is useful for
        working with currencies to two decimal places



## f-strings

### f-strings

[Python 3's f-Strings: An Improved String Formatting Syntax (Guide)](https://realpython.com/python-f-strings/)


```
name = "Eric"
age = 74
f"Hello, {name}. You are {age}."
```


Because f-strings are evaluated at runtime, you can put any and all valid Python expressions in them:

In [None]:
f"{2 * 37}"

But you could also call functions. Here’s an example:

In [None]:
def to_lowercase(input):
    return input.lower()

name = "Eric Idle"
f"{to_lowercase(name)} is funny."

You also have the option of calling a method directly:

In [None]:
f"{name.lower()} is funny."

f-strings are faster than both %-formatting and str.format(). 

As you already saw, f-strings are expressions evaluated at runtime rather than constant values.

[PEP 498 -- Literal String Interpolation](https://www.python.org/dev/peps/pep-0498/#abstract)
