# Class 26-27- String and String Processing Examples
**COMP130 - Introduction to Computing**  
**Dickinson College**  

### `String`s and Offsets (indices)

- A `String` is a *sequence* of characters.  
- Each *item* or (*element*) in a *sequence* has an *offset* (or an *index*).  
- The offsets (or indices) start with `0`.
- Individual items in a sequence can be accessed using square brackets `[ ]`

In [None]:
s = 'Hello COMP 130!'
print(s[0])
print(s[4])

In [None]:
funny = s[2] + s[4] + s[3]
print(funny)

In [None]:
okay = s[14]
oopsie = s[15]

### `String` Traversal

Offsets (indices) can be used to to *traverse* the sequence (i.e. to go though the items of the sequence.) The loop *setup*, *condition* and *update* determines the order in which the sequence is traversed.

In [None]:
s = "Traverse this."
index = 0
while index < len(s):
    letter = s[index]
    print(str(index) + " : " + letter)
    index = index + 1

### Traversal with `for in`

Traversing a sequence from the first element to the last element is a very common operation.  Thus, many languages provide a shorthand way to express this. In Python the *`for in`* loop traverses a sequence from the first element to the last element.  The *loop variable* is assigned to each element in turn and can be used to process the elements.

In [None]:
def get_acronym(phrase):
    """ Get an acronym by returning a string containing
        all of the uppercase letters in the phrase.
    """
    
    acronym = ''
    for letter in phrase:
        if letter >= 'A' and letter <= 'Z':
            acronym = acronym + letter
        
    return acronym

acronym = get_acronym("As Soon As Possible!")
print(acronym)

# Note: Writing an is_cap_letter boolean function would help with readability here!

In [None]:
assert get_acronym("As Soon As Possible!") == "ASAP", "Incorrect acronym"
assert get_acronym("fuNnY bUt Should be okAy") == "NYUSA", "Incorrect acronym"
assert get_acronym("shoud be empty") == "", "Did not handle empty case correctly"
assert get_acronym("SHOULD BE ALL") == "SHOULDBEALL", "Did not handle all case correctly"
print("Success!")

### `String` Slices

The `[ ]` operator for accessing an element of a sequence can also be used to access *sub-sequences* consisting of more than one element.

In [None]:
motto='A useful education for the common good.'
#      0         1         2         3
#                0         0         0 

sliceIt=motto[2:8]
print(sliceIt)

diceIt=motto[34:38]
print(diceIt)

diceIt=motto[len(motto)-5:len(motto)-1]
print(diceIt)

In [None]:
start=motto[:18]
print(start)

end=motto[23:]
print(end)

### `String`s are Immutable

A `String` in Python is *immutable* meaning that the characters in the `String` cannot be changed.  That is, it is illegal to assign new characters to an index or a slice of a `String`.  Instead, a new `String` needs to be created by concatenating parts of the original string and the new character(s).

In [None]:
s = 'Hello COMP 130!'
s[0] = 'M'

In [None]:
motto='A useful education for the common good.'
motto[27:33] = 'greater'

In [None]:
new_motto=motto[:27] + 'greater' + motto[33:]
print(new_motto)

![Stop sign](stop.png)
End of Class 26 material.

### A Search Pattern

A common pattern that arises with sequences is to *search* through the elements of the sequence for a particular value or a value that meets some criterion.  As with a traversal the loop *setup*, *condition* and *update* determines the order in which the sequence is searched.  The `if` *condition* is used to detect when an item is the one that is being searched for.

In [37]:
def find_digit(line):
    """ Return the index of the first numeric digit in the 
        the given string, or -1 if line contains no digits.
    """
    index = 0
    while index < len(line):
        char = line[index]
        if char >= '0' and char <= '9':
            return index
        
        index = index + 1
        
    return -1

print(find_digit("abc 123"))

# Note: Writing an is_digit boolean function would help with readability here!

4


In [None]:
assert find_digit("abc 123") == 4, "Incorrect index of digit."
assert find_digit("123 abc") == 0, "Missed digit in first spot"
assert find_digit("last 1") == 5, "Missed digit in last spot"
assert find_digit("no digits for you") == -1, "Incorrect result with no digits"
assert find_digit("") == -1, "Does not work with empty string"
print("Success!")

### Aggregate Pattern

Another common patten that arises with sequences is the computation of aggregate data.  We saw this pattern earlier when working with simple `for` loops and computing totals, averages, maximum and minimum values. The same techniques we used for those computations also work with sequences.

In [None]:
def count_vowels(line):
    """ Count the number of vowels that appear in the line.
    """
    count = 0
    
    for char in line:
        if char=='A' or char=='E' or char=='I' or char=='O' or char=='U':
            count = count + 1
    
    return count

print(count_vowels('THIS IS A TEST'))

# Note: Writing an is_vowel boolean function would help with readability here!

In [None]:
assert count_vowels("THIS IS A TEST")==4, "Incorect vowel count"
assert count_vowels("NT NY VWLS HR")==0, "Incorrect vowel count when no vowels"
assert count_vowels("AEIOU")==5, "Incorrect vowel count when all vowels"
print("Success!")