# Week 06, Worksheet 0: Strings

<div class="alert alert-block alert-info">
    This worksheet implements to-do markers where work needs to be completed. In some cases, this means that you'll need to add a line or two to an example. In other cases (such as the final exercise), you may need to solve an entire problem.
</div>

## `string`ing you along

To this point in the semester, we've worked with `string` objects largely through `print` statements. While we will still do some of that, this week, we're exploring the more advanced world of what lies beneath the surface.

### What is a string?

The "textbook" definition of the `string` is that it's simply a collection of _symbols_. These symbols can be anything that is computer-recognizable. Typically this means letters (`qweituortrgdafgdg`), symbols (`##@*!#!)` -- again, I'm not _that_ upset about it), or numbers-as-symbols. This last one is a little strange. Just remember that:

$$ 4 \neq{"4"} $$

One of the above, `4` is the _integer_ representation -- a number. The other, `"4"` is the `string` representation of the symbol `4`. This is why we get the lovely `TypeError` in when we attempt to do the following:

```
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-ebbc32cdfe85> in <module>
----> 1 s = 4  + "4"

TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

Back to the definition: we can then say that a `string` is simply a group of symbols known as _characters_ which can display nearly anything that language can provide.

But, I can also tell you that your computer doesn't actually care about the symbol. That's just a human convenience. All a system really cares about is that _each symbol is stored at a different place_ in a gigantic set of characters called the Unicode Coded Character Set (or just "Unicode"). Speaking of the majority of English-language characters, we can say they're stored in a _subset_ (that is, smaller portion of) this standard that we call [ASCII (the American Standard Code for Information Interchange)](http://www.asciitable.com/).

This table, known as the "ASCII Table" contains entries for various letters which refer to decimal (or `DEC`) representations of symbols which actually refer to letters. Here's what that looks like from a programming perspective:

In [None]:
# Notice that they're different

print("The ASCII character code for 'a':",ord("a"))
print("The ASCII character code for 'A':",ord("A"))

Here, we use a built-in function called `ord` to get the _ordinal_ (that is the Unicode "code point" -- address of) a given character. Again, because the _symbol_ `A` looks and behaves distinctly differently from the symbol `a`, their addresses in the table are _different_. Try it for yourself below.

#### 1a. `print` the ordinal values of at least `4` different characters or symbols.

In [None]:
#

If we have _ordinal_ values (numbers), we can turn them back into characters using `ord`'s opposite, `chr` using the same syntax as we would for `ord`.

1b. `print` the character values of the following `5` ordinal values:

* `71`
* `46`
* `87`
* `105`
* `122`

In [15]:
# 

W


### Full disclosure

I am required by the Ethical Code of All Computer Science Teachers<sup>TM</sup> to make you aware of the fact that I've been telling you a convenient lie all semester.

<img src = "https://i.imgflip.com/4i1d15.jpg">

This is because `string` objects, while appearing like any other _data type_ (`float`, `int`, `boolean`, et al.) really _aren't_. We can treat `string`s like a _data type_ **and** a _data structure_. 

Let's take a look at an example.

In [None]:
cat_name = "Ulysses"

for letter in cat_name:
    print(letter)

Notice that _each `character`_ `print`s on a separate line. This is because `string`s function a lot like a _data structure_ -- specifically `list`s. 

### Similarities to `list`s

* a _known length_
* access via _indexes_
* the abilty to _slice_ them
* ability to iterate over them

In [None]:
# known length
print("cat_name length:", len(cat_name))

# access via index
print("cat_name index 4:", cat_name[4])

# ability to slice
print("cat_name sliced:", cat_name[2:5])

print("Iteration:")
for letter in cat_name:
    print(letter)

### Differences from `list`s

* `string`s are _immutable_: we can't change them _directly_
* different _methods_ than a `list`
* `string`s only accept `string` data (we can't mix data types)

#### Methods

We've learned a bit about _methods_ in this course, though we can't spent a lot of time with them. Specifically, we know things like `append` and `remove` vis-a-vis our use of them with `list`s. REgardless of the object type we're working with, the syntax is always the same:

$$ variable\_name.method\_name(arguments) $$

As concerns `string`s, we have quite a few things we can do with them (this is a _small sampling_):

| Method | Argument(s) |Effect |
|--------|-------------|-------|
|`.lower()`|None | Converts entire string to lower case |
|`.upper()`|None |Converts entire string to upper case |
|`.count()`|`string` to count instances of|Counts the number of times a given substring appears in a `string`|
|`.endswith()`|`string`/`tuple` of `string`s to look for | Returns `boolean` if string does/n't end with `string` argument |
|`.startswith()`|`string`/`tuple` of `string`s to look for| Returns `boolean` if string does/n't start with `string` argument |
|`.replace()`|`string` to find, `string` to replace it with, `integer` times to replace| Replace searched string with specified replacement `N` times|
|`.split()`|`string` on which to to "split" `string` (default: spaces) |Splits a `string` into parts (a `list`)|
|`.join()`|`list` to "glue" together into a string|Fuses a `string` together from a `list` of `string`s|

In the above table, `join()` behaves a bit differently than the others. See the following:

In [5]:
sentences = ["It was the best of times","it was the worst of times."]
", ".join(sentences)

'It was the best of times, it was the worst of times.'

The "glue" (the `string` that joins the two `list`s together) comes _first_.

#### Detour into `string` immutability

We say a `string` is "immutable," though clearly some of the methods above alter the contents of a `string`. Let's observe this hands on.

In [6]:
introduction = "My cat's name is Ulysses."
print(introduction.replace("Ulysses","The Boss"))
print(introduction)

My cat's name is The Boss.
My cat's name is Ulysses.


As we can see in the above example, calling the `replace` method on the `string` _doesn't change the underlying `string`_ (as shown when we `print` it again at the end). Instead, it creates a new copy of the `string` which contains the replacement. This is what we mean when we say that `strings` are _immutable_ -- unless we re/assign them, the original data doesn't change.

#### 2. Complete the following examples.

In [7]:
# Example 1

while True:
    choice = input("Tell me something (enter [N]o to quit): ")
    # use the upper method on choice to make either "n" or "N" valid choices to end the loop
    if choice.upper() == "N":
        print("Stop telling me stuff!")
        break

Tell me something (enter [N]o to quit):  N


Stop telling me stuff!


In [8]:
# Example 2

animals = ["cat","rat","antelope","bat","anteater","ant","abalone","python","Ulysses"]
for animal in animals:
    # Finish the following statement to print `True` if an animal's name starts with the letter "a"
    print(animal + ":",animal.startswith("a"))

cat: False
rat: False
antelope: True
bat: False
anteater: True
ant: True
abalone: True
python: False
Ulysses: False


In [9]:
# Example 3

# This is a lie
bad_message = "My cat Ulysses is the worst."
# Use the replace method to substitute the word "worst" for the word "best"
good_message = bad_message.replace("worst","best")
print(good_message)

My cat Ulysses is the best.


In [10]:
# Example 4

quote = "A string divided cannot stand."

# Split quote on spaces
words = quote.split()
print(words)

# Join quote together using spaces
joined = " ".join(words)
print(joined)

['A', 'string', 'divided', 'cannot', 'stand.']
A string divided cannot stand.


In [11]:
# Example 5

verse = "Betty bought some butter, but the butter was bitter, so Betty bought some better butter to make the bitter butter better."

# Count the number of words that start with the letter b, display those words in a list
letter_b = 0
words = []

for word in verse.split():
    if word.lower().startswith("b"):
        words.append(word)

print(words)

['Betty', 'bought', 'butter,', 'but', 'butter', 'bitter,', 'Betty', 'bought', 'better', 'butter', 'bitter', 'butter', 'better.']


### Final exercise

Knowing what we now know about strings, your final task is to combine a bit of knowledge we already have with a bit more that we gained in this worksheet. Your task is to write code that:

* takes `input` from the keyboard
* counts the total number of vowels in the `string` (vowels: `a`,`e`,`i`,`o`,`u`)
* counts the number of each individual vowel
* `print`s the total count of vowels
* `print`s the total count for _each vowel_
  * The letter should be separated from the number by a tab (`\t`)
* Looks like the output below:

```
Enter a string:

Your input string: THE STRING INPUT ABOVE

The string has # vowels:
a     #
e     #
i     #
o     #
u     #
```

This must pass the test case input:

```
Come now, Roy, we will first make them bad hot coffee some snow would cool.
```

Keep in mind:

* captial `A` and lowercase `a` are _different_
* there are at least 3 different ways to do this, but they almost all involve:
   * a `for` loop
   * at least one method (likely more)
* this is the perfect application for a `dictionary` whose `keys` are the individual vowels

In [12]:
# This is the optimal solution

vowels = {}

input_str = input("Enter a string:")
total_vowels = 0

for char in input_str:
    char = char.lower()
    if char.startswith(('a','e','i','o','u')):
        try:
            vowels[char] += 1
        except KeyError:
            vowels[char] = 1
        total_vowels += 1

print("Your input string:",input_str)
print("The string has",total_vowels,"vowels.")
for vowel in vowels:
    print(f"{vowel}\t{vowels[vowel]}")

Enter a string: This is a string.


Your input string: This is a string.
The string has 4 vowels.
i	3
a	1
