# Solutions: Practice Session 

## Built-in and custom functions

**Exercise.**
Recall one of the classifier programs from previous assignments (either the D&D classifier or the animal classifier).
Remember having to write several different `if`-statements to handle whether the user typed `Yes`, `yes`, `y` or `Y`?
Rewrite one of those programs using `string.lower` to simplify those `if`-statements.
Note that this also allows you to accept input like: `YES` `yEs`, and so on.

### Solution.

This exercise is almost entirely dependent on the way you chose to implement your classifier, but the important takeaway is that you can use `str.lower()` on the inputs to avoid having to write many `if` statements handling different capitalization on the inputs "yes" or "y". We can also further shorten it by checking if the input is in a list containing the possible answers "yes" and "y". This allows for all sorts of inputs ranging from `YeS` and `yES` to `Y` and `y`, which we know all mean the same thing.

In [5]:
eats_meat, has_hair = False, False

def yes(string_in):
    if str.lower(string_in) in ["yes","y"]:
        return True

print("Is it an insect?")
if yes(input()):
    print("The animal is a beetle.")
else:    
    print("Does it hop?")   
    if yes(input()):
        print("The animal is a rabbit.")
    else:
        print("Does it eat meat?")
        if yes(input()):
            eats_meat = True

        print("Does it have hair?")
        if yes(input()):
            has_hair = True
            
        if eats_meat and has_hair:
            print("The animal is a cat.")
        elif eats_meat and not has_hair:
            print("The animal is a lizard.")
        elif not eats_meat and has_hair:
            print("The animal is a panda.")
        else:
            print("Stop messing around!")

Does it hop?
n
Is it an insect?
n
Does it eat meat?
n
Does it have hair?
y
The animal is a panda.


**Exercise.**
Many functions combine `for`, `if`, and `return` to accomplish their task.
When working with these tools, it is easy to introduce mistakes that do not cause the program to crash but make it behave in undesired ways.
This is a *semantic bug*.

The code below contains a crucial mistake on a single line.
Fix this line, and add a comment that explains what was wrong with it.

### Solution.

See block comment in the code for explanation.

In [None]:
def extract_odd_digits(some_string):
    # return list of all odd digits in string
    # empty list of found digits
    digits = []
    
    # check for each character if it is a vowel
    for char in some_string:
        if char in ["1", "3", "5", "7", "9"]:
            list.append(digits, char)
    return digits # THIS IS THE LINE WITH THE MISTAKE. UNINDENT THIS LINE.
    # If your return statement is indented such that it is 
    # occurring inside the for loop, the function returns only 
    # after the first iteration through the loop. This means it 
    # only looks at the first character of the string.

# test the function
print(extract_odd_digits("1 hello i was born in 1823"))

**Exercise.**
As with the previous exercise, the code below contains a hidden mistake on a single line.
On another line, it also contains a different mistake that will crash the program.
Spot the mistakes, fix them, and add comments that explain the respective issue.

### Solution.

See lines marked with `***ERROR # HERE***`. 

**Error 1** is happening because the Counter collection requires us to access its data via key-value pairs. In other words, every element of a Counter is a character we can "look-up" (like a word in a dictionary) associated with a numerical value representing the count of that character. The `for`-loop is iterating through every `count` in `char_counter`, but just because we named the variable `count` doesn't mean we're actually accessing a number. In fact, what we are trying to do by saying this is iterate through the characters in `char_counter`, not their counts. So, knowing this, if we just say `if count > 5`, the code interprets this as trying to compare a string to an integer. There is no way to determine whether h > 5, for example, so the program will crash. 

To fix this issue, we need to actually address the numerical count associated with a given letter. The characters of `char_counter` act as keys, so in order to get their values we can say `char_counter[count]`. This will fix the problem of the program crashing. If it helps, try renaming `count` to something else, like `c`, since the variable itself is a bit misleading.

As for **Error 2**, the problem there is that the `if` statement is outside of the `for`- loop, so the message about having a lot of tokens for one character might never print even if you have characters with more than 5 tokens in the string. Why? Because the `for`-loop resets `message` at each iteration. Unless the last character it loops over in `char_counter` has > 5 tokens, `message` will not be `True` by the time an `if` statement outside of the loop checks it. By putting it inside the loop, this check is done at each iteration so the message will print any time a character has more than 5 tokens.

In [None]:
from collections import Counter

def char_counter(some_string):
    # count tokens for each character
    char_counter = Counter(some_string)

    # test if some character occurs very often
    for count in char_counter:
        message = False
        if char_counter[count] > 5: # ***ERROR 1 HERE***
            # you must access by "char_counter[count]" rather than "count"
            message = True
    # print message if at least one character occured very often
        if message == True: # ***ERROR 2 HERE*** 
            # indent so that this is happening inside for loop
            print("Wow, that's a lot of tokens for one character")

# test the function
char_counter("hhhhhhelllo worllld")

## `for`-loops

**Exercise.**
Write a function that takes a string as its input and prints each character that is an English vowel (a, e, i, o, u).
For example, with `toy` the function prints `o`, whereas `banana` results in `a a a` being printed to the screen.
See below for how you can have multiple `print` statements appear on the same line.

### Solution.

We want to print only the vowels of a given string. So, using a `for`-loop, look at every letter in the string. If the letter is a vowel (as in, it is in the list of vowels _aeiou_), print the letter. That's it!

In [None]:
def vowels_only(string):
    for letter in string:
        if letter in ['a','e','i','o','u']:
            print(letter, end=" ")

# test the function
vowels_only(input())

**Optional exercise.**
This continues the previous exercise.
Modify the function so that instead of printing the string, it returns it.
Depending on what solution you came up with for the previous exercise, this may require a major redesign of your function.

### Solution.

One way to do this is to just create an empty string `vowels`, to which we will add the vowels we encounter. This is essentially the same as the function in the previous exercise, but for each letter in the string, if it is a vowel, add it to the `vowels` string instead of printing it. A whitespace can also be added to maintain the same output as the previous function. Then, after looking at all the letters in the string, return the list of `vowels`. 

In [None]:
def vowels_only(string):
    vowels = ""
    for letter in string:
        if letter in ['a','e','i','o','u']:
            vowels = vowels + letter + " "
    return vowels

# test the function
print(vowels_only(input()))

**Exercise.**
Write a program that looks at these three strings:

- *supercalifragilisticexpialidocious*, and
- *sesquipedalianism*, and
- *squirreled*.

The program should print the length of each string and the average character length of all three strings.

### Solution.

Although the exercise specifies three particular strings, we can generalize our program by writing a function that computes the average length of a list of strings. To do this, we need to initialize a variable that will store the total number of characters of all the strings combined, to be used later for computing the average.

Using a `for`-loop, we look at each string in the list of strings, print the string and its length, and then add the length of the string to the `total_len`. After looking at all strings and adding up their lengths, we return the value given by dividing `total_len` by the length of the `strings` list, which is the number of strings we are working with. This gives us the average character length across the strings in the list.

Common mistakes here might include initializing `total_len` inside of the `for`-loop (which will reset it at each string).

In [None]:
def compute_avg_len(strings):
    total_len = 0
    for s in strings:
        print(s, len(s))
        total_len += len(s)
    return (total_len / len(strings))

strings = ["supercalifragilisticexpialidocious", 
           "sesquipedalianism", 
           "squirreled"]
print("Average character length of strings:", 
      compute_avg_len(strings))

**Exercise.**
Write a function that takes a list of classes and counts the number of LIN classes.
For instance, the input `["LIN 120", "LIN 347", "AMS 161", "LIN 405", "EST 581"]` should yield the output `3`.

### Solution.
We want to write a function that takes a list of classes and counts the number of LIN classes. We start by initializing a counter to store the number of times we encounter "LIN". Since the function takes in a list of classes, we can use a `for`-loop to look at each class in the list. If we find the string "LIN" in the class we're currently look at, we add 1 to our counter. After looking at all the classes in the list, we return the total count.

In [None]:
def count_lin(classes):
    count = 0
    for c in classes:
        if "LIN" in str.upper(c):
            count += 1
    return count

# test the function
classes = ["LIN 120", "LIN 347", "AMS 161", "LIN 405", "EST 581"]
count_lin(classes)

There are a couple places where this could go wrong. One would be instantiating the counter inside of the `for`-loop, which will cause the final count to be at most 1 since it is being reset for each class in the list. Alternatively, a newbie may forget to initialize the counter at all, in which case the program will crash because it doesn't know what to do with this `count` thing you are trying to increment.

One must also be careful not to return the count within the `for`-loop or `if` statement. Doing so in the former case would mean the function only ever looks at the first class in the list, whereas in the latter case the loop will stop and return the count as soon as it encounters the first "LIN" in the list.

## Regular Expressions

**Exercise.**
Suppose you are creating a chatbot, but you do not want it developing a potty-mouth.
Knowing that people can be pretty inventive write with their profanity, you decide to ban all 4 letter words from your data.
Write a regular expression that will delete all 4 letter words from a string.

For example, `Hello, I want you to know my name is Bill` should become `Hello, I  you to  my  is ` (removing the extra space is optional).

### **Solution.** 

In order to specify the pattern that matches a string of certain length, we need to:
1. include the possible things that might come before the string
2. define the properties of the string we're looking for (i.e., any word of length 4)
3. include the possible things that might come after the string

So, if we are looking for a string that is only length 4, not just any string containing at least 4 characters, we need to distinguish that we are looking for word boundaries (`\b`) before and after the string. We don't necessarily want to look for whitespace (`\s`) because this will cause the first and last words to be ignored - first word isn't preceded by a whitespace, last word isn't followed by a whitespace. If we try to use `\W` we'll encounter a similar problem. To specify that we want 4 characters, we can use `\w{n}`, where `n` is the number of characters, in this case 4.

In [None]:
import re 

def ban4(some_string):
    some_string = re.sub(r'\b\w{4}\b', r'', some_string) # COMPLETED LINE
    return some_string

# code for testing your regular expression
data = input("Enter some words: ")
while data.lower() != 'quit':
    print('Censored data: ', ban4(data))
    data = input("Enter some words: ")

**Exercise.**
Write a custom function that takes a single word in the form of a string and returns whether it is a *well-formatted token*.
For the purposes of this exercise, a well-formatted token is one that only contains lowercase letters, digits, and/or apostrophes (`'`).
In other words, it **must not contain**

1. uppercase letters (`ABC...`)
1. punctuation (`!?.,;-`)
1. special characters (`@#%...`)
1. white space (space or tab)

For example: `Hello!` is not well-formatted, nor is `why?`, `Hi`, or `John's`.
But `me`, `you`, `all`, `hello`, `why` are all well-formatted tokens.

Your function should return `True` if the string is a well-formatted token and `False` if it is not.

### Solution.

We are looking to match tokens that contain uppercase letters and anything that isn't a Unicode word character (i.e., punctuation, special characters, and white space). The latter, as we know, can easily be identified using `\W`, and the former we can identify as any character in the set A - Z (NOT a-z, that would be all lowercase letters). As a regular expression, this would be `[A-Z]`. Since tokens are sequences of characters, we can express the full pattern as "a **sequence** of characters taken from the **set of uppercase letters** OR **non-word characters**." As a regex, this is represented as `([A-Z]|\W)`.

Now that we've concocted the pattern we're trying to match, we can simply search for the pattern using `re.search()` and if the pattern is matched, return `False` (the token is not well formatted), otherwise return `True`. 

In [None]:
import re

def is_well_formatted(token):
    # return True if the token is well formatted, otherwise False
    if re.search(r'([A-Z]|\W)',token):
        return False
    else:
        return True

# test the function
print(is_well_formatted(input()))

**Optional exercise.**
This continues the previous exercise.
Write another function that takes a list of strings and determines if all of them are well formatted tokens.
You can copy the same code for the function you wrote above and call that function from within your new function, or you can rewrite the code in the new function. 

For example: 

- `['hello', 'world', 'this', 'is', 'a', 'tokenized', 'string']` is well-formatted because every string in the list is a well-formatted token.
- `['Hello,', 'world!', 'This', 'is', 'a', 'tokenized', 'String!']` is not well-formatted; even though some of the strings in the list are well-formatted, at least one is not.

Your function should return `True` if all the words are well formatted and `False` if they are not.

### Solution.

We want to check each token in the list of tokens for well formattedness. Using the function defined in the previous exercise, this function will return `False` if a token in the list is not well formatted, i.e. if `is_well_formatted()` is **not** returning `True` for the token. This way, once we encounter a token that is not well formatted, we don't need to check the others - we already know that the list of strings contains at least one string that isn't well formatted. Otherwise, the function will return `True` since it hasn't encountered any tokens that were not well formatted.


In [None]:
# return True if ALL tokens are well formatted and False otherwise

def all_well_formatted(tokens):
    for token in tokens:
        if not is_well_formatted(token):
            return False
    return True
        
# test the function
print(all_well_formatted(['hello', 'world', 'this', 'is', 'a', 'tokenized', 'string']))
print(all_well_formatted(['Hello,', 'world!', 'This', 'is', 'a', 'tokenized', 'String!']))

**Expansion Exercise.** 

Now that you have an idea of how `all()` works, rewrite your function from the optional exercise using `all()`.

### Solution.

`all()` we need to do here (har har) is apply our first function, `is_well_formatted()`, to each token in the list of tokens given to the function, and then have `all()` evaluate the resulting list of truth values. We can do this in one line by doing it at the same time that we call `all()`. 

In [None]:
import re

def is_well_formatted(token):
    # return True if the token is well formatted, otherwise False
    if re.search(r'([A-Z]|\W)',token):
        return False
    else:
        return True
    
def all_well_formatted(tokens):
    return all(is_well_formatted(token) for token in tokens)

# test the function
print(all_well_formatted(['hello', 'world', 'this', 'is', 'a', 'tokenized', 'string']))
print(all_well_formatted(['Hello,', 'world!', 'This', 'is', 'a', 'tokenized', 'String!']))