# Functions for Word Counts

This unit is a recommended supplementary reading for unit 11.
It is **not** an expansion unit.
All the material covered here is also part of unit 11 or unit 12.
The purpose of this unit is to provide an easier entrance point to custom functions.
Unit 11 attempts to give a fairly realistic example of why custom functions are useful in the real world, including chatbots: custom functions greatly simplify the code.
But in order to get this point across, unit 11 has to use a fairly complex program that is gradually simplified by using functions.
This means that you have to spend a fair amount of your mental capacities on understanding the code before you can start to learn about functions.
This unit instead presents a very simple program where functions already pay off.

## Counting words

Suppose we want to determine for some string how often each word type occurs in it.
To do so, we have to first tokenize the string with a regular expression, and then ask Python to determine the number of word tokens for each word type.
The code for this is shown below.
Do not worry too much about the specifics for now, those will be covered in Unit 12.

In [None]:
# load the re module, we need regular expressions for tokenization
import re
# then load the Counter function from the collections module;
# we need this to count word tokens
from collections import Counter

# our example string
string1 = "The sun shone, having no alternative, on the nothing new."
print(string1)

# re.findall takes a string as input and computes a list of all the matches;
# since \w+ matches 1 or more word characters, this will split the string into a list of words;
# we also use str.lower because capitalization is misleading for word counts ("The" and "the" are the same word type)
words1 = re.findall(r"\w+", str.lower(string1))
print(words1)

# now we just feed the list words1 into Counter
counts1 = Counter(words1)
print(counts1)

But now suppose that we want to print the counts for five different strings.
This gets rather tedious.

In [None]:
import re
from collections import Counter

string1 = "The sun shone, having no alternative, on the nothing new."
string2 = "Murphy sat out of it, as though he were free, in a mew in West Brompton."
string3 = "Here for what might have been six months he had eaten, drunk, slept, and put his clothes on and off, in a medium-sized cage of north-western aspect commanding an unbroken view of medium-sized cages of south-eastern aspect."
string4 = "Soon he would have to make other arrangements, for the mew had been condemned."
string5 = "Soon he would have to buckle to and start eating, drinking, sleeping, and putting his clothes on and off, in quite alien surroundings."

# tokenize the normalized strings, count the words, and print to screen
words1 = re.findall(r"\w+", str.lower(string1))
counts1 = Counter(words1)
print(counts1)

words2 = re.findall(r"\w+", str.lower(string2))
counts2 = Counter(words2)
print(counts2)

words3 = re.findall(r"\w+", str.lower(string3))
counts3 = Counter(words3)
print(counts3)

words4 = re.findall(r"\w+", str.lower(string4))
counts4 = Counter(words4)
print(counts4)

words5 = re.findall(r"\w+", str.lower(string5))
counts5 = Counter(words5)
print(counts5)

# and now do counts for everything together
passage = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5
words = re.findall(r"\w+", str.lower(passage))
counts = Counter(words)
print(counts)

As you can see, we are repeating the same code over and over again, changing only the variable names.
It would be much nicer if we could attach a name `foo` to those chunks of code and just tell Python to execute `foo` with `string1`, then with `string2`, then with `string3`, and so on.
Custom functions allow us to do just that.

In [None]:
import re
from collections import Counter

string1 = "The sun shone, having no alternative, on the nothing new."
string2 = "Murphy sat out of it, as though he were free, in a mew in West Brompton."
string3 = "Here for what might have been six months he had eaten, drunk, slept, and put his clothes on and off, in a medium-sized cage of north-western aspect commanding an unbroken view of medium-sized cages of south-eastern aspect."
string4 = "Soon he would have to make other arrangements, for the mew had been condemned."
string5 = "Soon he would have to buckle to and start eating, drinking, sleeping, and putting his clothes on and off, in quite alien surroundings."


# define a custom function for counting words
def count_words(string):
    words = re.findall(r"\w+", str.lower(string))
    counts = Counter(words)
    return counts

# tokenize the normalized strings, count the words, and print to screen;
# the normalization, tokenization and word counting now all happens inside the count_words function
print(count_words(string1))
print(count_words(string2))
print(count_words(string3))
print(count_words(string4))
print(count_words(string5))

# and now do counts for everything together
passage = string1 + " " + string2 + " " + string3 + " " + string4 + " " + string5
print(count_words(passage))

## How functions work

A custom function is similar to a variable in that it allows us to attach a name to a specific thing.
But whereas variables attach names to values, custom functions attach names to chunks of code.
Functions always follow the same format:

```
def function_name(argument1, argument2, ..., argument999):
    # do something with the arguments
    # and return an output
    return output
```

1. they start with `def`, which is short for *define*
1. then we get the function name; as usual, names should be lowercase and use _ instead of spaces
1. after that we have opening and closing parenthesis, with zero or more arguments in between
1. arguments can be used like defined variables inside the function code
1. once the function is done computing, it returns a specific output based on the previous computations

Let us look again at the function `count_words`.

In [None]:
# we define the function count_words with a single argument, called string
def count_words(string):
    # since string is an argument, we can use it like a variable name on the next line
    words = re.findall(r"\w+", str.lower(string))
    counts = Counter(words)
    # we have computed a value, stored as the variable counts
    # we now return this as the output of the function
    return counts

And this is what happens internally in Python when we run `count_words(string1)`:

1. `string1` refers to `"The sun shone, having no alternative, on the nothing new."`, so Python runs `count_words("The sun shone, having no alternative, on the nothing new.")`.
1.  The definition of `count_words` tells Python that it should compute `re.findall(r"\w+", str.lower("The sun shone, having no alternative, on the nothing new."))`.
1.  It then stores this as a **local** variable `words`.
    Local means that the variable cannot be used outside the function.
    And if you already defined a variable `words` outside the function, its value will not be overwritten by what happens inside the function.
    When it comes to variables, you can think of functions as a black hole: nothing gets out.
1.  It then computes `Counter(words)` using the local variable `words`.
    The value of `Counter(words)` is stored as the local variable `counts`.
    Again this does not conflict with any variable `counts` that you might have defined outside the function.
1.  Finally, the function returns the value of `counts` as its output.

We can see each step by adding `print` statements to the function

In [None]:
import re
from collections import Counter

string1 = "The sun shone, having no alternative, on the nothing new."

# we define the function count_words with a single argument, called string
def count_words(string):
    print("Value of string is:", string)
    words = re.findall(r"\w+", str.lower(string))
    print("Value of words is:", words)
    counts = Counter(words)
    print("Value of counts is:", counts)
    # we have computed a value, stored as the variable counts
    # we now return this as the output of the function
    return counts

print("Output of function is:", count_words(string1))

And here is a code snippet that shows that local variables within a function do not conflict with variables outside the function.

In [None]:
words = "This is not a list of words"

# the count_words function uses a local variable words
count_words(string1)

print("Value of words outside of function is:", words)

## More Examples

Here are a few more examples of custom functions.
Most of them aren't very useful, but they are short and simple so that you can focus on figuring out what they do, and **how** they do it.
Pay close attention to how arguments are used inside the function, and where a `return` statement may occur.

In [None]:
def double_string(string):
    return string + " " + string

print(double_string("I don't want to repeat myself!"))

In [None]:
def politics_filter(string):
    if "Trump" in string or "Hillary" in string:
        return "censored"
    else:
        return string
    
print(politics_filter("Vote Trump!"))
print(politics_filter("Hillary should have won!"))
print(politics_filter("Politics have no relation to morals."))

In [None]:
import random

def random_greeting(names):
    greeting = random.choice(["Hi, ", "Hello, "])
    name = random.choice(names)
    return greeting + name + "!"

print(random_greeting(["John", "Mary", "Sue", "Bill", "Paul", "Anne"]))

In [None]:
def madlibs(adjective, verb, noun):
    string = "An " + adjective + " man was " + verb + "ing his " + noun
    return string

print(madlibs("expensive", "fail", "tardiness"))

In [None]:
def always_do_the_same():
    return "The output of this function never changes"

print(always_do_the_same())

Don't worry too much about where you should or should not use custom functions, that is something you will naturally figure out as we add more and more functions to our code.
For now, the important thing is that you learn to write a correct function definition.