## Let's start with some Python review

As our initial data, we are going to use a phrase from Charles Dickens' "A Tale of Two Cities".

In [None]:
sentence = 'It was the best of times, it was the worst of times'

Q: What is this object?

A: A string

We can double-check the datatype of any variable by using Python's built-in `type` function.

In [None]:
type(sentence)

`type` is like `print` -- both of them are functions that we can use whenever we want without having to write any additional code.

In [None]:
print(sentence)

In [None]:
type(1)

In [None]:
type(1.5)

There are many built-in functions.  In fact, I ask you to use some in the most recent homework assignment, like `len`: https://python.readthedocs.io/en/stable/library/functions.html#len

In [None]:
len(sentence)

The page that's linked above is the documentation for Python. This reference manual contains WAY more information than you may want at the moment, but it is comprehensive and authoritative for any given version of Python.

Since `sentence` is a string, we can also look at particular characters in the string using indexing and slicing.

In [None]:
# retrieves the character at index 0
sentence[0]

In [None]:
# retrieves the characters starting at index 0 and 
# going up to (but not including) index 5
sentence[0:5]

In contrast to built-in functions, strings have functions of their own.  In fact, many variables will have special functions of their own.  These are called methods, and we use `.` after the variable name to denote that what follows is a method specific to the variable.

Example: `str.split()` will split a string up into individual elements

In [None]:
sentence.split(',')

* `str.split()` will split a string up into individual elements, based on the value between parantheses.
* https://python.readthedocs.io/en/stable/library/stdtypes.html#str.split
* No value between parentheses?  Then the method uses a default value.  For `split`, the default is to break a string up based on the location of whitespace.

In [None]:
sentence.split()

## Exercise

Let's try to find the number of times "times" occurs in this phrase.

To do this, we are going to make use of **for** loops, **if** conditional expressions, and **functions**.

In [None]:
sentence

In [None]:
# how to count the number of times a word occurs?
# start building this gradually:

for word in sentence:
    print(word)

What happened?

In [None]:
sentence.split()

In [None]:
words = sentence.split()

In [None]:
words

In [None]:
# how to count the number of times a word occurs?

#for word in sentence:
for word in words:
    print(word)

In [None]:
# how to count the number of times a word occurs?

#for word in sentence:
for word in words:
    if word == 'times':
        print('yay! I found "times"')

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + timescount + ' times.')

How do we make sense of error messages?

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + str(timescount) + ' times.')

Why not 2?

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    
    # new line
    print('yay! I found ' + word)
    # also note the use of space and comments
    # and the extension of the block of code across many lines
    
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + str(timescount) + ' times.')

In [None]:
specialchar = ','

# remove each character in the specialchars string
# "replace" is a string method that replaces the first argument with the second argument

# fails because replace is for strings not lists
# remember that functions and methods are particular to the type of data they operate on
# words.replace(',', '')

# fails because you are changing the value of the iteration variable
# but not the list itself
# for word in words:
    
#     print(word,words)
    
#     if ',' in word:
#         word = word.replace(',', '')
    
#     print(word,words)

# success because we operate on the initial string and only then break it up into words
# sentence.replace(',', '')
# n = sentence.replace(',', '')
# n.split()
# we can chain these together because the output of "sentence.replace(',', '')" is itself a string
# on which we can use ".split()"
# sentence.replace(',', '').split()
# and now:
words = sentence.replace(',', '').split()

In [None]:
words

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    
    # new line
    print('yay! I found ' + word)
    # also note the use of space and comments
    # and the extension of the block of code across many lines
    
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + str(timescount) + ' times.')

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    
    # new line
    print('yay! I found ' + word)
    # also note the use of space and comments
    # and the extension of the block of code across many lines
    
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + str(timescount) + ' times.')
        
print('"times" occurs ' + str(timescount) + ' times.')

it can be useful to have this **function**ality for every word.

In [None]:
# write the function and generalize it so that "times" is now a variable `wordtofindvar`

def wordcount(listofwordsvar, wordtofindvar):

    timescount = 0
    for word in listofwordsvar:
    
        # new line
        print('yay! I found ' + word)
        # also note the use of space and comments
        # and the extension of the block of code across many lines

        if word == wordtofindvar:
            print('yay! I found ' + word)
            timescount = timescount + 1
            print('I have now found it ' + str(timescount) + ' times.')

    print(word + ' occurs ' + str(timescount) + ' times.')

In [None]:
wordcount(words, 'it')

Something else to snag us:  Capitalization!
* to be addressed later

In [None]:
wordcount(words, 'times')

In [None]:
# one more thing:  return value

def wordcount(listofwordsvar, wordtofindvar):

    timescount = 0
    for word in listofwordsvar:
    
        # new line
        # print('yay! I found ' + word)
        # also note the use of space and comments
        # and the extension of the block of code across many lines

        if word == wordtofindvar:
            # print('yay! I found ' + word)
            timescount = timescount + 1
            # print('I have now found it ' + str(timescount) + ' times.')

    # print(word + ' occurs ' + str(timescount) + ' times.')
    return timescount

In [None]:
wordcount(words, 'times')

In [None]:
n = wordcount(words, 'times')

In [None]:
print('times occurs ' + str(n) + ' times.')

# Exercise time