## Learning Python with Dickens

We are going to explore an applied problem as a way to explore what we need to do with Python.

As our initial data, we are going to use a phrase from Charles Dickens' "A Tale of Two Cities":

In [None]:
sentence = 'It was the best of times, it was the worst of times'

* This object is a string.
* `sentence` is our name for a variable that stores the string.
* `=` is a Python way to say "Take the quantity on the right and store it in the variable on the right"

In [None]:
print(sentence)

# Or, within Jupyter notebooks, 
# we can put the variable by itself in a cell, 
# and then run the cell to output the variable's value.
## 
## uncomment the line below
## 
# sentence

In the cell below, how can we find the number of characters in the sentence?

How about the number of phrases?  or the number of words?

In [None]:
sentence.split()

In [None]:
len(sentence.split())

We're actually using `len` on a non-string object here.
* `sentence.split()` gives us a collection of strings.
* Each individual element is contained in `''`
* All of the elements are contained within `[]`

This is a Python **list**.

---
### Back to Fundamentals notebook
---

Since `sentence` is a string, we can also look at particular characters in the string using indexing and slicing.

In [None]:
# retrieve the character at index 0



In [None]:
# retrieve the characters starting at index 0 and 
# going up to (but not including) index 5



In [None]:
# retrieve the 4th word of the sentence

????

print('Our sentence is:')
print(???)

print('The 4th word is:')
print(???)

## # of times of "times"

Let's try to find the number of times "times" occurs in our sentence.

In [None]:
sentence

We need to be able to (1) get the words, (2) check if any word is "times", and (3) keep track of the count.

We'll build this up gradually.  First, it will be useful to know how to do some action repetitively.  For example, when we check if the words are "times", we'll check the 1st word to see if it is "times", check the 2nd word to see if it is "times", check the 3rd word....

To do this, we are going to make use of **for** loops, **if** conditional expressions, and **functions**.

---
### Back to Fundamentals notebook
---

In [None]:
for word in sentence:
    print(word)

What is the above doing? (and note that `word` is just a variable name, not an actual indication that we're specifically telling Python to look at words)

In [None]:
sentence.split()

In [None]:
words = sentence.split()

In [None]:
words

In [None]:
# how to count the number of times a word occurs?

# for word in sentence:
for word in words:
    print(word)

In order to count the number of times "times" occurs, we need to be able to check whether any given word is "times".
* conditional expressions can be used to check True/False statements 
* "if" statements can be used to carry out actions on the basis of whether something is True or False

---
### Back to Fundamentals notebook
---

In [None]:
# how to count the number of times a word occurs?

# for word in sentence:
for word in words:
    if word == 'times':
        print('yay! I found "times"')

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + timescount + ' times.')

How do we make sense of error messages?

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + str(timescount) + ' times.')

Why not 2?

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    
    # new line
    print('yay! I found ' + word)
    # also note the use of space and comments
    # and the extension of the block of code across many lines
    
    if word == 'times':
        print('yay! I found "times"')
        timescount = timescount + 1
        print('I have now found it ' + str(timescount) + ' times.')

How can we get rid of the commas?

We'll try a couple ways (that will mostly fail -- failure is only a way to navigate towards good code!)

In [None]:
specialchar = ','

`replace` is a string method that replaces the first argument with the second argument:

In [None]:
'Hi my name is Ben.'.replace('Ben','John')

Let's try that.

In [None]:
words.replace(',', '')

This fails because replace is for strings not lists!

Remember that functions and methods are very particular about the type of data they operate on.

How about using a for loop to iterate over all the words, and then use `replace`?

In [None]:
for word in words:
    
    print(word,words)
    
    if ',' in word:
        word = word.replace(',', '')
    
    print(word,words)

This fails because you are changing the value of the iteration variable!  The iteration variable only exists to use a value inside the loop (the iterative process), it isn't *exactly* the same thing as what's stored in the list.

Let's take a step back:
* first, use `replace` on the entire sentence
* only after doing that, create a list of words using `split`

In [None]:
n = sentence.replace(',', '')
n.split()

That's better because now there are two elements that are exactly 'times' rather than being 'times,' and 'times'.

In [None]:
# we can chain these together because the output 
# of "sentence.replace(',', '')" is itself a string

words = sentence.replace(',', '').split()

In [None]:
words

In [None]:
# how to count the number of times a word occurs?

timescount = 0
for word in words:
    
    # new line
    print('I found ' + word)
    # also note the use of space and comments
    # and the extension of the block of code across many lines
    
    if word == 'times':
        timescount = timescount + 1
        print('Yay! I have now found it ' + str(timescount) + ' times.')
        
print('"times" occurs ' + str(timescount) + ' times.')

One more step:  it can be useful to have this **function**ality for every word.
* Not only can we use Python functions, we can write *our own* functions to carry out actions.

---
### Back to Fundamentals notebook
---

In [None]:
# write the function and generalize it so that "times" is now a variable `wordtofindvar`

def wordcount(listofwordsvar, wordtofindvar):

    timescount = 0
    for word in listofwordsvar:
    
        # new line
        print('I found ' + word)
        # also note the use of space and comments
        # and the extension of the block of code across many lines

        if word == wordtofindvar:
            timescount = timescount + 1
            print('Yay! I have now found it ' + str(timescount) + ' times.')

    print(wordtofindvar + ' occurs ' + str(timescount) + ' times.')

In [None]:
wordcount(words, 'it')

Something else to snag us:  Capitalization!
* to be addressed later

In [None]:
wordcount(words, 'times')

In [None]:
# one more thing:  return value

def wordcount(listofwordsvar, wordtofindvar):

    timescount = 0
    for word in listofwordsvar:
    
        # new line
        # print('I found ' + word)
        # also note the use of space and comments
        # and the extension of the block of code across many lines

        if word == wordtofindvar:
            timescount = timescount + 1
            # print('Yay! I have now found it ' + str(timescount) + ' times.')

    # print(word + ' occurs ' + str(timescount) + ' times.')
    return timescount

In [None]:
wordcount(words, 'times')

In [None]:
n = wordcount(words, 'times')

In [None]:
print('times occurs ' + str(n) + ' times.')

---
### Back to Fundamentals notebook
---