# Strings

## We can do things with strings

We've already seen  in Data 8 some operations that can be done with strings.

In [None]:
first_name = "Franz"
last_name = "Kafka"
full_name = first_name + last_name
print(full_name)

Remember that computers don't understand context.

In [None]:
full_name = first_name + " " + last_name
print(full_name)

## Strings are made up of sub-strings

You can think of strings as a [sequence](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#sequence) of smaller strings or characters. We can access a piece of that sequence using square brackets `[]`.

In [None]:
full_name[1]

<div class="alert alert-danger">
Don't forget, Python (and many other langauges) start counting from 0.
</div>

In [None]:
full_name[0]

In [None]:
full_name[4]

## You can slice strings using  `[ : ]`

If you want a range (or "slice") of a sequence, you get everything *before* the second index, i.e,. Python slicing is *exclusive*:

In [None]:
full_name[0:4]

In [None]:
full_name[0:5]

You can see some of the logic for this when we consider implicit indices.

In [None]:
full_name[:5]

In [None]:
full_name[5:]

If we want to find out how long a string is, we can use the `len` function:

In [None]:
len(full_name)

## Strings have methods

* There are other operations defined on string data. These are called **string [methods](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#method)**. 
* The Jupyter Notebooks lets you do tab-completion after a dot ('.') to see what methods an [object](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#object) (i.e., a defined variable) has to offer. Try it now!

In [None]:
str.

Let's look at the `upper` method. What does it do? Let's take a look at the documentation. Jupyter Notebooks let us do this with a question mark ('?') before *or* after an object (again, a defined variable).

In [None]:
str.upper?

So we can use it to upper-caseify a string. 

In [None]:
full_name.upper()

You have to use the parenthesis at the end because upper is a method of the string class.
<p></p>
<div class="alert alert-danger">
Don't forget, simply calling the method does not change the original variable, you must *reassign* the variable:
</div>

In [None]:
print(full_name)

In [None]:
full_name = full_name.upper()
print(full_name)

For what it's worth, you don't need to have a variable to use the `upper()` method, you could use it on the string itself.

In [None]:
"Franz Kafka".upper()

What do you think should happen when you take upper of an int?  What about a string representation of an int?

In [None]:
1.upper()

In [None]:
"1".upper()

## Challenge 1: Write your name

1. Make two string variables, one with your first name and one with your last name.
2. Concatenate both strings to form your full name and [assign](https://github.com/dlab-berkeley/python-intensive/blob/master/Glossary.md#assign) it to a variable.
3. Assign a new variable that has your full name in all upper case.
4. Slice that string to get your first name again.

In [11]:
first_name = "Bernice"
last_name = "Wong"
full_name = first_name + " " + last_name
full_name.upper()
full_name[:7]

'Bernice'

## Challenge 2: Try seeing what the following string methods do:

    * `split`
    * `join`
    * `replace`
    * `strip`
    * `find`

In [None]:
my_string = "It was a Sunday morning at the height of spring."

In [None]:
split = "The function splits the string by white space."
join = "The function joins the string argument with a str seperator."
replace = "The replace function returns a string in which the occurences of the first argument are replaced by the second argument."
strip = "The function strips all instances of the argument from the beginning and end of the string."
find = "Returns the index number of where the string begins."

## Challenge 3: Working with strings

Below is a string of Edgar Allen Poe's "A Dream Within a Dream":

In [25]:
poem = '''Take this kiss upon the brow!
And, in parting from you now,
Thus much let me avow —
You are not wrong, who deem
That my days have been a dream;
Yet if hope has flown away
In a night, or in a day,
In a vision, or in none,
Is it therefore the less gone?  
All that we see or seem
Is but a dream within a dream.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand —
How few! yet how they creep
Through my fingers to the deep,
While I weep — while I weep!
O God! Can I not grasp 
Them with a tighter clasp?
O God! can I not save
One from the pitiless wave?
Is all that we see or seem
But a dream within a dream?'''

poem.rfind("and")
    

407

What is the difference between `poem.strip("?")` and `poem.replace("?", "")` ?

In [17]:
"poem.strip('?') only removes all instances of '?' the beginning or at the end of the poem. poem.replace('?','') does replace all instances of ? with whitespace no matter where the question mark is at the poem."

"poem.strip('?') only removes all instances of '?' the beginning or at the end of the poem. poem.replace('?','') does replace all instances of ? with whitespace no matter where the question mark is at the poem."

At what index does the word "*and*" first appear? Where does it last appear?

In [None]:
"The word 'and' appears at index 314. It last appears at index 381. Using rfind(), the last instance of and is index 407."

How can you answer the above accounting for upper- and lowercase?

In [None]:
"Since the find function is case sensitive, we can simply either use the str.lower() or str.upper() function to find both cases."

## Challenge 4: Counting Text

Below is a string of Robert Frost's "The Road Not Taken":

In [71]:
poem = '''Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.'''

len(set(poem.replace(',', "")))

39

Using the `len` function and the string methods, answer the following questions:

How many characters (letters) are in the poem?

In [36]:
"Using the len(poem) function, there are 729 letters."

729

How many words?

In [None]:
"Using len(poem.replace("\n", " ").split()), there are 144 words."

How many lines? (HINT: A line break is represented as  `\n`  )

In [None]:
"Using len(poem.split("\n")), there are 23 line breaks."

How many stanzas?

In [None]:
"Using len(poem.split("\n\n")), there are 4 stanzas."

How many unique words? (HINT: look up what a `set` is)

In [None]:
"Using len(set(poem)), there are 40 unique words."

Remove commas and check the number of unique words again. Why is it different?

In [None]:
"Using len(set(poem.replace(',', ""))), there are 39 words. This is because the set seems to count comma as a word."