# Mystery Friend

You've received an anonymous postcard from a friend who you haven't seen in years. Your friend did not leave a name, but the card is definitely addressed to you. So far, you've narrowed your search down to three friends, based on handwriting:
- Emma Goldman
- Matthew Henson
- TingFang Wu

But which one sent you the card?

Just like you can classify a message as spam or not spam with a spam filter, you can classify writing as related to one friend or another by building a kind of friend writing classifier. You have past writing from all three friends stored up in the variable `friends_docs`, which means you can use scikit-learn's bag-of-words and Naive Bayes classifier to determine who the mystery friend is!

Ready?

## Feature Vectors Are in the Bag with Scikit-Learn

1. In the code block below, import `CountVectorizer` from `sklearn.feature_extraction.text`. Below it, import `MultinomialNB` from `sklearn.naive_bayes`.

In [1]:
# import sklearn modules here:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

2. Define `bow_vectorizer` as an implementation of `CountVectorizer`.

In [2]:
# Create bow_vectorizer:
bow_vectorizer = CountVectorizer()

3. Use your newly minted `bow_vectorizer` to both `fit` (train) and `transform` (vectorize) all your friends' writing (stored in the variable `friends_docs`). Save the resulting vector object as `friends_vectors`.

In [3]:
import import_ipynb

from goldman_emma_raw import goldman_docs
from henson_matthew_raw import henson_docs
from wu_tingfang_raw import wu_docs

friends_docs = goldman_docs + henson_docs + wu_docs

# Define friends_vectors:
friends_vectors = bow_vectorizer.fit_transform(friends_docs)

importing Jupyter notebook from goldman_emma_raw.ipynb
importing Jupyter notebook from henson_matthew_raw.ipynb
importing Jupyter notebook from wu_tingfang_raw.ipynb


In [4]:
friends_vectors.shape

(461, 3375)

4. Create a new variable `mystery_vector`. Assign to it the vectorized form of `[mystery_postcard]` using the vectorizer's `.transform()` method.

   (`mystery_postcard` is a string, while the vectorizer expects a list as an argument.)

In [11]:
mystery_postcard = """
My friend,
From the 10th of July to the 13th, a fierce storm raged, clouds of
freeing spray broke over the ship, incasing her in a coat of icy mail,
and the tempest forced all of the ice out of the lower end of the
channel and beyond as far as the eye could see, but the _Roosevelt_
still remained surrounded by ice.
Hope to see you soon.
"""

# Define mystery_vector:
mystery_vector = bow_vectorizer.transform([mystery_postcard])


## This Mystery Friend Gets Classified

5. You've vectorized and prepared all the documents. Let's take a look at your friends' writing samples to get a sense of how they write.

   Print out one document of each friend's writing - try any one between `0` and `140`. (Your friends' documents are stored in `goldman_docs`, `henson_docs`, and `wu_docs`.)

In [18]:
# Print out a document from each friend:

print("Emma Goldman's writing:\n", goldman_docs[0])
print("\nMatthew Henson's writing:\n", henson_docs[46])
print("\nWu Tingfang's writing:\n", wu_docs[100])


Emma Goldman's writing:
 
The history of human growth and development is at the same time the
history of the terrible struggle of every new idea heralding the
approach of a brighter dawn

Matthew Henson's writing:
 He
learned to speak English and slept underneath my bunk.

This expedition was larger in numbers than the previous one, but the
results, owing to the impossible weather conditions, were by no means
successful, and the following season all of the expedition returned to
the United States except Commander Peary, Hugh J

Wu Tingfang's writing:
  On several occasions lady friends
from Washington, Philadelphia, and New York have visited me in Peking.
This is one of the Americans' strong points


6. Have an inkling about which friend wrote the mystery card? We can use a classifier to confirm those suspicions...

   Implement a Naive Bayes classifier using `MultinomialNB`. Save the result to `friends_classifier`.

In [14]:
# Define friends_classifier:
friends_classifier = MultinomialNB()

7. Train `friends_classifier` on `friends_vectors` and `friends_labels` using the classifier's `.fit()` method.

In [29]:
friends_labels = ["Emma"] * len(goldman_docs) + ["Matthew"] * len(henson_docs) + ["Tingfang"] * len(wu_docs)

# Train the classifier:
friends_classifier.fit(friends_vectors, friends_labels)

8. Change `predictions` value from `["None Yet"]` to the classifier's prediction about which friend wrote the postcard. You can do this by calling the classifier's `predict()` method on the `mystery_vector`.

In [30]:
# Change predictions:
predictions = friends_classifier.predict(mystery_vector)

## Mystery Revealed!

9. Uncomment the final print statement and run the code block below to see who your mystery friend was all along!

In [31]:
mystery_friend = predictions[0] if predictions[0] else "someone else"

# Uncomment the print statement:
print("The postcard was from {}!".format(mystery_friend))

The postcard was from Matthew!


10. But does it really work? Find some lines by Emma Goldman, Matthew Henson, and TingFang Wu on <a href="http://www.gutenberg.org" target="_blank">gutenberg.org</a> and save them to `mystery_postcard` to see how the classifier holds up!

    Try using the `.predict_proba()` method instead of `.predict()` and print out `predictions` to see the estimated probabilities that the `mystery_postcard` was written by each person.
   
    What happens when you add in a recent email or text instead?

In [35]:
predictions2 = friends_classifier.predict_proba(mystery_vector)

for i, label in enumerate(friends_labels):
    print(f"{label}: {predictions2[0][i]*100:.2f}%")


Emma: 1.10%
Emma: 98.90%
Emma: 0.00%


IndexError: index 3 is out of bounds for axis 0 with size 3

In [26]:
print("Goldman docs: \n\n")
print(goldman_docs[:3])




Goldman docs: 


['\nThe history of human growth and development is at the same time the\nhistory of the terrible struggle of every new idea heralding the\napproach of a brighter dawn', ' In its tenacious hold on tradition, the\nOld has never hesitated to make use of the foulest and cruelest means\nto stay the advent of the New, in whatever form or period the latter\nmay have asserted itself', ' Nor need we retrace our steps into the\ndistant past to realize the enormity of opposition, difficulties, and\nhardships placed in the path of every progressive idea']
Henson docs: 


['\nWhen the news of the discovery of the North Pole, by Commander Peary,\nwas first sent to the world, a distinguished citizen of New York City,\nwell versed in the affairs of the Peary Arctic Club, made the statement,\nthat he was sure that Matt Henson had been with Commander Peary on the\nday of the discovery', "\n\nThere were not many people who knew who Henson\nwas, or the reason why the gentleman had made th

In [27]:
print("Henson docs: \n\n")
print(henson_docs[:3])

Henson docs: 


['\nWhen the news of the discovery of the North Pole, by Commander Peary,\nwas first sent to the world, a distinguished citizen of New York City,\nwell versed in the affairs of the Peary Arctic Club, made the statement,\nthat he was sure that Matt Henson had been with Commander Peary on the\nday of the discovery', "\n\nThere were not many people who knew who Henson\nwas, or the reason why the gentleman had made the remark, and, when\nasked why he was so certain, he explained that, for the best part of the\ntwenty years of Commander Peary's Arctic work, his faithful and often\nonly companion was Matthew Alexander Henson.\n\nTo-day there is a more general knowledge of Commander Peary, his work\nand his success, and a vague understanding of the fact that Commander\nPeary's sole companion from the realm of civilization, when he stood at\nthe North Pole, was Matthew A", 'Henson, a Colored Man.\n\nTo satisfy the demand of perfectly natural curiosity, I have undertaken\nto wri

In [28]:
print("Wu docs: \n\n")
print(wu_docs[:3])

Wu docs: 


['\nThe Importance of Names\n\n  "What\'s in a name?  That which we call a rose\n  By any other name would smell as sweet."\n\n\nNotwithstanding these lines, I maintain that the selection of names is\nimportant', ' They should always be carefully chosen', ' They are apt to\ninfluence friendships or to excite prejudices according to their\nsignificance']
