# Mystery Friend

You've received an anonymous postcard from a friend who you haven't seen in years. Your friend did not leave a name, but the card is definitely addressed to you. So far, you've narrowed your search down to three friends, based on handwriting:
- Emma Goldman
- Matthew Henson
- TingFang Wu

But which one sent you the card?


## Creating Feature Vectors and other variables

1. `CountVectorizer` from sklearn.feature_extraction.text and `MultinomialNB` from sklearn.naive_bayes is imported

In [76]:
# import sklearn modules here:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

2. Define `bow_vectorizer` as an implementation of `CountVectorizer`.

In [77]:
# Create bow_vectorizer:
bow_vectorizer = CountVectorizer()

3. `bow_vectorizer` is trained and vectorized with all of my friends' writing (stored in the variable `friends_docs`). The resulting vector object is saved as `friends_vectors`.

In [78]:
import import_ipynb

from goldman_emma_raw import goldman_docs
from henson_matthew_raw import henson_docs
from wu_tingfang_raw import wu_docs


friends_docs = goldman_docs + henson_docs + wu_docs

# Define friends_vectors:
friends_vectors = bow_vectorizer.fit_transform(friends_docs)

4. A new variable `mystery_vector` is created. It is assigned to the vectorized form of `[mystery_postcard]` using the vectorizer's `.transform()` method.

In [93]:
# Define mystery_vector:
from mystery_postcard import mystery_postcard
mystery_vector = bow_vectorizer.transform([mystery_postcard])

#mystery_postcard is a string, while the vectorizer expects a list as an argument. thus the []

## Mystery Friend Gets Classified

5. All the documents have been vectorized. Let's now take a look at my friends' writing samples to get a sense of how they write by printing out each friend's document (any random document will do - I just used the 31st document in this case)


In [94]:
# Print out a document from each friend:

print("Goldman's Document:"+ goldman_docs[30])
print("Henson's Document:"+ henson_docs[30])
print("Wu's Document:"+ wu_docs[30])


Goldman's Document: God is everything, man is nothing, says religion
Henson's Document:On the first and the last of
Peary's expeditions, success was marred by tragedy
Wu's Document: There were a few Chinese children among the students,
and one of them was pointed out to me as the police superintendent.
This not only eloquently spoke of his popularity, but showed goodwill
and harmony among the several hundred children, and the entire absence
of race feeling


6. We now have an inkling about which friend wrote the mystery card. We can use a classifier to confirm these suspicions.

   A Naive Bayes classifier using `MultinomialNB` is implemented. The result is saved to `friends_classifier`.

In [95]:
# Define friends_classifier:
friends_classifier = MultinomialNB()

7. `friends_classifier` is fitted (trained) on `friends_vectors` and `friends_labels` using the classifier's `.fit()` method.

In [96]:
friends_labels = ["Emma"] * 154 + ["Matthew"] * 141 + ["Tingfang"] * 166

# Train the classifier:
friends_classifier.fit(friends_vectors, friends_labels)

MultinomialNB()

8. To predict which friend wrote the postcard, the classifier's `predict()` method is called on the `mystery_vector`.

In [97]:
predictions = friends_classifier.predict(mystery_vector)
print(predictions)

['Matthew']


## Mystery Revealed!

9. We can now see who my mystery friend was all along!

In [98]:
mystery_friend = predictions[0] if predictions[0] else "someone else"

print("The postcard was from {}!".format(mystery_friend))

The postcard was from Matthew!
