# Mystery Friend

You’ve received an anonymous postcard from a friend who you haven’t seen in years. Your friend did not leave a name, but the card is definitely addressed to you. So far, you’ve narrowed your search down to three friends, based on handwriting:

* _Emma Goldman_
* _Matthew Henson_
* _TingFang Wu_

But which one sent you the card?

Just like you can classify a message as spam or not spam with a spam filter, you can classify writing as related to one friend or another by building a kind of friend writing classifier. You have past writing from all three friends stored up in the variable `friends_docs`, which means you can use scikit-learn’s bag-of-words and Naive Bayes classifier to determine who the mystery friend is!

Ready?

## Feature vectors are in the bag with scikit-learn

Near the top of `script.py`, we import `CountVectorizer` from `sklearn.feature_extraction.text`. Below it, we import `MultinomialNB` from `sklearn.naive_bayes`.

After that, we define `bow_vectorizer` as an implementation of CountVectorizer.

Then, we use the newly minted `bow_vectorizer` to both fit (train) and transform (vectorize) all our friends’ writing (stored in the variable `friends_docs`). The resulting vector object is saved as `friends_vectors`.

Last, we create a new variable `mystery_vector`, and assign to it the vectorized form of [mystery_postcard] using the vectorizer’s `.transform()` method.

(mystery_postcard is a string, while the vectorizer expects a list as an argument.)

In [1]:
from goldman_emma_raw import goldman_docs
from henson_matthew_raw import henson_docs
from wu_tingfang_raw import wu_docs
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# setting up the combined list of friends' writing samples
friends_docs = goldman_docs + henson_docs + wu_docs

# setting up labels for three friends
friends_labels = [1] * 154 + [2] * 141 + [3] * 166

mystery_postcard = """
My friend,
From the 10th of July to the 13th, a #fierce storm raged, clouds of
freeing spray broke over the ship, #incasing her in a coat of icy mail,
and the tempest forced all of the ice out of the lower end of the
channel and beyond as far as the eye could see, but the _Roosevelt_
still remained surrounded by ice.
Hope to see you soon.
"""

# create bow_vectorizer:
bow_vectorizer = CountVectorizer()

# define friends_vectors:
friends_vectors = bow_vectorizer.fit_transform(friends_docs)

# define mystery_vector: 
mystery_vector = bow_vectorizer.transform([mystery_postcard])

## This mystery friend gets classified

We’ve vectorized and prepared all the documents. Let’s take a looks at our friends’ writing samples to get a sense of how they write.

Let's print out one document of each friend’s writing — any one between 0 and 140. (they're stored in `goldman_docs`, `henson_docs`, and `wu_docs`.)

In [2]:
# print out a document from each friend:
print(goldman_docs[56])
print(henson_docs[34])
print(wu_docs[99])

 Anarchism,
however, also recognizes the right of the individual, or numbers of
individuals, to arrange at all times for other forms of work, in
harmony with their tastes and desires.

Such free display of human energy being possible only under complete
individual and social freedom, Anarchism directs its forces against
the third and greatest foe of all social equality; namely, the State,
organized authority, or statutory law,--the dominion of human
conduct.

Just as religion has fettered the human mind, and as property, or the
monopoly of things, has subdued and stifled man's needs, so has the
State enslaved his spirit, dictating every phase of conduct
I was to live with a people who, the scientists
stated, represented the earliest form of human life, living in what is
known as the Stone Age, and I was to revert to that stage of life by
leaps and bounds, and to emerge from it by the same sudden means
 But it is not only the men who go abroad; in many cases
ladies also travel by themse

Have an inkling about which friend wrote the mystery card? We can use a classifier to confirm those suspicions…

We implement a Naive Bayes classifier using `MultinomialNB`, and save the result to `friends_classifier`.

Then, we train `friends_classifier` on `friends_vectors` and `friends_labels` using the classifier’s `.fit()` method.

Let's change predictions value from ["None Yet"] to the classifier’s prediction about which friend wrote the postcard. We can do this by calling the classifier’s `.predict()` method on the `mystery_vector`.

In [3]:
# define friends_classifier:
friends_classifier = MultinomialNB()

# train the classifier:
friends_classifier.fit(friends_vectors, friends_labels)

# change predictions:
#predictions = ["None Yet"]
predictions = friends_classifier.predict(mystery_vector)

mystery_friend = predictions[0] if predictions[0] else "someone else"
# Uncomment the print statement:
print("The postcard was from {}!".format(mystery_friend))
print(predictions)

The postcard was from 2!
[2]


## Mystery revealed!

Does it really work? 

Let's find some lines by Emma Goldman, Matthew Henson, and TingFang Wu on [gutenberg.org](gutenberg.org) and save them to `mystery_postcard` to see how the classifier holds up!

We'll also use the `.predict_proba()` method instead of `.predict()` and print out predictions to see the estimated probabilities that the `mystery_postcard` was written by each person.

What happens when we add in a recent email or text instead?

In [4]:
mystery_postcard = """
Free love? As if love is anything but free! Man has bought brains, but all the millions in the world have failed 
to buy love. Man has subdued bodies, but all the power on earth has been unable to subdue love. Man has conquered 
whole nations, but all his armies could not conquer love. Man has chained and fettered the spirit, but he has been 
utterly helpless before love. High on a throne, with all the splendor and pomp his gold can command, man is yet poor 
and desolate, if love passes him by. And if it stays, the poorest hovel is radiant with warmth, with life and color. 
Thus love has the magic power to make of a beggar a king. Yes, love is free; it can dwell in no other atmosphere. 
In freedom it gives itself unreservedly, abundantly, completely. All the laws on the statutes, all the courts in the 
universe, cannot tear it from the soil, once love has taken root. If, however, the soil is sterile, how can marriage 
make it bear fruit? It is like the last desperate struggle of fleeting life against death.
"""

predictions = friends_classifier.predict_proba(mystery_vector)
print(predictions)

[[1.10199321e-02 9.88977727e-01 2.34054697e-06]]
