# Mystery Friend

You've received an anonymous postcard from a friend who you haven't seen in years. Your friend did not leave a name, but the card is definitely addressed to you. So far, you've narrowed your search down to three friends, based on handwriting:
- Emma Goldman
- Matthew Henson
- TingFang Wu

But which one sent you the card?

Just like you can classify a message as spam or not spam with a spam filter, you can classify writing as related to one friend or another by building a kind of friend writing classifier. You have past writing from all three friends stored up in the variable `friends_docs`, which means you can use scikit-learn's bag-of-words and Naive Bayes classifier to determine who the mystery friend is!

Ready?

## Feature Vectors Are in the Bag with Scikit-Learn

1. In the code block below, import `CountVectorizer` from `sklearn.feature_extraction.text`. Below it, import `MultinomialNB` from `sklearn.naive_bayes`.

In [8]:
# import sklearn modules here:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB


2. Define `bow_vectorizer` as an implementation of `CountVectorizer`.

In [9]:
# Create bow_vectorizer:
bow_vectorizer = CountVectorizer()

3. Use your newly minted `bow_vectorizer` to both `fit` (train) and `transform` (vectorize) all your friends' writing (stored in the variable `friends_docs`). Save the resulting vector object as `friends_vectors`.

In [15]:
!pip install import_ipynb

import import_ipynb

from goldman_emma_raw import goldman_docs
from henson_matthew_raw import henson_docs
from wu_tingfang_raw import wu_docs

friends_docs = goldman_docs + henson_docs + wu_docs

# Define friends_vectors:
friends_vectors = bow_vectorizer.fit_transform(friends_docs)



4. Create a new variable `mystery_vector`. Assign to it the vectorized form of `[mystery_postcard]` using the vectorizer's `.transform()` method.

   (`mystery_postcard` is a string, while the vectorizer expects a list as an argument.)

In [16]:
mystery_postcard = """
My friend,
From the 10th of July to the 13th, a fierce storm raged, clouds of
freeing spray broke over the ship, incasing her in a coat of icy mail,
and the tempest forced all of the ice out of the lower end of the
channel and beyond as far as the eye could see, but the _Roosevelt_
still remained surrounded by ice.
Hope to see you soon.
"""

# Define mystery_vector:
mystery_vector = bow_vectorizer.transform([mystery_postcard])


## This Mystery Friend Gets Classified

5. You've vectorized and prepared all the documents. Let's take a look at your friends' writing samples to get a sense of how they write.

   Print out one document of each friend's writing - try any one between `0` and `140`. (Your friends' documents are stored in `goldman_docs`, `henson_docs`, and `wu_docs`.)

In [17]:
# Print out a document from each friend:


print(goldman_docs[20])
print(henson_docs[25])
print(wu_docs[45])

 All the
early sagas rest on that idea, which continues to be the LEIT-MOTIF
of the biblical tales dealing with the relation of man to God, to the
State, to society
Eivind Astrp, of
Christiania, Norway, who had the honor of being the companion of
Commander Peary in the first crossing of North Greenland--and of having
an Esquimo at Cape York become so fond of him that he named his son for
him! It was on this voyage north that Peary's leg was broken.

Mr
 The Sovereign of Great Britain
appoints only the Governor General who acts in his name, but the
Dominion is governed by a responsible Ministry, and all domestic
affairs are managed by local officials, without interference from the
Home Government


6. Have an inkling about which friend wrote the mystery card? We can use a classifier to confirm those suspicions...

   Implement a Naive Bayes classifier using `MultinomialNB`. Save the result to `friends_classifier`.

In [18]:
# Define friends_classifier:
friends_classifier = MultinomialNB()

7. Train `friends_classifier` on `friends_vectors` and `friends_labels` using the classifier's `.fit()` method.

In [19]:
friends_labels = ["Emma"] * 154 + ["Matthew"] * 141 + ["Tingfang"] * 166

# Train the classifier:
friends_classifier.fit(friends_vectors, friends_labels)


MultinomialNB()

8. Change `predictions` value from `["None Yet"]` to the classifier's prediction about which friend wrote the postcard. You can do this by calling the classifier's `predict()` method on the `mystery_vector`.

In [20]:
# Change predictions:
predictions = friends_classifier.predict(mystery_vector)


## Mystery Revealed!

9. Uncomment the final print statement and run the code block below to see who your mystery friend was all along!

In [21]:
mystery_friend = predictions[0] if predictions[0] else "someone else"

# Uncomment the print statement:
print("The postcard was from {}!".format(mystery_friend))

The postcard was from Matthew!


10. But does it really work? Find some lines by Emma Goldman, Matthew Henson, and TingFang Wu on <a href="http://www.gutenberg.org" target="_blank">gutenberg.org</a> and save them to `mystery_postcard` to see how the classifier holds up!

    Try using the `.predict_proba()` method instead of `.predict()` and print out `predictions` to see the estimated probabilities that the `mystery_postcard` was written by each person.
   
    What happens when you add in a recent email or text instead?

In [22]:
Mystery_Emma = "Free love? As if love is anything but free! Man has bought brains, but all the millions in the world have failed to buy love. Man has subdued bodies, but all the power on earth has been unable to subdue love. Man has conquered whole nations, but all his armies could not conquer love. Man has chained and fettered the spirit, but he has been utterly helpless before love. High on a throne, with all the splendor and pomp his gold can command, man is yet poor and desolate, if love passes him by. And if it stays, the poorest hovel is radiant with warmth, with life and color. Thus love has the magic power to make of a beggar a king. Yes, love is free; it can dwell in no other atmosphere. In freedom it gives itself unreservedly, abundantly, completely. All the laws on the statutes, all the courts in the universe, cannot tear it from the soil, once love has taken root. If, however, the soil is sterile, how can marriage make it bear fruit? It is like the last desperate struggle of fleeting life against death."
Mystery_Emma_vector = bow_vectorizer.transform([Mystery_Emma])
pred_2 = friends_classifier.predict_proba(Mystery_Emma_vector)
print(pred_2)

[[1.00000000e+00 2.83485427e-40 3.34556633e-29]]


In [24]:
mystery_email = """"Dear Investor,

As you all are aware, SEBI along with all MIIs had celebrated the globally celebrated IOSCO World Investor Week 2022 (#IOSCOWIW2022) in India from Oct 10-16, 2022.

BSE Investors Protection Fund had also participated in WIW 2022 with slew of programs.

One of our initiative was to share specially created videos on the occasion of WIW 2022 with investors which will help in enhancing knowledge about the Securities Market.

We are sharing these videos periodically starting with first Investor Awareness Video shared during WIW 2022.

Please find herewith a video on the topic – Start Investing Early

Link for the Video is : https://youtu.be/Df1wbUiuOZ8

Hope the above will add value to your knowledge.

Regards

BSE Investors Protection Fund"""
mystery_email_vector = bow_vectorizer.transform([mystery_email])
pred_3 = friends_classifier.predict(mystery_email_vector)
print(pred_3)

['Matthew']
