# Naive Bayes

The naive bayes classifier uses the bayesian theorem:

$P(artist|text) = \frac{P(text|artist)P(artist )}{P(text)}$

to calculate the conditional probability: 
$P (artist | text )$ for each of the artists/classes and assigns the artist with the highest probability.

#### Example: for text = love and artist = Beatles
we want to calculaye: $P(Beatles|love)$ from: 

- $P(love|Beatles)$ := probability of 'love' appearing in Beatles' songs
- $P(Beatles)$ := proability of all Beatles' songs in the data set
- $P(love)$: probability of 'love' appearing in any song in the data set.

#### For more than 1 word, as we have many many words in our text corpus

$P( text | artist ) = P( word1 | artist ) * P( word2 | artist ) * .... * P(word_N | artist)$

And this where the word *naive* comes in the classifier, as it assumes all these probabilities are independent and it can just multiply them.

### Practical example from yesterday

In [1]:
ARTIST_1 = 'Spice Girls'
ARTIST_2 = 'Beatles'

# here: balance your text data

TEXT_CORPUS = ['last time that we had this conversation',
               'if you wanna be my lover',
               'spice up your life',
               'so baby come round, come round come round',
               'stop right now thank you very much',
               'ill tell you what i want',
               'what i really really want',
               'you cant do nothing for me baby',
               'hey you always on the run',
               'silly games that you were playing',
               'when I find myself in times of trouble',
               'speaking words of wisdom',
               'and in my hour of darkness she is standing right in front of me',
               'whisper words of wisdom',
               'we all live in a yellow submarine',
               'all you need is love',
               'eleanor rigby puts on a face that she keeps in a jar by the door',
               'yesterday all my troubles seemed so far away',
               'lucy in the sky with diamonds',
               'i am the walrus']

LABELS = [ARTIST_1] * 10 + [ARTIST_2] * 10

In [18]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB # for count data like the one we have with the words in texts

from sklearn.pipeline import make_pipeline

def train_model(text, labels):
    """
    Trains a scikit-learn classification model on text.
    
    Parameters
    ----------
    text : list
    labels : list
    
    Returns
    -------
    model : Trained scikit-learn model.
    
    """
    cv = CountVectorizer(stop_words='english')
    #here: balance the transformed numerical data
    tf = TfidfTransformer()
    nb = MultinomialNB(alpha = 1)
    model = make_pipeline(cv, tf, nb)
    model.fit(text, labels)
    
    return model
 

def predict(model, new_text):
    """
    Takes the pre-trained model pipeline and predicts new artist based on unseen text.
    
    Parameters
    ----------
    model : Trained scikit-learn model pipeline.
    new_text : str
    
    Returns
    ---------
    prediction : str
    
    """
    new_text = [new_text]
    prediction = model.predict(new_text)
    probabilities = model.predict_proba(new_text)
    
    return prediction[0], probabilities


In [19]:
model = train_model(TEXT_CORPUS, LABELS)

In [20]:
predict(model, 'funny cat')

('Beatles', array([[0.5, 0.5]]))

## Links from the warmup:
- Bayesian updating: http://www.statisticalengineering.com/bayesian.htm
- Bayes theorem with lego: https://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego
- For the visual ones among you: https://setosa.io/ev/conditional-probability/