# A simple text classifier using the Naive Bayes Algorithm to categorize text as either 'Sports' or 'Not Sports'.

Naive Bayes is a probabilistic algorithm based on Bayes' Theorem, which helps find the probability
of a label given a set of features.<br>
THe 'Naive' part comes from the assumption that all the features are independant of each other, which is often not the case in real life, but simplifies calculations.<br>
Bayes' Theorem: $P(A|B) = \frac{P(B|A) * P(A)}{P(B)}$ where:<br>
$P(A|B)$ is the probablity of $A$ happening, given $B$ has occurred. <br>
$P(A)$ and $P(B)$ are the probabilities of $A$ and $B$ occurring idependantly. <br>
$P(B|A)$ is the probability of $B$ occuring, given A has occurred

## Imports:

In [8]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score

## Generate Data:

In [9]:
texts = ["Football is great", "I love tennis", "Chess is a board game", 
         "Basketball is fast", "Programming is not a sport", "I hate sports"]
labels = ["Sports", "Sports", "Not Sports", "Sports", "Not Sports", "Not Sports"]

## Text Vectorization:

In [10]:
# convert the text into a format that the machine can unserstand using CountVectorizer
vectorizer = CountVectorizer()

## Create and Train the Model:

In [11]:
# create a Naive Bayes model 
myModel = make_pipeline(vectorizer, MultinomialNB())

myModel.fit(texts, labels)

## Evaluate:

In [12]:
test_texts = ["I enjoy playing soccer", "Video games are awesome", "Is golf a sport?", "Coding is fun"]

# makes predictions
test_labels = myModel.predict(test_texts)

for text, label in zip(test_texts, test_labels):
    print(f"'{text}' is {label}")

'I enjoy playing soccer' is Not Sports
'Video games are awesome' is Not Sports
'Is golf a sport?' is Not Sports
'Coding is fun' is Sports
