# Naive Bayes Classifer 

I am making a naive bayes classifier with the `scikit-learn` library and I will be using the `pandas` and `plotly` libraries. Naive Bayes uses Byaes Theorem which is as follows:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Bayes Theorem calculates the probability of a class given a set of features, assuming that the features are independent. Despite this "naive" assumption of feature independence, it performs in many practical applications, especially high-dimensional data. The algorithm computes the posterior probability of each class and assigns the class label with the highest posterior probability. It is also important to mention that the Naive Bayes Algorithm that all dependent variables are independent which is again the "naive" assumption that it makes. It is commonly used in text classification, such as spam detection and sentiment analysis. Naive Bayes is efficent, easy to implement and works well with small datasets.

In will be using the Naive Bayes Classifier to predict the weather to see if tennis players can play tennis with the attributes being the following:

 - **weather**: (sunny, overcast, rainy)
 - **outlook**: (True/ False)
 - **temperature**: (True/ False)
 - **humidity**: (High, Normal)
 - **windy**: (True/ False)
 - *play_tennis*: (yes/ no) (This will Be the Target Variable)

 The data was generated by *Google Gemini*

**Let the fun begin!!**

In [45]:
# importing the libraries

import pandas as pd
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.naive_bayes import GaussianNB

In [46]:
# Loading the dataset and converting into pandas dataframe

df = pd.read_csv('weather.csv')
df.head()

Unnamed: 0,weather,outlook,temperature,humidity,windy,play_tennis
0,sunny,True,True,False,False,yes
1,overcast,True,False,True,True,no
2,rainy,False,False,True,False,no
3,sunny,False,False,False,False,no
4,overcast,True,True,False,False,yes


In [47]:
# Cleaning the data

df = df.drop(['weather'], axis=1)
df.head()

Unnamed: 0,outlook,temperature,humidity,windy,play_tennis
0,True,True,False,False,yes
1,True,False,True,True,no
2,False,False,True,False,no
3,False,False,False,False,no
4,True,True,False,False,yes


In [48]:
# One Hot Encoding
from scipy.sparse import isspmatrix

encoder = OneHotEncoder(sparse=False)

for column in df.columns:
    if column != 'play_tennis' and not isspmatrix(df[column]):
        df[column] = encoder.fit_transform(df[column].values.reshape(-1,1))

df.head()

TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'sparse'

In [26]:
# Visualizaion

fig = px.histogram(df,
                   x = 'temperature',
                   y = 'play_tennis',
                   color_discrete_sequence=px.colors.sequential.Bluered)
fig.update_layout(title = "Distribution of Temperature",
                  xaxis_title = "Temperature",
                  yaxis_title = "Able to Play Tennis")
fig.show()

### Making the model from here on out.