# Naive Bayes Classifer 

I am making a naive bayes classifier with the `scikit-learn` library and I will be using the `pandas` and `plotly` libraries. Naive Bayes uses Byaes Theorem which is as follows:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Bayes Theorem calculates the probability of a class given a set of features, assuming that the features are independent. Despite this "naive" assumption of feature independence, it performs in many practical applications, especially high-dimensional data. The algorithm computes the posterior probability of each class and assigns the class label with the highest posterior probability. It is also important to mention that the Naive Bayes Algorithm that all dependent variables are independent which is again the "naive" assumption that it makes. It is commonly used in text classification, such as spam detection and sentiment analysis. Naive Bayes is efficent, easy to implement and works well with small datasets.

In will be using the Naive Bayes Classifier to predict the weather with the attributes being the following:

 - **Outlook**: (Overcast, Sunny, Rainy)
 - **Temperature**: (Cool, Warm, Hot, Mild)
 - **Humidity**: (High, Normal)
 - **Windy**: (True/ False)
 - *Play*: (Yes/ No) (This will Be the Target Variable)

 The data was generated by *Google Gemini*

**Let the fun begin!!**

In [1]:
# importing the libraries

import pandas as pd
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import MinMaxScaler, LabelEncoder
from sklearn.naive_bayes import GaussianNB

In [2]:
# Loading the dataset and converting into pandas dataframe



Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [3]:
df.describe()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,...,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.127292,19.289649,91.969033,654.889104,0.09636,0.104341,0.088799,0.048919,0.181162,0.062798,...,16.26919,25.677223,107.261213,880.583128,0.132369,0.254265,0.272188,0.114606,0.290076,0.083946
std,3.524049,4.301036,24.298981,351.914129,0.014064,0.052813,0.07972,0.038803,0.027414,0.00706,...,4.833242,6.146258,33.602542,569.356993,0.022832,0.157336,0.208624,0.065732,0.061867,0.018061
min,6.981,9.71,43.79,143.5,0.05263,0.01938,0.0,0.0,0.106,0.04996,...,7.93,12.02,50.41,185.2,0.07117,0.02729,0.0,0.0,0.1565,0.05504
25%,11.7,16.17,75.17,420.3,0.08637,0.06492,0.02956,0.02031,0.1619,0.0577,...,13.01,21.08,84.11,515.3,0.1166,0.1472,0.1145,0.06493,0.2504,0.07146
50%,13.37,18.84,86.24,551.1,0.09587,0.09263,0.06154,0.0335,0.1792,0.06154,...,14.97,25.41,97.66,686.5,0.1313,0.2119,0.2267,0.09993,0.2822,0.08004
75%,15.78,21.8,104.1,782.7,0.1053,0.1304,0.1307,0.074,0.1957,0.06612,...,18.79,29.72,125.4,1084.0,0.146,0.3391,0.3829,0.1614,0.3179,0.09208
max,28.11,39.28,188.5,2501.0,0.1634,0.3454,0.4268,0.2012,0.304,0.09744,...,36.04,49.54,251.2,4254.0,0.2226,1.058,1.252,0.291,0.6638,0.2075


In [4]:
# Visualizaion

fig = px.histogram(df,
                   x = 'worst concavity',
                   y = 'worst fractal dimension',
                   color_discrete_sequence=px.colors.sequential.Bluered)
fig.update_layout(title = "Distribution of Worst Smoothness",
                  xaxis_title = "Worst Smoothness",
                  yaxis_title = "Worst Fractal Dimension")
fig.show()

### Making the model from here on out.

In [5]:
# Feature/ Label Selection

X = df.iloc[:,0:30]
y = df['worst fractal dimension']

In [6]:
# Splitting the Data and making the model

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=20)


bayes = GaussianNB()
bayes.fit(X_train, y_train)

ValueError: Unknown label type: (array([0.05504, 0.05521, 0.05525, 0.05695, 0.05737, 0.05843, 0.05865,
       0.05871, 0.05905, 0.05933, 0.05972, 0.05974, 0.06025, 0.06033,
       0.06037, 0.06072, 0.06091, 0.0612 , 0.06142, 0.06164, 0.06165,
       0.06169, 0.06174, 0.06251, 0.06263, 0.06287, 0.06289, 0.06291,
       0.06306, 0.06321, 0.06386, 0.06387, 0.0641 , 0.06428, 0.06431,
       0.06443, 0.06464, 0.06469, 0.06484, 0.06487, 0.0651 , 0.06522,
       0.06558, 0.06559, 0.06563, 0.06576, 0.0658 , 0.06589, 0.06592,
       0.06596, 0.06599, 0.06609, 0.06623, 0.06637, 0.06641, 0.06658,
       0.06688, 0.0671 , 0.06735, 0.06742, 0.06743, 0.06745, 0.06765,
       0.06766, 0.06769, 0.06771, 0.06772, 0.06777, 0.06783, 0.06784,
       0.06788, 0.06794, 0.0681 , 0.06818, 0.06825, 0.06827, 0.06846,
       0.06878, 0.06888, 0.06896, 0.06911, 0.06912, 0.06915, 0.06917,
       0.06922, 0.06925, 0.06938, 0.06949, 0.06953, 0.06954, 0.06956,
       0.06958, 0.0696 , 0.06994, 0.07007, 0.07009, 0.07012, 0.07014,
       0.07024, 0.07028, 0.07036, 0.07037, 0.07039, 0.07048, 0.07053,
       0.07055, 0.07061, 0.07071, 0.07083, 0.07087, 0.07113, 0.07123,
       0.07127, 0.07146, 0.07147, 0.07182, 0.07185, 0.07188, 0.07191,
       0.07198, 0.07207, 0.0722 , 0.07228, 0.0723 , 0.07234, 0.07238,
       0.07242, 0.07247, 0.07253, 0.07259, 0.07262, 0.07277, 0.07287,
       0.07307, 0.07313, 0.07319, 0.0732 , 0.07351, 0.07371, 0.07376,
       0.0738 , 0.07393, 0.07397, 0.07399, 0.07421, 0.07427, 0.07429,
       0.07431, 0.07434, 0.07463, 0.0747 , 0.07474, 0.07484, 0.07538,
       0.07552, 0.07568, 0.07569, 0.0757 , 0.07582, 0.07587, 0.0759 ,
       0.07592, 0.07599, 0.07602, 0.07603, 0.0761 , 0.07614, 0.07617,
       0.07619, 0.07623, 0.07625, 0.07626, 0.07628, 0.07638, 0.07664,
       0.07675, 0.07676, 0.07678, 0.07683, 0.07685, 0.07686, 0.07697,
       0.07698, 0.07701, 0.07712, 0.07722, 0.07729, 0.07732, 0.07735,
       0.07738, 0.07745, 0.07748, 0.07757, 0.07764, 0.07773, 0.07779,
       0.07782, 0.07787, 0.07802, 0.07804, 0.07806, 0.07809, 0.0782 ,
       0.07834, 0.07842, 0.07848, 0.07849, 0.07858, 0.07863, 0.07867,
       0.07873, 0.07898, 0.079  , 0.07918, 0.0792 , 0.07924, 0.07944,
       0.07948, 0.07953, 0.07957, 0.07961, 0.07987, 0.07993, 0.07999,
       0.08004, 0.08006, 0.08009, 0.08019, 0.0802 , 0.08022, 0.08024,
       0.08025, 0.08052, 0.08061, 0.08067, 0.08075, 0.08082, 0.08096,
       0.08116, 0.08118, 0.08121, 0.08132, 0.08134, 0.08151, 0.08174,
       0.08175, 0.08178, 0.08181, 0.08187, 0.08218, 0.08225, 0.08251,
       0.08253, 0.08255, 0.08269, 0.08273, 0.08278, 0.08283, 0.08284,
       0.08301, 0.08304, 0.08314, 0.08321, 0.08328, 0.08362, 0.08365,
       0.08368, 0.08385, 0.08472, 0.08473, 0.08486, 0.08488, 0.0849 ,
       0.08492, 0.08496, 0.08503, 0.08523, 0.08524, 0.08553, 0.08557,
       0.08574, 0.08579, 0.08633, 0.08665, 0.08666, 0.08677, 0.08701,
       0.08718, 0.08732, 0.08756, 0.08758, 0.08762, 0.08764, 0.08797,
       0.08824, 0.08832, 0.08839, 0.08858, 0.08865, 0.08893, 0.08902,
       0.08911, 0.0895 , 0.0896 , 0.08982, 0.08988, 0.08999, 0.09026,
       0.09031, 0.0906 , 0.09061, 0.09075, 0.0908 , 0.09082, 0.09124,
       0.09136, 0.09158, 0.09166, 0.0918 , 0.09187, 0.09203, 0.09206,
       0.09208, 0.09209, 0.09211, 0.09221, 0.09223, 0.09241, 0.09251,
       0.09261, 0.0927 , 0.09288, 0.093  , 0.09333, 0.09349, 0.09353,
       0.09359, 0.09382, 0.09464, 0.09469, 0.09479, 0.09519, 0.09564,
       0.09584, 0.09585, 0.09606, 0.09614, 0.09618, 0.09646, 0.09671,
       0.09702, 0.0972 , 0.0974 , 0.09772, 0.09789, 0.09825, 0.0987 ,
       0.09873, 0.09876, 0.09879, 0.09929, 0.09938, 0.09946, 0.09952,
       0.09964, 0.0997 , 0.09981, 0.1005 , 0.1007 , 0.1009 , 0.1013 ,
       0.1014 , 0.1016 , 0.1017 , 0.1019 , 0.1023 , 0.1024 , 0.1027 ,
       0.103  , 0.1031 , 0.1034 , 0.1038 , 0.1043 , 0.1048 , 0.1049 ,
       0.1051 , 0.1055 , 0.1059 , 0.1064 , 0.1065 , 0.1066 , 0.1067 ,
       0.1071 , 0.1072 , 0.1076 , 0.108  , 0.1082 , 0.1086 , 0.1094 ,
       0.1109 , 0.1118 , 0.1132 , 0.1151 , 0.1155 , 0.1162 , 0.1175 ,
       0.1178 , 0.1179 , 0.1183 , 0.1189 , 0.1191 , 0.1198 , 0.12   ,
       0.1204 , 0.1205 , 0.1224 , 0.124  , 0.1243 , 0.1244 , 0.1249 ,
       0.1252 , 0.1259 , 0.1275 , 0.1297 , 0.1339 , 0.1341 , 0.1403 ,
       0.1409 , 0.1446 , 0.1486 , 0.2075 ]),)