In [21]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

- **numpy**: A library used for numerical operations, particularly for handling arrays and matrices.
- **train_test_split** from **sklearn.model_selection**: This function is used to split the dataset into training and testing subsets.
- **GaussianNB** from **sklearn.naive_bayes**: This is the Naïve Bayes classifier based on the Gaussian (normal) distribution. It is commonly used for classification tasks when the features are continuous.
- **accuracy_score** from **sklearn.metrics**: This function calculates the accuracy of the model by comparing the predicted labels with the true labels.

In [22]:
np.random.seed(42)
X = np.random.rand(150, 4)  # 150 samples, 4 features
y = np.random.randint(0, 3, 150)  # 3 classes

- **np.random.seed(42)**: Sets a random seed to ensure reproducibility of results. If you run the code multiple times, the same random numbers will be generated.
- **X = np.random.rand(150, 4)**: Generates a dataset X with 150 samples, where each sample has 4 features (dimensions). The values are randomly generated between 0 and 1.
- **y = np.random.randint(0, 3, 150)**: Generates the target labels y for the 150 samples. Each label is an integer between 0 and 2, representing 3 different classes (e.g., class 0, class 1, and class 2).

In [23]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

- **train_test_split(X, y, test_size=0.2, random_state=42)**:
  - **X**: The feature matrix (input data).
  - **y**: The target labels (output data).
  - **test_size=0.2**: Specifies that 20% of the data will be used for testing, and the remaining 80% will be used for training.
  - **random_state=42**: Ensures reproducibility of the split. The same split will occur every time you run the code with this seed.
  
  This function splits the dataset into four parts:
  - **X_train**: The feature matrix for the training set.
  - **X_test**: The feature matrix for the testing set.
  - **y_train**: The target labels for the training set.
  - **y_test**: The target labels for the testing set.

In [24]:
model = GaussianNB()
model.fit(X_train, y_train)

- **GaussianNB()**: Initializes a Gaussian Naïve Bayes classifier. This classifier assumes that the features follow a Gaussian (normal) distribution and uses Bayes' theorem to calculate the probability of each class given the input features.
- **model.fit(X_train, y_train)**: Trains the Naïve Bayes model using the training data (X_train and y_train). The model learns the relationships between the features and the target labels.

In [25]:
y_pred = model.predict(X_test)

In [26]:
y_pred

array([0, 2, 2, 2, 0, 1, 0, 1, 0, 1, 2, 1, 1, 1, 0, 1, 1, 1, 0, 2, 1, 0,
       0, 0, 0, 2, 0, 1, 1, 0])

- **model.predict(X_test)**: Uses the trained model to predict the class labels for the test set (X_test). The result is stored in **y_pred**, which contains the predicted class labels for each sample in the test set.

In [27]:
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')

Accuracy: 36.67%


- **accuracy_score(y_test, y_pred)**: Compares the true labels (**y_test**) with the predicted labels (**y_pred**) and calculates the accuracy of the model. Accuracy is defined as the percentage of correctly predicted labels out of the total number of predictions.
- **print(f'Accuracy: {accuracy * 100:.2f}%')**: Prints the accuracy of the model as a percentage, formatted to two decimal places.

In [28]:
# Example of an unknown data point (4 features)
unknown_sample = np.array([[0.5, 0.3, 0.7, 0.2]])  # Shape: (1, 4)

In [29]:
# Predict the class label for the unknown sample
predicted_class = model.predict(unknown_sample)

# Print the predicted class
print(f'Predicted Class: {predicted_class[0]}')

Predicted Class: 1


In [30]:
# Get the probabilities for each class
class_probabilities = model.predict_proba(unknown_sample)

# Print the probabilities
print(f'Class Probabilities: {class_probabilities}')

Class Probabilities: [[0.2752146  0.42874978 0.29603562]]
