# QUIZ : NAIVE BAYES
---

## Q1. Which type of Naive Bayes is best suited for text classification with word count features? 
1. Gaussian Naive Bayes 
2. Bernoulli Naive Bayes 
3. Multinomial Naive Bayes 
4. Complement Naive Bayes

The best-suited type of Naive Bayes for text classification with **word count features** is:

**3. Multinomial Naive Bayes**

---

### Explanation:

* **Multinomial Naive Bayes** works well with discrete features such as word counts or frequencies, which makes it ideal for text classification tasks using bag-of-words or TF-IDF features.
* **Bernoulli Naive Bayes** is suitable for binary/boolean features (word presence/absence).
* **Gaussian Naive Bayes** assumes continuous features with a normal distribution, so it is less suitable for text data.
* **Complement Naive Bayes** is a variant designed to handle imbalanced datasets but is still based on multinomial distribution.


## Q2. What is the main advantage of using scikit-learn's implementation of Naive Bayes over a custom implementation? 
1. It's always more accurate 
2. It's optimized for performance 
3. It can handle any type of data 
4. It doesn't require data preprocessing

The main advantage of using scikit-learn’s implementation of Naive Bayes over a custom implementation is:

**2. It’s optimized for performance**

---

### Explanation:

* Scikit-learn’s Naive Bayes implementations are highly optimized, efficient, and tested for performance and scalability.
* It’s not necessarily *always more accurate* (option 1), since accuracy depends on the data and model tuning.
* It **cannot handle any type of data** automatically (option 3); data preprocessing is often still required.
* It **does require data preprocessing** like feature extraction and cleaning (option 4 is incorrect).


## Q3. What does the 'np.unique' function do in the context of Naive Bayes implementation? 
1. It nomralizes the feature values 
2. It calaculates the mean of features 
3. It finds the unique classes or feature values 
4. It splits the dataset

The correct answer is:

**3. It finds the unique classes or feature values**

---

### Explanation:

* `np.unique` returns the sorted unique elements of an array.
* In Naive Bayes implementation, it's often used to find the unique class labels or unique feature values.
* It does **not** normalize features (option 1), calculate means (option 2), or split datasets (option 4).



## Q4. What is the purpose of the '_predict_sample' method in th ecustom Naive Bayes implementation? 
1. To train the model on a single sample 
2. To calculate the probability for a single instance 
3. To evaluate the model's performance 
4. To preprocess a single data point

The correct answer is:

**2. To calculate the probability for a single instance**

---

### Explanation:

* In a custom Naive Bayes implementation, the `_predict_sample` method typically computes the predicted class probabilities (or the most likely class) for **one input sample** using the learned probabilities from training.
* It is **not** for training (option 1), evaluating performance (option 3), or preprocessing (option 4).



## Q5. What does the 'complementNB' class in scikit-learn implement? 
1. A variant of Gaussian Naive Bayes 
2. A variant of Multinomial Naive Bayes for imbalanced datasets 
3. A combination of Multinomial and Bernoulli Naive Bayes 
4. A Naive Bayes classifier for continuous features


The correct answer is:

**2. A variant of Multinomial Naive Bayes for imbalanced datasets**

---

### Explanation:

* The `ComplementNB` class in scikit-learn is designed to improve the performance of Multinomial Naive Bayes, especially on **imbalanced datasets**.
* It adjusts the weight of features based on the complement of each class.
* It is **not** a variant of Gaussian Naive Bayes (option 1), nor a combination of Multinomial and Bernoulli (option 3), nor meant for continuous features (option 4).



## Q6. In the context of Naive Bayes, what is the 'prior probability'? 
1. The probability of a feature given a class 
2. The probability of a class before seeing any features 
3. The final probability after classification 
4. The probability of a feature occurring in the dataset

The correct answer is:

**2. The probability of a class before seeing any features**

---

### Explanation:

* The **prior probability** in Naive Bayes refers to the initial probability of each class based on the training data, *before* considering any feature values.
* Option 1 describes the likelihood $P(\text{feature}|\text{class})$.
* Option 3 is the posterior probability after applying Bayes' theorem.
* Option 4 is unrelated to the concept of prior probability.



## Q7. Which Naive Bayes variant would be most appropriate for classoifying emails based on th efrequency of certain words? 
1. Gaussian Naive Bayes 
2. Bernoulli naive Bayes 
3. Multinomial Naive Bayes 
4. Complement Naive Bayes

The best choice is:

**3. Multinomial Naive Bayes**

---

### Why?

* For classifying emails based on **word frequency counts**, Multinomial Naive Bayes is ideal because it models discrete count data.
* **Bernoulli Naive Bayes** works on binary features (presence/absence of words), not frequencies.
* **Gaussian Naive Bayes** assumes continuous, normally distributed features, which doesn't fit word counts.
* **Complement Naive Bayes** is a variant of Multinomial NB, useful especially for imbalanced data, but Multinomial NB is the primary go-to.


## Q8. In Multinomial Naive Bayes, what do the features typically represent? 
1. Continuous measurements 
2. Binary indicators 
3. Word frequencies 
4. Image pixels

The correct answer is:

**3. Word frequencies**

---

### Explanation:

* In **Multinomial Naive Bayes**, features usually represent **counts or frequencies of words** (or tokens) in text classification.
* Continuous measurements (option 1) are more suited for Gaussian NB.
* Binary indicators (option 2) are typical for Bernoulli NB.
* Image pixels (option 4) are not typical features for Multinomial NB.



## Q9. What assumption does Gaussian Naive Bayes make about the features? 
1. They follow a Bernoulli distribution 
2. They follow a Gaussian distribution 
3. They are binary 
4. They are discrete counts

The correct answer is:

**2. They follow a Gaussian distribution**

---

### Explanation:

* Gaussian Naive Bayes assumes that the features are **continuous and normally (Gaussian) distributed** within each class.
* Bernoulli distribution (option 1) applies to Bernoulli NB.
* Binary features (option 3) are for Bernoulli NB.
* Discrete counts (option 4) are modeled by Multinomial NB.


## Q10. Which Naive Bayes variant is designed for binary/Boolean features? 
1. Multinomial Naive Bayes 
2. Gaussian Naive Bayes 
3. Bernoulli Naive Bayes 
4. Complement Naive Bayes

The correct answer is:

**3. Bernoulli Naive Bayes**

---

### Explanation:

* Bernoulli Naive Bayes is specifically designed for **binary/Boolean features** (e.g., word presence or absence).
* Multinomial NB (option 1) is for count/frequency features.
* Gaussian NB (option 2) is for continuous features.
* Complement NB (option 4) is a variant of Multinomial NB, mainly for imbalanced data.


## Q11. What is the primary purpose of complement Naive Bayes? 
1. To handle continuous features 
2. To work with imbalanced datasets 
3. To process binary features 
4. To improve accuracy on balanced datasets

The correct answer is:

**2. To work with imbalanced datasets**

---

### Explanation:

* Complement Naive Bayes was developed to improve performance specifically on **imbalanced datasets** by adjusting how feature weights are calculated using the complement of each class.
* It is **not** designed for continuous features (option 1), binary features (option 3), or balanced datasets (option 4).


## Q12. In scikit-learn, which class is used to implement Multinomial Naive Bayes? 
1. GaussianNB 
2. BernoulliNB 
3. MultinomialNB 
4. ComplementNB

The correct answer is:

**3. MultinomialNB**

---

### Explanation:

* In scikit-learn, the **MultinomialNB** class implements the Multinomial Naive Bayes algorithm.
* **GaussianNB** is for Gaussian Naive Bayes.
* **BernoulliNB** is for Bernoulli Naive Bayes.
* **ComplementNB** is for Complement Naive Bayes.


## Q13. What preprocessing step is typically requires before using Multinomial Naive Bayes for text classification? 
1. Normalization 
2. Vectorization 
3. Standardization 
4. Binarization

The correct answer is:

**2. Vectorization**

---

### Explanation:

* Before using Multinomial Naive Bayes for text classification, you typically need to **convert text data into numerical features** using **vectorization** methods such as **CountVectorizer** or **TF-IDF Vectorizer**.
* Normalization (option 1) and standardization (option 3) are more common for continuous features, not counts.
* Binarization (option 4) is more relevant for Bernoulli Naive Bayes.


## Q14. Which Naive Bayes variant would you use for classifying data points based on their coordinates ? 
1. Multinomial Naive Bayes 
2. Gaussian Naive Bayes 
3. Bernoulli Naive Bayes 
4. Complement NAive Bayes

The correct answer is:

**2. Gaussian Naive Bayes**

---

### Explanation:

* When classifying data points based on **continuous features** like coordinates (e.g., x and y values), **Gaussian Naive Bayes** is appropriate because it assumes features follow a Gaussian (normal) distribution.
* Multinomial and Complement Naive Bayes are for discrete/count data.
* Bernoulli Naive Bayes is for binary features.

## Q15. What is Laplace smoothing used for in Naive Bayes? 
1. To normalize features 
2. To handle missing data 
3. To prevent zero probabilities 
4. To improve computational efficiency

The correct answer is:

**3. To prevent zero probabilities**

---

### Explanation:

* **Laplace smoothing** (also called add-one smoothing) is used in Naive Bayes to avoid zero probabilities for features that don’t appear in the training data for a given class.
* It adds a small constant (usually 1) to feature counts to ensure every feature has a non-zero probability.
* It does **not** normalize features (option 1), handle missing data (option 2), or directly improve computational efficiency (option 4).



## Q16. Which Naive Bayes variant is most suitable for sentiment analysis of short text messages? 
1. Gaussian Naive Bayes 
2. Bernoulli Naive Bayes 
3. Multinomial NAive Bayes 
4. Complement Naive Bayes

The best choice is:

**3. Multinomial Naive Bayes**

---

### Explanation:

* For **sentiment analysis of short text messages**, which usually involves word counts or frequencies, Multinomial Naive Bayes is the most suitable.
* Bernoulli NB (option 2) can be used but works better with binary presence/absence features.
* Gaussian NB (option 1) is for continuous data, so less appropriate here.
* Complement NB (option 4) is useful for imbalanced datasets but is still based on the multinomial model.


## Q17. WHat does the CountVectorizer in scikit-learn do? 
1. Normalizes the input data 
2. Converts text to a matrix of token counts 
3. Applies Gaussian distribution to features 
4. Binarizes the input features

The correct answer is:

**2. Converts text to a matrix of token counts**

---

### Explanation:

* **CountVectorizer** in scikit-learn converts a collection of text documents into a matrix of **token counts** (i.e., how many times each word appears).
* It does **not** normalize data (option 1), apply Gaussian distribution (option 3), or binarize features (option 4).


## Q18. IN the custom NAive Bayes implementation, what does the line 'self.feature_probs[c][feature, value] = (np.sum(X_c[:,feature] == value) + 1)/feature_sum' calculate? 
1. Prior probability 
2. Likelihood 
3. Posterior probability 
4. Evidence

The correct answer is:

**2. Likelihood**

---

### Explanation:

* This line calculates the **likelihood** $P(\text{feature} = \text{value} | \text{class} = c)$, i.e., the probability of a specific feature value given a class.
* The `+1` is Laplace smoothing to avoid zero probabilities.
* Prior probability (option 1) refers to $P(\text{class} = c)$.
* Posterior probability (option 3) is computed after combining prior and likelihood.
* Evidence (option 4) is the overall probability of the data, used for normalization.


## Q19. What is the purpose of using 'np.log' in the custom Naive Bayes implementation ? 
1. To normalize probabilities 
2. To prevent underflow 
3. To speed up calaculations 
4. To handle negative values 

The correct answer is:

**2. To prevent underflow**

---

### Explanation:

* Taking the **logarithm (np.log)** of probabilities helps prevent numerical **underflow**, which occurs when multiplying many small probabilities together results in values too close to zero for the computer to represent.
* Logarithms convert multiplication into addition, which is numerically more stable.
* It doesn’t normalize probabilities (option 1), directly speed up calculations (option 3), or handle negative values (option 4).


## Q20. Which Naive BAyes variant would be most appropriate for spam detection based on the presence or absence of certain words? 
1. Gaussian Naive Bayes 
2. Bernoulli NAive Bayes 
3. Multinomial Naive BAyes 
4. COmplement Naive BAyes

The best choice is:

**2. Bernoulli Naive Bayes**

---

### Explanation:

* Bernoulli Naive Bayes works well when features are **binary indicators** — i.e., presence or absence of words — which fits spam detection based on whether certain words appear or not.
* Multinomial NB (option 3) models counts/frequencies, not just presence.
* Gaussian NB (option 1) is for continuous features.
* Complement NB (option 4) is a variant of Multinomial NB, mainly for imbalanced data.


## Q21. What does the 'fit' method do in scikit-learn's Naive Bayes implemenations? 
1. Makes predictions 
2. Evaluates the model 
3. Trains the model 
4. Preprocesses the data

The correct answer is:

**3. Trains the model**

---

### Explanation:

* The `fit` method in scikit-learn is used to **train the model** on the given data by learning the parameters (like probabilities in Naive Bayes).
* It does **not** make predictions (option 1), evaluate the model (option 2), or preprocess data (option 4).


## Q22. In the context of Naive Bayes, what does the term 'naive' refer to? 
1. The simplicity of the algorithm 
2. The assumption of feature independence 
3. The use of prior probabilities 
4. The speed of the slgorithm

The correct answer is:

**2. The assumption of feature independence**

---

### Explanation:

* The term **"naive"** in Naive Bayes refers to the **strong assumption that all features are independent of each other given the class label**.
* This assumption is often not true in real-world data, but it simplifies computation and often works well in practice.
* It’s not just about simplicity (option 1), use of priors (option 3), or speed (option 4).


## Q23. Which of the following is NOT a common application of Naive Bayes? 
1. Text Calssification 
2. Spam filtering 
3. Sentiment analysis 
4. Image segematation

The correct answer is:

**4. Image segmentation**

---

### Explanation:

* Naive Bayes is commonly used for text classification, spam filtering, and sentiment analysis — all tasks involving categorical or text data.
* Image segmentation (option 4) usually requires more complex models like convolutional neural networks (CNNs), not Naive Bayes.


## Q24. What is the amin advantage of Naive Bayes classifiers? 
1. They always provide the highest accuracy 
2. They can handle non-linear relationships 
3. They are simple and fast 
4. They don't require feature scaling

The correct answer is:

**3. They are simple and fast**

---

### Explanation:

* Naive Bayes classifiers are **simple to implement and computationally efficient**, making them fast especially for large datasets.
* They do **not** always provide the highest accuracy (option 1).
* They assume feature independence, so they generally don’t handle complex non-linear relationships well (option 2).
* While they often don’t require feature scaling (option 4), the main advantage is simplicity and speed.


## Q25. What is the purpose of the 'predict' method in Naive Bayes implemenattions? 
1. To train the model 
2. To calacualte probabilities 
3. To evaluate the model 
4. To classify new instance

The correct answer is:

**4. To classify new instances**

---

### Explanation:

* The `predict` method in Naive Bayes is used to **classify new, unseen data points** based on the learned model.
* It does **not** train the model (option 1), calculate probabilities explicitly for each feature (option 2, though it uses probabilities internally), or evaluate the model (option 3).


## Q26. What does the 'random_state' parameter in 'train_test_split' control? 
1. The randomness of the Naive Bayes algorithm 
2. The seed for random number generation in splitting the data 
3. The number of features to consider 
4. The balance of classes in the split

The correct answer is:

**2. The seed for random number generation in splitting the data**

---

### Explanation:

* The `random_state` parameter in `train_test_split` sets the **seed for the random number generator** to ensure reproducible splits of data into training and testing sets.
* It does **not** control the randomness of the Naive Bayes algorithm (option 1), the number of features (option 3), or the class balance (option 4).


## Q27. WHat is the main difference between Multinomial and Bernoulli Naive Bayes? 
1. Multinomial is faster 
2. Bernoulli handles continous features 
3. Mutlinomial uses word frequencies, Bernoulli uses word presence 
4. Bernoulli is more accurate for text classification

The correct answer is:

**3. Multinomial uses word frequencies, Bernoulli uses word presence**

---

### Explanation:

* **Multinomial Naive Bayes** models **counts or frequencies** of features (e.g., how many times a word appears).
* **Bernoulli Naive Bayes** models **binary features** representing **presence or absence** of a feature.
* Speed (option 1) and accuracy (option 4) depend on the dataset and are not absolute.
* Bernoulli NB does **not** handle continuous features (option 2).


## Q28. In the custom Naive Bayes Implemnation, what does 'defaultdict(lambda: defaultdict(lambda: 1))' achieve? 
1. It sets all probabilities to 1 
2. It implements Laplace smoothing 
3. It creates a nested dictionary with default value 1 
4. It normalizes the probabilities 

The correct answer is:

**3. It creates a nested dictionary with default value 1**

---

### Explanation:

* `defaultdict(lambda: defaultdict(lambda: 1))` creates a **nested dictionary** where if a key is missing at any level, it automatically returns `1` instead of raising a KeyError.
* This is often used to simplify counting and **implement Laplace smoothing** by starting counts at 1 (related to option 2, but the actual smoothing happens in the calculation).
* It does **not** set all probabilities to 1 (option 1) or normalize probabilities (option 4).


## Q29. Which of the following is true about the 'GaussianNB' class in scikit-learn? 
1. It's suitable for text classification 
2. It assumes features follow a Gaussian distribution 
3. It's best for binary features 
4. It's designed for imbalanced datasets

The correct answer is:

**2. It assumes features follow a Gaussian distribution**

---

### Explanation:

* The **GaussianNB** class assumes that features are continuous and **normally (Gaussian) distributed** within each class.
* It is **not** ideal for text classification (option 1), which typically involves discrete features.
* It’s **not** designed for binary features (option 3) — that’s BernoulliNB.
* It is **not** specifically designed for imbalanced datasets (option 4).

Want me to explain when to use GaussianNB versus other Naive Bayes variants?


## Q30. In the custom Naive Bayes implementation, why is 'np.log' used instead of direct multiplication of probabilities? 
1. To speed up calculations 
2. To normalize probabilities 
3. To avoid underflow errors 
4. To implement Laplace smoothing

The correct answer is:

**3. To avoid underflow errors**

---

### Explanation:

* Multiplying many small probabilities can result in values too close to zero for the computer to represent (underflow).
* Using `np.log` transforms multiplication into addition, making calculations numerically stable.
* It’s not primarily for speeding up calculations (option 1), normalizing probabilities (option 2), or Laplace smoothing (option 4).
