Question 1: What is Information Gain, and how is it used in Decision Trees?

Answer:


Information Gain is a metric used in Decision Trees to decide which feature should be used to split the data at each node.
It measures how much uncertainty (entropy) is reduced after splitting the dataset based on a feature.

The feature with the highest Information Gain is selected for splitting because it gives the most useful information.
____

Question 2: What is the difference between Gini Impurity and Entropy?
Hint: Directly compares the two main impurity measures, highlighting strengths,
weaknesses, and appropriate use cases.

answer

Gini Impurity

Gini Impurity is a metric used in Decision Trees to measure how impure or mixed the classes are within a dataset after a split. It represents the probability that a randomly selected data point would be incorrectly classified if it were labeled according to the class distribution of that node. Gini Impurity is computationally simple and faster to calculate, which makes it suitable for large datasets and practical machine learning applications. Due to its efficiency, it is commonly used in the CART algorithm and in libraries such as scikit-learn. A lower Gini value indicates a purer node.

Entropy

Entropy is a measure of uncertainty or randomness in a dataset and is based on information theory. It calculates the amount of information required to describe the class distribution using logarithmic functions. Entropy provides a more theoretical and detailed understanding of data impurity but is computationally slower compared to Gini Impurity due to its complex calculations. It is mainly used in decision tree algorithms such as ID3 and C4.5. A lower entropy value signifies higher purity, while a higher value indicates more disorder in the dataset.
___

Question 3: What is Pre-Pruning in Decision Trees?

 Answer

Pre-Pruning is a technique used in Decision Trees to prevent the model from becoming too complex and overfitting the training data. In this method, the growth of the decision tree is stopped early by setting certain conditions such as maximum depth of the tree, minimum number of samples required to split a node, or minimum samples required at a leaf node. By applying these limits during the training process, Pre-Pruning helps improve the generalization ability of the model on unseen data. It also reduces computation time and makes the model easier to interpret, although excessive pre-pruning may sometimes lead to underfitting.
___

In [1]:
#Question 4: Write a Python program to train a Decision Tree Classifier using Gini Impurity and print feature importances
# Answer
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load the dataset
data = load_iris()
X = data.data
y = data.target

# Create Decision Tree model using Gini Impurity
model = DecisionTreeClassifier(criterion='gini')

# Train the model
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
print(model.feature_importances_)


Feature Importances:
[0.         0.01333333 0.06405596 0.92261071]



_____
Question 5: What is a Support Vector Machine (SVM)?

Answer

 - Support Vector Machine (SVM) is a supervised machine learning algorithm.

- It is used for classification and regression problems.

- SVM works by finding the best separating boundary, called a hyperplane.

- The hyperplane separates different classes with the maximum margin.

- Data points that lie closest to the hyperplane are called support vectors.

- SVM performs well for both linear and non-linear data.

- It is effective in high-dimensional spaces.

____
Question 6: What is the Kernel Trick in SVM?
Answer :

- The Kernel Trick is a technique used in Support Vector Machines (SVM).

- It helps SVM handle non-linear data.

- Kernel Trick transforms data into a higher-dimensional space.

- This transformation makes non-linear data linearly separable.

- The actual transformation is done implicitly, without calculating new features directly.

- Common kernel functions include:

  - Linear Kernel

   - Polynomial Kernel

   - Radial Basis Function (RBF)

- Kernel Trick improves SVM’s performance on complex datasets.
____

In [2]:
#Question 7: Train two SVM classifiers with Linear and RBF kernels and compare their accuracies
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split the dataset into training and testing
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
linear_pred = svm_linear.predict(X_test)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
rbf_pred = svm_rbf.predict(X_test)

# Calculate accuracies
linear_accuracy = accuracy_score(y_test, linear_pred)
rbf_accuracy = accuracy_score(y_test, rbf_pred)

print("Linear Kernel Accuracy:", linear_accuracy)
print("RBF Kernel Accuracy:", rbf_accuracy)



Linear Kernel Accuracy: 1.0
RBF Kernel Accuracy: 0.8055555555555556


______
Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

Answer :

- aïve Bayes is a supervised machine learning classifier.

- It is based on Bayes’ Theorem.

- It works on probability concepts to make predictions.

- It assumes that all features are independent of each other.

- This assumption is usually not true in real-life data.

- Because of this strong assumption, it is called "Naïve".

- Naïve Bayes is fast, simple, and efficient.

- It performs well on large datasets and text classification problems.
_______

Question 9: Explain the differences between Gaussian, Multinomial, and Bernoulli Naïve Bayes
✅ Answer:

- Naïve Bayes has three main types depending on the type of data:

1 Gaussian Naïve Bayes (GNB)

- Used for continuous numeric data.

- Assumes that features follow a normal (Gaussian) distribution.

- Example: Height, weight, or other measurements.

2 Multinomial Naïve Bayes (MNB)

- Used for discrete/count data, often in text classification.

- Works with word counts or frequencies.

- Example: Spam email detection, document classification.

3 Bernoulli Naïve Bayes (BNB)

- Used for binary data (0 or 1 features).

- Suitable for presence or absence of a feature.

- Example: Whether a word appears in a document (Yes/No).

Key Differences:

  - Gaussian → continuous features

-  Multinomial → count-based features

-  Bernoulli → binary features
_____

In [3]:
#Question 10: Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create Gaussian Naive Bayes model
model = GaussianNB()

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 0.9736842105263158
