<a href="https://colab.research.google.com/github/Himani954/Data-types-and-structure/blob/main/SVM_%26_Naive_Bayes_%7C_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Question 1: What is a Support Vector Machine (SVM), and how does it work?**

# **Ans1.**
# Support Vector Machine (SVM) Overview
- Definition : A Support Vector Machine is a supervised learning algorithm used for classification and regression tasks. SVMs are particularly known for their effectiveness in high-dimensional spaces.
- Goal : In classification, SVM finds the hyperplane that best separates the classes in the feature space.

How SVM Works
1. Finding the Hyperplane : SVM aims to find the hyperplane that maximally separates the classes. In a two-dimensional space, this is a line; in higher dimensions, it's a hyperplane.
2. Support Vectors : The data points closest to the hyperplane are called support vectors. These points are crucial in defining the hyperplane.
3. Margin Maximization : SVM maximizes the margin (distance) between the hyperplane and the support vectors of each class.
4. Kernel Trick : For non-linearly separable data, SVM uses kernel functions to transform the data into a higher-dimensional space where a hyperplane can separate the classes.

Key Aspects of SVM
- Kernel Choice : Common kernels include linear, polynomial, radial basis function (RBF). The choice of kernel affects the SVM's ability to capture the data's structure.
- Regularization Parameter (C) : Controls the trade-off between achieving a low-error solution and a solution that generalizes well for new data.

Example
In a binary classification problem with two features, SVM would find a line (or hyperplane in higher dimensions) that best separates the two classes with the maximum margin.

# **Question 2: Explain the difference between Hard Margin and Soft Margin SVM.**

# **Ans2.**
# Hard Margin vs. Soft Margin SVM
- Hard Margin SVM :
    - Definition : A Hard Margin SVM is used when the data is linearly separable. It aims to find the hyperplane that separates the classes with the maximum margin, without allowing any data points to be on the wrong side of the margin.
    - Issue : Hard Margin SVMs can be sensitive to outliers and won't work well if the data isn't perfectly linearly separable.

- Soft Margin SVM :
    - Definition : A Soft Margin SVM allows for some misclassifications (data points on the wrong side of the margin) by introducing slack variables. This makes SVM more robust to noise and outliers.
    - Controlled by C : The regularization parameter \(C\) controls the trade-off between maximizing the margin and minimizing the classification error. A smaller \(C\) allows for a wider margin (softer margin), potentially at the cost of more misclassifications.

Key Difference
- Flexibility : Soft Margin SVM is more flexible and commonly used because real-world data is often not perfectly separable.
- Usage : Soft Margin (via introducing slack variables and using \(C\)) is more practical for most applications.

# **Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.**


# **Ans3.**
# Kernel Trick in SVM
- Definition : The kernel trick in SVM allows the algorithm to operate in a higher-dimensional feature space without explicitly transforming the data into that space. This enables SVM to find a separating hyperplane in a space where the data might be linearly separable, even if it's not in the original space.
- How it works : By using a kernel function, SVM computes the inner product of two vectors in the higher-dimensional space without actually mapping the vectors to that space, thus saving computational resources.

Example of a Kernel: Radial Basis Function (RBF)
- RBF Kernel : \(K(x, x') = \exp(-\gamma \|x - x'\|^2)\)
- Use Case : The RBF kernel is widely used for non-linear classification problems. It's effective when the relationship between features is complex and non-linear. \(\gamma\) controls the influence of each support vector.

# **Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”?**

# **Ans4.**
# Naive Bayes Classifier Overview
- Definition : A Naive Bayes classifier is a probabilistic machine learning model based on Bayes' theorem. It's used for classification tasks.
- How it works : Naive Bayes calculates the probability of each class given the input features and predicts the class with the highest probability.

Why is it Called "Naive"?
- Assumption of Independence : The "naive" part comes from the assumption that all features are independent of each other given the class. This is often not true in real-world data but simplifies calculations and often works surprisingly well in practice.

Characteristics of Naive Bayes
- Simple and Fast : Naive Bayes is easy to implement and computationally efficient.
- Performs Well with High-Dimensional Data : Despite its simplicity and "naive" assumption, it can perform well, especially with a large number of features.


# **Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?**

# **Ans5.**

# Gaussian, Multinomial, and Bernoulli Naive Bayes Variants
1. Gaussian Naive Bayes
- Assumption : Features are continuous and follow a Gaussian distribution for each class.
- Use Case : Suitable for datasets with continuous features like the Iris dataset (`sklearn.datasets.load_iris()`).

2. Multinomial Naive Bayes
- Assumption : Features represent counts (like word counts in a document).
- Use Case : Commonly used in text classification tasks.

3. Bernoulli Naive Bayes
- Assumption : Features are binary (like presence/absence of a word).
- Use Case : Used when features are binary, like in some text classification scenarios focusing on presence/absence.

Example with Datasets
- Iris with Gaussian NB : Gaussian Naive Bayes can be applied to the Iris dataset for classification based on sepal and petal measurements.
- Text data with Multinomial/Bernoulli NB : For text classification tasks, Multinomial (for word counts) or Bernoulli (for binary word presence) Naive Bayes can be used.

# **Question 6: Write a Python program to:**
# **● Load the Iris dataset**
# **● Train an SVM Classifier with a linear kernel**
# **● Print the model's accuracy and support vectors.**


In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM classifier with linear kernel
svm_clf = SVC(kernel='linear')
svm_clf.fit(X_train, y_train)

# Predict on the test set
y_pred = svm_clf.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Print support vectors
print("Support Vectors:")
print(svm_clf.support_vectors_)

Model Accuracy: 1.00
Support Vectors:
[[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.5 5.  1.9]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


# **Question 7: Write a Python program to:**
# **● Load the Breast Cancer dataset**
# **● Train a Gaussian Naïve Bayes model**
# **● Print its classification report including precision, recall, and F1-score.**

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian Naïve Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict on the test set
y_pred = gnb.predict(X_test)

# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

Classification Report:
              precision    recall  f1-score   support

   malignant       1.00      0.93      0.96        43
      benign       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



# **Question 8: Write a Python program to:**
# **● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma.**
# **● Print the best hyperparameters and accuracy.**

In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid for C and gamma
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}

# Create SVM model and perform GridSearchCV
grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)

# Predict on test set with best model
y_pred = grid.predict(X_test)

# Print best hyperparameters and accuracy
print("Best Hyperparameters:")
print(grid.best_params_)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

Best Hyperparameters:
{'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
Model Accuracy: 0.83


# **Question 9: Write a Python program to:**

# **● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using**
# **sklearn.datasets.fetch_20newsgroups).**

# **● Print the model's ROC-AUC score for its predictions.**


In [None]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

# Load a subset of the 20 Newsgroups dataset for binary classification
categories = ['sci.space', 'rec.sport.hockey']
data = fetch_20newsgroups(subset='all', categories=categories)

# Convert text to TF-IDF features
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(data.data)
y = data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naïve Bayes Classifier
nb = MultinomialNB()
nb.fit(X_train, y_train)

# Predict probabilities
y_proba = nb.predict_proba(X_test)[:, 1]  # Probabilities for class 1

# Compute ROC-AUC score
roc_auc = roc_auc_score(y_test, y_proba)
print(f"ROC-AUC Score: {roc_auc:.2f}")

ROC-AUC Score: 1.00


# **Question 10: Imagine you’re working as a data scientist for a company that handles email communications.**

# **Your task is to automatically classify emails as Spam or Not Spam. The emails may contain:**

# **● Text with diverse vocabulary**

# **● Potential class imbalance (far more legitimate emails than spam)**

# **● Some incomplete or missing data**
# **Explain the approach you would take to:**

# **● Preprocess the data (e.g. text vectorization, handling missing data)**

# **● Choose and justify an appropriate model (SVM vs. Naïve Bayes)**

# **● Address class imbalance**

# **● Evaluate the performance of your solution with suitable metrics And explain the business impact of your solution.**

# **Ans10.**
# Approach to Classifying Emails as Spam or Not Spam
Preprocess the Data
- Text Vectorization : Use techniques like Bag-of-Words or TF-IDF to convert text into numerical vectors. TF-IDF is often more effective for spam classification.
- Handling Missing Data : If there are missing values in features other than text (unlikely in text-focused spam detection), use imputation like filling with the most frequent value or using a model-based imputation.

Choose and Justify an Appropriate Model
- Naïve Bayes vs. SVM :
    - Naïve Bayes : Often a good choice for text classification due to its simplicity and efficiency with high-dimensional sparse data like text.
    - SVM : Can be effective but might be computationally more expensive with large vocabularies.
- Choice : Naïve Bayes (Multinomial) is often preferred for text classification tasks like spam detection due to its efficiency and effectiveness.

Address Class Imbalance
- Class Imbalance Handling : Use techniques like SMOTE, class weights in the model (like `class_weight='balanced'` in sklearn classifiers), or adjust the decision threshold to handle imbalance.

Evaluate Performance with Suitable Metrics
- Metrics : Use precision, recall, F1-score (especially for the minority class), and ROC AUC or PR AUC for imbalanced data. Accuracy alone can be misleading with class imbalance.

Business Impact of the Solution
- Reduced Manual Effort : Automatically classifying spam reduces the need for manual filtering, saving time.
- lmproved User Experience : More accurate spam filtering means less spam in inboxes and fewer false positives losing important emails.
- Security : Effective spam detection can reduce phishing and malicious email risks.

Example Implementation

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

# Assuming emails_df has 'text' and 'label' (0=Not Spam, 1=Spam)
X_train, X_test, y_train, y_test = train_test_split(emails_df['text'], emails_df['label'], test_size=0.2)
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

nb = MultinomialNB(class_prior=[0.9, 0.1]) # Adjusting for imbalance
nb.fit(X_train_vec, y_train)
y_pred = nb.predict(X_test_vec)
print(classification_report(y_test, y_pred))