In [17]:
import pandas as pd

# Replace with the correct file path if needed
df = pd.read_csv("in-vehicle-coupon-recommendation.csv")
df.head()

print("Shape of DataFrame:", df.shape)
df.info()



Shape of DataFrame: (12684, 26)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12684 entries, 0 to 12683
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   destination           12684 non-null  object
 1   passanger             12684 non-null  object
 2   weather               12684 non-null  object
 3   temperature           12684 non-null  int64 
 4   time                  12684 non-null  object
 5   coupon                12684 non-null  object
 6   expiration            12684 non-null  object
 7   gender                12684 non-null  object
 8   age                   12684 non-null  object
 9   maritalStatus         12684 non-null  object
 10  has_children          12684 non-null  int64 
 11  education             12684 non-null  object
 12  occupation            12684 non-null  object
 13  income                12684 non-null  object
 14  car                   108 non-null    object
 15  Bar 

### **Code Explanation**

- **Imports the `pandas` library:**
    - Enables data manipulation and analysis using DataFrame structures.

- **Reads the CSV file into a DataFrame (`df`):**
    - Loads the "in-vehicle-coupon-recommendation.csv" file.
    - This provides a structured way to access and process the dataset.

- **Displays the first few rows (`df.head()`):**
    - Gives a quick preview of the data, helping to verify that it has been loaded correctly.

- **Prints the shape of the DataFrame and its info:**
    - `df.shape` shows the number of rows and columns, which helps you understand the dataset's size.
    - `df.info()` provides details on the data types and non-null counts for each column, which is crucial for identifying missing values or data type issues early in the preprocessing stage.

In [18]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=["Y"])
y = df["Y"]

df['Y'].value_counts(normalize=True) * 100




Y
1    56.843267
0    43.156733
Name: proportion, dtype: float64

### **Code Explanation**

- **Import from scikit-learn:**
    - `train_test_split` is imported to split the dataset into training and testing sets, which is crucial for evaluating model performance.

- **Separates Features and Target:**
    - `X = df.drop(columns=["Y"])` creates a feature matrix by removing the target column (`Y`).
    - `y = df["Y"]` isolates the target variable, which shows whether the coupon is accepted.

- **Displays Class Distribution:**
    - `df['Y'].value_counts(normalize=True) * 100` computes the percentage distribution of the target classes.
    - This helps you understand the balance of the classes (e.g., coupon accepted vs. not accepted), which is important for selecting appropriate modeling strategies, especially if the dataset is imbalanced.

In [None]:
from sklearn.preprocessing import StandardScaler

# One-hot encode categorical columns
X_encoded = pd.get_dummies(X, drop_first=True)
X_encoded.head()

X_train, X_test, y_train, y_test = train_test_split(
    X_encoded, y, 
    test_size=0.2, 
    random_state=42,
)

scaler = StandardScaler()

# Fit on training data, then transform both train and test
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### **Code Explanation**

- **Imports from scikit-learn:**
    - `StandardScaler` is imported for scaling numeric features, ensuring that each feature contributes equally during model training.

- **One-Hot Encoding of Categorical Features:**
    - The code uses `pd.get_dummies` with `drop_first=True` to convert all categorical variables in the features (`X`) into dummy/indicator variables.
	    - `pd.get_dummies` means it transforms categorical variables into a series of binary (0 or 1) columns—each representing one category. When you set `drop_first=True`, it drops the first category for each variable. You're converting each categorical feature in `X` into several new features that the model can work with numerically, while ensuring you don’t introduce unnecessary redundancy.
    - This is necessary because SVM models require numeric input, and one-hot encoding ensures that each category is represented as a separate binary feature.

- **Train/Test Split:**
    - `train_test_split` splits the one-hot encoded data and target variable (`y`) into training (80%) and testing (20%) sets.
    - The `random_state=42` parameter ensures reproducibility of the split.

- **Feature Scaling with StandardScaler:**
    - A `StandardScaler` object is created to normalize the feature values, which is important for SVM as it relies on distance calculations between data points.
    - The scaler is **fitted on the training data** (`X_train`) to compute the mean and standard deviation, and then used to transform both the training and test data. This prevents data leakage and ensures consistent scaling across both sets.

In [6]:
from sklearn.svm import SVC

svm_linear = SVC(kernel="linear", C=1.0, random_state=42)
svm_linear.fit(X_train_scaled, y_train)


### **Code Explanation**

- **Import and Initialise the SVM Classifier:**
    - The `SVC` class from `sklearn.svm` is used to create a Support Vector Machine model.
    - The SVM is initialized with a **linear kernel**, which means it will try to find a linear hyperplane that best separates the classes.
	    - When an SVM with a **linear kernel** is initialised, it means that the model is set up to find a straight line (in two dimensions), a flat plane (in three dimensions), a hyperplane in higher-dimensional space that best separates the data points belonging to different classes.
		    - **Linear Separability:**  The model assumes that the classes can be separated by a straight boundary. It looks for a hyperplane that divides the feature space so that points on one side of the hyperplane belong mainly to one class, and points on the other side belong mainly to the other class.
		    - **Decision Boundary:**  
			 The hyperplane is defined mathematically by a linear equation. 

				 w1+w2+…+wn+b=0
		    
- **Set Hyperparameters:**
    - **`C=1.0`:** This is the regularization parameter. A moderate value of 1.0 balances between having a smooth decision boundary and correctly classifying the training examples.
    - **`random_state=42`:** Ensures that any random processes involved are reproducible.

- **Model Training:**
    - The `svm_linear.fit(X_train_scaled, y_train)` line trains the SVM on the scaled training data, allowing the model to learn the decision boundary that separates the classes based on the features.

In [11]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred_linear = svm_linear.predict(X_test_scaled)

print("Linear SVM Results")
print(f"Accuracy: {accuracy_score(y_test, y_pred_linear):.4f}")
print("Classification Report:")
print(classification_report(y_test, y_pred_linear))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_linear))


Linear SVM Results
Accuracy: 0.6858
Classification Report:
              precision    recall  f1-score   support

           0       0.64      0.59      0.61      1021
           1       0.72      0.75      0.73      1395

    accuracy                           0.69      2416
   macro avg       0.68      0.67      0.67      2416
weighted avg       0.68      0.69      0.68      2416

Confusion Matrix:
[[ 606  415]
 [ 344 1051]]


### **Code Explanation:**

1. **Importing Metrics:**
    - The code imports `accuracy_score`, `classification_report`, and `confusion_matrix` from `sklearn.metrics` to evaluate the performance of the trained SVM model.

2. **Prediction on Test Data:*
    - `y_pred_linear = svm_linear.predict(X_test_scaled)`
        - This line uses the linear SVM model (trained earlier) to predict the target values for the scaled test data.

3. **Printing the Results:**
    - The code prints:
        - **Accuracy:** Overall fraction of correct predictions.
        - **Classification Report:** Detailed metrics including precision, recall, and F1-score for each class.
        - **Confusion Matrix:** A table showing the counts of true negatives, false positives, false negatives, and true positives.

### **Result Analysis:**

- **Accuracy: 0.69**
    - Approximately **69%** of the test instances were correctly classified by the model.

- **Classification Report:**
    - **Class 0:**
        - **Precision:** 0.64
            - Out of all instances predicted as class 0, 64% were actually class 0.
        - **Recall:** 0.59
            - The model correctly identified 59% of all actual class 0 instances.
        - **F1-score:** 0.61
            - This is the harmonic mean of precision and recall, indicating a moderate balance.
    - **Class 1:**
        - **Precision:** 0.72
            - Out of all instances predicted as class 1, 72% were actually class 1.
        - **Recall:** 0.75
            - The model correctly identified 75% of all actual class 1 instances.
        - **F1-score:** 0.73
            - Indicates a good balance between precision and recall for class 1.
    - **Overall:**
        - **Macro Avg:** Precision 0.68, Recall 0.67, F1 0.67 (equal weight for each class).
        - **Weighted Avg:** Slightly higher values reflect the influence of class distribution.

- **Confusion Matrix:**
    - **For Class 0 (First Row):**
        - **True Negatives (606):** Correctly predicted as class 0.
        - **False Positives (415):** Class 0 instances that were incorrectly predicted as class 1.
    - **For Class 1 (Second Row):**
        - **False Negatives (344):** Class 1 instances that were incorrectly predicted as class 0.
        - **True Positives (1051):** Correctly predicted as class 1.

### **Interpretation:**
- The **linear SVM** performs moderately well with an overall accuracy of ~68.6%.
- **Class 1** (coupon accepted) shows better recall and F1-score compared to **Class 0**.
- The **confusion matrix** reveals that the model makes a significant number of errors:
    - **415 false positives:** Instances predicted as class 1 while actually class 0.
    - **344 false negatives:** Instances predicted as class 0 while actually class 1.

In [8]:
svm_poly = SVC(kernel="poly", degree=3, C=1.0, gamma="scale", random_state=42)
svm_poly.fit(X_train_scaled, y_train)


### **Code Explanation:**

- **Model Initialisation with Polynomial Kernel:**
    - Here, an SVM model is created with a **polynomial kernel** using `SVC(kernel="poly", degree=3, C=1.0, gamma="scale", random_state=42)`.
    - The **polynomial kernel** enables the SVM to capture non-linear relationships by mapping the input features into a higher-dimensional space.
    - **Degree 3** indicates that the polynomial used to transform the data is cubic, meaning it considers interactions among features up to the third power.

- **Hyperparameter Details:**
    - **C=1.0:** Regularisation parameter that balances the trade-off between maximising the margin and minimising classification errors.
    - **gamma="scale":** Automatically calculates the gamma value based on the number of features and their variance, which helps in controlling the influence of individual training samples.
    - **random_state=42:** Ensures that any random processes are reproducible, meaning that the model’s initialisation and any random sampling are consistent across runs.

- **Model Training:**
    - The model is trained on the scaled training data using `svm_poly.fit(X_train_scaled, y_train)`.
    - This step allows the polynomial SVM to learn a decision boundary that can potentially model more complex relationships than a linear hyperplane.


**Why Use a Polynomial Kernel?**

- **Capturing Non-linear Relationships:**  
    Unlike the linear SVM, the polynomial SVM can handle data where the classes are not linearly separable by implicitly mapping data to a higher-dimensional space.
    
- **Complex Decision Boundaries:**  
    The cubic transformation (degree=3) allows the model to capture interactions between features that are not evident in the original feature space.
    
- **Flexibility vs. Complexity:**  
    While a polynomial kernel can provide more flexibility in modeling, it may also increase the risk of overfitting. The parameter `C` and the chosen degree help balance this risk.

### **Explanation of the Polynomial Kernel Function**

The polynomial kernel function is given by:

	K(x,y)=(xTy+c)d

- **xTy:**  
    This is the dot product between two feature vectors x and y. It measures the similarity between the two vectors in the original feature space.
    
- **c:**  
    This is a constant. It effectively shifts the decision boundary, making the model more flexible.
    
- **d:**  
    The degree of the polynomial. This determines how complex the relationship between features can be. A higher degree means a more flexible decision boundary, which can capture more complex patterns.

In [10]:
y_pred_poly = svm_poly.predict(X_test_scaled)

print("Polynomial SVM Results")
print(f"Accuracy: {accuracy_score(y_test, y_pred_poly):.4f}")
print("Classification Report:")
print(classification_report(y_test, y_pred_poly))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_poly))


Polynomial SVM Results
Accuracy: 0.7268
Classification Report:
              precision    recall  f1-score   support

           0       0.72      0.57      0.64      1021
           1       0.73      0.84      0.78      1395

    accuracy                           0.73      2416
   macro avg       0.73      0.71      0.71      2416
weighted avg       0.73      0.73      0.72      2416

Confusion Matrix:
[[ 585  436]
 [ 224 1171]]


### **Code Explanation:**

1. **Prediction on Test Data:**
    - The line `y_pred_poly = svm_poly.predict(X_test_scaled)` uses the trained polynomial SVM model to predict class labels on the scaled test set.

2. **Evaluating the Model:**
    - **Accuracy:**
        - `accuracy_score(y_test, y_pred_poly)` computes the overall fraction of correctly predicted instances.
    - **Classification Report:**
        - `classification_report(y_test, y_pred_poly)` displays detailed metrics—precision, recall, and F1-score—for each class.
    - **Confusion Matrix:**
        - `confusion_matrix(y_test, y_pred_poly)` shows the counts of true positives, false positives, false negatives, and true negatives, which helps visualise the model’s errors.

3. **Printed Results:**
    - **Accuracy:** 0.73, which is higher than the linear SVM.
    - **For Class 0:**
        - Precision is 0.72 and Recall is 0.57, resulting in an F1-score of 0.64.
    - **For Class 1:**
        - Precision is 0.73 and Recall is 0.84, resulting in an F1-score of 0.78.
    - **Confusion Matrix:**
        - The matrix shows that 585 instances of class 0 were correctly classified, while 436 were incorrectly classified as class 1.
        - For class 1, 1171 were correctly identified and 224 were misclassified as class 0.

### **Code Explanation:**

1. **Prediction on Test Data:**
    - The line `y_pred_poly = svm_poly.predict(X_test_scaled)` uses the trained polynomial SVM model to predict class labels on the scaled test set.

2. **Evaluating the Model:**
    - **Accuracy:**
        - `accuracy_score(y_test, y_pred_poly)` computes the overall fraction of correctly predicted instances.
    - **Classification Report:**
        - `classification_report(y_test, y_pred_poly)` displays detailed metrics—precision, recall, and F1-score—for each class.
    - **Confusion Matrix:**
        - `confusion_matrix(y_test, y_pred_poly)` shows the counts of true positives, false positives, false negatives, and true negatives, which helps visualise the model’s errors.

3. **Printed Results:**
    - **Accuracy:** 0.73, which is higher than the linear SVM.
    - **For Class 0:**
        - Precision is 0.72 and Recall is 0.57, resulting in an F1-score of 0.64.
    - **For Class 1:**
        - Precision is 0.73 and Recall is 0.84, resulting in an F1-score of 0.78.
    - **Confusion Matrix:**
        - The matrix shows that 585 instances of class 0 were correctly classified, while 436 were incorrectly classified as class 1.
        - For class 1, 1171 were correctly identified and 224 were misclassified as class 0.

---

### **Comparison to Linear SVM Results**

| Model          | Accuracy | Precision(Class 0/Class 1) | Recall(Class 0/Class 1) | F1-Score(Class 0/Class 1) |
| -------------- | -------- | -------------------------- | ----------------------- | ------------------------- |
| Linear SVM     | 69%      | 0.64/0.72                  | 0.59/0.75               | 0.61/0.73                 |
| Polynomial SVM | 73%      | 0.72/0.73                  | 0.57/0.84               | 0.64/0.78                 |


---

### **Analysis & Comparison:**

- **Overall Accuracy:**
    - The **polynomial SVM** achieves a higher overall accuracy (73%) compared to the **linear SVM** (69%).

- **Class 0 (Negative Class):**
    - **Linear SVM:**
        - Precision: 0.64, Recall: 0.59
    - **Polynomial SVM:**
        - Precision improves to 0.72, but Recall drops slightly to 0.57.
    - **Interpretation:**
        - The polynomial model is more precise for class 0 (fewer false positives proportionally) but misses a few more actual class 0 cases (lower recall).

- **Class 1 (Positive Class):**
    - **Linear SVM:**
        - Precision: 0.72, Recall: 0.75
    - **Polynomial SVM:**
        - Precision remains similar (0.73), but Recall significantly increases to 0.84.
    - **Interpretation:**
        - The polynomial SVM is better at identifying class 1 instances (higher recall), which is often critical if class 1 represents a more important outcome (e.g., coupon acceptance).

---

### **Conclusion:**

The **polynomial SVM** shows an improvement in overall accuracy and a notable increase in recall for class 1, showing it captures non-linear relationships better than the linear SVM. This makes it particularly valuable if the goal is to correctly identify as many positive cases as possible, even at the cost of slightly more misclassifications in class 0.

## **Data Sources**

- **[UCI Machine Learning Repository - In-Vehicle Coupon Recommendation Dataset](https://archive.ics.uci.edu/dataset/603/in+vehicle+coupon+recommendation)**
    - This dataset contains **12,684 instances** with **25 features**.
    - Features include various driving and demographic factors such as `destination`, `passenger`, `weather`, `temperature`, `time`, `coupon`, `expiration`, `gender`, `age`, `maritalStatus`, and several additional behavioural and economic attributes.
    - The target variable (`Y`) indicates whether the coupon is accepted (`1`) or not (`0`).


---

## **Pre-Processing**

### **In-Vehicle Coupon Recommendation Dataset Preprocessing**

- **Categorical Feature Encoding:**
    - Categorical variables were transformed into numeric format using one-hot encoding with `pd.get_dummies` (using `drop_first=True` to avoid the dummy variable trap).

- **Feature Scaling:**
    - Numerical features were standardised using `StandardScaler` to ensure that all features contribute equally when training distance-based models such as SVM.

- **Train/Test Split:**
    - The data was split into training (80%) and testing (20%) sets, with reproducibility ensured by setting a fixed random state.

---

## **Data Understanding & Visualisation**

- **Data Inspection:**
    - The dataset was initially inspected using Pandas to confirm its dimensions and to review data types.
- **Class Distribution:**
    - The target variable (`Y`) was examined to understand the balance between accepted (1) and rejected (0) coupons.

---

## **Algorithms**

Support Vector Machines (SVMs) were utilised for the classification task. Two types were implemented:

- **Linear SVM:**
    - Assumes that the classes are linearly separable by finding a straight-line (or hyperplane in higher dimensions) decision boundary.
- **Polynomial SVM:**
    - Uses a polynomial kernel to capture non-linear relationships.

---

## **Model Training and Evaluation**

### **Training**

- **Linear SVM:**
    - The linear SVM was trained on the standardised training data.
    - Model parameters included `kernel="linear"`, `C=1.0`, and `random_state=42`.
- **Polynomial SVM:**
    - The polynomial SVM was also trained on the same standardised training data.
    - Model parameters were set as `kernel="poly"`, `degree=3`, `C=1.0`, `gamma="scale"`, and `random_state=42`.

### **Evaluation**

- **Linear SVM Results:**
    
    - **Accuracy:** 0.69
        
    - **Classification Report:**
        
        |Class|Precision|Recall|F1-Score|Support|
        |---|---|---|---|---|
        |0|0.64|0.59|0.61|1021|
        |1|0.72|0.75|0.73|1395|
        
    - **Observations:**  
        The linear SVM correctly classified approximately 69% of the instances. It shows moderate performance for both classes, although there are notable misclassifications.
        
- **Polynomial SVM Results:**
    
    - **Accuracy:** 0.73
        
    - **Classification Report:**
        
        |Class|Precision|Recall|F1-Score|Support|
        |---|---|---|---|---|
        |0|0.72|0.57|0.64|1021|
        |1|0.73|0.84|0.78|1395|
        
    - **Observations:**  
        The polynomial SVM achieved a higher overall accuracy (73%). It demonstrates an improved ability to identify class 1 (coupon accepted), as evidenced by a higher recall (0.84 compared to 0.75 in the linear SVM), though with a slight reduction in recall for class 0.
        

### **Comparison:**

- The **Polynomial SVM** outperforms the **Linear SVM** by capturing non-linear relationships, resulting in higher overall accuracy and a significant improvement in identifying positive cases (coupon acceptance).
- While the linear model shows a balanced performance, the polynomial model shows a trade off with a slight decrease in recall for class 0 but an improvement for class 1, which is often more critical in practical applications.

---

## **Online Resources & Sources**

- **[UCI Machine Learning Repository - In-Vehicle Coupon Recommendation Dataset](https://archive.ics.uci.edu/dataset/603/in+vehicle+coupon+recommendation)**
    - Provides the dataset and detailed variable information.

- **Learning SVM:**  
    [SVM Machine Learning Tutorial – FreeCodeCamp](https://www.freecodecamp.org/news/svm-machine-learning-tutorial-what-is-the-support-vector-machine-algorithm-explained-with-code-examples/)
    
- **Coding and Parameter Reference:**  
    [Scikit-learn SVM Documentation](https://scikit-learn.org/stable/modules/svm.html)
    
- **Understanding Kernel Functions:**  
    [GeeksforGeeks – Major Kernel Functions in SVM](https://www.geeksforgeeks.org/major-kernel-functions-in-support-vector-machine-svm/)
    

---

## **Tools & Technologies Used**

- **Python Libraries:**
    - `Pandas` and `NumPy` for data manipulation.
    - `Scikit-learn` for implementing SVMs, preprocessing, and performance evaluation.
    - `Matplotlib` and `Seaborn` for visualisation of results.
- **Development Environment:**
    - **Jupyter Notebook** for interactive code development and analysis.

---

## **Challenges Faced**

1. **High Dimensionality:**
    - One-hot encoding of numerous categorical features increased the dimensionality, which needed to be managed effectively.
2. **Model Selection:**
    - Deciding between a linear and a polynomial SVM involved evaluating trade-offs between model complexity, accuracy, and generalisation.
3.  Understanding the SVM models:
	- Understanding the differences between Linear and Non-linear SVM models and the functions corresponding to polynomial and linear SVM's.

---

## **Conclusion**

- The In-Vehicle Coupon Recommendation dataset was effectively used to evaluate two SVM models.
- The **Polynomial SVM** demonstrated superior performance, with an overall accuracy of 73% and improved recall for the critical positive class, compared to 69% accuracy with the **Linear SVM**.
- These results suggest that incorporating non-linear kernel functions can better capture the complex relationships within the data, thereby enhancing the model’s predictive capabilities.
- Future work may include further hyperparameter tuning and exploring additional kernel functions or ensemble methods to further boost performance.