# **Predicting Car Acceptability based on Price and Technical features: Gradient Boosting Classification**

## **Objective**  
The project aims to implement a **Gradient Boosted Trees** classification model using `GradientBoostingClassifier` from the **scikit-learn** library to solve a classification problem. The goal is to evaluate the acceptability of a car based on its price and technical characteristics, utilizing a dataset sourced from **UCI’s Machine Learning Repository**.

---

## **Dataset**  
The dataset consists of labeled data for evaluating car acceptability, categorized based on various features including price and technical attributes.

---

## **Implementation Steps**

### 1. **Model Creation**  
- A `GradientBoostingClassifier` is instantiated with **15 estimators** (`n_estimators=15`) and all other parameters set to default.  
- The model parameters are printed using the `.get_params()` method to confirm the configuration.  

### 2. **Model Training and Prediction**  
- The model is trained on the training dataset (`X_train` and `y_train`).
- Predictions are made on the testing dataset (`X_test`), and the results are stored in a variable named `y_pred`.

### 3. **Model Evaluation**  
The performance of the model is evaluated using key classification metrics:
- **Accuracy**: Measures overall correctness.
- **Precision**: Measures the ability to avoid false positives.
- **Recall**: Measures the ability to capture all positive cases.
- **F1-Score**: Balances precision and recall.  

These metrics are printed for analysis.

### 4. **Confusion Matrix**  
- A **confusion matrix** is generated to provide a detailed breakdown of correct and incorrect predictions across classes.

---

## **Outcome**  
The Gradient Boosted Trees model successfully classifies car acceptability, providing insights into the accuracy and reliability of the predictions. The evaluation metrics (accuracy, precision, recall, and F1-score) and the confusion matrix enable a comprehensive understanding of the model's performance.

---

## **Key Highlights**  
- Gradient Boosting implementation using `GradientBoostingClassifier`.  
- Evaluation of model performance using standard metrics.  
- Application of a real-world dataset to solve a classification problem.  


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Loading the dataset to a pandas DataFrame
path_to_data = 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
column_names = ['buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'accep']

df = pd.read_csv(path_to_data, names=column_names)
target_column = 'accep'
raw_feature_columns = [col for col in column_names if col != target_column]

# Creation dummy variables from the feature columns
X = pd.get_dummies(df[raw_feature_columns], drop_first=True)

# Convertion target column to binary variable; 0 if 'unacc', 1 otherwise
df[target_column] = np.where(df[target_column] == 'unacc', 0, 1)
y = df[target_column]

# Split of the full dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123, test_size=0.3)

# 1. Creation of a Gradient Boosting Classifier and print its parameters
grad_classifier = GradientBoostingClassifier(n_estimators=15)

print(grad_classifier.get_params())

# 2. Fitting the Gradient Boosted Trees Classifier to the training data and getting the list of predictions
grad_classifier.fit(X_train, y_train)
y_pred = grad_classifier.predict(X_test)

# 3. Accuracy, precision, recall, and f1-score on the testing data
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f'Test set accuracy:\t{accuracy}')
print(f'Test set precision:\t{precision}')
print(f'Test set recall:\t{recall}')
print(f'Test set f1-score:\t{f1}')

# Confusion matrix
test_conf_matrix = pd.DataFrame(
    confusion_matrix(y_test, y_pred, labels=[1, 0]), 
    index=['actual yes', 'actual no'], 
    columns=['predicted yes', 'predicted no']
)

print(f'Confusion Matrix:\n{test_conf_matrix.to_string()}')

{'ccp_alpha': 0.0, 'criterion': 'friedman_mse', 'init': None, 'learning_rate': 0.1, 'loss': 'log_loss', 'max_depth': 3, 'max_features': None, 'max_leaf_nodes': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 15, 'n_iter_no_change': None, 'random_state': None, 'subsample': 1.0, 'tol': 0.0001, 'validation_fraction': 0.1, 'verbose': 0, 'warm_start': False}
Test set accuracy:	0.8978805394990366
Test set precision:	0.7885714285714286
Test set recall:	0.8961038961038961
Test set f1-score:	0.8389057750759878
Confusion Matrix:
            predicted yes  predicted no
actual yes            138            16
actual no              37           328
