## Here I have used the built-in dataset available in the sklearn.datasets module called the "load_iris" dataset to demonstrate building a decision tree classifier. While this dataset is not directly related to predicting customer purchases

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, accuracy_score

In [5]:
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

In [8]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
# Initialize and train the decision tree classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

In [10]:
# Predict on the testing set
y_pred = clf.predict(X_test)

In [14]:
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 1.0
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## In this example, we're using the Iris dataset to build a decision tree classifier. The dataset consists of three classes of iris plants, and we're predicting the class based on features such as sepal length, sepal width, petal length, and petal width. This code demonstrates the core process of loading the dataset, splitting it, building the decision tree classifier, making predictions, and evaluating the model's performance. 

# Another Example

## Here, I have used the "UCI Adult Income" dataset, which is often used for predicting whether a person's income is above or below a certain threshold. This can serve as a reasonable proxy for binary classification similar to the Bank Marketing dataset. Here's how you can use it:

In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, accuracy_score


In [16]:
# Load the UCI Adult Income dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
column_names = ["age", "workclass", "fnlwgt", "education", "education-num", "marital-status", "occupation",
                "relationship", "race", "sex", "capital-gain", "capital-loss", "hours-per-week", "native-country", "income"]

In [17]:
data = pd.read_csv(url, names=column_names, sep=',\s*', engine='python')


In [18]:
# Preprocessing
data['income'] = data['income'].apply(lambda x: 1 if x == '>50K' else 0)  # Convert income to binary


In [19]:
# Drop columns that might not be relevant for the classifier
data = data.drop(['fnlwgt', 'education'], axis=1)

In [20]:
# Convert categorical variables to one-hot encoding
data = pd.get_dummies(data, columns=['workclass', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country'])


In [21]:
# Split into features (X) and target (y)
X = data.drop('income', axis=1)
y = data['income']


In [22]:
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [23]:
# Initialize and train the decision tree classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

In [24]:
# Predict on the testing set
y_pred = clf.predict(X_test)

In [25]:
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.822201750345463
              precision    recall  f1-score   support

           0       0.88      0.88      0.88      4942
           1       0.63      0.63      0.63      1571

    accuracy                           0.82      6513
   macro avg       0.76      0.76      0.76      6513
weighted avg       0.82      0.82      0.82      6513



## In this example, the "UCI Adult Income" dataset is used as a similar alternative to demonstrate building a decision tree classifier. This dataset involves predicting whether a person earns more than 50,000 a year based on various demographic features. The code provided follows similar steps as earlier to preprocess, split, build, and evaluate the model. 