# Decision Tree for Banknote Authentication Dataset

This notebook demonstrates the application of a decision tree classifier on the Banknote Authentication Dataset from the UCI Machine Learning Repository. Below are the detailed steps and explanations for Question 2.

## Step 1: Image Statistical Measures

### Variance
- Measures the spread of pixel intensity values around the mean.
- High variance: wide spread; Low variance: values clustered near the mean.

### Skewness
- Measures the asymmetry of the intensity distribution.
- Positive skew: longer tail on the right (more high-intensity values).
- Negative skew: longer tail on the left (more low-intensity values).

### Kurtosis
- Measures the 'tailedness' or peakedness of the distribution.
- High kurtosis: heavy tails (more outliers); Low kurtosis: light tails (flatter distribution).

### Entropy
- Measures the randomness or complexity of the image.
- Higher entropy: more detail and variation; Lower entropy: simpler, more uniform image.

## Step 2: Load and Explore the Dataset

The dataset contains 1372 instances, 4 numerical features, and a binary class label (0: fake, 1: authentic). In this section, data is loaded from the provided ZIP file, and a comment is provided at the end on whether the decision tree algorithm is suitable for these features.
**Decision Tree Suitability:**
- If the features are numerical and the target is binary, decision trees is a suitable choice.
- Decision trees do not require feature scaling and provide interpretable decision rules.
- However, overfitting may occur without proper tuning, so hyperparameter adjustments are necessary.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import zipfile

# For model building and evaluation
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import classification_report, ConfusionMatrixDisplay

# Load the dataset from the zip file
# The .zip file is named 'banknote+authentication.zip' and contains a file named 'data_banknote_authentication.txt'
zip_file = 'banknote+authentication.zip'
with zipfile.ZipFile(zip_file, 'r') as z:
    # List files to see the content
    print(z.namelist())
    # Adjust the filename below if needed
    with z.open('data_banknote_authentication.txt') as f:
        df = pd.read_csv(f, header=None, names=['variance', 'skewness', 'kurtosis', 'entropy', 'class'])

# Display the first few rows of the dataframe
df.head()

### Comment on Decision Tree Suitability

The dataset has a relatively small number of features (4 numerical features) and a binary target. Decision trees are WELL SUITED for such datasets because:

- They can handle numerical features easily.
- They do not require feature scaling.
- They provide interpretable decision rules that can be visualized.

However, decision trees can be prone to OVERFITTING if not properly tuned. In that case, we should experiment with different hyperparameters (such as `max_depth` and `min_samples_split`) to balance model complexity and performance.

## Step 3: Data Splitting, Model Training, and Evaluation

We split the data into training (80%) and testing (20%) sets, then train a DecisionTreeClassifier using different hyperparameters. The model is evaluated using accuracy, precision, recall, and F1-score, and a confusion matrix is also plotted.

In [None]:
# Split the data into features and target
X = df.drop('class', axis=1)
y = df['class']

# Split into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42, stratify=y)

# Instantiate the DecisionTreeClassifier with initial hyperparameters
clf = DecisionTreeClassifier(max_depth=5, min_samples_split=10, criterion='gini', random_state=42)

# Train the model
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Evaluate the model
print('Classification Report:')
print(classification_report(y_test, y_pred))

# Plot the confusion matrix
disp = ConfusionMatrixDisplay.from_estimator(clf, X_test, y_test, cmap=plt.cm.Blues)
plt.title('Confusion Matrix')
plt.show()

## Step 4: Visualize the Decision Tree and Discuss Interpretability

We use `plot_tree()` to visualize the decision tree. The depth of the tree is crucial for interpretability:
- **Shallow Tree:** Easier to interpret but might miss complex patterns.
- **Deep Tree:** Can capture complex relationships but may be too complex to understand.

In [None]:
# Visualize the decision tree
plt.figure(figsize=(20,10))
plot_tree(clf, feature_names=X.columns, class_names=['Fake', 'Authentic'], filled=True, rounded=True, fontsize=10)
plt.title('Decision Tree Visualization')
plt.show()

# It's possible to experiment with different max_depth values and observe the trade-off between model complexity and interpretability.

## Step 5: Feature Importance

We extract and plot the feature importances from the trained model. This shows which features contribute most to the classification decision.

In [None]:
# Extract feature importances
importances = clf.feature_importances_
features = X.columns

# Create a DataFrame for visualization
feat_importances = pd.DataFrame({'Feature': features, 'Importance':importances}).sort_values(by='Importance', ascending=False)
print(feat_importances)

# Plot feature importances
plt.figure(figsize=(8,6))
sns.barplot(x='Importance', y='Feature', data=feat_importances, palette='viridis')
plt.title('Feature Importances')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()

## Step 6: Remarks

After training and evaluating the decision tree, here are some reflections:

- **Interpretability:** The decision tree provides clear decision rules, making the model transparent.
- **Performance:** With proper hyperparameter tuning, the decision tree performs well on the dataset. However, its simplicity may cause overfitting if not controlled.
- **Overall Suitability:** Given the dataset's small number of features and binary classification task, the decision tree is a good choice. For higher accuracy or more complex data patterns, ensemble methods like Random Forests or Gradient Boosting might be explored.

**Answer:** I believe the decision tree is a good model for the Banknote Authentication dataset because it handles numerical features well, requires minimal data preprocessing, and offers an interpretable set of decision rules. However, its tendency to overfit makes it necessary to carefully tune hyperparameters. For further improvement, ensemble methods might be considered.

## Conclusion

This notebook has walked through the entire process of loading the Banknote Authentication dataset, training and evaluating a decision tree classifier, visualizing the model, and extracting feature importances. Additionally, it provides remarks, comments on the model's suitability.