## **Decision Trees**

Constructs a tree-like structure to classify text based on word presence or absence.

**Imports**

In [6]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

**Sample Dataset**

In [11]:
documents = [
    "I love programming in Python", "Python is great for data science",
    "Machine learning is fascinating", "AI is the future",
    "The movie was fantastic", "I really enjoyed the film",
    "The food was terrible", "I hated the bad service",
    "The experience was awful", "I will never go there again"
]

labels = [1, 1, 1, 1,  # Positive class
          1, 1,  # Positive sentiment
          0, 0, 0, 0]  # Negative sentiment

# Convert text to TF-IDF features
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

**Initialize Decision Tree Classifier**

In [12]:
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict on test data
y_pred = clf.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Show important words
feature_names = vectorizer.get_feature_names_out()
important_words = sorted(zip(clf.feature_importances_, feature_names), reverse=True)[:10]
print("\nTop 10 Important Words in Decision Tree:")
for importance, word in important_words:
    print(f"{word}: {importance:.4f}")

Accuracy: 0.50

Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         1
           1       0.50      1.00      0.67         1

    accuracy                           0.50         2
   macro avg       0.25      0.50      0.33         2
weighted avg       0.25      0.50      0.33         2


Top 10 Important Words in Decision Tree:
terrible: 0.4444
there: 0.3175
hated: 0.2381
will: 0.0000
was: 0.0000
the: 0.0000
service: 0.0000
science: 0.0000
really: 0.0000
python: 0.0000


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
