## Decision Tree

##### Decision trees are more associated with traditional machine learning, particularly in supervised learning. They split data into subsets based on feature values, making binary decisions at each node until reaching a stopping criterion. Deep learning models like neural networks are preferred for tasks like image recognition and natural language processing due to their ability to learn complex patterns from large datasets

In [1]:
import pandas as pd

#### Read the CSV file

In [2]:
nlp_data1=pd.read_csv('nlp_data1.csv')

#### Classification with TF-IDF Vectorization

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.feature_extraction.text import TfidfVectorizer
import joblib

X = nlp_data1['lemmatized_token']
y = nlp_data1['target']

tfidf_vectorizer = TfidfVectorizer()
X = tfidf_vectorizer.fit_transform(X)

#### Train & Test the data

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [7]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_validate
import numpy as np

In [8]:
# Define models
models = {
    'Decision Tree': DecisionTreeClassifier(),
}

#### Perform cross-valiation for model

In [9]:
for name, model in models.items():
    print("Model:", name)
    cv_results = cross_validate(model, X_train, y_train, cv=5,
                                scoring=['accuracy', 'precision', 'recall', 'f1'])
    
    # Access the cross-validation results
    accuracy_scores = cv_results['test_accuracy']
    precision_scores = cv_results['test_precision']
    recall_scores = cv_results['test_recall']
    f1_scores = cv_results['test_f1']

    # Print the mean and standard deviation of each metric
    print("Accuracy: mean =", np.mean(accuracy_scores), ", std =", np.std(accuracy_scores))
    print("Precision: mean =", np.mean(precision_scores), ", std =", np.std(precision_scores))
    print("Recall: mean =", np.mean(recall_scores), ", std =", np.std(recall_scores))
    print("F1 Score: mean =", np.mean(f1_scores), ", std =", np.std(f1_scores))
    print("\n")

Model: Decision Tree
Accuracy: mean = 0.7361247947454844 , std = 0.01660487465380244
Precision: mean = 0.701801004892428 , std = 0.020344724545970198
Recall: mean = 0.6735238095238095 , std = 0.02703326188318311
F1 Score: mean = 0.6871592944889591 , std = 0.020803135478775645




#### Saving Trained Model Using Joblib

In [10]:
joblib.dump(model, f'{name.replace(" ", "_")}_model.joblib')
print("Model saved as", f'{name.replace(" ", "_")}_model.joblib')
print("\n")

Model saved as Decision_Tree_model.joblib


