## Random Forest Classifier

Random Forest is a popular ensemble learning technique in traditional machine learning. However, it's worth mentioning in the context of deep learning for comparison.

Random Forest is an ensemble method that operates by constructing multiple decision trees and combines their predictions. Each decision tree in a Random Forest is trained independently on a random subset of the training data and a random subset of the features. This randomness helps to reduce overfitting and improve generalization performance. 

In [1]:
import pandas as pd

#### Read the CSV file

In [2]:
nlp_data1=pd.read_csv('nlp_data1.csv')

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.feature_extraction.text import TfidfVectorizer
import joblib

X = nlp_data1['lemmatized_token']
y = nlp_data1['target']

tfidf_vectorizer = TfidfVectorizer()
X = tfidf_vectorizer.fit_transform(X)

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_validate
import numpy as np

In [6]:
# Define models
models = {
    'Random Forest': RandomForestClassifier(),
}

#### Perform cross-validation for model

In [7]:

for name, model in models.items():
    print("Model:", name)
    cv_results = cross_validate(model, X_train, y_train, cv=5,
                                scoring=['accuracy', 'precision', 'recall', 'f1'])
    
    # Access the cross-validation results
    accuracy_scores = cv_results['test_accuracy']
    precision_scores = cv_results['test_precision']
    recall_scores = cv_results['test_recall']
    f1_scores = cv_results['test_f1']

    # Print the mean and standard deviation of each metric
    print("Accuracy: mean =", np.mean(accuracy_scores), ", std =", np.std(accuracy_scores))
    print("Precision: mean =", np.mean(precision_scores), ", std =", np.std(precision_scores))
    print("Recall: mean =", np.mean(recall_scores), ", std =", np.std(recall_scores))
    print("F1 Score: mean =", np.mean(f1_scores), ", std =", np.std(f1_scores))
    print("\n")

Model: Random Forest
Accuracy: mean = 0.7832512315270936 , std = 0.011459012362803233
Precision: mean = 0.8221843977594278 , std = 0.01423455577882111
Recall: mean = 0.6334918211559433 , std = 0.018528972801276643
F1 Score: mean = 0.7155511741786925 , std = 0.016516671629865174




#### Saving Trained Model Using Joblib

In [11]:
joblib.dump(model, f'{name.replace(" ", "_")}_model.joblib')
print("Model saved as", f'{name.replace(" ", "_")}_model.joblib')
print("\n")

Model saved as Random_Forest_model.joblib


