#Naive Bayes Classifier
A Naive Bayes classifier is a probabilistic machine learning algorithm based on Bayes' theorem. It is particularly useful for classification tasks, especially when dealing with high-dimensional data.

The fundamental formula for Naive Bayes classification is derived from Bayes' theorem:

##P(C|X) = [P(X|C) * P(C)] / P(X)

###Where:

P(C|X):The posterior probability of class C given the feature vector X. (What we want to calculate)

P(X|C): The likelihood of observing feature vector X given class C.

P(C): The prior probability of class C. (How likely is the class in general)

P(X): The prior probability of observing feature vector X. (Usually ignored as it's a constant for all classes)

# Problem statement

## Naive Bayes Classifier for Text Classification
(Assuming a set of documents that need to be classified,use the naive bayesian classifier model to perform this task.built-in-Java classes/API can be used to write the program.Calculate the accuracy,precision and recall for your datasets


# Implementation


In [171]:
#Import the Necessary libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score
from nltk.corpus import stopwords  # Import stopwords for optional removal
import string  # Import string for punctuation removal

In [172]:
def preprocess_text(text):
  # Lowercase and remove punctuation
  text = text.lower().translate(str.maketrans('', '', string.punctuation))
  words = text.split()

  # Optionally remove some stop words (consider reducing the number of stopwords removed)
  stop_words = stopwords.words('english')
  # words = [word for word in words if word not in stop_words]  # Uncomment for some stopword removal

  # Join the words back into a string
  return ' '.join(words)  # Return a string instead of a list of words


In [173]:
#function called load_data that reads data from a CSV file and creates a list of Document objects. Let's break it down step by step:
def load_data(filename):
  data = pd.read_csv(filename)
  documents = []
  for index, row in data.iterrows():
    documents.append(Document(row["Text"], row["Label"]))
  return documents


In [174]:
class Document: #class defination
  def __init__(self, text, category):  #constructor method
    self.text = text #attribute assignment
    self.category = category

# Train the model

In [175]:
def train_model(documents):
  # Prepare data
  X = [preprocess_text(doc.text) for doc in documents]
  y = [doc.category for doc in documents]

  # Feature extraction with TF-IDF
  vectorizer = TfidfVectorizer(min_df=1)  # Adjust min_df if necessary
  X_features = vectorizer.fit_transform(X)

  # Train the Naive Bayes model
  model = MultinomialNB()
  model.fit(X_features, y)

  return model, vectorizer

# to predict the category of a new document using a pre-trained model and a vectorizer.

In [176]:
def predict_category(model, vectorizer, new_doc):
  # Preprocess new document
  new_doc_features = vectorizer.transform([preprocess_text(new_doc)])

  # Predict category
  predicted_category = model.predict(new_doc_features)[0]
  return predicted_category

# Evalute the model

In [177]:

def evaluate_model(model, vectorizer, test_data):
  # Prepare data
  X_test = [preprocess_text(doc.text) for doc in test_data]
  y_test = [doc.category for doc in test_data]

  X_test_features = vectorizer.transform(X_test)

  # Predict categories
  predicted_categories = model.predict(X_test_features)

  # Calculate metrics
  accuracy = accuracy_score(y_test, predicted_categories)
  precision = precision_score(y_test, predicted_categories, average='weighted')
  recall = recall_score(y_test, predicted_categories, average='weighted')

  return accuracy, precision, recall


In [178]:
# Load data from CSV
filename = "/content/dataset.csv"  # Replace with your actual CSV file path
documents = load_data(filename)


In [179]:
# Train the model
model, vectorizer = train_model(documents)


In [180]:

# New document to classify
new_doc = "my heart is broken because my love is failure ."

In [181]:

# Predict the class
predicted_category = predict_category(model, vectorizer, new_doc)
print("Predicted class:", predicted_category)

# Split data into train and test (optional)
# You can split your data into training and testing sets for evaluation using libraries like scikit-learn's train_test_split



Predicted class: negative


In [182]:
# Evaluate the model on test data (replace with your actual test data)
test_data = documents[:int(0.8 * len(documents))]  # Assuming 80% for training
accuracy, precision, recall = evaluate_model(model, vectorizer, test_data)
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)

Accuracy: 0.9
Precision: 0.8166666666666668
Recall: 0.9


  _warn_prf(average, modifier, msg_start, len(result))
