📧 Spam Email Classification

The aim of this project is to suspect the E-mails which consist of offensive, anti-social elements and block them which help in identifying the suspicious user.

Suspicious email detection is a kind of mailing system where suspicious users are identified by determining the keywords used by him/her. The keywords such as bomb, RDX, are found in the mails which are sent by the user. All these blocked mails are checked by the administrator and identify the users who sent such mails.

The given code is an implementation of a spam email classifier using the Multinomial Naive Bayes algorithm. The dataset used for training the model is "spam.csv", which contains labeled messages as either "spam" or "ham" (not spam). The dataset is read into a Pandas dataframe and preprocessed to drop unnecessary columns and map the "ham" and "spam" labels to numerical values.

The text data is then transformed into numerical features using CountVectorizer, which converts each message into a vector of word frequencies. The data is then split into training and testing sets using train_test_split, with a test size of 20% and a random state of 42 to ensure reproducibility.

A Multinomial Naive Bayes model is then trained on the training data using the fit method, and the accuracy of the model is evaluated on the test data using the score method. The model achieves an accuracy of around 98%, indicating that it is performing well in classifying spam and ham messages.

Finally, a function named "result" is defined that takes a message as input, transforms it using the CountVectorizer, and predicts whether it is spam or not using the trained model. If the prediction is 1, it is classified as spam, and if it is 0, it is classified as ham. The function then prints the corresponding output message.

Overall, the code provides a simple and effective implementation of a spam email classifier using the Multinomial Naive Bayes algorithm. It can be easily adapted to other datasets and classification tasks by modifying the preprocessing steps and the choice of machine learning algorithm.

