# <center> Applications in Natural Language Processing <center>

Natural Language Processing (NLP) is concerned with machine-human collaboration. It 
helps computers read, interpret, and understand the human language so that machines 
can carry our repetitive and high-volume tasks. It is the field of Artificial Intelligence (AI) 
that focuses on quantifying human language to make it intelligible to machines by 
combining the power of linguistics and computer science to study the rules and structure
of language and create intelligent systems.

### Problem Statement:
Using NLP we can easily analyse any given text. The steps involved for such an analysis are 
tokenization, pre processing each word and then finally vectorising each of them. One of the 
most common and easy to implement vectorisation algorithm is BoW. Using BoW and NLTK 
for processing, implement a simple spam filter that marks all the spam texts as dangerous

### Procedure:
- import libraries
- Data Preparation
- Text Preprocessing
- Create Bag-of-Words(Bow) Vectors
- Training the Model
- Classify New Message


### Importing Libraries

In [3]:
import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

### Data Preparation:
Preparing the dat with both spam and non-spam(ham) messages

In [4]:

messages = [
    ("Win a free iPhone now!", "spam"),
    ("Hey, how's it going?", "ham"),
    ("Congratulations, you've won a prize!", "spam"),
    ("Can we meet tomorrow?", "ham"),
]


### Text Preprocessing

In [7]:
nltk.download('punkt')
nltk.download('stopwords')

def preprocess_text(text):
    # Tokenization
    words = word_tokenize(text.lower())
    
    # Removing punctuation and stopwords
    words = [word for word in words if word.isalpha() and word not in stopwords.words('english')]
    
    return ' '.join(words)

# Preprocess messages
preprocessed_messages = [(preprocess_text(text), label) for text, label in messages]


[nltk_data] Downloading package punkt to /usr/share/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


### Creating Bag-of-Words(BoW)Vectors:
Converting the preprocessed text data into BoW vectors using 'CountVectorizer' from scikit-learn

In [9]:
# Create the BoW vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform([message for message, label in preprocessed_messages])

### Training the Model: Train a Naive Bayes Classifier using the BoW Vectors.

In [10]:
# Prepare labels
y=[label for message,label in preprocessed_messages]

# Train a Naive Bayes Classifier
classifier=MultinomialNB()
classifier.fit(X,y)

### Classify a new message

In [12]:
def classify_message(text):
    preprocessed_text = preprocess_text(text)
    vectorized_text = vectorizer.transform([preprocessed_text])
    prediction = classifier.predict(vectorized_text)
    return prediction[0]

### Test the classifier

In [13]:
# Test the classifier
new_message = "Congratulations, you've won $1000!"
classification = classify_message(new_message)
if classification == "spam":
    print("This message is marked as spam.")
else:
    print("This message is not spam.")


This message is marked as spam.
