This code imports the NLTK and SpaCy libraries and prints their versions.
It's a basic setup for working with natural language processing tasks in Python.

### **Import NLTK library**

In [1]:

import nltk
print(nltk.__version__)

3.8.1


### **Import SpaCy library**

In [2]:
import spacy
print(spacy.__version__)

3.7.5


## **Sentence tokenization**

In [3]:

import nltk
nltk.download('punkt') # Download the Punkt tokenizer models
from nltk.tokenize import sent_tokenize  # Import the sent_tokenize function from the nltk.tokenize module

# Let us use a quote by Benjamin Franklin as an example
doc = """Tell me and I forget. Teach me and I remember. Involve me and I learn."""

#Splitting the text into sentences
sent_tokenize(doc) # Tokenize the text into sentences

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\aswathyr\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


['Tell me and I forget.',
 'Teach me and I remember.',
 'Involve me and I learn.']

#**Word Tokenization**

In [4]:
from nltk.tokenize import word_tokenize
# Let us use a simple sentence here
doc = """How are you doing?"""
word_tokenize(doc) # Tokenize the text into words

['How', 'are', 'you', 'doing', '?']

## 3.8.2 SpaCy

### **Tokenization**

In [5]:
import spacy
# Download the English model if not already installed
#!python -m spacy download en_core_web_sm 
nlp = spacy.load('en_core_web_sm')

# Create a Doc object
doc = nlp(u'How are you doing?')

# Print each token separately
for token in doc:
    print(token.text)

How
are
you
doing
?


In [6]:
import spacy
nlp = spacy.load('en_core_web_sm')

# Create a Doc object
doc = nlp(u'Tell me and I forget. Teach me and I remember. Involve me and I learn.')

# Print each token separately
for sent in doc.sents:
    print(sent.text)

Tell me and I forget.
Teach me and I remember.
Involve me and I learn.


## **Stemming**

In [7]:
from nltk.stem import PorterStemmer
porter = PorterStemmer()
print(porter.stem("connecting"))


connect


In [8]:
import spacy

# Load the English language model in Spacy
nlp = spacy.load("en_core_web_sm")
# Word for stemming and lemmatization
word = "connecting"
# Create a single-token document
doc = nlp(word)
# Stemming using Spacy
stemmed_word = doc[0].lemma_
print("Stemmed Word:", stemmed_word)

Stemmed Word: connect


In [9]:
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
porter = PorterStemmer()# PorterStemmer
lancaster = LancasterStemmer()# LancasterStemmer
print(porter.stem("caring"))
print(lancaster.stem("caring"))

care
car


In [10]:
import spacy

# Load the English language model in Spacy
nlp = spacy.load("en_core_web_sm")

# Word for stemming and lemmatization
word = "caring"

# Create a single-token document
doc = nlp(word)

# Stemming using Spacy
stemmed_word = doc[0].lemma_

# Lemmatization using Spacy
lemmatized_word = doc[0].lemma_ if doc[0].lemma_ != '-PRON-' else doc[0].text

# Print the results
print("Original Word:", word)
print("Stemmed Word:", stemmed_word)
print("Lemmatized Word:", lemmatized_word)



Original Word: caring
Stemmed Word: care
Lemmatized Word: care


## **Stop words**

In [11]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

sentence = "What is NLP?" # Example sentence
words = word_tokenize(sentence) # Tokenize the sentence into words
stop_words = set(stopwords.words('english')) # Get the English stopwords from NLTK
filtered_words = [word for word in words if word.lower() not in stop_words] # Remove stopwords from the sentence
print("Stopwords in the sentence are:", filtered_words) # Print the filtered words


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\aswathyr\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


Stopwords in the sentence are: ['NLP', '?']


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\aswathyr\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [12]:
import spacy

nlp = spacy.load('en_core_web_sm') # Load the English language model in spaCy
sentence = "What is NLP?" # Example sentence
doc = nlp(sentence) # Tokenize the sentence using spaCy
filtered_words = [token.text for token in doc if not token.is_stop] # Remove stopwords from the sentence
print("Stopwords in the sentence are:", filtered_words) # Print the filtered words


Stopwords in the sentence are: ['NLP', '?']


## Example.3.8.1 ##

###Sentiment Analysis###

Sentiment Analysis is a sub-field of Natural Language Processing (NLP) that tries to identify and extract the attitude, sentiments, evaluations, and emotions within a given text. It helps to determines whether data is positive, negative, or neutral. It has many applications in healthcare, customer service, banking, etc.

In Python this can be implemented using VADER (Valence Aware Dictionary for Sentiment Reasoning) that is available in the NLTK package. It is a simple rule-based model for sentiment analysis that can efficiently handle vocabularies, abbreviations, capitalizations, repeated punctuations, emoticons, etc. VADER has the advantage of assessing the sentiment of any given text without the need for prior training. The result generated by VADER is a dictionary of 4 keys neg, neu, pos and compound. Here neg, neu, and pos means negative, neutral, and positive respectively. Their sum should be equal to 1 or close to it with float operation. The compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1(most extreme negative) and +1 (most extreme positive). Using the compound score can be enough to determine the underlying sentiment of a text, because for:

* a positive sentiment = compound ≥ 0.05
* a negative sentiment = compound ≤ -0.05
* a neutral sentiment = compound greater than -0.05 and less than 0.05.


In [13]:
import nltk

# Download the lexicon
nltk.download("vader_lexicon")

# Import the lexicon
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Create an instance of SentimentIntensityAnalyzer
sent_analyzer = SentimentIntensityAnalyzer()

def sentiment_analyzer_scores(sentence):
    score = sent_analyzer.polarity_scores(sentence)
    print("{:-<40} {}".format(sentence, str(score)))

sentiment_analyzer_scores("The hotel stay was horrible and uncomfortable.")
sentiment_analyzer_scores("Always :) and be :D !")

The hotel stay was horrible and uncomfortable. {'neg': 0.55, 'neu': 0.45, 'pos': 0.0, 'compound': -0.7269}
Always :) and be :D !------------------- {'neg': 0.0, 'neu': 0.291, 'pos': 0.709, 'compound': 0.8087}


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\aswathyr\AppData\Roaming\nltk_data...
