<a href="https://colab.research.google.com/github/KODURISRIHARI/Text_Blob_NLP_Tool/blob/main/Text_Blob_NLP_Library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import textblob

In [2]:
from textblob import TextBlob

#TextBlob
###Overview:
- TextBlob is a Python library built on top of NLTK (Natural Language Toolkit).
- Simplifies tasks like part-of-speech tagging, noun phrase extraction, and sentiment
analysis.
Key Sentiment Properties:
- Polarity: Ranges from -1.0 to 1.0 (negative to positive).
- Subjectivity: Ranges from 0.0 to 1.0 (objective to subjective).

In [4]:
txb = TextBlob("Python is a high-level, general-purpose programming language.")


In [9]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

###Sentiment Analysis¶


In [18]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")
testimonial.sentiment

Sentiment(polarity=0.39166666666666666, subjectivity=0.4357142857142857)

In [19]:
testimonial.sentiment.polarity

0.39166666666666666

For instance, the NaiveBayesAnalyzer returns its result as a namedtuple of the form: Sentiment(classification, p_pos, p_neg).

In [25]:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
nltk.download('movie_reviews')


blob = TextBlob("I love this library", analyzer=NaiveBayesAnalyzer())
blob.sentiment

[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.


Sentiment(classification='pos', p_pos=0.7996209910191279, p_neg=0.2003790089808724)

##Tokenization
Break the textblobs into words or sentences

In [20]:
zen = TextBlob("Beautiful is better than ugly. "
                "Explicit is better than implicit. "
                "Simple is better than complex.")
zen.words

WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [21]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

###TabTokenizer()
- If we have \t or tab in the text , we use TabTokenizer

In [26]:
from textblob import TextBlob
from nltk.tokenize import TabTokenizer

tokenizer = TabTokenizer()
blob = TextBlob("This is\ta rather tabby\tblob.", tokenizer=tokenizer)
blob.tokens
#WordList(['This is', 'a rather tabby', 'blob.'])

WordList(['This is', 'a rather tabby', 'blob.'])

###BlanklineTokenizer()
- If we have \t or tab in the text , we use BlanklineTokenizer

In [30]:
from textblob import TextBlob
from nltk.tokenize import BlanklineTokenizer

tokenizer = BlanklineTokenizer()
blob = TextBlob("A token\n\nof appreciation")
blob.tokenize(tokenizer)

WordList(['A token', 'of appreciation'])

###Sentence objects have the same properties and methods as TextBlobs.



In [23]:
for sentence in zen.sentences:
  print(sentence.sentiment)

Sentiment(polarity=0.2166666666666667, subjectivity=0.8333333333333334)
Sentiment(polarity=0.5, subjectivity=0.5)
Sentiment(polarity=0.06666666666666667, subjectivity=0.41904761904761906)


#Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [32]:
sentence = TextBlob('Use 4 spaces per indentation level.')
print(sentence.words)
print(sentence.words[2].singularize())
print(sentence.words[-1].pluralize())

['Use', '4', 'spaces', 'per', 'indentation', 'level']
space
levels


Words can be lemmatized by calling the lemmatize method.

In [35]:
from textblob import Word
import nltk
nltk.download('wordnet')

w = Word("octopi")
print(w.lemmatize())

w = Word("went")
print(w.lemmatize("v")) # Pass in WordNet part of speech (verb)

octopus
go


[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


#WordNet Integration
 - You can access the synsets for a Word via the synsets property or the get_synsets method, optionally passing in a part of speech

In [38]:
from textblob import Word
from textblob.wordnet import VERB

word1 = Word("octopus")
print(word1.synsets)
print(Word("hack").get_synsets(pos=VERB))

[Synset('octopus.n.01'), Synset('octopus.n.02')]
[Synset('chop.v.05'), Synset('hack.v.02'), Synset('hack.v.03'), Synset('hack.v.04'), Synset('hack.v.05'), Synset('hack.v.06'), Synset('hack.v.07'), Synset('hack.v.08')]


You can access the definitions for each synset via the definitions property or the define() method, which can also take an optional part-of-speech argument.

In [40]:
Word("octopus").definitions

['tentacles of octopus prepared as food',
 'bottom-living cephalopod having a soft oval body with eight long tentacles']

In [42]:
#You can also create synsets directly
from textblob.wordnet import Synset


octopus = Synset('octopus.n.02')
shrimp = Synset('shrimp.n.03')
octopus.path_similarity(shrimp)

0.1111111111111111

##WordLists
A WordList is just a Python list with additional methods.

In [43]:
animals = TextBlob("cat dog octopus")
print(animals.words)
print(animals.words.pluralize())

['cat', 'dog', 'octopus']
['cats', 'dogs', 'octopodes']


#Spelling Correction
 - Use the correct() method to attempt spelling correction.

In [48]:
b = TextBlob("I havv goood speling!")
print(b.correct())
b = TextBlob("I havv goood spailing!")
print(b.correct())
b = TextBlob("I havv goood spiling!")
print(b.correct())
b = TextBlob("I havv goood spoling!")
print(b.correct())

I have good spelling!
I have good sailing!
I have good smiling!
I have good spoiling!


Word objects have a spellcheck() Word.spellcheck() method that returns a list of (word, confidence) tuples with spelling suggestions.

In [57]:
from textblob import Word

b = Word("spoling!")
print(b.spellcheck())
print("\n")
b = Word("spealing!")
print(b.spellcheck())
print("\n")
b = Word("spailing!")
print(b.spellcheck())
print("\n")
b = Word("spiling!")
print(b.spellcheck())

[('spoiling', 0.875), ('sporing', 0.125)]


[('speaking', 0.925), ('sealing', 0.04), ('spelling', 0.02), ('stealing', 0.015)]


[('sailing', 0.6190476190476191), ('spoiling', 0.3333333333333333), ('sailings', 0.047619047619047616)]


[('smiling', 0.8655913978494624), ('sailing', 0.06989247311827956), ('spoiling', 0.03763440860215054), ('piling', 0.010752688172043012), ('spiking', 0.005376344086021506), ('soiling', 0.005376344086021506), ('sailings', 0.005376344086021506)]


Spelling correction is based on Peter Norvig’s “How to Write a Spelling Corrector”

as implemented in the pattern library. It is about 70% accurate

##Get Word and Noun Phrase Frequencies
There are two ways to get the frequency of a word or noun phrase in a TextBlob.

The first is through the word_counts dictionary.

You can specify whether or not the search should be case-sensitive (default is False)

In [63]:
monty = TextBlob("We are no longer the Knights who say Ni. "
                     "We are now the Knights who say Ekki ekki ekki PTANG.")
print(monty.word_counts['we'])
print(monty.words.count('ekki', case_sensitive=True))
print(monty.words.count('ekki', case_sensitive=False))

2
2
3


In [64]:
import nltk
nltk.download('brown')

print(monty.noun_phrases.count('ekki'))

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.


1


In [65]:
# The second way is to use the count() method.

monty.words.count('ekki')

3

#Parsing
 - Use the parse() method to parse the text.



In [66]:
b = TextBlob("And now for something completely different.")
print(b.parse())

And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O


#n-grams
The TextBlob.ngrams() method returns a list of tuples of n successive words.

In [67]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]