<a href="https://colab.research.google.com/github/SoumyadeepDebnath/NLP_Libraries/blob/master/TextBlob.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **TextBlob Operations - NLP**

Contains functionalities for the tasks like :-
1. Data Preprocessing Tasks (Tokenization, Word Inflection and Lemmatization, Language Detection and Translation, Spelling Correction, )

### **Setup in Colab Notebook**

In [0]:
import nltk
nltk.download('punkt')
nltk.download('brown')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

from textblob import TextBlob
from textblob import Word

text = "Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages. In particular, how to program computers to process and analyze large amounts of natural language data. Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation."
paragraph = TextBlob(text)
#printing the paragraph
print("\nParagraph - " + str(paragraph))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!

Paragraph - Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages. In particular, how to program computers to process and analyze large amounts of natural language data. Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generatio

### **Tokenization**

In [0]:
#printing all sentences
print("\nAll Sentences - " + str(paragraph.sentences))
#printing first sentence
print("\nFirst Sentence - " + str(paragraph.sentences[0]) + "\nLast Sentence - " + str(paragraph.sentences[-1]))
#printing all words
print("\nAll Words - " + str(paragraph.words))
#printing first word
print("\nFirst Word - " + paragraph.words[0] + "\nLast Word - " + paragraph.words[-1])
#printing words of first sentence
print("\nWords of One Sentence (one by one) :-")
for word in paragraph.sentences[0].words:
    print(word)


All Sentences - [Sentence("Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages."), Sentence("In particular, how to program computers to process and analyze large amounts of natural language data."), Sentence("Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation.")]

First Sentence - Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages.
Last Sentence - Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation.

All Words - ['Natural', 'language', 'processing', 'NLP', 'is', 'a', 'subfield', '

### **Noun Phrase Extraction**

In [0]:
print("All Noun Phrases - " + str(paragraph.noun_phrases))
print("\nNoun Phrases of One Sentence (one by one) :-")
for np in paragraph.sentences[0].noun_phrases:
    print(np)

All Noun Phrases - ['natural language processing', 'nlp', 'computer science', 'information engineering', 'artificial intelligence', 'program computers', 'large amounts', 'natural language data', 'challenges', 'natural language processing', 'speech recognition', 'natural language understanding', 'natural language generation']

Noun Phrases of One Sentence (one by one) :-
natural language processing
nlp
computer science
information engineering
artificial intelligence


### **Part-of-speech Tagging**

In [0]:
print("All Parts-of-speech - " + str(paragraph.tags))
print("\nParts-of-speech of One Sentence (one by one) :-")
for words, tag in paragraph.sentences[0].tags:
    print(words +" - "+ tag)

All Parts-of-speech - [('Natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('NLP', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('subfield', 'NN'), ('of', 'IN'), ('linguistics', 'NNS'), ('computer', 'NN'), ('science', 'NN'), ('information', 'NN'), ('engineering', 'NN'), ('and', 'CC'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('concerned', 'VBN'), ('with', 'IN'), ('the', 'DT'), ('interactions', 'NNS'), ('between', 'IN'), ('computers', 'NNS'), ('and', 'CC'), ('human', 'JJ'), ('natural', 'JJ'), ('languages', 'NNS'), ('In', 'IN'), ('particular', 'JJ'), ('how', 'WRB'), ('to', 'TO'), ('program', 'NN'), ('computers', 'NNS'), ('to', 'TO'), ('process', 'VB'), ('and', 'CC'), ('analyze', 'VB'), ('large', 'JJ'), ('amounts', 'NNS'), ('of', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('data', 'NNS'), ('Challenges', 'NNS'), ('in', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('frequently', 'RB'), ('involve', 'VBP'), ('speech', 'NN'), ('recognition', 'NN'), ('natural', 'J

### **Words Inflection and Lemmatization**

In [0]:
word = paragraph.sentences[0].words[8]
print("Actual Word - " + word)
print("Singular Form - " + word.singularize())

#Alternative - use Word() method
word = Word("Language")
print("\nActual Word - " + word)
print("Singular Form - " + word.pluralize())

word = Word("")
print("\nActual Word - " + word)
print("Lemmatize Form - " + word.lemmatize())

Actual Word - linguistics
Singular Form - linguistic

Actual Word - Language
Singular Form - Languages

Actual Word - gone
Lemmatize Form - go


In [0]:
from textblob import Word
w = Word('Platform')
w.pluralize()

'Platforms'

In [0]:
for word,pos in blob.tags:
    if pos == 'NN':
        print (word.pluralize())

platforms
sciences
communities


In [0]:
w = Word('running')
w.lemmatize("v") ## v here represents verb

'run'

# N-GRAMS

In [0]:
for ngram in blob.ngrams(2):
    print (ngram)

['Analytics', 'Vidhya']
['Vidhya', 'is']
['is', 'a']
['a', 'great']
['great', 'platform']
['platform', 'to']
['to', 'learn']
['learn', 'data']
['data', 'science']
['science', 'It']
['It', 'helps']
['helps', 'community']
['community', 'through']
['through', 'blogs']
['blogs', 'hackathons']
['hackathons', 'discussions']
['discussions', 'etc']



# SENTIMENT ANALYSIS

Sentiment analysis is basically the process of determining the attitude or the emotion of the writer, i.e., whether it is positive or negative or neutral.

The sentiment function of textblob returns two properties, polarity, and subjectivity.

Polarity is float which lies in the range of [-1,1] where 1 means positive statement and -1 means a negative statement. Subjective sentences generally refer to personal opinion, emotion or judgment whereas objective refers to factual information. Subjectivity is also a float which lies in the range of [0,1].

Let’s check the sentiment of our blob.

In [0]:
print(blob)

Analytics Vidhya is a great platform to learn data science. 
 It helps community through blogs, hackathons, discussions,etc.


In [0]:
blob.sentiment

Sentiment(polarity=0.8, subjectivity=0.75)

We can see that polarity is 0.8, which means that the statement is positive and 0.75 subjectivity refers that mostly it is a public opinion and not a factual information.

In [0]:
text='''The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.'''

In [0]:
blob1=TextBlob(text)

In [0]:
print(blob1)

The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.


In [0]:
blob1.sentiment

Sentiment(polarity=-0.1590909090909091, subjectivity=0.6931818181818182)

# SPELLING CORRECTION

In [0]:
blob = TextBlob('Analytics Vidhya is a gret platfrm to learn data scence')
blob.correct()

TextBlob("Analytics Vidhya is a great platform to learn data science")

# CREATING A SHORT SUMMARY OF TEXT

In [0]:
import random

In [0]:
blob = TextBlob('Analytics Vidhya is a thriving community for data driven industry. This platform allows \
people to know more about analytics from its articles, Q&A forum, and learning paths. Also, we help \
professionals & amateurs to sharpen their skillsets by providing a platform to participate in Hackathons.')

In [0]:
nouns = list()
for word, tag in blob.tags:
    if tag == 'NN':
        nouns.append(word.lemmatize())

print ("This text is about...")
for item in random.sample(nouns, 5):
    word = Word(item)
    print (word.pluralize())


**********************************************************************
  Resource [93mwordnet[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('wordnet')
  [0m
  Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************



MissingCorpusError: ignored

In [0]:
nouns = list()
for word, tag in blob.tags:
    if tag == 'NN':
        print(word, word.lemmatize())

community community
industry industry
platform platform
forum forum
platform platform


In [0]:
type(nouns)

list

# TRANSLATION AND LANGUAGE DETECTION

In [0]:
blob.detect_language()

'en'

In [0]:
blob.translate(from_lang='en', to ='ar')

TextBlob("تحليلات Vidhya هو مجتمع مزدهر لصناعة تعتمد على البيانات. تتيح هذه المنصة للأشخاص معرفة المزيد عن التحليلات من مقالاتها ومنتدى الأسئلة والأجوبة ومسارات التعلم. أيضا ، نحن نساعد المحترفين والهواة على شحذ مهاراتهم من خلال توفير منصة للمشاركة في Hackathons.")

In [0]:
blob.translate(to='hi')

TextBlob("एनालिटिक्स विधा डेटा संचालित उद्योग के लिए एक संपन्न समुदाय है। यह मंच लोगों को अपने लेखों, क्यू एंड ए फोरम, और सीखने के रास्तों से विश्लेषण के बारे में अधिक जानने की अनुमति देता है। इसके अलावा, हम पेशेवरों और एमेच्योर को हैकथॉन में भाग लेने के लिए एक मंच प्रदान करके अपने कौशल को तेज करने में मदद करते हैं।")

# TEXT CLASSIFICATION USING TextBlob

In [0]:
training = [
('Tom Holland is a terrible spiderman.','pos'),
('a terrible Javert (Russell Crowe) ruined Les Miserables for me...','pos'),
('The Dark Knight Rises is the greatest superhero movie ever!','neg'),
('Fantastic Four should have never been made.','pos'),
('Wes Anderson is my favorite director!','neg'),
('Captain America 2 is pretty awesome.','neg'),
('Let\s pretend "Batman and Robin" never happened..','pos'),
]
testing = [
('Superman was never an interesting character.','pos'),
('Fantastic Mr Fox is an awesome film!','neg'),
('Dragonball Evolution is simply terrible!!','pos')
]

In [0]:
from textblob import classifiers

In [0]:
classifier = classifiers.NaiveBayesClassifier(training)

As you can see above, we have passed the training data into the classifier.

Note that here we have used Naive Bayes classifier, but TextBlob also offers Decision tree classifier which is as shown below.

In [0]:
dt_classifier = classifiers.DecisionTreeClassifier(training)

In [0]:
print (classifier.accuracy(testing))
classifier.show_informative_features(3)

1.0
Most Informative Features
            contains(is) = True              neg : pos    =      2.9 : 1.0
             contains(a) = False             neg : pos    =      1.8 : 1.0
      contains(terrible) = False             neg : pos    =      1.8 : 1.0


In [0]:
blob = TextBlob('the weather is terrible!', classifier=classifier)
print (blob.classify())

neg


So, based on the training on the above dataset, our classifier has provided us the right result.

Note that here we could have done some preprocessing and data cleaning but here my aim was to give you an intuition that how we can do text classification using TextBlob.