<a href="https://colab.research.google.com/github/aravindb212/NLP/blob/main/Textblob_Functionalities.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is Textblob

* Textblob is an open-source python library used to perform NLP activities like Lemmatization, Stemming, Tokenization, Noun Phrase Extraction, POS Tagging, N-Grams, Sentiment Analysis.

* It is faster than NLTK, however it does not provide the functionalities like vectorization, dependency parsing.

* Text Classification, Sentiment Analysis can be performed using Textblob.
* Official Link to Textblob is: https://textblob.readthedocs.io/en/dev/

* Installation: pip install textblob

In [None]:
### Install Textblob
!pip install nltk
!pip install textblob



In [None]:
import nltk
nltk.download('popular')

[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/cmudict.zip.
[nltk_data]    | Downloading package gazetteers to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/gazetteers.zip.
[nltk_data]    | Downloading package genesis to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/genesis.zip.
[nltk_data]    | Downloading package gutenberg to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/gutenberg.zip.
[nltk_data]    | Downloading package inaugural to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/inaugural.zip.
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping corpora/movie_reviews.zip.
[nltk_data]    | Downloading package names to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/names.zip.
[nltk_data]    | Downloading package shakespeare to /root/nltk_data...
[nlt

True

In [None]:
import nltk
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

### Functionalities of Textblob
* Language Detection
* Word Correction
* Word Count
* Phrase Extraction
* POS Tagging
* Tokenization
* Plularization of words using Textblob
* Lemmatization using Textblob
* n-gram in Textblob

#### Language Detection
* With the help of Google Translate, Textblob detects the language of input text.
* Textblob is also able to translate text from one language to another language.

In [67]:
from textblob import TextBlob

blob = TextBlob("Hey John, How are You")

print("Detected Language is:",blob.detect_language())

print("Input text in Spanish:",blob.translate(to='es'))


### Note:
Since Google has made some changes into its API and Textblob is using the older API, as a result you may get 404 error. To avoide this, change the url given in translate.py under your environment.


updated url link is:
url = "http://translate.google.com/translate_a/t?client=te&format=html&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&dt=at&ie=UTF-8&oe=UTF-8&otf=2&ssel=0&tsel=0&kc=1"


Location of translate.py file: C:\Users\<user name>\Anaconda3\envs\Rython\Lib\site-packages\textblob\translate.py



#### Spelling Correction

In [30]:
from textblob import TextBlob
text=""" ABCD Corp alays values
 ttheir employees!!!"""

In [18]:
print(text)

 ABCD Corp alays values ttheir employees!!!


In [19]:
blob=TextBlob(text)

In [20]:
blob

TextBlob(" ABCD Corp alays values ttheir employees!!!")

In [21]:
blob.correct()

TextBlob(" ABCD For always values their employees!!!")

In [23]:
TextBlob('hasss').correct()

TextBlob("has")

In [31]:
### Sometimes it failsas well
TextBlob('ur').correct()

TextBlob("or")

### Word Count
With the help of word count, we can count the frequency of words or a noun phrase in a given sentence.

In [24]:
text="Sentiment Analysis is a process by which we can find the sentiment of a text. Sentiment can be Positive, Negative or Neutral"

In [25]:
blob=TextBlob(text)

In [26]:
blob.word_counts["analysis"]

1

In [40]:
blob.word_counts["Sentiment"]

0

In [28]:
blob.word_counts["sentiment"]

3

In [29]:
blob.word_counts["Analysis"]

0

### POS Tagging
With the help of tags function of textblob, we can get tag each words of a sentence with a tag that can be either noun, pronoun, verb, adverb, adjective and more.

In [41]:
from textblob import TextBlob

text = TextBlob("My name is Adam. I like to read about NLP. I work at ABCD Corp.")
print(text.tags)


[('My', 'PRP$'), ('name', 'NN'), ('is', 'VBZ'), ('Adam', 'NNP'), ('I', 'PRP'), ('like', 'VBP'), ('to', 'TO'), ('read', 'VB'), ('about', 'IN'), ('NLP', 'NNP'), ('I', 'PRP'), ('work', 'VBP'), ('at', 'IN'), ('ABCD', 'NNP'), ('Corp', 'NNP')]


In [42]:
new_tuple=[]
for i in text.tags:
    print(i)
    if 'VBP' not in i[1]:
        new_tuple.append(i)

('My', 'PRP$')
('name', 'NN')
('is', 'VBZ')
('Adam', 'NNP')
('I', 'PRP')
('like', 'VBP')
('to', 'TO')
('read', 'VB')
('about', 'IN')
('NLP', 'NNP')
('I', 'PRP')
('work', 'VBP')
('at', 'IN')
('ABCD', 'NNP')
('Corp', 'NNP')


In [43]:
new_tuple

[('My', 'PRP$'),
 ('name', 'NN'),
 ('is', 'VBZ'),
 ('Adam', 'NNP'),
 ('I', 'PRP'),
 ('to', 'TO'),
 ('read', 'VB'),
 ('about', 'IN'),
 ('NLP', 'NNP'),
 ('I', 'PRP'),
 ('at', 'IN'),
 ('ABCD', 'NNP'),
 ('Corp', 'NNP')]

In [44]:
value=''
for i in new_tuple:
    value=value+" " + "".join(i[0])

In [45]:
value

' My name is Adam I to read about NLP I at ABCD Corp'

#### Tokenization

* Corpus (or corpora in plural) - Corpus is nothing but a collection of text data. The text maybe in one language or maybe a combination of two or more.

* Token - The term "Token" is nothing but the total number of words in a text, corpus etc, regardless of their freuqncy of occurrence in the text. Tokens are nothing but a string of contiguous characters which either lies between the two spaces or it lies between a space and punctuation. For Example: Suppose you have the following string : "abc_123_defg", if you split it on basis of underscores "_" you obtained three tokens : "abc", "123" and "defg".

**What is tokenization?**

Tokenization is a process of splitting the sentence or corpus into its smalles unit i.e. "Tokens"

In [46]:
text="""
R is a comprehensive statistical and graphical programming language, which is fast gaining popularity among data analysts. It is free and runs on a variety of platforms, including Windows, Unix, and macOS. It provides an unparalleled platform for programming new statistical methods in an easy and straightforward manner.
"""

In [47]:
blob_object = TextBlob(text)

In [48]:
# Word tokenization of the sample corpus
corpus_words = blob_object.words

In [49]:
corpus_words

WordList(['R', 'is', 'a', 'comprehensive', 'statistical', 'and', 'graphical', 'programming', 'language', 'which', 'is', 'fast', 'gaining', 'popularity', 'among', 'data', 'analysts', 'It', 'is', 'free', 'and', 'runs', 'on', 'a', 'variety', 'of', 'platforms', 'including', 'Windows', 'Unix', 'and', 'macOS', 'It', 'provides', 'an', 'unparalleled', 'platform', 'for', 'programming', 'new', 'statistical', 'methods', 'in', 'an', 'easy', 'and', 'straightforward', 'manner'])

In [50]:
print(len(corpus_words))

48


In [51]:
corpus_sentences= blob_object.sentences

In [52]:
corpus_sentences

[Sentence("
 R is a comprehensive statistical and graphical programming language, which is fast gaining popularity among data analysts."),
 Sentence("It is free and runs on a variety of platforms, including Windows, Unix, and macOS."),
 Sentence("It provides an unparalleled platform for programming new statistical methods in an easy and straightforward manner.")]

In [None]:
print(len(corpus_sentences))

3


#### Pluralization of words using Textblob

In [None]:
from textblob import Word
w = Word('Platform')
w.pluralize()

'Platforms'

In [57]:
from textblob import Word
w = Word('Platforms')
w.pluralize()

'Platformss'

In [53]:
blob = TextBlob("Great Learning is a great platform to learn data science. \n It helps community through blogs, Youtube, GLA,etc.")
for word,pos in blob.tags:
    if pos == 'NN':
        print (word.pluralize())

platforms
sciences
communities
etcs


#### Lemmatization using Textblob

In [None]:
blob = TextBlob("Great Learning is a great platform to learn data science. \n It helps community through blogs, Youtube, GLA,etc.")
words = blob.words

for word in words:
    print("ORIGINAL:", word, "| LEMMA:", word.lemmatize(), "| STEM:", word.stem())

ORIGINAL: Great | LEMMA: Great | STEM: great
ORIGINAL: Learning | LEMMA: Learning | STEM: learn
ORIGINAL: is | LEMMA: is | STEM: is
ORIGINAL: a | LEMMA: a | STEM: a
ORIGINAL: great | LEMMA: great | STEM: great
ORIGINAL: platform | LEMMA: platform | STEM: platform
ORIGINAL: to | LEMMA: to | STEM: to
ORIGINAL: learn | LEMMA: learn | STEM: learn
ORIGINAL: data | LEMMA: data | STEM: data
ORIGINAL: science | LEMMA: science | STEM: scienc
ORIGINAL: It | LEMMA: It | STEM: it
ORIGINAL: helps | LEMMA: help | STEM: help
ORIGINAL: community | LEMMA: community | STEM: commun
ORIGINAL: through | LEMMA: through | STEM: through
ORIGINAL: blogs | LEMMA: blog | STEM: blog
ORIGINAL: Youtube | LEMMA: Youtube | STEM: youtub
ORIGINAL: GLA | LEMMA: GLA | STEM: gla
ORIGINAL: etc | LEMMA: etc | STEM: etc


In [58]:
w = Word('learning')
w.lemmatize("n") ## v here represents verb

'learning'

In [59]:
w = Word('learning')
w.lemmatize("v") ## v here represents verb

'learn'

In [63]:
w = Word('peoples')
w.lemmatize("n") ## v here represents verb

'people'

#### n-gram in Textblob

An N-gram is an N-token sequence of words: a 2-gram (more commonly called a bigram) is a two-word sequence of words like “really good”, “not good”, or “your homework”, and a 3-gram (more commonly called a trigram) is a three-word sequence of words like “not at all”, or “turn off light”.

In [64]:
blob

TextBlob("Great Learning is a great platform to learn data science. 
 It helps community through blogs, Youtube, GLA,etc.")

In [65]:
blob.ngrams(n=1)

[WordList(['Great']),
 WordList(['Learning']),
 WordList(['is']),
 WordList(['a']),
 WordList(['great']),
 WordList(['platform']),
 WordList(['to']),
 WordList(['learn']),
 WordList(['data']),
 WordList(['science']),
 WordList(['It']),
 WordList(['helps']),
 WordList(['community']),
 WordList(['through']),
 WordList(['blogs']),
 WordList(['Youtube']),
 WordList(['GLA']),
 WordList(['etc'])]

In [66]:
blob.ngrams(n=2)

[WordList(['Great', 'Learning']),
 WordList(['Learning', 'is']),
 WordList(['is', 'a']),
 WordList(['a', 'great']),
 WordList(['great', 'platform']),
 WordList(['platform', 'to']),
 WordList(['to', 'learn']),
 WordList(['learn', 'data']),
 WordList(['data', 'science']),
 WordList(['science', 'It']),
 WordList(['It', 'helps']),
 WordList(['helps', 'community']),
 WordList(['community', 'through']),
 WordList(['through', 'blogs']),
 WordList(['blogs', 'Youtube']),
 WordList(['Youtube', 'GLA']),
 WordList(['GLA', 'etc'])]

In [None]:
blob.ngrams(n=3)

In [67]:
blob.ngrams(n=4)

[WordList(['Great', 'Learning', 'is', 'a']),
 WordList(['Learning', 'is', 'a', 'great']),
 WordList(['is', 'a', 'great', 'platform']),
 WordList(['a', 'great', 'platform', 'to']),
 WordList(['great', 'platform', 'to', 'learn']),
 WordList(['platform', 'to', 'learn', 'data']),
 WordList(['to', 'learn', 'data', 'science']),
 WordList(['learn', 'data', 'science', 'It']),
 WordList(['data', 'science', 'It', 'helps']),
 WordList(['science', 'It', 'helps', 'community']),
 WordList(['It', 'helps', 'community', 'through']),
 WordList(['helps', 'community', 'through', 'blogs']),
 WordList(['community', 'through', 'blogs', 'Youtube']),
 WordList(['through', 'blogs', 'Youtube', 'GLA']),
 WordList(['blogs', 'Youtube', 'GLA', 'etc'])]