# TextBlob
TextBlob is a powerful NLP Python library. It can be used to perform a variety of NLP tasks. Documentation for TextBlob can be found [here](https://textblob.readthedocs.io/en/dev/).

In [59]:
%%capture
# Install textblob
!pip install -U textblob


In [60]:
from textblob import TextBlob


## Corpora

In [61]:
%%capture
# Download corpora
!python -m textblob.download_corpora


In [None]:
import transformers
import sentencepiece
import nltk
nltk.download('omw-1.4')


## TextBlobs

In [63]:
my_blob = TextBlob("There is more than one way to skin a cat.")


In [64]:
my_blob


## Tagging Parts of Speech
A list of the different parts of speech tags can be found [here](https://www.geeksforgeeks.org/python-part-of-speech-tagging-using-textblob/).

code | meaning | example
--- | --- | ---
CC | coordinating conjunction |
CD | cardinal digit |
DT | determiner |
EX | existential there | (like: “there is” … think of it like “there exists”)
FW | foreign word |
IN | preposition/subordinating conjunction |
JJ | adjective | ‘big’
JJR | adjective, comparative | ‘bigger’
JJS | adjective, superlative | ‘biggest’
LS | list marker | 1)
MD | modal could, | will
NN | noun, singular | ‘desk’
NNS | noun plural | ‘desks’
NNP | proper noun, singular | ‘Harrison’
NNPS | proper noun, plural | ‘Americans’
PDT | predeterminer | ‘all the kids’
POS | possessive ending | parent‘s
PRP | personal pronoun | I, he, she
PRP\$ | possessive pronoun | my, his, hers
RB | adverb | very, silently,
RBR | adverb, comparative | better
RBS | adverb, superlative | best
RP | particle | give up
TO | to go | ‘to‘ the store.
UH | interjection | errrrrrrrm
VB | verb, base form | take
VBD | verb, past tense | took
VBG | verb, gerund/present participle | taking
VBN | verb, past participle | taken
VBP | verb, sing. present, non-3d | take
VBZ | verb, 3rd person sing. present | takes
WDT | wh-determiner | which
WP | wh-pronoun | who, what
WP$ | possessive wh-pronoun | whose
WRB | wh-adverb | where, when






In [65]:
# Use the .tags attribute to see parts of speech
my_blob.tags


## Sentiment Analysis

Sentiment analysis can be used to understand the feeling or emotion tied to the text. The sentiment attribute in TextBlob will return two values:
1. The **polarity score** (a float between -1.0 and 1.0). -1 is negative, 1 is positive.
2. The **subjectivity** (a float between 0.0 and 1.0). 0 is very objective, while 1 is very subjective.

In [66]:
neg_blob = TextBlob("I am so tired. Today was a long, hard day.")
neg_blob.sentiment


In [67]:
pos_blob = TextBlob("Today was a great day. I am so happy.")
pos_blob.sentiment


In [68]:
obj_blob = TextBlob("The cat is gray.")
obj_blob.sentiment


In [69]:
subj_blob = TextBlob("The cat is so cute and sweet.")
print(subj_blob.sentiment)
print(subj_blob.sentiment.subjectivity) # Only get the subjectivity


Sentiment analysis of multiple sentences

In [70]:
my_poem = TextBlob('''
  Python is a great language to learn.
  You can easily do NLP; it's fab.
  It might take some getting used to.
  But it's definitely more gooder than Matlab.
''')


In [71]:
my_poem

In [72]:
my_poem.sentiment


In [73]:
my_poem.sentences


In [74]:
for sentence in my_poem.sentences:
  print(sentence.sentiment)


### Your Turn
Create three TextBlobs with the following sentiments:
1. Negative, subjective
2. Positive, objective
3. Neutral


In [75]:
# Solution 1
text_ns = "It's a cruddy day."
neg_sub = TextBlob(text_ns)
neg_sub.sentiment


In [76]:
# Solution 1
text_ns = "Hitler was a terrible man."
neg_sub = TextBlob(text_ns)
neg_sub.sentiment


In [77]:
# Solution 2
text_po = "Bill is a nice guy. He won the race."
pos_obj = TextBlob(text_po)
pos_obj.sentiment
# no luck


In [78]:
# Solution 2
text_po = "My best friend had a baby boy."
pos_obj = TextBlob(text_po)
pos_obj.sentiment
# no luck


In [79]:
# Solution 3
text_n = "One plus one is two."
neut = TextBlob(text_n)
neut.sentiment


## Tokenization
Tokenization is the process of splitting long strings of text into small pieces (tokens).

In [80]:
my_poem.sentences


In [81]:
my_poem.sentences[0].words


In [82]:
my_poem.words


In [83]:
sorted(my_poem.word_counts.items(), key = lambda x: x[1], reverse=True)

## Singular & Plural

In [84]:
my_sent = TextBlob("The octopi went swimming in the dark ocean waters.")


In [85]:
my_sent.words

In [86]:
# Singularize
my_sent.words[-1].singularize()


In [87]:
my_sent.words[1].singularize()


In [88]:
# Pluralize
my_sent.words[-2].pluralize()


In [89]:
foo = my_sent.words[-2]
foo == foo.singularize()

In [90]:
TextBlob("corpus").words.singularize(), TextBlob("corpus").words.pluralize()

In [91]:
my_sent.words[2:5]

In [92]:
import numpy as np

np.array(my_sent.words)

## Stemming & Lemmatization

Stemming is the process of deleting prefixes and suffixes from a word, leaving on the word “stem”. Lemmatization is similar to stemming, but lemmatization is able to capture the underlying meaning of the word.

In [93]:
my_sent


In [94]:
# Find the index of 'swimming'
my_sent.words.index('swimming')


In [95]:
# Stemming
print(my_sent.words[3].stem())
print(my_sent.words[1].stem())


In [96]:
# Lemmatization
print(my_sent.words[3].lemmatize())
print(my_sent.words[1].lemmatize())


In [97]:
care = TextBlob("caring")

(
  care.words.stem(),
  care.words.lemmatize()
)


## WordNet

In [98]:
my_sent

In [99]:
{ my_sent.words[-2] : my_sent.words[-2].definitions }


In [100]:
{"swimming", "tennis"} - set(my_sent.words)

## Spelling ( correcting )

In [101]:
my_bad_spelling = TextBlob('Helllo, today is my birfday.')
my_bad_spelling.correct()


## Counting Words

In [102]:
my_cheer = TextBlob('Data science is the best, data science is the coolest.')
my_cheer.words.count('data')


In [103]:
my_cheer.word_counts


### Your Turn
1. Create a TextBlob called `message` and set it equal to `Good morning, todayy is going to be a fantastic day!`.
2. Correct the spelling in your TextBlob and set it equal to a new variable called `message_sp`.
3. Find the index of the word `fantastic`.
4. Look up the definition of the word `fantastic`.
5. Stem and lemmatize the word `fantastic`.

In [104]:
# Solution 1
message = TextBlob("Good morning, todayy is going to be a fantastic day!.")
message

In [105]:
# Solution 2
message_sp = message.correct()
message_sp

In [106]:
list(zip(message.words, message_sp.words ))

In [107]:
[ (i,t) for i, t in enumerate(zip(message.words, message_sp.words )) if t[0] != t[1] ]

In [108]:
# Solution 3
(
    message.index("fantastic"),
    message.words.index("fantastic")
)

In [109]:
# Solution 4
TextBlob("fantastic").words[0].definitions
message.words[ message.words.index("fantastic") ].definitions

In [110]:
# Solution 5
fan = message.words[ message.words.index("fantastic") ]
(
  fan.stem(),
  fan.lemmatize()
)

## TextBlobs as Strings
TextBlobs act as strings, meaning you can use all of the normal string methods and you can index them as you would a string.

In [111]:
my_cheer


In [112]:
my_cheer[0:6]


In [113]:
my_cheer.upper()


In [114]:
my_cheer.lower()


## **n**-grams
Overlapping lists of words.

In [115]:
my_cheer


In [116]:
my_cheer.words

In [117]:
my_cheer.ngrams(n=3)


In [118]:
[ " ".join(i) for i in my_cheer.ngrams(n=3) ]


In [119]:
my_cheer.split(",")


In [120]:
my_cheer.words


In [121]:
[ " ".join(i) for i in TextBlob("italian pop rock").ngrams(n=2) ]


## Translation


In [122]:
# %%capture
!pip install googletrans==3.1.0a0 transformers sacremoses


### Google translate

In [123]:
from googletrans import Translator


In [124]:
translator = Translator()


In [125]:
result = translator.translate(
    'Hello, how are you?',
    src='en',
    dest='es',
)
print(result.text)


In [126]:
result = translator.translate(
    'Hello, how are you?',
    src='en',
    dest='fr',
)
print(result.text)


In [127]:
result = translator.translate(
    'Hello, how are you?',
    src='en',
    dest='ar',
)
print(result.text)


In [128]:
result = translator.translate(
    'Hello, how are you?',
    src='en',
    dest='de',
)
print(result.text)


All in one ...

In [129]:
langs = 'es fr ar de'.split()

for lang in langs:
  result = translator.translate(
      'Hello, how are you?',
      src='en',
      dest=lang,
  )
  print(result.text)


### Hugging Face Transformers (via pre-trained models)


In [130]:
from transformers import MarianMTModel, MarianTokenizer


In [131]:
# Load pre-trained MarianMT model
model_name = 'Helsinki-NLP/opus-mt-en-es'
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name, clean_up_tokenization_spaces=True )

text = "Hello, how are you?"
translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True))
result = tokenizer.decode(translated[0], skip_special_tokens=True)

print(result)


In [None]:
# Load pre-trained MarianMT model
model_name = 'Helsinki-NLP/opus-mt-en-fr'
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name, clean_up_tokenization_spaces=True)

text = "Hello, how are you?"
translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True))
result = tokenizer.decode(translated[0], skip_special_tokens=True)

print(result)


In [None]:
# Load pre-trained MarianMT model
model_name = 'Helsinki-NLP/opus-mt-en-ar'
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name, clean_up_tokenization_spaces=True)

text = "Hello, how are you?"
translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True))
result = tokenizer.decode(translated[0], skip_special_tokens=True)

print(result)


In [None]:
# Load pre-trained MarianMT model
model_name = 'Helsinki-NLP/opus-mt-en-de'
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name, clean_up_tokenization_spaces=True)

text = "Hello, how are you?"
translated = model.generate(**tokenizer(text, return_tensors="pt", padding=True))
result = tokenizer.decode(translated[0], skip_special_tokens=True)

print(result)
