# **TextBlob**

TextBlob provides a wide range of methods for natural language processing tasks. Below are some of the key methods and functions available in TextBlob:

In [1]:
!pip install textblob



In [2]:
import textblob
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [3]:
str = '''SHOULD WE COLONISE SPACE?

        In conclusion, I return to Einstein. If we find a planet in the
        Alpha Centauri system, its image, captured by a camera travelling
        at a fifth of light speed, will be slightly distorted due to the
        effects of special relativity. It would be the first time a
        spacecraft has flown fast enough to see such effects. In fact,
        Einstein’s theory is central to the whole mission. Without it
        we would have neither lasers nor the ability to perform the
        calculations necessary for guidance, imaging and data transmission
        over twenty-five trillion miles at a fifth of light speed. We can
        see a pathway between that sixteen-year-old boy dreaming of riding
        on a light beam and our own dream, which we are planning to turn
        into a reality, of riding our own light beam to the stars. We are
        standing at the threshold of a new era. Human colonisation on other
        planets is no longer science fiction. It can be science fact. The
        human race has existed as a separate species for about two million
        years. Civilisation began about 10,000 years ago, and the rate of
        development has been steadily increasing. If humanity is to continue
        for another million years, our future lies in boldly going where no one
        else has gone before. I hope for the best. I have to. We have no other
        option. The era of civilian space travel is coming. What do you think
        it means to us? I look forward to space travel. I would be one of the
        first to buy a ticket. I expect that within the next hundred years we
        will be able to travel anywhere in the solar system, except maybe the
        outer planets. But travel to the stars will take a bit longer. I reckon
        in 500 years, we will have visited some of the nearby stars. It won’t be
        like Star Trek . We won’t be able to travel at warp speed. So a round trip
        will take at least ten years and probably much longer.

        From the book: Brief Answers to the Big Questions – Stephen Hawking
'''

## **1. Tokenization**
   - `.words`: Splits the text into individual words.
   - `.sentences`: Splits the text into individual sentences.



In [4]:
blob = textblob.TextBlob(str)
a = blob.words
print(a)

['SHOULD', 'WE', 'COLONISE', 'SPACE', 'In', 'conclusion', 'I', 'return', 'to', 'Einstein', 'If', 'we', 'find', 'a', 'planet', 'in', 'the', 'Alpha', 'Centauri', 'system', 'its', 'image', 'captured', 'by', 'a', 'camera', 'travelling', 'at', 'a', 'fifth', 'of', 'light', 'speed', 'will', 'be', 'slightly', 'distorted', 'due', 'to', 'the', 'effects', 'of', 'special', 'relativity', 'It', 'would', 'be', 'the', 'first', 'time', 'a', 'spacecraft', 'has', 'flown', 'fast', 'enough', 'to', 'see', 'such', 'effects', 'In', 'fact', 'Einstein', '’', 's', 'theory', 'is', 'central', 'to', 'the', 'whole', 'mission', 'Without', 'it', 'we', 'would', 'have', 'neither', 'lasers', 'nor', 'the', 'ability', 'to', 'perform', 'the', 'calculations', 'necessary', 'for', 'guidance', 'imaging', 'and', 'data', 'transmission', 'over', 'twenty-five', 'trillion', 'miles', 'at', 'a', 'fifth', 'of', 'light', 'speed', 'We', 'can', 'see', 'a', 'pathway', 'between', 'that', 'sixteen-year-old', 'boy', 'dreaming', 'of', 'riding'

In [5]:
a = blob.sentences
print(a)

[Sentence("SHOULD WE COLONISE SPACE?"), Sentence("In conclusion, I return to Einstein."), Sentence("If we find a planet in the
        Alpha Centauri system, its image, captured by a camera travelling
        at a fifth of light speed, will be slightly distorted due to the
        effects of special relativity."), Sentence("It would be the first time a
        spacecraft has flown fast enough to see such effects."), Sentence("In fact,
        Einstein’s theory is central to the whole mission."), Sentence("Without it
        we would have neither lasers nor the ability to perform the
        calculations necessary for guidance, imaging and data transmission
        over twenty-five trillion miles at a fifth of light speed."), Sentence("We can
        see a pathway between that sixteen-year-old boy dreaming of riding
        on a light beam and our own dream, which we are planning to turn
        into a reality, of riding our own light beam to the stars."), Sentence("We are
        stand

## **2. ngram**

The `.ngrams(n)` method in **TextBlob** generates **n-grams** from the text, where an n-gram is a contiguous sequence of `n` items (typically words) from a given text.

### How It Works:
- A **unigram** (1-gram) contains individual words.
- A **bigram** (2-gram) contains pairs of consecutive words.
- A **trigram** (3-gram) contains sequences of three consecutive words, and so on.

### Usage:

You can use `.ngrams(n)` for:
- **Text analysis**: Capture relationships between adjacent words.
- **Feature extraction**: Create features for machine learning models based on sequences of words.
- **Language modeling**: Identify common phrases or word patterns in a corpus.

You can adjust `n` to get trigrams (`n=3`), four-grams (`n=4`), and so on depending on how long the sequences need to be.

In [6]:
a = blob.ngrams(n=4)
print(a)

[WordList(['SHOULD', 'WE', 'COLONISE', 'SPACE']), WordList(['WE', 'COLONISE', 'SPACE', 'In']), WordList(['COLONISE', 'SPACE', 'In', 'conclusion']), WordList(['SPACE', 'In', 'conclusion', 'I']), WordList(['In', 'conclusion', 'I', 'return']), WordList(['conclusion', 'I', 'return', 'to']), WordList(['I', 'return', 'to', 'Einstein']), WordList(['return', 'to', 'Einstein', 'If']), WordList(['to', 'Einstein', 'If', 'we']), WordList(['Einstein', 'If', 'we', 'find']), WordList(['If', 'we', 'find', 'a']), WordList(['we', 'find', 'a', 'planet']), WordList(['find', 'a', 'planet', 'in']), WordList(['a', 'planet', 'in', 'the']), WordList(['planet', 'in', 'the', 'Alpha']), WordList(['in', 'the', 'Alpha', 'Centauri']), WordList(['the', 'Alpha', 'Centauri', 'system']), WordList(['Alpha', 'Centauri', 'system', 'its']), WordList(['Centauri', 'system', 'its', 'image']), WordList(['system', 'its', 'image', 'captured']), WordList(['its', 'image', 'captured', 'by']), WordList(['image', 'captured', 'by', 'a'

## **3. Word Definitions and Synonyms (via WordNet)**

- .definitions: Returns definitions of words from WordNet.

- .synsets: Returns a list of synonyms for a word

In [7]:
from textblob import Word
nltk.download('wordnet')
nltk.download('omw-1.4')


a = Word("book")
print(a.definitions)

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


['a written work or composition that has been published (printed on pages bound together)', 'physical objects consisting of a number of pages bound together', 'a compilation of the known facts regarding something or someone', 'a written version of a play or other dramatic composition; used in preparing for a performance', 'a record in which commercial accounts are recorded', 'a collection of playing cards satisfying the rules of a card game', 'a collection of rules or prescribed standards on the basis of which decisions are made', 'the sacred writings of Islam revealed by God to the prophet Muhammad during his life at Mecca and Medina', 'the sacred writings of the Christian religions', 'a major division of a long written composition', 'a number of sheets (ticket or stamps etc.) bound together on one edge', 'engage for a performance', 'arrange for and reserve (something for someone else) in advance', 'record a charge in a police register', 'register in a hotel booker']


In [8]:
a.synsets

[Synset('book.n.01'),
 Synset('book.n.02'),
 Synset('record.n.05'),
 Synset('script.n.01'),
 Synset('ledger.n.01'),
 Synset('book.n.06'),
 Synset('book.n.07'),
 Synset('koran.n.01'),
 Synset('bible.n.01'),
 Synset('book.n.10'),
 Synset('book.n.11'),
 Synset('book.v.01'),
 Synset('reserve.v.04'),
 Synset('book.v.03'),
 Synset('book.v.04')]

## **4. Word and Sentence Counts**


- .word_counts: Returns a dictionary with word frequencies.

- .sentence_count: Returns the number of sentences in the text

In [9]:
print(blob.word_counts["space"])

3


## **5. Sentiment Analysis**
   - `.sentiment`: Returns a named tuple `(polarity, subjectivity)` where:
     - **Polarity** ranges from `-1` (negative) to `1` (positive).
     - **Subjectivity** ranges from `0` (objective) to `1` (subjective).


In [10]:
a = blob.sentiment

print(a)

Sentiment(polarity=0.1725525664811379, subjectivity=0.44312306740878166)


## **6. Part-of-Speech (POS) Tagging**

-  .tags: Returns a list of tuples, where each tuple contains a word and its part of speech (POS) tag

In [11]:
import textblob
import nltk
nltk.download('averaged_perceptron_tagger')

blob.tags

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


[('SHOULD', 'MD'),
 ('WE', 'NNP'),
 ('COLONISE', 'NNP'),
 ('SPACE', 'NNP'),
 ('In', 'IN'),
 ('conclusion', 'NN'),
 ('I', 'PRP'),
 ('return', 'VBP'),
 ('to', 'TO'),
 ('Einstein', 'NNP'),
 ('If', 'IN'),
 ('we', 'PRP'),
 ('find', 'VBP'),
 ('a', 'DT'),
 ('planet', 'NN'),
 ('in', 'IN'),
 ('the', 'DT'),
 ('Alpha', 'NNP'),
 ('Centauri', 'NNP'),
 ('system', 'NN'),
 ('its', 'PRP$'),
 ('image', 'NN'),
 ('captured', 'VBN'),
 ('by', 'IN'),
 ('a', 'DT'),
 ('camera', 'NN'),
 ('travelling', 'NN'),
 ('at', 'IN'),
 ('a', 'DT'),
 ('fifth', 'NN'),
 ('of', 'IN'),
 ('light', 'JJ'),
 ('speed', 'NN'),
 ('will', 'MD'),
 ('be', 'VB'),
 ('slightly', 'RB'),
 ('distorted', 'VBN'),
 ('due', 'JJ'),
 ('to', 'TO'),
 ('the', 'DT'),
 ('effects', 'NNS'),
 ('of', 'IN'),
 ('special', 'JJ'),
 ('relativity', 'NN'),
 ('It', 'PRP'),
 ('would', 'MD'),
 ('be', 'VB'),
 ('the', 'DT'),
 ('first', 'JJ'),
 ('time', 'NN'),
 ('a', 'DT'),
 ('spacecraft', 'NN'),
 ('has', 'VBZ'),
 ('flown', 'VBN'),
 ('fast', 'RB'),
 ('enough', 'JJ'),
 ('

## **7. Word Inflection and Lemmatization**

- .words.pluralize(): Pluralizes each word.

- .words.singularize(): Converts plural words to singular.

- .lemmatize(): Returns the base form of words.

In [12]:
blob.words.pluralize()

WordList(['SHOULDs', 'WEs', 'COLONISEs', 'SPACEs', 'Ins', 'conclusions', 'we', 'returns', 'toes', 'Einsteins', 'Ifs', 'wes', 'finds', 'some', 'planets', 'ins', 'thes', 'Alphas', 'Centauris', 'systems', 'their', 'images', 'captureds', 'bies', 'some', 'cameras', 'travellings', 'ats', 'some', 'fifths', 'ofs', 'lights', 'speeds', 'wills', 'bes', 'slightlies', 'distorteds', 'dues', 'toes', 'thes', 'effectss', 'ofs', 'specials', 'relativities', 'Its', 'woulds', 'bes', 'thes', 'firsts', 'times', 'some', 'spacecrafts', 'hass', 'flowns', 'fasts', 'enoughs', 'toes', 'sees', 'suches', 'effectss', 'Ins', 'facts', 'Einsteins', '’s', 'ss', 'theories', 'iss', 'centrals', 'toes', 'thes', 'wholes', 'missions', 'Withouts', 'they', 'wes', 'woulds', 'haves', 'neithers', 'laserss', 'nors', 'thes', 'abilities', 'toes', 'performs', 'thes', 'calculationss', 'necessaries', 'fors', 'guidances', 'imagings', 'ands', 'datas', 'transmissions', 'overs', 'twenty-fives', 'trillions', 'miless', 'ats', 'some', 'fifths',

In [13]:
blob.words.singularize()

WordList(['SHOULD', 'WE', 'COLONISE', 'SPACE', 'In', 'conclusion', 'I', 'return', 'to', 'Einstein', 'If', 'we', 'find', 'a', 'planet', 'in', 'the', 'Alpha', 'Centaurus', 'system', 'it', 'image', 'captured', 'by', 'a', 'camera', 'travelling', 'at', 'a', 'fifth', 'of', 'light', 'speed', 'will', 'be', 'slightly', 'distorted', 'due', 'to', 'the', 'effect', 'of', 'special', 'relativity', 'It', 'would', 'be', 'the', 'first', 'time', 'a', 'spacecraft', 'ha', 'flown', 'fast', 'enough', 'to', 'see', 'such', 'effect', 'In', 'fact', 'Einstein', '’', 's', 'theory', 'is', 'central', 'to', 'the', 'whole', 'mission', 'Without', 'it', 'we', 'would', 'have', 'neither', 'laser', 'nor', 'the', 'ability', 'to', 'perform', 'the', 'calculation', 'necessary', 'for', 'guidance', 'imaging', 'and', 'datum', 'transmission', 'over', 'twenty-five', 'trillion', 'mile', 'at', 'a', 'fifth', 'of', 'light', 'speed', 'We', 'can', 'see', 'a', 'pathway', 'between', 'that', 'sixteen-year-old', 'boy', 'dreaming', 'of', 'rid

## **8. Noun Phrase Extraction**

- .noun_phrases: Returns a list of noun phrases (important phrases) found in the text.

In [14]:
import nltk
nltk.download('brown')

blob.noun_phrases

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!


WordList(['should we colonise space', 'einstein', 'alpha centauri', 'light speed', 'special relativity', 'such effects', 'einstein', '’ s theory', 'whole mission', 'data transmission', 'twenty-five trillion miles', 'light speed', 'sixteen-year-old boy', 'light beam', 'own dream', 'own light beam', 'new era', 'human colonisation', 'science fiction', 'science fact', 'human race', 'separate species', 'civilisation', 'civilian space travel', 'space travel', 'solar system', 'outer planets', 'nearby stars', '’ t', 'trek', '’ t', 'warp speed', 'round trip', 'brief answers', 'stephen hawking'])