# Basic

Create a TextBlob         

Part-of-speech Tagging

Noun Phrase Extraction

Sentiment Analysis

Tokenization

Words Inflection and Lemmatization

WordNet Integration

WordLists

Spelling Correction

Get Word and Noun Phrase Frequencies

Parsing

TextBlobs Are Like Python Strings!

n-grams


# End to End to Projects

Sentiment Analysis on Customer Reviews

Language Translation Tool

Spell Checker for User Input

Keyword Extraction from Text

Text Summarization by Sentence Polarity

Formality Checker for Text

Subjectivity Analysis of Statements

Sentence Tokenizer for Paragraphs

Text Similarity Checker

Text Complexity Scorer

Parts of Speech (POS) Tagging

Synonym Replacement Tool

Custom Word Frequency Counter

Named Entity Recognition (NER) using Custom Keywords

Politeness Detector for Customer Support

In [14]:
!pip install textblob
!python -m textblob.download_corpora

Finished.

[nltk_data] Downloading package brown to C:\Users\Noor
[nltk_data]     Saeed\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\brown.zip.
[nltk_data] Downloading package punkt to C:\Users\Noor
[nltk_data]     Saeed\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Noor
[nltk_data]     Saeed\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Noor Saeed\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to C:\Users\Noor
[nltk_data]     Saeed\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\conll2000.zip.
[nltk_data] Downloading package movie_reviews to C:\Users\Noor
[nltk_data]     Saeed\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\movie_reviews.zip.





# Create a TextBlob

In [15]:
from textblob import TextBlob

In [16]:
wiki = TextBlob("Python is a high-level, general-purpose programming language.")
wiki

TextBlob("Python is a high-level, general-purpose programming language.")

# Parts of speech Taggings

Each POS tag represents the grammatical role of the word in the sentence. Here’s the full form of each tag from your list:


NNP: Proper Noun, Singular (e.g., "Python")

VBZ: Verb, 3rd person singular present (e.g., "is")

DT: Determiner (e.g., "a")

JJ: Adjective (e.g., "high-level", "general-purpose")

NN: Noun, Singular or Mass (e.g., "programming," "language")

In [17]:
wiki.tags

[('Python', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('high-level', 'JJ'),
 ('general-purpose', 'JJ'),
 ('programming', 'NN'),
 ('language', 'NN')]

# Noun Phrase Extraction
Similarly, noun phrases are accessed through the noun_phrases property

In [18]:
wiki.noun_phrases

WordList(['python'])

# Sentiment Analysis

In [19]:
testimonial = TextBlob("Textblob is amazingly simple to use. What great fun!")

if testimonial.sentiment[0] >=0.1:
    print("Positive")
else:
    print("Negative")

Positive


In [20]:
testimonial = TextBlob("I hate to start learning complex concepts first")

if testimonial.sentiment[0] >=0.1:
    print("Positive")
else:
    print("Negative")

Negative


# Tokenization

Word Base

Sentence Base

In [22]:
zen = TextBlob(
    "Beautiful is better than ugly. "
    "Explicit is better than implicit. "
    "Simple is better than complex."
)
zen.words

WordList(['Beautiful', 'is', 'better', 'than', 'ugly', 'Explicit', 'is', 'better', 'than', 'implicit', 'Simple', 'is', 'better', 'than', 'complex'])

In [23]:
zen.sentences

[Sentence("Beautiful is better than ugly."),
 Sentence("Explicit is better than implicit."),
 Sentence("Simple is better than complex.")]

# Words Inflection and Lemmatization

In [24]:
sentence = TextBlob("Use 4 spaces per indentation level.")
sentence.words

WordList(['Use', '4', 'spaces', 'per', 'indentation', 'level'])

In [25]:
sentence.words[2].singularize()

'space'

In [26]:
sentence.words[-1].pluralize()

'levels'

In [39]:
from textblob import Word
w = Word("places")
w.lemmatize('n')

'place'

In [37]:
w = Word("went")
w.lemmatize("v")  # Pass in WordNet part of speech (verb)

'go'

In [38]:
w = Word("loving")
w.lemmatize("v")  # Pass in WordNet part of speech (verb)

'love'

# Wordnet integration

In [40]:
from textblob import Word
from textblob.wordnet import VERB
word = Word("places")
word.synsets

[Synset('topographic_point.n.01'),
 Synset('place.n.02'),
 Synset('place.n.03'),
 Synset('place.n.04'),
 Synset('stead.n.01'),
 Synset('place.n.06'),
 Synset('home.n.01'),
 Synset('position.n.06'),
 Synset('position.n.01'),
 Synset('place.n.10'),
 Synset('seat.n.01'),
 Synset('place.n.12'),
 Synset('place.n.13'),
 Synset('plaza.n.01'),
 Synset('place.n.15'),
 Synset('space.n.07'),
 Synset('put.v.01'),
 Synset('place.v.02'),
 Synset('rate.v.01'),
 Synset('locate.v.03'),
 Synset('place.v.05'),
 Synset('place.v.06'),
 Synset('target.v.01'),
 Synset('identify.v.01'),
 Synset('place.v.09'),
 Synset('set.v.09'),
 Synset('place.v.11'),
 Synset('place.v.12'),
 Synset('invest.v.01'),
 Synset('station.v.01'),
 Synset('place.v.15'),
 Synset('place.v.16')]

In [41]:
Word("hack").get_synsets(pos=VERB)

[Synset('chop.v.05'),
 Synset('hack.v.02'),
 Synset('hack.v.03'),
 Synset('hack.v.04'),
 Synset('hack.v.05'),
 Synset('hack.v.06'),
 Synset('hack.v.07'),
 Synset('hack.v.08')]

In [42]:
Word("Place").definitions

['a point located with respect to surface features of some region',
 'any area set aside for a particular purpose',
 'an abstract mental location',
 'a general vicinity',
 'the post or function properly or customarily occupied or served by another',
 'a particular situation',
 'where you live at a particular time',
 'a job in an organization',
 'the particular portion of space occupied by something',
 'proper or designated social situation',
 'a space reserved for sitting (as in a theater or on a train or airplane)',
 'the passage that is being read',
 'proper or appropriate position or location',
 'a public square with room for pedestrians',
 'an item on a list or in a sequence',
 'a blank area',
 'put into a certain place or abstract location',
 'place somebody in a particular situation or location',
 'assign a rank or rating to',
 'assign a location to',
 'to arrange for',
 'take a place in a competition; often followed by an ordinal',
 'intend (something) to move towards a certain 

# Spelling Correction & Checking

In [43]:
b = TextBlob("I havv goood speling!")
print(b.correct())

I have good spelling!


In [46]:
w = Word("lovery")
w.spellcheck()

[('lovely', 0.44776119402985076),
 ('lover', 0.3880597014925373),
 ('lovers', 0.13432835820895522),
 ('livery', 0.029850746268656716)]

In [45]:
w = Word("nica")
w.spellcheck()

[('nice', 0.9636363636363636),
 ('nina', 0.01818181818181818),
 ('nick', 0.01818181818181818)]

# Parsing

Here's the breakdown of the tags in your example:

Word: The actual word from the sentence.

POS-tag: Part-of-speech tag, indicating the grammatical role of the word (e.g., "CC" for coordinating conjunction, "RB" for adverb, etc.).

Chunk-tag: Indicates the beginning or continuation of a chunk (e.g., noun phrases, prepositional phrases). This uses B (beginning) and I (inside) tags to show chunk boundaries:

B-NP: Beginning of a noun phrase

I-NP: Inside a noun phrase

B-PP: Beginning of a prepositional phrase

B-ADVP: Beginning of an adverbial phrase

O: Outside of any chunk

Named-entity-tag: Marks named entities (e.g., organizations, locations) with specific tags. In your example, everything is tagged as O (Outside any named entity), meaning there are no named entities in this sentence.

In [47]:
b = TextBlob("And now for something completely different.")
print(b.parse())

And/CC/O/O now/RB/B-ADVP/O for/IN/B-PP/B-PNP something/NN/B-NP/I-PNP completely/RB/B-ADJP/O different/JJ/I-ADJP/O ././O/O


# n-gram (gram, bi-gram, tri-gram)

In [48]:
blob = TextBlob("Now is better than never.")
blob.ngrams(n=3)

[WordList(['Now', 'is', 'better']),
 WordList(['is', 'better', 'than']),
 WordList(['better', 'than', 'never'])]

In [49]:
blob.ngrams(n=1)

[WordList(['Now']),
 WordList(['is']),
 WordList(['better']),
 WordList(['than']),
 WordList(['never'])]

In [50]:
blob.ngrams(n=2)

[WordList(['Now', 'is']),
 WordList(['is', 'better']),
 WordList(['better', 'than']),
 WordList(['than', 'never'])]

# Project 1: Sentiment Analysis on Customer Reviews


two possibilies

Real Time Sentiment Detection

Detection Sentiment from text (custom data)

In [25]:
# Real Time Detection
from textblob import TextBlob
while True:
    input_text = input("Enter your text......or press exit to leave   ")
    print("\n")
    text = TextBlob(input_text)
    sentiment = text.sentiment.polarity
    if sentiment > 0:
        print("Positive")
    elif sentiment==0:
        print("Neutral")
    else:
        print("negative")
    
    if input_text == "exit":
        print("Good Bye")
        break

Enter your text......or press exit to leave   I love to watch movies


Positive
Enter your text......or press exit to leave   we don't like this movies its so ugly


negative
Enter your text......or press exit to leave   i wanna go to USA


Neutral
Enter your text......or press exit to leave   we loves each other


negative
Enter your text......or press exit to leave   we love each other


Positive
Enter your text......or press exit to leave   exit


Neutral
Good Bye


In [26]:
# on custom data
import pandas as pd

In [28]:
def sentiment_detection(text):
    text = text.lower()
    
    text = TextBlob(text)
    
    sentiment = text.sentiment.polarity
    
    if sentiment > 0:
        return "positive"
    elif sentiment < 0:
        return "negative"
    else:
        return 'neutral'

In [29]:
df = pd.read_csv("redmi6.csv", encoding='ISO-8859-1')
df = df[['Customer name','Comments']]
df.head()

Unnamed: 0,Customer name,Comments
0,Rishikumar Thakur,Another Midrange killer Smartphone by Xiaomi\n...
1,Raza ji,All ok but vry small size mobile
2,Vaibhav Patel,Quite good
3,Amazon Customer,Redmi has always have been the the king of bud...
4,Sudhakaran Wadakkancheri,worst product from MI. I am a hardcore fan of ...


In [30]:
df['label'] = df['Comments'].apply(sentiment_detection)

In [31]:
df

Unnamed: 0,Customer name,Comments,label
0,Rishikumar Thakur,Another Midrange killer Smartphone by Xiaomi\n...,positive
1,Raza ji,All ok but vry small size mobile,positive
2,Vaibhav Patel,Quite good,positive
3,Amazon Customer,Redmi has always have been the the king of bud...,positive
4,Sudhakaran Wadakkancheri,worst product from MI. I am a hardcore fan of ...,negative
...,...,...,...
275,Rahul,"I like This Phone, Awesome look and design.\nI...",positive
276,Sunil Soni,Product is avasome but invoice is note include...,neutral
277,D.C.Padhi,"Redmi Note4, Note5, now 6pro..It seems the old...",positive
278,Mahesh,I love mi,positive


# Project 2: Language Translation Tool

In [7]:
# language translation is deprecated from textblob but we can use alternative

In [8]:
# from textblob import TextBlob

# blob = TextBlob('TextBlob is a great tool for developers')
# print(blob.translate(to='hi'))

# Deep Translator (Google Translator)

In [8]:
from deep_translator import GoogleTranslator

text = "my name is noor saeed"
print(GoogleTranslator(source='en', target='fr').translate(text))

je m'appelle noor saeed


In [11]:
def trans(text, target_lang):
    trans_text = GoogleTranslator(target=target_lang).translate(text)
    return trans_text

In [None]:
while True:
    print("\n Language Translation tool...")
    input_text = input("Enter your text....")
    
    if input_text == "exit":
        break
        
    target_lang = input("To translate lang....")
    
    trans_text = trans(input_text,target_lang)
    print("Translation :", trans_text)
    
    


 Language Translation tool...
Enter your text....we appreciate each other
To translate lang....ur
Translation : ہم ایک دوسرے کی تعریف کرتے ہیں

 Language Translation tool...
Enter your text....this is a cat
To translate lang....hi
Translation : यह एक बिल्ली है

 Language Translation tool...


# Project 3: Spell Checker

In [5]:
from textblob import TextBlob

mytxt = "i lov you"
text = TextBlob(mytxt)

print(text.correct())

i love you


In [None]:
# Import necessary libraries (Run this code in VS code or pycharm)
import streamlit as st
from textblob import TextBlob

# Streamlit app setup
st.title("Real-Time Spell Checker")
st.write("Enter text with spelling errors below, and see the corrected version in real time.")

# Text input
text = st.text_area("Type your text here:", "")

# Check if the input text is not empty
if text:
    # Use TextBlob for spell correction
    blob = TextBlob(text)
    corrected_text = blob.correct()
    
    # Display the results
    st.subheader("Corrected Text")
    st.write(corrected_text)

# Info
st.info("This app uses TextBlob for spell checking and correction.")


# Project 4: Auto Keyword Extraction from Articles Text Using TextBlob

In [1]:
import pandas as pd

df = pd.read_csv("dblp-v10.csv")
df = df[['abstract','authors']]
df.dropna(inplace=True)
df = df.sample(n=20)
df.to_csv("small_paper_data.csv")

  from pandas.core.computation.check import NUMEXPR_INSTALLED
  from pandas.core import (


In [59]:
df

Unnamed: 0,abstract,authors
910336,"In this paper, we explore the use of Maximum L...","['Diana I. Escalona-Vargas', 'Pamela Murphy', ..."
849061,Even with the recent advances in the area of d...,"['Andreas Seekircher', 'Ubbo Visser']"
872318,It is challenging to support multimedia transm...,"['Chungui Liu', 'Yantai Shu', 'Lianfang Zhang'..."
85096,Evaluations on the quality of liquid crystal d...,"['Eui Chul Lee', 'Si Mong Lee', 'Chee Sun Won'..."
551574,As three-dimensional (3D) environments become ...,"['Hiep Phuc Luong', 'Dipesh Gautam', 'John Gau..."
132534,"Discusses the development of a single, multifu...",['Forouzan Golshani']
25538,We used the finite-element method (FEM) to mod...,"['Hong Cao', 'Michael A. Speidel', 'Jang-Zern ..."
550160,This paper presents a formal specification in ...,['Jonathan Jacky']
106606,The subjective quality achieved by most audio ...,"['Claus Bauer', 'Matt Fellers', 'Grant Allen D..."
740819,This paper presents a supervised bayesian appr...,"['Jose San Pedro', 'Alexandros Karatzoglou']"


# clean text

In [60]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re
import string


stmer = PorterStemmer()
ltzr = WordNetLemmatizer()
stopwords = set(stopwords.words('english'))

def clean(text):
    # lower casing
    text = text.lower()
    
    # Remove everything except letters and digits, and make lowercase for a continuous string
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
    
    # tokenization
    text = word_tokenize(text)
    
    # remove stopwords
    text = [word for word in text if word not in stopwords]
    
    return " ".join(text)


In [61]:
# Sample text
text = "Machine! learning enables 890808 @#$#@$#@ computers to? learn from data without being explicitly programmed."

text = clean(text)
text

'machine learning enables 890808 computers learn data without explicitly programmed'

In [62]:
# apply on data
df['clean_abstract'] = df['abstract'].apply(clean)

In [63]:
df

Unnamed: 0,abstract,authors,clean_abstract
910336,"In this paper, we explore the use of Maximum L...","['Diana I. Escalona-Vargas', 'Pamela Murphy', ...",paper explore use maximum likelihood ml method...
849061,Even with the recent advances in the area of d...,"['Andreas Seekircher', 'Ubbo Visser']",even recent advances area dynamic walking huma...
872318,It is challenging to support multimedia transm...,"['Chungui Liu', 'Yantai Shu', 'Lianfang Zhang'...",challenging support multimedia transmissions w...
85096,Evaluations on the quality of liquid crystal d...,"['Eui Chul Lee', 'Si Mong Lee', 'Chee Sun Won'...",evaluations quality liquid crystal display lcd...
551574,As three-dimensional (3D) environments become ...,"['Hiep Phuc Luong', 'Dipesh Gautam', 'John Gau...",threedimensional 3d environments become preval...
132534,"Discusses the development of a single, multifu...",['Forouzan Golshani'],discusses development single multifunctional d...
25538,We used the finite-element method (FEM) to mod...,"['Hong Cao', 'Michael A. Speidel', 'Jang-Zern ...",used finiteelement method fem model analyze re...
550160,This paper presents a formal specification in ...,['Jonathan Jacky'],paper presents formal specification z notation...
106606,The subjective quality achieved by most audio ...,"['Claus Bauer', 'Matt Fellers', 'Grant Allen D...",subjective quality achieved audio codecs inclu...
740819,This paper presents a supervised bayesian appr...,"['Jose San Pedro', 'Alexandros Karatzoglou']",paper presents supervised bayesian approach mo...


Explanation (bellow code)

Nouns: Captures both common and proper nouns (singular and plural).
    
Adjectives: Includes comparative (JJR) and superlative (JJS) forms in addition to base adjectives (JJ).
    
Verbs: Includes various forms (base, past, gerund, participle, etc.).
    
Adverbs: Includes comparative (RBR) and superlative (RBS) adverbs as well as standard adverbs (RB).

In [67]:
from textblob import TextBlob
from collections import Counter


def get_keywords(text):
    # Create a TextBlob object
    blob = TextBlob(text)

    # Extract a broader range of keywords based on POS tagging
    keywords = [word for word, tag in blob.tags if tag in ('NN', 'NNS', 'NNP', 'NNPS',  # Nouns
                                                           'JJ', 'JJR', 'JJS',          # Adjectives
                                                           'RB', 'RBR', 'RBS')]         # Adverbs

    # Count the most common keywords
    keyword_counts = Counter(keywords)
    most_common_keywords = keyword_counts.most_common(5)  # Get the top 5 most common keywords
    
    return most_common_keywords

In [68]:
# get keywords
df['keywords'] = df['clean_abstract'].apply(get_keywords)
df

Unnamed: 0,abstract,authors,clean_abstract,keywords
910336,"In this paper, we explore the use of Maximum L...","['Diana I. Escalona-Vargas', 'Pamela Murphy', ...",paper explore use maximum likelihood ml method...,"[(ga, 2), (optimization, 2), (fmcg, 2), (data,..."
849061,Even with the recent advances in the area of d...,"['Andreas Seekircher', 'Ubbo Visser']",even recent advances area dynamic walking huma...,"[(model, 3), (even, 2), (robots, 2), (signific..."
872318,It is challenging to support multimedia transm...,"['Chungui Liu', 'Yantai Shu', 'Lianfang Zhang'...",challenging support multimedia transmissions w...,"[(networks, 5), (wireless, 5), (support, 3), (..."
85096,Evaluations on the quality of liquid crystal d...,"['Eui Chul Lee', 'Si Mong Lee', 'Chee Sun Won'...",evaluations quality liquid crystal display lcd...,"[(lcd, 4), (tv, 4), (scene, 4), (video, 3), (d..."
551574,As three-dimensional (3D) environments become ...,"['Hiep Phuc Luong', 'Dipesh Gautam', 'John Gau...",threedimensional 3d environments become preval...,"[(virtual, 5), (data, 3), (world, 3), (collect..."
132534,"Discusses the development of a single, multifu...",['Forouzan Golshani'],discusses development single multifunctional d...,"[(discusses, 1), (development, 1), (single, 1)..."
25538,We used the finite-element method (FEM) to mod...,"['Hong Cao', 'Michael A. Speidel', 'Jang-Zern ...",used finiteelement method fem model analyze re...,"[(catheter, 5), (fem, 3), (resistance, 3), (de..."
550160,This paper presents a formal specification in ...,['Jonathan Jacky'],paper presents formal specification z notation...,"[(specification, 6), (system, 5), (z, 3), (con..."
106606,The subjective quality achieved by most audio ...,"['Claus Bauer', 'Matt Fellers', 'Grant Allen D...",subjective quality achieved audio codecs inclu...,"[(mpeg4, 2), (conventional, 2), (procedure, 2)..."
740819,This paper presents a supervised bayesian appr...,"['Jose San Pedro', 'Alexandros Karatzoglou']",paper presents supervised bayesian approach mo...,"[(model, 3), (paper, 1), (presents, 1), (bayes..."


In [69]:
df['keywords']

910336    [(ga, 2), (optimization, 2), (fmcg, 2), (data,...
849061    [(model, 3), (even, 2), (robots, 2), (signific...
872318    [(networks, 5), (wireless, 5), (support, 3), (...
85096     [(lcd, 4), (tv, 4), (scene, 4), (video, 3), (d...
551574    [(virtual, 5), (data, 3), (world, 3), (collect...
132534    [(discusses, 1), (development, 1), (single, 1)...
25538     [(catheter, 5), (fem, 3), (resistance, 3), (de...
550160    [(specification, 6), (system, 5), (z, 3), (con...
106606    [(mpeg4, 2), (conventional, 2), (procedure, 2)...
740819    [(model, 3), (paper, 1), (presents, 1), (bayes...
Name: keywords, dtype: object

# App.py (Don't run it here, run in vs or pycharm)

In [None]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re
import streamlit as st
import pandas as pd
from textblob import TextBlob
from collections import Counter

import
# run this in terminal; python -m nltk.downloader all
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')



stopwords = set(stopwords.words('english'))


def clean(text):
    # lower casing
    text = text.lower()

    # Remove everything except letters and digits, and make lowercase for a continuous string
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)

    # tokenization
    text = word_tokenize(text)

    # remove stopwords
    text = [word for word in text if word not in stopwords]

    return " ".join(text)



def get_keywords(text):
    # Create a TextBlob object
    blob = TextBlob(text)

    # Extract a broader range of keywords based on POS tagging
    keywords = [word for word, tag in blob.tags if tag in ('NN', 'NNS', 'NNP', 'NNPS',  # Nouns
                                                           'JJ', 'JJR', 'JJS',  # Adjectives
                                                           'RB', 'RBR', 'RBS')]  # Adverbs

    # Count the most common keywords
    keyword_counts = Counter(keywords)
    most_common_keywords = keyword_counts.most_common(5)  # Get the top 5 most common keywords

    return most_common_keywords


# UI code=================
st.title("Auto Keyword Extraction from Articles Text Using TextBlob")
uploaded_file = st.sidebar.file_uploader("Upload a file", type="csv")

if uploaded_file is not None:
    # read dataset
    df = pd.read_csv(uploaded_file)
    df = df[['abstract', 'authors']]
    df.dropna(inplace=True)
    df = df.sample(n=20)
    # apply on data
    st.write("Uploaded File Preview")
    st.dataframe(df.head(3))

    # Check if the 'abstract' column exists and is not empty
    if 'abstract' in df.columns and not df['abstract'].empty:
        df['clean_abstract'] = df['abstract'].apply(clean)
        df['keywords'] = df['clean_abstract'].apply(get_keywords)



        # Ensure keywords column contains only strings
        df['keywords'] = df['keywords'].apply(lambda x: str(x) if not isinstance(x, str) else x)
        # Handle NaN or None values by replacing them with an empty string
        df['keywords'] = df['keywords'].fillna('')
        st.write("Extracted Keywords Preview")
        st.dataframe(df)
    else:
        st.write("Uploaded Data must have abstract column...")