### Quickly establish a baseline text classifier accuracy before building more complex model using [Textblob.](https://textblob.readthedocs.io/en/dev/)

#### i.e. simplified text processing

#### Load 'tweet' dataset

In [46]:
import pandas as pd
df = pd.read_csv("sts_gold_tweet.csv",delimiter=";")
print(df.columns)

# Make a list of all the tweets.

tweets_text_collection = list(df['tweet'])

Index(['id', 'polarity', 'tweet'], dtype='object')


In [47]:
df.head()

Unnamed: 0,id,polarity,tweet
0,1467933112,0,the angel is going to miss the athlete this we...
1,2323395086,0,It looks as though Shaq is getting traded to C...
2,1467968979,0,@clarianne APRIL 9TH ISN'T COMING SOON ENOUGH
3,1990283756,0,drinking a McDonalds coffee and not understand...
4,1988884918,0,So dissapointed Taylor Swift doesnt have a Twi...


#### sentiment analysis

In [48]:
# !pip install textblob
from textblob import TextBlob

for i, tweet_text in zip(range(20), tweets_text_collection):

    print(tweet_text)
    analysis = TextBlob(tweet_text)
    
    # Analyse the sentiment.
    
    print(analysis.sentiment)
    
    # Polarity is a value between [-1.0, 1.0] and tells how positive or negative the text is. 
    # Subjectivity is within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.
    
    print("-"*20)

the angel is going to miss the athlete this weekend 
Sentiment(polarity=0.0, subjectivity=0.0)
--------------------
It looks as though Shaq is getting traded to Cleveland to play w/ LeBron... Too bad for Suns' fans. The Big Cactus is no more 
Sentiment(polarity=-0.3166666666666666, subjectivity=0.4222222222222222)
--------------------
@clarianne APRIL 9TH ISN'T COMING SOON ENOUGH 
Sentiment(polarity=0.0, subjectivity=0.5)
--------------------
drinking a McDonalds coffee and not understanding why someone would hurt me for no apparent reason. 
Sentiment(polarity=-0.025, subjectivity=0.35)
--------------------
So dissapointed Taylor Swift doesnt have a Twitter 
Sentiment(polarity=0.0, subjectivity=0.0)
--------------------
Wishes I was on the Spring Fling Tour with Dawn &amp; neecee Sigh  G'knight
Sentiment(polarity=0.0, subjectivity=0.0)
--------------------
got a sniffle, got the kids and hubby just left to work in Sydney for the weekend, boo hoo 
Sentiment(polarity=0.0, subjectivity=0.

### create a custom classifier

There are many more features in textblob you can refer the official documentation for them.

In [49]:
train = [
    ('I love this sandwich.', 'pos'),
    ('this is an amazing place!', 'pos'),
    ('I feel very good about these beers.', 'pos'),
    ('this is my best work.', 'pos'),
    ("what an awesome view", 'pos'),
    ('I do not like this restaurant', 'neg'),
    ('I am tired of this stuff.', 'neg'),
    ("I can't deal with this", 'neg'),
    ('he is my sworn enemy!', 'neg'),
    ('my boss is horrible.', 'neg')
]

In [50]:
test = [
    ('the beer was good.', 'pos'),
    ('I do not enjoy my job', 'neg'),
    ("I ain't feeling dandy today.", 'neg'),
    ("I feel amazing!", 'pos'),
    ('Gary is a friend of mine.', 'pos'),
    ("I can't believe I'm doing this.", 'neg')
]

In [51]:
from textblob.classifiers import NaiveBayesClassifier

In [52]:
import nltk
from nltk.data import load

#### manually download the tokenizer 
(because the following command can't access https://www.nltk.org/nltk_data/): 
nltk.download('punkt')

In [53]:
# !mkdir /root/nltk_data/tokenizers
# upload punkt.zip (punkt.zip is available from https://www.nltk.org/nltk_data/)
# !cp /home/jupyter/projects/knowledge-sharing-sessions/2022.10.04.NLP/punkt.zip /root/nltk_data/tokenizers
# !unzip /root/nltk_data/tokenizers/punkt.zip

#### create model

In [54]:
cl = NaiveBayesClassifier(train)

In [55]:
#### classify text

In [56]:
cl.classify("This is an amazing library!")

'pos'

In [57]:
cl.classify("enemy")

'neg'

In [58]:
#### classify sentences within a blob

In [59]:
blob = TextBlob("The beer is good. But the hangover is horrible.", classifier=cl)

In [60]:
for s in blob.sentences:
    print(s)
    print(s.classify())

The beer is good.
pos
But the hangover is horrible.
neg


#### evaluate performance

In [61]:
cl.accuracy(test)

0.8333333333333334

#### show informative features

In [62]:
cl.show_informative_features(5) 

Most Informative Features
            contains(my) = True              neg : pos    =      1.7 : 1.0
            contains(an) = False             neg : pos    =      1.6 : 1.0
             contains(I) = False             pos : neg    =      1.4 : 1.0
             contains(I) = True              neg : pos    =      1.4 : 1.0
            contains(my) = False             pos : neg    =      1.3 : 1.0


#### update classifier with new information

In [63]:
new_data = [('She is my best friend.', 'pos'),
    ("I'm happy to have a new friend.", 'pos'),
    ("Stay thirsty, my friend.", 'pos'),
    ("He ain't from around here.", 'neg')]

In [64]:
cl.update(new_data)

True

In [65]:
cl.accuracy(test)

1.0