**Installing BERTopic**

We start by installing BERTopic from PyPi:

In [None]:
%%capture
!pip install bertopic

**Data**

In [None]:
import re
import pandas as pd
from datetime import datetime

# Load data
sms_one = pd.read_csv('new.csv')
sms_two = pd.read_csv('sms.csv',encoding='latin1')

In [None]:
del sms_one['Label']
sms_one

Unnamed: 0,Message
0,"Dear Student, Its never too late-clear your ba..."
1,Join V-STUDY and score excellent marks in clas...
2,"Join crash courses for B.ST,A/C'S,ECO,ENG,&IP ..."
3,CRASH COURSES by BEST POOL OF FACULTY. ENGLISH...
4,"Dear Ola Shuttle user, get 60% Off on your nex..."
...,...
179,"SPOT ADMISSIONS FOR FORENSIC SCIENCE, Cardiac ..."
180,"CBSE Private Exam 2018, Forms are starting for..."
181,If you receive offer of lottery winnings or ch...
182,Gokul sent you a Blue Packet which expires in ...


In [None]:
sms_two = sms_two.drop(['Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3'], axis=1)
sms_two = sms_two.rename(columns={'sms': 'Message'})
sms_two

Unnamed: 0,Message
0,"Go until jurong point, crazy.. Available only ..."
1,Ok lar... Joking wif u oni...
2,Free entry in 2 a wkly comp to win FA Cup fina...
3,U dun say so early hor... U c already then say...
4,"Nah I don't think he goes to usf, he lives aro..."
...,...
5567,This is the 2nd time we have tried 2 contact u...
5568,Will Ì_ b going to esplanade fr home?
5569,"Pity, * was in mood for that. So...any other s..."
5570,The guy did some bitching but I acted like i'd...


In [None]:
sms = pd.concat([sms_one, sms_two])
sms

Unnamed: 0,Message
0,"Dear Student, Its never too late-clear your ba..."
1,Join V-STUDY and score excellent marks in clas...
2,"Join crash courses for B.ST,A/C'S,ECO,ENG,&IP ..."
3,CRASH COURSES by BEST POOL OF FACULTY. ENGLISH...
4,"Dear Ola Shuttle user, get 60% Off on your nex..."
...,...
5567,This is the 2nd time we have tried 2 contact u...
5568,Will Ì_ b going to esplanade fr home?
5569,"Pity, * was in mood for that. So...any other s..."
5570,The guy did some bitching but I acted like i'd...


# Aspect-Based Sentiment Analysis Using Spacy & TextBlob

In [None]:
# We get started by importing spacy
import spacy
nlp = spacy.load("en_core_web_sm")



Our first goal is to split our sentences in a way so that we have the target aspects (e.g. food) and their sentiment descriptions (e.g. delicious).

In [None]:
for sentence in sms.Message:
  doc = nlp(sentence)
  for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
      token.pos_,[child for child in token.children])

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
) punct hate VERB PUNCT []
Hey intj Congrats NOUN INTJ []
! punct Congrats NOUN PUNCT []
Congrats ROOT Congrats NOUN NOUN [Hey, !, 2u2, .]
2u2 nummod Congrats NOUN NUM []
. punct Congrats NOUN PUNCT []
i nsubj had VERB NOUN [luv]
d compound luv PROPN PROPN []
luv appos i NOUN PROPN [d, 2]
2 nummod luv PROPN NUM []
but cc had VERB CCONJ []
i nsubj had VERB PRON []
ve aux had VERB AUX []
had ROOT had VERB VERB [i, but, i, ve, go, !]
2 nsubj go VERB NUM []
go ccomp had VERB VERB [2, home]
home advmod go VERB ADV []
! punct had VERB PUNCT []
Dear ROOT Dear ADJ ADJ [you, .]
where advmod you PRON SCONJ []
you ccomp Dear ADJ PRON [where]
. punct Dear ADJ PUNCT []
Call ROOT Call VERB VERB [me]
me dobj Call VERB PRON []
Xy nsubj trying VERB PROPN []
trying ROOT trying VERB VERB [Xy, smth, now, .]
smth dobj trying VERB NOUN []
now advmod trying VERB ADV []
. punct trying VERB PUNCT []
U nsubj eat VERB NOUN []
eat ROOT eat VERB VERB

For each token inside our sentences, we can see the dependency thanks to spacy’s dependency parsing and the POS (Part-Of-Speech) tags. We’re also paying attention to the child tokens, so that we’re able to pick up intensifiers such as “very”, “quite”, and more.

**Disclaimer**: Our current simplistic algorithm may not be able to pick up semantically important information such as the “not” in “not great” at the moment. That would be crucial to account for in a real-life application.

Let’s see how to pick up the sentiment descriptions first.

In [None]:
for sentence in sms.Message:
  doc = nlp(sentence)
  descriptive_term = ''
  for token in doc:
    if token.pos_ == 'ADJ':
      descriptive_term = token
  print(sentence)
  print(descriptive_term)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Somebody should go to andros and steal ice

Don know. I did't msg him recently.

Take us out shopping and Mark will distract Isaiah.=D

Mum, hope you are having a great day. Hoping this text meets you well and full of life. Have a great day. Abiola
great
There is no sense in my foot and penis.

Okay but i thought you were the expert

*deep sigh* ... I miss you :-( ... I am really surprised you haven't gone to the net cafe yet to get to me ... Don't you miss me?
net
S.s:)i thinl role is like sachin.just standing. Others have to hit.

Have a great trip to India. And bring the light to everyone not just with the project but with everyone that is lucky to see you smile. Bye. Abiola
lucky
And very importantly, all we discuss is between u and i only.

K..k:)how about your training process?

Ok lor. I ned 2 go toa payoh 4 a while 2 return smth u wan 2 send me there or wat?

In da car park 
da
I wish that I was with you. Holding 

You can see that our simplistic algorithm picks up all the descriptive adjectives such as great, hopeful, and dead. But what’s currently missing are intensifiers, like “very” for example.

In [None]:
for sentence in sms.Message:
  doc = nlp(sentence)
  descriptive_term = ''
  for token in doc:
    if token.pos_ == 'ADJ':
      prepend = ''
      for child in token.children:
        if child.pos_ != 'ADV':
          continue
        prepend += child.text + ' '
      descriptive_term = prepend + token.text
  print(sentence)
  print(descriptive_term)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Somebody should go to andros and steal ice

Don know. I did't msg him recently.

Take us out shopping and Mark will distract Isaiah.=D

Mum, hope you are having a great day. Hoping this text meets you well and full of life. Have a great day. Abiola
great
There is no sense in my foot and penis.

Okay but i thought you were the expert

*deep sigh* ... I miss you :-( ... I am really surprised you haven't gone to the net cafe yet to get to me ... Don't you miss me?
net
S.s:)i thinl role is like sachin.just standing. Others have to hit.

Have a great trip to India. And bring the light to everyone not just with the project but with everyone that is lucky to see you smile. Bye. Abiola
lucky
And very importantly, all we discuss is between u and i only.

K..k:)how about your training process?

Ok lor. I ned 2 go toa payoh 4 a while 2 return smth u wan 2 send me there or wat?

In da car park 
da
I wish that I was with you. Holding 

As you can see, this time around we picked up half dead as well. Our simplistic algorithm is able to pick up adverbs. It checks for child tokens for each adjective and picks up the adverbs such as “very”, “actually”, etc.

We’re now ready to identify the targets that are being described.

In [None]:
aspects = []
for sentence in sms.Message:
  doc = nlp(sentence)
  descriptive_term = ''
  target = ''
  for token in doc:
    if token.dep_ == 'nsubj' and token.pos_ == 'NOUN':
      target = token.text
    if token.pos_ == 'ADJ':
      prepend = ''
      for child in token.children:
        if child.pos_ != 'ADV':
          continue
        prepend += child.text + ' '
      descriptive_term = prepend + token.text
  aspects.append({'sentence': sentence, 'aspect': target,
    'description': descriptive_term})
print(aspects)



Now our solution is starting to look more complete. We’re able to pick up aspects, even though our application doesn’t “know” anything beforehand. We haven’t hardcoded the aspects such as “schedule”, “Students”, or “user”. And we also haven’t hardcoded the adjectives such as “daily”, “excellent”, or “first”.

Now that we successfully extracted the aspects and descriptions, it’s time to classify them as positive or negative. The goal here is to help the computer understand that tasty food is positive, while slow internet is negative. Computers don’t understand English, so we will need to try a few things before we have a working solution.

We will start off by using the default TextBlob sentiment analysis.

In [None]:
aspects

[{'aspect': 'schedule',
  'description': 'daily',
  'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)},
 {'aspect': 'Students',
  'description': 'excellent',
  'sentiment': Sentiment(polarity=1.0, subjectivity=1.0)},
 {'aspect': '',
  'description': '',
  'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)},
 {'aspect': 'SURI',
  'description': '',
  'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)},
 {'aspect': 'user',
  'description': 'first',
  'sentiment': Sentiment(polarity=0.25, subjectivity=0.3333333333333333)},
 {'aspect': '',
  'description': '',
  'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)},
 {'aspect': '',
  'description': 'DEAD',
  'sentiment': Sentiment(polarity=-0.2, subjectivity=0.4)},
 {'aspect': 'Beauties',
  'description': '',
  'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)},
 {'aspect': 'tunes',
  'description': 'bhi',
  'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)},
 {'aspect': '',
  'description': '',
  'sentiment': Sent

In [None]:
from textblob import TextBlob
for aspect in aspects:
  aspect['sentiment'] = TextBlob(aspect['description']).sentiment
print(aspects)

[{'aspect': 'schedule', 'description': 'daily', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'Students', 'description': 'excellent', 'sentiment': Sentiment(polarity=1.0, subjectivity=1.0)}, {'aspect': '', 'description': '', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'SURI', 'description': '', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'user', 'description': 'first', 'sentiment': Sentiment(polarity=0.25, subjectivity=0.3333333333333333)}, {'aspect': '', 'description': '', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': '', 'description': 'DEAD', 'sentiment': Sentiment(polarity=-0.2, subjectivity=0.4)}, {'aspect': 'Beauties', 'description': '', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': 'tunes', 'description': 'bhi', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect': '', 'description': '', 'sentiment': Sentiment(polarity=0.0, subjectivity=0.0)}, {'aspect'

In [1]:
#pip install pyabsa

In [None]:
from pyabsa import AspectTermExtraction as ATEPC, available_checkpoints

# you can view all available checkpoints by calling available_checkpoints()
checkpoint_map = available_checkpoints()

No CUDA GPU found in your device




[2023-03-08 12:13:38] (2.1.2) PyABSA(2.1.2): 
[New Feature] Aspect Sentiment Triplet Extraction from v2.1.0 test version (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration)

If you find any problems, please report them on GitHub. Thanks!
The v2.x versions are not compatible with Google Colab. Please downgrade to 1.16.27.

[2023-03-08 12:13:38] (2.1.2) Please specify the task code, e.g. from pyabsa import TaskCodeOption


  _warn(f"unclosed running multiprocessing pool {self!r}",


In [None]:
aspect_extractor = ATEPC.AspectExtractor('multilingual',
                                         auto_device=False,  # False means load model on CPU
                                         cal_perplexity=True,
                                         )

[2023-03-08 12:13:46] (2.1.2) ********** Available ATEPC model checkpoints for Version:2.1.2 (this version) **********
[2023-03-08 12:13:46] (2.1.2) Downloading checkpoint:multilingual 
[2023-03-08 12:13:46] (2.1.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets


Downloading checkpoint: 809MB [00:39, 20.72MB/s]                         

Find zipped checkpoint: ./checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc_Multilingual_cdw_apcacc_80.81_apcf1_73.75_atef1_76.01.zip, unzipping





Done.
[2023-03-08 12:14:43] (2.1.2) If the auto-downloading failed, please download it via browser: https://huggingface.co/spaces/yangheng/PyABSA/resolve/main/checkpoints/Multilingual/ATEPC/fast_lcf_atepc_Multilingual_cdw_apcacc_80.81_apcf1_73.75_atef1_76.01.zip 
[2023-03-08 12:14:43] (2.1.2) Load aspect extractor from ./checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT
[2023-03-08 12:14:43] (2.1.2) config: ./checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc.config
[2023-03-08 12:14:43] (2.1.2) state_dict: ./checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc.state_dict
[2023-03-08 12:14:43] (2.1.2) model: None
[2023-03-08 12:14:43] (2.1.2) tokenizer: ./checkpoints/ATEPC_MULTILINGUAL_CHECKPOINT/fast_lcf_atepc.tokenizer
[2023-03-08 12:14:46] (2.1.2) Set Model Device: cpu
[2023-03-08 12:14:46] (2.1.2) Device Name: Unknown


Downloading (…)lve/main/config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/560M [00:00<?, ?B/s]

Some weights of the model checkpoint at microsoft/mdeberta-v3-base were not used when initializing DebertaV2Model: ['lm_predictions.lm_head.LayerNorm.weight', 'mask_predictions.LayerNorm.weight', 'lm_predictions.lm_head.dense.bias', 'mask_predictions.classifier.weight', 'lm_predictions.lm_head.LayerNorm.bias', 'mask_predictions.classifier.bias', 'mask_predictions.dense.bias', 'lm_predictions.lm_head.dense.weight', 'lm_predictions.lm_head.bias', 'mask_predictions.dense.weight', 'mask_predictions.LayerNorm.bias']
- This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading (…)okenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)"spm.model";:   0%|          | 0.00/4.31M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
# instance inference
aspect_extractor.predict(['I love this movie, it is so great!'],
                         save_result=True,
                         print_result=True,  # print the result
                         ignore_error=True,  # ignore the error when the model cannot predict the input
                         )

  lcf_cdm_vec = torch.tensor(


[2023-03-08 12:16:48] (2.1.2) The results of aspect term extraction have been saved in /content/Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json
[2023-03-08 12:16:48] (2.1.2) Example 0: I love this <movie:Positive Confidence:0.9811> , it is so great !


  float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()


[{'sentence': 'I love this movie , it is so great !',
  'IOB': ['O', 'O', 'O', 'B-ASP', 'O', 'O', 'O', 'O', 'O', 'O'],
  'tokens': ['I',
   'love',
   'this',
   'movie',
   ',',
   'it',
   'is',
   'so',
   'great',
   '!'],
  'aspect': ['movie'],
  'position': [[4]],
  'sentiment': ['Positive'],
  'probs': [[0.004690156318247318, 0.014222261495888233, 0.9810876250267029]],
  'confidence': [0.9811]}]

In [None]:
inference_source = ATEPC.ATEPCDatasetList.Restaurant16
atepc_result = aspect_extractor.batch_predict(target_file=sms.Message.to_list(),  #
                                              save_result=True,
                                              print_result=True,  # print the result
                                              pred_sentiment=True,  # Predict the sentiment of extracted aspect terms
                                              )
print(atepc_result)

preparing apc inference dataloader: 100%|██████████| 5756/5756 [00:17<00:00, 336.38it/s]
extracting aspect terms: 100%|██████████| 180/180 [40:06<00:00, 13.37s/it]
preparing apc inference dataloader: 100%|██████████| 2468/2468 [00:05<00:00, 440.85it/s]
  float(x) for x in F.softmax(i_apc_logits).cpu().numpy().tolist()
classifying aspect sentiments: 100%|██████████| 78/78 [17:26<00:00, 13.42s/it]


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[2023-03-08 13:30:09] (2.1.2) Example 687: Lolnice . I went from a <fish:Neutral Confidence:0.9901> to . . <water:Neutral Confidence:0.9763> . ?
[2023-03-08 13:30:09] (2.1.2) Example 688: + 123 Congratulations - in this week ' s competition draw u have won the å£1450 prize to claim just call 09050002311 b4280703 . T & Cs / stop SMS 08718727868 . Over 18 only 150ppm
[2023-03-08 13:30:09] (2.1.2) Example 689: No it ' s <waiting:Negative Confidence:0.9623> in e car dat ' s bored wat . Cos wait outside got nothing 2 do . At home can do my stuff or watch tv wat .
[2023-03-08 13:30:09] (2.1.2) Example 690: Maybe westshore or <hyde:Neutral Confidence:0.9691> park village , the place near my house ?
[2023-03-08 13:30:09] (2.1.2) Example 691: You should know now . So how ' s <anthony:Neutral Confidence:0.6199> . Are you bringing money . I ' ve school fees to pay and rent and stuff like that . Thats why i need your help . A friend 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)

