# Sentiment Analysis of tweets using Natural Language Processing(NLP)

Importing the important libraries. And loading the data which was scrapped in the twitter with respect to online retailers.

In [1]:
import pandas as pd 
df = pd.read_csv("Twitter_Retailer_data.csv")
df.head(10)

Unnamed: 0,Datetime,Tweet Id,Name,tweet
0,2021-12-26 23:59:58+00:00,1475255097483579402,JSOH13597,If you are buying shirts on eBay avoid this se...
1,2021-12-26 23:59:58+00:00,1475255095730348032,SACellPhonesPro,"3-in-1 Charging Dock for Phone, Apple Watch &a..."
2,2021-12-26 23:59:54+00:00,1475255078537863171,primitivebowls,Check out BIG DAIRY Milking COW Wall PICTURE*P...
3,2021-12-26 23:59:48+00:00,1475255055980843008,StagnoAmanda,Look what I found on @eBay! https://t.co/wQPB5...
4,2021-12-26 23:59:48+00:00,1475255053657247748,BenBestDeals,Universal Nutrition Buffered Vitamin C Pills 1...
5,2021-12-26 23:59:45+00:00,1475255041116323844,tsukiisms,@puppyboylix mwah! also literally drop shipper...
6,2021-12-26 23:59:33+00:00,1475254992252850178,koolest77,New Listing! Code Geass Lelouch of the Rebelli...
7,2021-12-26 23:59:33+00:00,1475254990855974917,Lisasebaystore,Check out Vintage Amber Glass Candy Nut Dish S...
8,2021-12-26 23:59:31+00:00,1475254982270230532,MMBSports,Check out 1992 Pacific Detroit Lions Dan Owens...
9,2021-12-26 23:59:26+00:00,1475254963077066762,GHCardSales,Next up are a couple of pickups from eBay! Rea...


Extracting the tweets with respect to online Retailers....

In [2]:
df["tweet"]

0        If you are buying shirts on eBay avoid this se...
1        3-in-1 Charging Dock for Phone, Apple Watch &a...
2        Check out BIG DAIRY Milking COW Wall PICTURE*P...
3        Look what I found on @eBay! https://t.co/wQPB5...
4        Universal Nutrition Buffered Vitamin C Pills 1...
                               ...                        
30977    @Meesho_Official Order I'd #5190930308. please...
30978    @Meesho_Official #meeshosupport #meeshoteam u ...
30979    @Meesho_Official #Meeshoteam #Meeshosupport #M...
30980    hello Meesho Supoort\n\ni have buy sum product...
30981    @meeshoapp \n@meeshosupport\n@ceomeesho\n\nMee...
Name: tweet, Length: 30982, dtype: object

## Data cleaning

Tweets contains so much of links,@,#,etc. Once we clean we will predict the reviewer ratings..

In [3]:
import re
def cleaner(tweet):
    tweet = re.sub("@[A-Za-z0-9]+","",tweet) #Remove @ sign
    tweet = re.sub(r"(?:\@|http?\://|https?\://|www)\S+", "", tweet) #Remove http links
    tweet = " ".join(tweet.split())
    tweet = tweet.replace("#", "").replace("_", " ") #Remove hashtag sign but keep the text
    return tweet
df['tweet'] = df['tweet'].map(lambda x: cleaner(x))
df.to_csv('clean.csv', index = False) #specify location

In [4]:
clean_df = pd.read_csv("clean.csv")
clean_df.head()

Unnamed: 0,Datetime,Tweet Id,Name,tweet
0,2021-12-26 23:59:58+00:00,1475255097483579402,JSOH13597,If you are buying shirts on eBay avoid this se...
1,2021-12-26 23:59:58+00:00,1475255095730348032,SACellPhonesPro,"3-in-1 Charging Dock for Phone, Apple Watch &a..."
2,2021-12-26 23:59:54+00:00,1475255078537863171,primitivebowls,Check out BIG DAIRY Milking COW Wall PICTURE*P...
3,2021-12-26 23:59:48+00:00,1475255055980843008,StagnoAmanda,Look what I found on ! via
4,2021-12-26 23:59:48+00:00,1475255053657247748,BenBestDeals,Universal Nutrition Buffered Vitamin C Pills 1...


We got the cleaned data sets so we are moving to NLP part. We are using huggingface to predict the ratings of Online retailer...

### Model Building

In [5]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import requests
from bs4 import BeautifulSoup
import re

In [6]:
tokenizer = AutoTokenizer.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

model = AutoModelForSequenceClassification.from_pretrained('nlptown/bert-base-multilingual-uncased-sentiment')

In [7]:
tokens = tokenizer.encode('It was good but couldve been better. Great', return_tensors='pt')
result = model(tokens)

In [8]:
result.logits

tensor([[-2.7768, -1.2353,  1.4419,  1.9804,  0.4584]],
       grad_fn=<AddmmBackward>)

In [9]:
int(torch.argmax(result.logits))+1

4

Creating to function call the model..

In [10]:
def sentiment_score(tweet):
    tokens = tokenizer.encode(tweet, return_tensors='pt')
    result = model(tokens)
    return int(torch.argmax(result.logits))+1

In [11]:
clean_df["tweet"].iloc[0]

"If you are buying shirts on eBay avoid this seller, finally refunded me after trying to avoid the return. Selling fake 'match worns' etc. Avoid, avoid, avoid. Tried to offer replacement shirts but only offered one shirt, can't believe it's sorted. The hunt continues for this one."

In [12]:
sentiment_score(clean_df["tweet"].iloc[0])

1

In [13]:
clean_df.isna().sum()

Datetime      0
Tweet Id      0
Name          0
tweet       196
dtype: int64

In [14]:
clean_df.dropna(inplace=True)

In [15]:
clean_df.isna().sum()

Datetime    0
Tweet Id    0
Name        0
tweet       0
dtype: int64

In [16]:
clean_df["sentiment"] = clean_df["tweet"].apply(sentiment_score)

In [17]:
clean_df.head()

Unnamed: 0,Datetime,Tweet Id,Name,tweet,sentiment
0,2021-12-26 23:59:58+00:00,1475255097483579402,JSOH13597,If you are buying shirts on eBay avoid this se...,1
1,2021-12-26 23:59:58+00:00,1475255095730348032,SACellPhonesPro,"3-in-1 Charging Dock for Phone, Apple Watch &a...",5
2,2021-12-26 23:59:54+00:00,1475255078537863171,primitivebowls,Check out BIG DAIRY Milking COW Wall PICTURE*P...,4
3,2021-12-26 23:59:48+00:00,1475255055980843008,StagnoAmanda,Look what I found on ! via,5
4,2021-12-26 23:59:48+00:00,1475255053657247748,BenBestDeals,Universal Nutrition Buffered Vitamin C Pills 1...,5


In [18]:
clean_df["sentiment"].value_counts()

5    11821
1     9217
2     3856
3     3048
4     2844
Name: sentiment, dtype: int64

### Data visualization

In [19]:
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline

In [22]:
clean_df.to_csv('Sentiment_added.csv', index = False)