# Sentiment Analysis of Twitter Reviews

### Tasks
+ Text Preprocessing
+ Sentiment Analysis
+ Keyword Extraction
+ Entity Extraction

### Dataset:
+ TwitterAPI Extracted data of various Telecom companies of UK

#### Importing Modules

In [1]:
import pandas as pd
import neattext as nfx
import re

### Hide warnings
import warnings 
warnings.filterwarnings('ignore')

In [2]:
#LOADING DATA SET

df = pd.read_csv('predicted.csv')

In [3]:
df.iloc[0:3,:]

Unnamed: 0,created_at,id,author_id,text,name,location,username,clean_text,label,predicted_labels
0,2022-03-01T14:24:59.000Z,184833230.0,184833230.0,@bt_uk @mikeburnieactor I am having the same p...,Michael Hutchings,"Bristol, England",mikehutchings9,i am having the same problem online support bo...,-1,-1
1,2022-03-01T14:20:06.000Z,412154216.0,412154216.0,@henrystewartdam @TheresaRegli @McDonalds @Yal...,Preservica,Worldwide,Preservica,we cant wait,-1,1
2,2022-03-01T14:16:42.000Z,311957759.0,311957759.0,Why's does the TV listings in my @bt_uk box an...,DennisThynne,County Durham,DennisThynne,whys does the tv listings in my box and the pl...,-1,-1


### Preprocessing

In [4]:
df.dropna(inplace=True)
df.isnull().sum()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 347 entries, 0 to 346
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   created_at        347 non-null    object 
 1   id                347 non-null    float64
 2   author_id         347 non-null    float64
 3   text              347 non-null    object 
 4   name              347 non-null    object 
 5   location          347 non-null    object 
 6   username          347 non-null    object 
 7   clean_text        347 non-null    object 
 8   label             347 non-null    int64  
 9   predicted_labels  347 non-null    int64  
dtypes: float64(2), int64(2), object(6)
memory usage: 27.2+ KB


## Text Preprocessing
### Tasks 
+ Remove Mentions / Hashtags 
+ Replace "'"(appost) wit relevant word
+ ['remove_emails','remove_emojis',
 'remove_multiple_spaces','remove_puncts','remove_special_characters',
 'remove_stopwords','remove_urls','remove_userhandles']

#### Function to cleanText

In [5]:
import json
with open('contractions.json','r+') as file :
    contraction_dict = json.load(file)


def _get_contractions(contraction_dict):
    contraction_re = re.compile('(%s)' % '|'.join(contraction_dict.keys()))
    return contraction_dict, contraction_re

contractions, contractions_re = _get_contractions(contraction_dict)

def replace_contractions(text):
    text = text.lower()
    def replace(match):
        return contractions[match.group(0)]
    return contractions_re.sub(replace, text.replace("’","'"))

# Usage
df['text'] = df['text'].apply(replace_contractions)

In [6]:
# print(df.iloc[2].text.lower())
# clean_txt = replace_contractions(df.iloc[2].text)
# clean_txt

In [7]:
def cleanTxt(text):
    text = text.lower()
    text = nfx.remove_userhandles(text)
    text = nfx.remove_emojis(text)
    text = nfx.remove_urls(text)
    text = re.sub(r'#',' ',text)
    text = re.sub(r'\n',' ',text)
    text = re.sub(r'&amp','&',text)
    text = nfx.remove_stopwords(text)
    text = nfx.remove_puncts(text)
    text = nfx.remove_special_characters(text)
    text = nfx.remove_numbers(text)
    text = nfx.remove_multiple_spaces(text)
    text = text.strip()
    

    return text

In [8]:
# sample = df.iloc[345]
# clean_txt = cleanTxt(sample.text)
df['clean2_text'] = df['text'].apply(cleanTxt)
# df['clean_text'] = df['text'].apply(cleanTxt)


In [9]:
df.keys()

Index(['created_at', 'id', 'author_id', 'text', 'name', 'location', 'username',
       'clean_text', 'label', 'predicted_labels', 'clean2_text'],
      dtype='object')

In [10]:
# print(sample.text)
# print('-------------------------------------------------------')
# print(clean_txt)
# df2 = df[['text','clean2_text','label', 'predicted_labels']]
df2 = df[['text','clean2_text','label']]


In [11]:
df2.head()

Unnamed: 0,text,clean2_text,label
0,@bt_uk @mikeburnieactor i am having the same p...,problem online support booked engineer likely end,-1
1,@henrystewartdam @theresaregli @mcdonalds @yal...,wait,-1
2,why is does the tv listings in my @bt_uk box a...,tv listings box planner actual tv think friday...,-1
3,@bill626 @bt_uk try setting it to record quest...,try setting record quest midnight tells goes,1
4,@bill626 @bt_uk i figured it out it will not r...,figured record stuff midnight recording snooke...,-1


In [12]:
df2.to_csv('without_appost.csv')

In [13]:
data =pd.read_csv('without_appost.csv',index_col=0)

In [14]:
data.head()

Unnamed: 0,text,clean2_text,label
0,@bt_uk @mikeburnieactor i am having the same p...,problem online support booked engineer likely end,-1
1,@henrystewartdam @theresaregli @mcdonalds @yal...,wait,-1
2,why is does the tv listings in my @bt_uk box a...,tv listings box planner actual tv think friday...,-1
3,@bill626 @bt_uk try setting it to record quest...,try setting record quest midnight tells goes,1
4,@bill626 @bt_uk i figured it out it will not r...,figured record stuff midnight recording snooke...,-1


## Extracting keywords

In [16]:
import yake

simple_extractor = yake.KeywordExtractor()
def get_keywords(text):
    print(text)
    post_kw = simple_extractor.extract_keywords(text)
    kw_list = []
    for word, number in post_kw:
        kw_list.append(word)
    
    return kw_list


In [17]:
data['keywords'] = data['clean2_text'].apply(get_keywords)

problem online support booked engineer likely end
wait
tv listings box planner actual tv think friday th february lost plot issue
try setting record quest midnight tells goes
figured record stuff midnight recording snooker quest ch right exception snooker thing want rec today spot this
showing march th supposed fix unplugging restarting joy
fiasco upgrading broadband engineer booked today pm arrived lucky around new router required delivered engineer delivered broadband old router disconnected outside oh dear
nope shows scheduling problem todays recordings scheduled fixed
tell simply rebuff complaint tell talk bt attempting provide explanation update nice little system worked zeroaccountability
customerservice apprentice help develop interpersonal skills knowledge ability help customer whilst studying level apprenticeship find liverpool role gt
letter confirming massive price hikes month wait finish contract cancel line forced super slow internet glad use starlink connection st century

In [18]:
data.head()

Unnamed: 0,text,clean2_text,label,keywords
0,@bt_uk @mikeburnieactor i am having the same p...,problem online support booked engineer likely end,-1,"[problem online support, online support booked..."
1,@henrystewartdam @theresaregli @mcdonalds @yal...,wait,-1,[wait]
2,why is does the tv listings in my @bt_uk box a...,tv listings box planner actual tv think friday...,-1,"[lost plot issue, listings box planner, box pl..."
3,@bill626 @bt_uk try setting it to record quest...,try setting record quest midnight tells goes,1,"[setting record quest, record quest midnight, ..."
4,@bill626 @bt_uk i figured it out it will not r...,figured record stuff midnight recording snooke...,-1,"[figured record stuff, record stuff midnight, ..."
