In [2]:
from transformers import pipeline


## Initiate the sentiment analysis pipleline

We are using the latest sentiment analysis to be trained on ~124M tweets. This is a fine tuned model of the base BERT model.

Further information can be found here:

https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest

In [3]:
sentiment_pipeline = pipeline(model="cardiffnlp/twitter-roberta-base-sentiment-latest")

Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████| 1.08k/1.08k [00:00<00:00, 90.7kB/s]
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## Load and pre process the data

The data was extracted from twitter in the `collectTweets.ipynb notebook` where tweets mentioning (...) during the last week week extracted (limit ...)

In [4]:
import pickle

#### This loads the tweets in dictionary format

```
{1235: ['tweet 1','blah blah blah','some words'],
4567, ['tweet tweet','a sentence', 'some other words']}
```

where 12345, 4567 are the account ids of the twitter users and are known as the _keys_ of the dictionary.

The tweets are stored in list format `[this,is,a,list]`

In [7]:
dt = pickle.load(open('../tweets_LD.pk','rb'))

In [8]:
#This is an example of how to read from a dictionary

for key in dt.keys():
    tweet = dt[key]
    #we just want the first item in the list
    tweet1 = tweet[0]
    print(key)
    print(tweet1)
    break

1034780506523676672
RT @AdoptionsUk: Please retweet to help Peaches find a home #CUMBRIA #UK Aged 1-2, Peaches can live with children aged 10+ and with anoth…


### This is the NLP pre-processing section

This section 'cleans' the text i.e. it removes unwanted characters and standarises the text.

In [9]:
import re

In [10]:
def strip_names(x):
    #This removes named mentions in tweets, i.e. it would remove @user_name
    x = re.sub('@[^\s]+','',x)
    return x

In [11]:
def remove_url(x):
    #removes urls from the tweets
    x = re.sub(r'http\S+', '', x)
    return x

In [12]:
def strip_chars(x):
    #removes the specific characters from the tweets
    #HINT: this function can be modified to remove more than the characters mentioned
    x = x.replace("#",'')
    x = x.replace(":",'')
    return x
    

## Sentiment Analysis

Attach a sentiment score to each tweet for each account id in the dictionary `dt` and create a new dictionary for each account id

```
{12345, [pos,neg,neg],
4567, [neu,neu,pos]}
```

In [14]:
feelings = {}

for key in dt.keys():
    text = dt[key]
    author_feeling =[]
    for t in text:
        t = strip_names(t)
        t = remove_url(t)
        t = strip_chars(t)
        scores = sentiment_pipeline(t)
        score = scores[0]['label']
        author_feeling.append(score)
    feelings[key] = author_feeling

In [15]:
i = 0
feelings2 = {}
for key in feelings.keys():
    f = feelings[key]
    for ff in f:
        if ff == 'Positive':
            i+=1
    feelings2[key] = feelings[key][0]
print(i)

50


In [16]:
i = 0
feelings2 = {}
for key in feelings.keys():
    f = feelings[key]
    for ff in f:
        if ff == 'Negative':
            i+=1
    feelings2[key] = feelings[key][0]
print(i)

7


check if qashaqai is even mentioned

In [17]:
for key in feelings2.keys():
    feel = feelings2[key]
    if feel =='Negative':
        print(dt[key])

['RT @bradfordgreens: :astonished_face:The government are unbelievably looking at opening a new #coalmine in #Cumbria.:writing_hand: Want to take action? Ask your MP to…']
['So EVERYTHING #tories say about #ClimateCrisis is bollocks #coal in #Cumbria too!!!!! Wtaf one of our most beautiful #NationalParks #ToryIncompetence #BorisJohnsonMustResign #BorisJohnson is an IDIOT #FossilFuels https://t.co/4kDrcrWjkI']
['RT @bradfordgreens: :astonished_face:The government are unbelievably looking at opening a new #coalmine in #Cumbria.:writing_hand: Want to take action? Ask your MP to…']
['RT @bradfordgreens: :astonished_face:The government are unbelievably looking at opening a new #coalmine in #Cumbria.:writing_hand: Want to take action? Ask your MP to…']
['RT @bradfordgreens: :astonished_face:The government are unbelievably looking at opening a new #coalmine in #Cumbria.:writing_hand: Want to take action? Ask your MP to…']
['How can @michaelgove make a decision on the #Cumbria #coal mine given

In [18]:
for key in feelings2.keys():
    feel = feelings2[key]
    if feel =='Positive':
        print(dt[key])

['Two of Colin Halliday’s paintings which he brought in at the weekend, now arranged over two floors..@VisitKeswick @FeatureCumbria @CumbriaWeather @LakesCumbria #keswick #LakeDistrict #cumbria #art #paintings #beautiful #Creative @ownartscheme https://t.co/OO0i6a6aao']
['RT @BeepDoctors: @BeepDoctors are on hand across #Cumbria 24/7 365 days a year. A huge thanks to our coperate sponsors and local giving it…']
['John Corran, hopes to raise £50k for the restoration of the Grade II listed organ at St. Paul’s, Irton in #Cumbria – by #cycling to every @churchofengland church dedicated to St Paul. 250 churches &amp; 4,000 miles, here he is at @stpaulsspenny. Donate :backhand_index_pointing_right: https://t.co/7iu1g6bhvF https://t.co/RVJmKWXzJQ']
['RT @BeepDoctors: This week marks the start of our summer road show, thanks to our  sponsors we look forward to seeing you around #Cumbria.…']
['@KarynRN4 @UHMBT @windermerenurse Hi Karyn! We are based in a small corner of the UK.. #LakeDistrict #