# 3.6 Sentiment Analysis

With the sentiment analysis we want to look at the question how the sentiment of politicians from different parties varies from social media to the Bundestag as an audience and make a comparison between female and male politicians in the way of used sentiment. As we used Python for our programming language, we start by importing some useful and commonly used packages. After loading in our preprocessed corpus we were ready to analyze the data. 

In [1]:
#import packages

import pandas as pd
from textblob_de import TextBlobDE as TextBlob
import numpy as np
from tqdm.notebook import tqdm

import re
import pickle
pd.options.mode.chained_assignment = None  # default='warn' based on false positives
import spacy
from spacy.language import Language
from spacy_langdetect import LanguageDetector
from spacy.tokens.doc import Doc
from spacy.vocab import Vocab
from spacy_sentiws import spaCySentiWS
from spacy_sentiws import spaCySentiWS


tqdm.pandas()

#load in the preprocessed data

pre_data_twitter= pickle.load(open('../data/processed/tweets_processed.p','rb'))[0:100]
pre_data_speeches= pickle.load(open('../data/processed/speeches_processed.p','rb'))[0:100]
pre_data_twitter.head()

Unnamed: 0,full_name,date,party,text_preprocessed,text_preprocessed_sentence,like_count
0,Ralph Brinkhaus,2021-06-15,CDU,"[fußballfans, freuen, spiel, nationalmannschaf...",fußballfans freuen spiel nationalmannschaft dr...,32.0
1,Ralph Brinkhaus,2021-06-11,CDU,"[außenpolitik, wirtschaftlich, souveränität, d...",außenpolitik wirtschaftlich souveränität digit...,5.0
2,Ralph Brinkhaus,2021-06-11,CDU,"[nachhaltig, klimawandel, kämpfen, brauchen, a...",nachhaltig klimawandel kämpfen brauchen akzept...,4.0
3,Ralph Brinkhaus,2021-06-11,CDU,"[brauchen, pandemie, bezahlen, arbeitsplätze, ...",brauchen pandemie bezahlen arbeitsplätze digit...,2.0
4,Ralph Brinkhaus,2021-06-11,CDU,"[wahldebatte, thema, zukunft, passieren, coron...",wahldebatte thema zukunft passieren corona sta...,24.0


## 3.6.1 Sentiment Analysis with TextBlob

For our first approach at sentiment analysis, we use the package TextBlob which can be used for preprocessing textual data and provides an API for natural language processing tasks like sentiment analysis. As our corpus was in German language, we needed to use the German version TextBlobDE which has fewer functionalities than its English counterpart but was sufficient for our first sentiment approach. For sentiment analysis it returns the polarity of a given sentence where polarity -1 means very negative and 1 very positive. The scores are generated based on a dictionary approach using a polarity lexicon for German from Clematide and Klenner.

### 3.6.1.1 Sentiment Analysis for Twitter Data

First we start of with the analysis of the Twitter data. As we want to look at the different politicians from our corpus individually, we define a for loop going through each politician. To apply TextBlob we first need to take the preprocessed tweets in sentence format. After applying TextBlob we use the function sentiment to generate the polarity scores for the individual tweets. We ignore the second output subjectivity as it has no meaning in this German version of this package. Then we calculate the mean of the polarity for each politician. Furthermore, we counted the number of positive, negative, and neutral tweets for every politician without accounting for how positive or negative they were. 

In [2]:
#loop through all the politicians we want to analyze
data=[]
for name in tqdm(['Ralph Brinkhaus','Hermann Gröhe', 'Nadine Schön' ,'Norbert Röttgen' , 'Peter Altmaier' , 'Jens Spahn' , 'Matthias Hauer',
            'Christian Lindner' , 'Marco Buschmann' , 'Bettina Stark-Watzinger', 'Alexander Graf Lambsdorff' , 'Johannes Vogel' , 'Konstantin Kuhle' , 'Marie-Agnes Strack-Zimmermann',
            'Lars Klingbeil' , 'Saskia Esken' , 'Hubertus Heil' , 'Heiko Maas' , 'Martin Schulz' , 'Karamba Diaby' , 'Karl Lauterbach',
            'Steffi Lemke' , 'Cem Özdemir' , 'Katrin Göring-Eckardt' , 'Konstantin von Notz' , 'Britta Haßelmann' , 'Sven Lehmann' , 'Annalena Baerbock',
            'Sahra Wagenknecht' , 'Bernd Riexinger' , 'Niema Movassat' , 'Jan Korte' , 'Dietmar Bartsch' , 'Gregor Gysi' , 'Sevim Dağdelen',
            'Alice Weidel' , 'Beatrix von Storch' , 'Joana Cotar' , 'Stephan Brandner' , 'Tino Chrupalla' , 'Götz Frömming' , 'Leif-Erik Holm']):
    #get tweets from the specific politician 
    tweets_analyzing =pre_data_twitter.loc[pre_data_twitter['full_name']==name]
    #create sentiment scores
    blobs=tweets_analyzing['text_preprocessed_sentence'].apply(TextBlob)
    sentiment=[]
    for blob in blobs:
        sentiment.append(blob.sentiment)
    #get the polarity scores
    polarity=[]
    for egg in sentiment:
        polarity.append(egg.polarity)
    #get the mean of the scores 
    p_mean = np.mean(polarity)
    #get the number of positive, neutral and negative tweets
    positive_p=0
    neutral_p=0
    negative_p=0
    for item_p in polarity:
        if item_p>0:
            positive_p += 1
        elif item_p<0:
            negative_p += 1
        else:
            neutral_p += 1
    #set up list to secure the values generated
    data.append([name,p_mean,positive_p,neutral_p,negative_p]) 

  0%|          | 0/42 [00:00<?, ?it/s]

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


Ending up with a data frame containing the polarity means and tweet counts for every politician, we had a first overview of the sentiments of their social media presence.

In [3]:
#set up dataframe with all values and save it into a csv file
dataf = pd.DataFrame(data, columns=['Name','Polarity_mean','Num_pos_tweets','Num_neutral_tweets','Num_neg_tweets'])
dataf.to_csv('../data/processed/sentiment_scores_twitter_01.csv')
dataf.head()

Unnamed: 0,Name,Polarity_mean,Num_pos_tweets,Num_neutral_tweets,Num_neg_tweets
0,Ralph Brinkhaus,0.265083,42,48,10
1,Hermann Gröhe,,0,0,0
2,Nadine Schön,,0,0,0
3,Norbert Röttgen,,0,0,0
4,Peter Altmaier,,0,0,0


We can now expand our dataframe with a column containing the polarity score generated by TextBlob. By simply applying the code from our for loop to the whole corpus and appending the generated scores.

In [4]:
#create a polarity column for our dataset
blobs=pre_data_twitter['text_preprocessed_sentence'].progress_apply(TextBlob)
sentiment=[]
for blob in blobs:
    sentiment.append(blob.sentiment)
#get the scores
polarity=[]
for egg in sentiment:
    polarity.append(egg.polarity)
pre_data_twitter['polarity_textblob'] = polarity

  0%|          | 0/100 [00:00<?, ?it/s]

In [5]:
pre_data_twitter.head()
    

Unnamed: 0,full_name,date,party,text_preprocessed,text_preprocessed_sentence,like_count,polarity_textblob
0,Ralph Brinkhaus,2021-06-15,CDU,"[fußballfans, freuen, spiel, nationalmannschaf...",fußballfans freuen spiel nationalmannschaft dr...,32.0,0.0
1,Ralph Brinkhaus,2021-06-11,CDU,"[außenpolitik, wirtschaftlich, souveränität, d...",außenpolitik wirtschaftlich souveränität digit...,5.0,0.0
2,Ralph Brinkhaus,2021-06-11,CDU,"[nachhaltig, klimawandel, kämpfen, brauchen, a...",nachhaltig klimawandel kämpfen brauchen akzept...,4.0,1.0
3,Ralph Brinkhaus,2021-06-11,CDU,"[brauchen, pandemie, bezahlen, arbeitsplätze, ...",brauchen pandemie bezahlen arbeitsplätze digit...,2.0,-1.0
4,Ralph Brinkhaus,2021-06-11,CDU,"[wahldebatte, thema, zukunft, passieren, coron...",wahldebatte thema zukunft passieren corona sta...,24.0,0.0


### 3.6.1.2 Sentiment Analysis Bundestag Speeches

Next up are the Bundestag speeches from the same politicians we analyzed in the step before. Here we take our preprocessed speeches and apply TextBlob in a similar fashion as on the tweets also looping through the politicians individually.

In [6]:
#loop through all the politicians we want to analyze
data=[]
for name in tqdm(['Ralph Brinkhaus','Hermann Gröhe', 'Nadine Schön' ,'Norbert Röttgen' , 'Peter Altmaier' , 'Jens Spahn' , 'Matthias Hauer',
            'Christian Lindner' , 'Marco Buschmann' , 'Bettina Stark-Watzinger', 'Alexander Graf Lambsdorff' , 'Johannes Vogel' , 'Konstantin Kuhle' , 'Marie-Agnes Strack-Zimmermann',
            'Lars Klingbeil' , 'Saskia Esken' , 'Hubertus Heil' , 'Heiko Maas' , 'Martin Schulz' , 'Karamba Diaby' , 'Karl Lauterbach',
            'Steffi Lemke' , 'Cem Özdemir' , 'Katrin Göring-Eckardt' , 'Konstantin von Notz' , 'Britta Haßelmann' , 'Sven Lehmann' , 'Annalena Baerbock',
            'Sahra Wagenknecht' , 'Bernd Riexinger' , 'Niema Movassat' , 'Jan Korte' , 'Dietmar Bartsch' , 'Gregor Gysi' , 'Sevim Dağdelen',
            'Alice Weidel' , 'Beatrix von Storch' , 'Joana Cotar' , 'Stephan Brandner' , 'Tino Chrupalla' , 'Götz Frömming' , 'Leif-Erik Holm']):
    #get speeches from the specific politician
    speeches_analyzing =pre_data_speeches.loc[pre_data_speeches['full_name']==name]
    #create sentiment scores
    blobs=speeches_analyzing['text_preprocessed_sentence'].apply(TextBlob)
    sentiment=[]
    for blob in blobs:
        sentiment.append(blob.sentiment)
    #get the polarity scores
    polarity=[]
    for egg in sentiment:
        polarity.append(egg.polarity)
    #get the mean and of the polarity values 
    p_mean = np.mean(polarity)
    #get the number of positive, neutral and negative tweets
    positive_p=0
    neutral_p=0
    negative_p=0
    for item_p in polarity:
        if item_p>0:
            positive_p += 1
        elif item_p<0:
            negative_p += 1
        else:
            neutral_p += 1
    #set up list to secure the values generated
    data.append([name,p_mean,positive_p,neutral_p,negative_p]) 

  0%|          | 0/42 [00:00<?, ?it/s]

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


Again, we end up with a list containing the sentiment score means and counts of positive, negative, and neutral speeches which we transform into a dataset we can analyze further.

In [7]:
#set up dataframe with all values
dataf = pd.DataFrame(data, columns=['Name','Polarity_mean','Num_pos_speeches','Num_neutral_speeches','Num_neg_speeches'])
dataf.to_csv('../data/processed/sentiment_scores_speeches_01.csv')
dataf.head()

Unnamed: 0,Name,Polarity_mean,Num_pos_speeches,Num_neutral_speeches,Num_neg_speeches
0,Ralph Brinkhaus,0.435714,1,0,0
1,Hermann Gröhe,,0,0,0
2,Nadine Schön,0.207771,3,1,1
3,Norbert Röttgen,-0.421739,0,1,2
4,Peter Altmaier,0.532353,1,0,0


Here we also add a column for the sentiment scores to have a overview. 

In [8]:
blobs=pre_data_speeches['text_preprocessed_sentence'].apply(TextBlob)
sentiment=[]
for blob in blobs:
    sentiment.append(blob.sentiment)
#get the scores
polarity=[]
for egg in sentiment:
    polarity.append(egg.polarity)
pre_data_speeches['polarity_textblob'] = polarity

In [9]:
pre_data_speeches.head()

Unnamed: 0,full_name,date,party,text_preprocessed,text_preprocessed_sentence,polarity_textblob
1114,Jan Korte,2017-10-24,Linke,"[herr, präsident, lieben, kollegin, kollege, g...",herr präsident lieben kollegin kollege geehrt ...,0.512121
1115,Marco Buschmann,2017-10-24,FDP,"[herr, präsident, lieb, kollegin, kollege, kon...",herr präsident lieb kollegin kollege konstitui...,0.061111
1116,Britta Haßelmann,2017-10-24,Grüne,"[geehrt, herr, präsident, dame, herr, kern, de...",geehrt herr präsident dame herr kern debatte t...,0.231579
1117,Marco Buschmann,2017-11-21,FDP,"[herr, präsident, geehrt, kollegin, kollege, f...",herr präsident geehrt kollegin kollege fraktio...,0.75
1118,Jan Korte,2017-11-21,Linke,"[geehrt, herr, präsident, dame, herr, ernst, z...",geehrt herr präsident dame herr ernst zeit hum...,0.293333


## 3.6.2 Sentiment Analysis with SentiWS

As a second approach for sentiment analysis we tried using SentiWS a often used German sentiment dictionary. It also calculates the sentiment of a given sentence with a polarity score from -1 to 1 and has over 3000 base words and over 30000 word forms in its dictionary. Not only does it use adjectives and adverbs but also nouns and verbs to calculate the sentiment score. For the code implementation we could use a extension from the spacy pipeline used in preprocessing. With this spaCySentiWS we can add the application of the dictionary directly into the preprocessing pipeline. Therefore, we write a new preprocessing pipeline which is changed a little from original pipeline to get the sentiment scores of a sentence.

In [10]:
#insert pipeline to add sentiws preprocessing

In [11]:
pre_data_twitter= pd.read_csv("../data/raw/tweets_explored.csv")[0:100]
pre_data_speeches= pd.read_csv("../data/raw/speeches_explored.csv")[0:100]

In [12]:
@Language.component("Remove non alphabetic words")
def remove_non_alpha(doc):
    return [token for token in doc if token.is_alpha]

In [13]:
@Language.factory("Detect languages")
def create_language_detector(nlp, name):
    return LanguageDetector(language_detection_function=None)

In [14]:
@Language.factory("Sentiment Appplication")
def create_sentiment_dictionary(nlp, name):
    return spaCySentiWS(sentiws_path = "../data/raw/Sentiment/")

In [15]:
@Language.component("Keep only German documents")
def remove_non_german(doc):
    res = [sent for sent in doc.sents if sent._.language["language"] == "de"]
    if res:
        return [token for sent in res for token in sent]
    else:
        return Doc(Vocab([]), words=[], spaces=[])

In [16]:
@Language.component("Remove stopwords")
def remove_stopwords(doc): 
    return [token for token in doc if not token.is_stop]

In [17]:
@Language.component("Lemmatize text")
def lemmatize_text(doc):
    return [token.lemma_ for token in doc]

In [18]:
@Language.component("Lowercase Text")
def lowercase(doc):
    return [token.lower() for token in doc]

In [19]:
emoji_codes = re.compile("["
                         u"\U0001F600-\U0001F64F"
                         u"\U0001F300-\U0001F5FF"
                         u"\U0001F680-\U0001F6FF"
                         u"\U0001F1E0-\U0001F1FF"
                         u"\U00002500-\U00002BEF"
                         u"\U00002702-\U000027B0"
                         u"\U00002702-\U000027B0"
                         u"\U000024C2-\U0001F251"
                         u"\U0001f926-\U0001f937"
                         u"\U00010000-\U0010ffff"
                         u"\u2640-\u2642"
                         u"\u2600-\u2B55"
                         u"\u200d"
                         u"\u23cf"
                         u"\u23e9"
                         u"\u231a"
                         u"\ufe0f"
                         u"\u3030"
                         "]+", re.UNICODE)

@Language.component("Remove emojis")
def remove_emojis(doc):
    doc = [token.text for token in doc if not re.match(emoji, token.text)]
    doc = ' '.join(doc)
    return nlp_twitter.make_doc(doc)

In [20]:
@Language.component("Remove URLs")
def remove_urls(doc):
    doc = [token.text for token in doc if not token.like_url]
    doc = ' '.join(doc)
    return nlp_twitter.make_doc(doc)

In [21]:
@Language.component("Remove mentions")
def remove_mentions(doc):
    doc = [token.text for token in doc if not re.match("@.*", token.text)]
    doc = ' '.join(doc)
    return nlp_twitter.make_doc(doc)

In [22]:
@Language.component("Remove stopwords and punctuation")
def remove_stopwords(doc):
    doc = [token.text for token in doc if not token.is_stop and not token.is_punct]
    return doc

In [23]:
# Create spacy pipeline
nlp_tweets_sentiws = spacy.load('de_core_news_sm')
nlp_tweets_sentiws.Defaults.stop_words |= {"amp", "rt"}

# The add_pipe function appends our functions to the default pipeline.
nlp_tweets_sentiws.add_pipe("sentencizer", last=True)
nlp_tweets_sentiws.add_pipe("Detect languages", name='Detect languages', last=True)
nlp_tweets_sentiws.add_pipe("Keep only German documents", name='Keep only German documents', last=True)
nlp_tweets_sentiws.add_pipe("Remove non alphabetic words", name="Remove non alphabetic words", last=True)
nlp_tweets_sentiws.add_pipe("Remove stopwords", name="Remove stopwords", last=True)
# nlp_tweets.add_pipe("Lemmatize text", name="Lemmatize text", last=True)
# nlp_tweets.add_pipe("Lowercase Text", name="Lowercase Text", last=True)
nlp_tweets_sentiws.add_pipe("Sentiment Appplication", name="Sentiment Appplication", last=True)

<spacy_sentiws.spaCySentiWS at 0x7fd841d4fc40>

### 3.6.2.1 Sentiment Analysis for Twitter Data

First, we want to have a look at our Twitter data again. As with the TextBlob analysis we want to go through all the in dividual politicians and therefore create a loop. In difference to the first approach we used the raw data here as we want to apply our new pipeline to the dataset. After the application of the pipeline with sentiment functionality we go through the the preprocessed tweets and take the calculated sentiment of each token. Next we add the scores together and calculate the means for each tweet and then for the individual politician. Again we count the number of positive, negative, and neutral tweets as well.

In [24]:
#Apply the sentiment anaylsis to the Twitter accounts of the politicians
data=[]
for name in tqdm(['Ralph Brinkhaus','Hermann Gröhe', 'Nadine Schön' ,'Norbert Röttgen' , 'Peter Altmaier' , 'Jens Spahn' , 'Matthias Hauer',
            'Christian Lindner' , 'Marco Buschmann' , 'Bettina Stark-Watzinger', 'Alexander Graf Lambsdorff' , 'Johannes Vogel' , 'Konstantin Kuhle' , 'Marie-Agnes Strack-Zimmermann',
            'Lars Klingbeil' , 'Saskia Esken' , 'Hubertus Heil' , 'Heiko Maas' , 'Martin Schulz' , 'Karamba Diaby' , 'Karl Lauterbach',
            'Steffi Lemke' , 'Cem Özdemir' , 'Katrin Göring-Eckardt' , 'Konstantin von Notz' , 'Britta Haßelmann' , 'Sven Lehmann' , 'Annalena Baerbock',
            'Sahra Wagenknecht' , 'Bernd Riexinger' , 'Niema Movassat' , 'Jan Korte' , 'Dietmar Bartsch' , 'Gregor Gysi' , 'Sevim Dağdelen',
            'Alice Weidel' , 'Beatrix von Storch' , 'Joana Cotar' , 'Stephan Brandner' , 'Tino Chrupalla' , 'Götz Frömming' , 'Leif-Erik Holm']):
    #get tweets from the specific politician
    tweets_analyzing = pre_data_twitter.loc[pre_data_twitter['full_name']==name]
    tweets_analyzing1 = tweets_analyzing.text.progress_apply(nlp_tweets_sentiws)
    #get the sentiment of the tweets
    politician_sum=[]
    for sentence in tweets_analyzing1:
        sentence_sum=[]
        for token in sentence:
            if token._.sentiws == None:
                a=0
            elif token._.sentiws == 'nan':
                a=0
            else:
                sentence_sum.append(token._.sentiws)
        sentence_score=np.nanmean(sentence_sum)
        politician_sum.append(sentence_score)
    politician_score=np.nanmean(politician_sum)
    #get the number of positive, neutral and negative tweets
    positive_p=0
    neutral_p=0
    negative_p=0
    for item_p in politician_sum:
        if item_p>0:
            positive_p += 1
        elif item_p<0:
            negative_p += 1
        elif item_p == 'nan':
            neutral_p += 1
        else:
            neutral_p += 1
    #set up list to secure the values generated
    data.append([name,politician_score,positive_p,neutral_p,negative_p])

  0%|          | 0/42 [00:00<?, ?it/s]

  0%|          | 0/100 [00:00<?, ?it/s]

  sentence_score=np.nanmean(sentence_sum)


0it [00:00, ?it/s]

  politician_score=np.nanmean(politician_sum)


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

We transform the list into a dataframe that we can again analyze further.

In [25]:
#set up dataframe with all values
dataf = pd.DataFrame(data, columns=['Name','Polarity_mean','Num_pos_tweets','Num_neutral_tweets','Num_neg_tweets'])
dataf.to_csv('../data/processed/sentiment_scores_tweets_sentiws_01.csv')
dataf.head()

Unnamed: 0,Name,Polarity_mean,Num_pos_tweets,Num_neutral_tweets,Num_neg_tweets
0,Ralph Brinkhaus,0.02611,48,27,25
1,Hermann Gröhe,,0,0,0
2,Nadine Schön,,0,0,0
3,Norbert Röttgen,,0,0,0
4,Peter Altmaier,,0,0,0


### 3.6.2.2 Sentiment Analysis for Bundestag Speeches

Again we want to have a look at the Bundestag speeches and see how the SentiWS dictionary clasifies them in terms of sentiment. We use the same procedure as with the tweets before to calculate the scores and the counts.

In [26]:
#Apply the sentiment analysis to the speeches accounts of the politicians
data=[]
for name in tqdm(['Ralph Brinkhaus','Hermann Gröhe', 'Nadine Schön' ,'Norbert Röttgen' , 'Peter Altmaier' , 'Jens Spahn' , 'Matthias Hauer',
            'Christian Lindner' , 'Marco Buschmann' , 'Bettina Stark-Watzinger', 'Alexander Graf Lambsdorff' , 'Johannes Vogel' , 'Konstantin Kuhle' , 'Marie-Agnes Strack-Zimmermann',
            'Lars Klingbeil' , 'Saskia Esken' , 'Hubertus Heil' , 'Heiko Maas' , 'Martin Schulz' , 'Karamba Diaby' , 'Karl Lauterbach',
            'Steffi Lemke' , 'Cem Özdemir' , 'Katrin Göring-Eckardt' , 'Konstantin von Notz' , 'Britta Haßelmann' , 'Sven Lehmann' , 'Annalena Baerbock',
            'Sahra Wagenknecht' , 'Bernd Riexinger' , 'Niema Movassat' , 'Jan Korte' , 'Dietmar Bartsch' , 'Gregor Gysi' , 'Sevim Dağdelen',
            'Alice Weidel' , 'Beatrix von Storch' , 'Joana Cotar' , 'Stephan Brandner' , 'Tino Chrupalla' , 'Götz Frömming' , 'Leif-Erik Holm']):
    #get speeches from the specific politician
    speeches_analyzing = pre_data_speeches.loc[pre_data_speeches['full_name']==name]
    speeches_analyzing1 = speeches_analyzing.text.progress_apply(nlp_tweets_sentiws)
    #get the sentiment of the tweets
    politician_sum=[]
    for sentence in speeches_analyzing1:
        sentence_sum=[]
        for token in sentence:
            if token._.sentiws == None:
                a=0
            elif token._.sentiws == 'nan':
                a=0
            else:
                sentence_sum.append(token._.sentiws)
        sentence_score=np.nanmean(sentence_sum)
        politician_sum.append(sentence_score)
    politician_score=np.nanmean(politician_sum)
    #get the number of positive, neutral and negative tweets
    positive_p=0
    neutral_p=0
    negative_p=0
    for item_p in politician_sum:
        if item_p>0:
            positive_p += 1
        elif item_p<0:
            negative_p += 1
        elif item_p == 'nan':
            neutral_p += 1
        else:
            neutral_p += 1
    #set up list to secure the values generated
    data.append([name,politician_score,positive_p,neutral_p,negative_p])

  0%|          | 0/42 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  sentence_score=np.nanmean(sentence_sum)


  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/6 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  politician_score=np.nanmean(politician_sum)


0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/8 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/13 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/4 [00:00<?, ?it/s]

  0%|          | 0/6 [00:00<?, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

0it [00:00, ?it/s]

And afterwards create a dataframe from the data for analysis.

In [27]:
#set up dataframe with all values
dataf = pd.DataFrame(data, columns=['Name','Polarity_mean','Num_pos_speeches','Num_neutral_speeches','Num_neg_speeches'])
dataf.to_csv('../data/processed/sentiment_scores_speeches_sentiws_01.csv')
dataf.head()

Unnamed: 0,Name,Polarity_mean,Num_pos_speeches,Num_neutral_speeches,Num_neg_speeches
0,Ralph Brinkhaus,-0.024537,0,0,2
1,Hermann Gröhe,0.082749,3,0,1
2,Nadine Schön,-0.042718,1,0,1
3,Norbert Röttgen,-0.099983,0,1,2
4,Peter Altmaier,0.140338,3,2,0


As the polarity scores for the SentiWS dictionary seem to be less significant due to their absolute values being smaller, we decided to conduct the further in depth analysis of the sentiment with the TextBlob model. These smaller values with the SentiWS dictionary could be a result from our loop used because the mean values could be to unrobust to mean neutral tweets. Another possible explenation could be htat there are no great outliers for the tweet or speech sentiments as the value range for polarity is only from -1 to 1. 