# Sentiment Analysis on Swiss Newspaper Jupyter Notebook

_Giorgio Bakhiet Derias_
_I3a, Bachelorarbeit_

The aim of this notebook is to make an analysis of the sentiment about the different newspapers that can be read in Switzerland.

# Setup

### Install from requirements
In order to work I first need to install the libraries from which I will then import what I need.
I created a text file called *requirementsNewspaper*, in which I saved all the libraries I used.
The usefulness of this file is when I move to a new environment, installing all packages at once by simply typing:

In [67]:
#%conda install --file requirementsDashboard.txt

In [2]:
!python -m pip install --upgrade pip



In [1]:
!pip install plotly-express
!pip install voila --user
!pip install voila-gridstack
!pip install voila-material
!pip install nbconvert

Collecting nbconvert<7,>=6.0.0
  Using cached nbconvert-6.0.7-py3-none-any.whl (552 kB)

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 4.1.4 requires pyqt5<5.13; python_version >= "3", which is not installed.
spyder 4.1.4 requires pyqtwebengine<5.13; python_version >= "3", which is not installed.



Installing collected packages: nbconvert
Successfully installed nbconvert-6.0.7




In [3]:
!pip3 freeze > requirementsDashboard.txt

# Imports

In [71]:
# Numpy and Pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
import re

# Plotly
import plotly.express as px
from matplotlib import rc
import plotly.graph_objects as go

# KTrain
import ktrain
from ktrain import text

# Util
import logging
from datetime import date
import time
from IPython.core.display import HTML

In [72]:
from traitlets.config import Config
import nbformat as nbf
from nbconvert.exporters import HTMLExporter

c = Config()
# Configure our tag removal
c.TagRemovePreprocessor.remove_cell_tags = ("hide",)
c.TagRemovePreprocessor.remove_all_outputs_tags = ('hide',)
c.TagRemovePreprocessor.remove_input_tags = ('hide',)

# Newspaper
## Import predicted dataset

In [73]:
news_concat = pd.read_csv("news_concat.csv",parse_dates=['date_parsed'], encoding='utf8', error_bad_lines=False, warn_bad_lines=True, header=0)

In [74]:
news_concat = news_concat.drop(columns='Unnamed: 0')

### Count predictions

In [75]:
news_concat['sentiment'].value_counts()

0    2663
1    1823
Name: sentiment, dtype: int64

In [76]:
len(news_concat)

4486

In [77]:
pd.set_option('display.max_colwidth', None)

### How many sources?

In [78]:
news_concat.source.unique()

array(['20 minuten', 'blick', 'neue schweizer zeitung', 'srf',
       'neue zürcher zeitung', 'speedweek.com', 'watson',
       'tages-anzeiger', 'landbote.ch', 'merkur.de', 'tagblatt.ch',
       'telebasel', 'aerotelegraph', 'auto motor und sport',
       'inside paradeplatz', 'btc-echo', 'inside digital',
       'chinahandys.net', 'chip online', ' cash', 'www.rtl.de',
       'schweizer-illustrierte.ch', 'bluewin.ch', 'nau.ch', 'gmx.ch',
       'real total', 'der spiegel', 'futurezone.at', 't-online',
       'faz - frankfurter allgemeine zeitung', 'nzzas.nzz.ch',
       'hardwareluxx.de', 'bild', 'die welt', 'frankfurter rundschau',
       'oe24', 'scinexx | das wissensmagazin', 'heilpraxisnet.de',
       'infranken.de', 'heidelberg24.de', 'presseportal.de',
       'frankfurt-live.com', 'www.swr.de', 'insuedthueringen.de',
       'chemie-zeitschrift.at', 'frankenpost.de', 'np-coburg.de',
       'hna.de', 'kreisbote', 'aargauer zeitung',
       'schiffe und kreuzfahrten - das kreuzfahr

In [79]:
news_concat

Unnamed: 0,source,content,category,sentiment,date_parsed
0,20 minuten,Verletzung im Schädelinneren : Frau lief nach Corona-Test Hirnwasser aus dem Kopf. In Osnabrück ist eine Frau beim Corona-Schnelltest im Inneren ihres Schädels verletzt worden. Danach lief ihr wochenlang Hirnwasser aus dem Kopf.,world,0,2021-05-01
1,blick,"USA: Freizeitpark wieder auf. 13 Monate lang war Disneyland wegen der Corona-Pandemie stillgelegt, nun hat der beliebte Freizeitpark in Kalifornien wieder auf.",world,1,2021-05-01
2,20 minuten,"Verdacht auf Menschenschmuggel : US-Polizei findet 91 Menschen ohne Papiere in Wohnhaus. Auf Hinweis einer Entführung finden Polizeibeamte in Houston, im US-Bundesstaat Texas, 91 Frauen und Männer ohne gültige Aufenthaltspapiere.",world,0,2021-05-01
3,20 minuten,"Australien macht ernst : Bis zu fünf Jahre Gefängnis für Heimkehrer aus Hochrisikogebieten. Australien plant radikale Massnahmen für Personen, die illegal aus Corona-Hochrisikogebieten wie Indien einreisen: Ihnen könnte künftig bis zu fünf Jahren Gefängnis drohen.",world,0,2021-04-30
4,blick,Indonesien: Veronika Troshina droht Knast wegen Porno-Dreh auf Bali. Für den Dreh eines Amateur-Sexclips haben sich die Russin Veronika Troshina (22) und ihr Partner ausgerechnet einen heiligen Berg auf Bali ausgesucht. Dafür sucht sie nun die Polizei,world,0,2021-04-30
...,...,...,...,...,...
4481,t-online,RKI-Zahlen in Deutschland: Bundesweite Sieben-Tage-Inzidenz sinkt auf unter 100. Erstmals seit dem 20. März vermeldet das RKI eine Sieben-Tage-Inzidenz unter dem kritischen Schwellenwert. Auch die Zahl der gemeldeten Neuinfektionen liegt deutlich unter der Vorwoche.,science,0,2021-05-14
4482,bild,Thüringen: Wie ein Wirt ganz legal die Corona-Regeln umgeht. Gotha (Thüringen) – Allein in den letzten Tagen wurden über 600 Gäste bekocht – Trotz Notbremse und Inzidenz weit über 200.,science,0,2021-05-13
4483,augsburger allgemeine,"Vorsicht: Ausgekugelte Schulter nie selbst behandeln. Eine ausgekugelte Schulter ist ausgesprochen schmerzhaft und das Einkugeln mitunter abenteuerlich. Damit alles wieder dahin kommt, wo es hingehört, hat der Arzt...",health,0,2021-05-13
4484,www.rtl.de,"Covid-19 ist doch keine Atemwegserkrankung - Lauterbach: ""wichtige Studie"" - RTL Online. Eine neue Studie zeigt nun, dass die besonderen Spike-Proteine auch bei der durch das Coronavirus ausgelösten Covid-19-Erkrankung eine Schlüsselrolle spielen.",health,1,2021-05-12


# Plot the data
Now that the data has been imported, predicted and cleaned I can start to analyse it, to do this I will use plolty.
In order to display the data correctly I will first have to normalise it, I have written two functions for this purpose.

In [80]:
def normalize(df):
    # copy the data
    df_max_scal = df.copy()

    # apply normalization techniques
    for column in df_max_scal.columns:
        df_max_scal['sentiment %'] = (df_max_scal['count'] / df_max_scal['count'].sum())*100
        
    df_max_scal['sentiment %'] = df_max_scal['sentiment %'].round(decimals=2)
    return df_max_scal
    

In [81]:
def norm(x):
    x['sentiment %'] = (x['count'] /x['count'].sum())*100   
    x['sentiment %'] = x['sentiment %'].round(decimals=2)
    return x  

## Plot of sentiment positive vs negative

In [82]:
tot= news_concat.groupby(['sentiment']).size().reset_index()
tot['sentiment'] = tot['sentiment'].astype(str)
tot = tot.rename(columns={0:'count'})
tot = normalize(tot)

In [83]:
tot

Unnamed: 0,sentiment,count,sentiment %
0,0,2663,59.36
1,1,1823,40.64


In [84]:
figTotal = px.bar(tot,
                  x="sentiment",
                  y="sentiment %",
                  #barmode="group",
                  color="sentiment",
                  color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                    },
                  labels={
                      "sentiment": "Sentiment",
                      "sentiment %": "# of articles (%)",
                      "sentiment": "Sentiment"
                  },
                  title="Total Positive vs Negative"
                  )

figTotal.show()

## Plot all newspaper positive vs negative per total count

In [85]:
grouped= news_concat.groupby(['source','sentiment']).size().reset_index()
grouped['sentiment'] = grouped['sentiment'].astype(str)
grouped = grouped.rename(columns={0:'count'})
grouped = normalize(grouped)
grouped

Unnamed: 0,source,sentiment,count,sentiment %
0,cash,0,29,0.65
1,cash,1,21,0.47
2,technik smartphone news,0,1,0.02
3,11freunde.de,0,1,0.02
4,20 minuten,0,461,10.28
...,...,...,...,...
392,xboxdynasty,1,1,0.02
393,xboxdynasty.de,1,1,0.02
394,youtube,0,1,0.02
395,zofingertagblatt.ch,0,1,0.02


In [86]:
figNews = px.bar(grouped,
                 x="source",
                 y="sentiment %",
                 text = "sentiment %",
                 barmode="group",
                 color="sentiment",
                 color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                    },
                 #facet_col='source', facet_col_wrap=4
                 #facet_row="targetTitle", 
                 #facet_col="category",
                  )
figNews.update_layout(xaxis={'categoryorder':'total descending'})
figNews.show()

## Plot of all newspaper positive vs negative

In [87]:
grouped.source.value_counts()

neue schweizer zeitung     2
futurezone.at              2
deavita                    2
finews.ch                  2
wallstreet-online          2
                          ..
tv aktuell                 1
vestors capital magazin    1
myheimat.de                1
ingame.de                  1
soaktuell.ch               1
Name: source, Length: 282, dtype: int64

In [88]:
# I take newspapers with more than tot articles, if a newspaper's category is missing I delete it.
clean = grouped.loc[grouped['count'] > 5]

In [89]:
# activate this only if you want a newspaper with both sentiment 
#clean = clean[clean['source'].map(clean['source'].value_counts()) > 1]

In [90]:
df20 = clean.loc[clean['source'] == '20 minuten']
df20 = normalize(df20)
display(df20)

Unnamed: 0,source,sentiment,count,sentiment %
4,20 minuten,0,461,64.57
5,20 minuten,1,253,35.43


In [91]:
clean = clean.groupby(['source']).apply(norm).reset_index(drop=True)

In [92]:
clean

Unnamed: 0,source,sentiment,count,sentiment %
0,cash,0,29,58.00
1,cash,1,21,42.00
2,20 minuten,0,461,64.57
3,20 minuten,1,253,35.43
4,aargauer zeitung,0,12,66.67
...,...,...,...,...
88,watson,0,52,44.07
89,watson,1,66,55.93
90,wirtschaftsblatt-bg.com,0,6,100.00
91,www.rtl.de,0,8,57.14


In [93]:
figNews = px.bar(clean,
                 x="source",
                 y="sentiment %",
                 text = "sentiment %",
                 barmode="group",
                 color="sentiment",
                 color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                    },
                 labels={
                      "source": "Sources",
                      "sentiment %": "# of articles (%)",
                      "sentiment": "Sentiment"
                  },
                  title="Newspaper Positive vs Negative"
                  )
                  
figNews.show()

## Plot top 10 positive vs top 10 negative 

In [94]:
# plot newspaper best positive 10
gr10Pos = clean.loc[clean['sentiment'] == '1']
gr10Pos = gr10Pos.sort_values(by=['sentiment %'], ascending=False)

In [95]:
gr10Pos = gr10Pos.head(10)
gr10Pos

Unnamed: 0,source,sentiment,count,sentiment %
44,games.ch,1,9,100.0
74,schweizer-illustrierte.ch,1,17,100.0
67,neue schweizer zeitung,1,13,100.0
70,nzzas.nzz.ch,1,7,100.0
32,eurosport de,1,12,100.0
31,ecomento.de,1,6,100.0
20,chinahandys.net,1,8,100.0
10,auto motor und sport,1,19,100.0
58,it magazine,1,12,100.0
89,watson,1,66,55.93


In [96]:
# plot newspaper best negative 10
gr10Neg = clean.loc[clean['sentiment'] == '0']
gr10Neg = gr10Neg.sort_values(by=['sentiment %'], ascending=[False])

In [97]:
gr10Neg = gr10Neg.head(10)
gr10Neg

Unnamed: 0,source,sentiment,count,sentiment %
83,tagblatt.ch,0,19,100.0
30,echo24.de,0,11,100.0
52,infranken.de,0,8,100.0
90,wirtschaftsblatt-bg.com,0,6,100.0
61,luzerner zeitung,0,9,100.0
47,heidelberg24.de,0,16,100.0
39,fr.de,0,6,100.0
19,business insider deutschland,0,8,100.0
71,ok! magazin,0,8,100.0
25,cryptoticker.io,0,11,100.0


In [98]:
figNews10 = px.bar(gr10Pos,
                   x="source",
                   y="sentiment %",
                   text = "sentiment %", 
                   barmode="group",
                   color="sentiment",
                   color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                    },
                   labels={
                      "source": "Sources",
                      "sentiment %": "# of articles (%)",
                      "sentiment": "Sentiment"
                  },
                  title="Top 10 Positive Newspaper"
                  )

figNews10.show()

In [99]:
figNews10 = px.bar(gr10Neg,
                   x="source",
                   y="sentiment %",
                   text = "sentiment %", 
                   barmode="group",
                   color="sentiment",
                   color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                    },
                   labels={
                      "source": "Sources",
                      "sentiment %": "# of articles (%)",
                      "sentiment": "Sentiment"
                   },
                   title="Top 10 Negative Newspaper"
                  )

figNews10.show()

## Plot pro category positive vs negative

In [100]:
category = news_concat.groupby(['category','sentiment']).size().reset_index()
category['sentiment'] = category['sentiment'].astype(str)
category = category.rename(columns={0:'count'})
#normalize on the dataset
category = normalize(category)
category

Unnamed: 0,category,sentiment,count,sentiment %
0,business,0,185,4.12
1,business,1,190,4.24
2,entertainment,0,279,6.22
3,entertainment,1,227,5.06
4,health,0,360,8.02
5,health,1,180,4.01
6,nation,0,235,5.24
7,nation,1,64,1.43
8,science,0,396,8.83
9,science,1,244,5.44


In [101]:
# normalize single category
category_clean = category.groupby(['category']).apply(norm).reset_index(drop=True)

In [102]:
category_clean

Unnamed: 0,category,sentiment,count,sentiment %
0,business,0,185,49.33
1,business,1,190,50.67
2,entertainment,0,279,55.14
3,entertainment,1,227,44.86
4,health,0,360,66.67
5,health,1,180,33.33
6,nation,0,235,78.6
7,nation,1,64,21.4
8,science,0,396,61.88
9,science,1,244,38.12


In [103]:
figCat = px.bar(category_clean, 
                x="category", 
                y="sentiment %",
                text = "sentiment %", 
                barmode="group",
                color="sentiment",
                color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                },
                labels={
                      "category": "Category",
                      "sentiment %": "# of articles (%)",
                      "sentiment": "Sentiment"
                  },
                  title="Positive vs Negative Category"
                )

figCat.show()


## Plot pro category of the top 3 newspaper
**I will only consider the three largest newspapers by number of article**

In [104]:
newspaper_source = [
    '20 minuten',
    'blick',
    'srf',
]

In [105]:
news_small = news_concat[news_concat.source.isin(newspaper_source)]

In [106]:
sourceCat = news_small.groupby(['source','category','sentiment']).size().reset_index()
sourceCat['sentiment'] = sourceCat['sentiment'].astype(str)
sourceCat = sourceCat.rename(columns={0:'count'})
# normalized only on category
sourceCat = sourceCat.groupby(['category']).apply(norm).reset_index(drop=True)
sourceCat

Unnamed: 0,source,category,sentiment,count,sentiment %
0,20 minuten,business,0,31,20.95
1,20 minuten,business,1,33,22.3
2,20 minuten,entertainment,0,56,22.4
3,20 minuten,entertainment,1,65,26.0
4,20 minuten,health,0,2,100.0
5,20 minuten,nation,0,106,40.3
6,20 minuten,nation,1,21,7.98
7,20 minuten,science,0,10,20.41
8,20 minuten,science,1,10,20.41
9,20 minuten,sport,0,41,10.73


In [107]:
# the right normalization
sourceClean = sourceCat.groupby(['source','category']).apply(norm).reset_index(drop=True)

In [108]:
sourceClean

Unnamed: 0,source,category,sentiment,count,sentiment %
0,20 minuten,business,0,31,48.44
1,20 minuten,business,1,33,51.56
2,20 minuten,entertainment,0,56,46.28
3,20 minuten,entertainment,1,65,53.72
4,20 minuten,health,0,2,100.0
5,20 minuten,nation,0,106,83.46
6,20 minuten,nation,1,21,16.54
7,20 minuten,science,0,10,50.0
8,20 minuten,science,1,10,50.0
9,20 minuten,sport,0,41,65.08


In [109]:
figCat = px.bar(sourceClean,
                x="source",
                y="sentiment %",
                text = "sentiment %", 
                #barmode="group",
                color="sentiment",
                color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                },
                #facet_row='sentiment', 
                facet_col="category",
                #facet_col_wrap=4
                #facet_row="targetTitle", 
                #facet_col="category",
                labels={
                    "source": "Sources",
                    "sentiment %": "# of articles (%)",
                    "sentiment": "Sentiment"
                  },
                title="Top 3 in CH"
                )

figCat.show()

For visualisation I create a minus value for "negative" values

In [110]:
sourceClean2 = sourceClean.copy()

In [111]:
sourceClean2['sentiment %'] = sourceClean2['sentiment %'] * (2 * sourceClean2['sentiment'].astype(int) - 1)

In [112]:
sourceClean2

Unnamed: 0,source,category,sentiment,count,sentiment %
0,20 minuten,business,0,31,-48.44
1,20 minuten,business,1,33,51.56
2,20 minuten,entertainment,0,56,-46.28
3,20 minuten,entertainment,1,65,53.72
4,20 minuten,health,0,2,-100.0
5,20 minuten,nation,0,106,-83.46
6,20 minuten,nation,1,21,16.54
7,20 minuten,science,0,10,-50.0
8,20 minuten,science,1,10,50.0
9,20 minuten,sport,0,41,-65.08


In [113]:
figCat = px.bar(sourceClean2,
                x="source",
                y="sentiment %",
                text = "sentiment %", 
                #barmode="group",
                color="sentiment",
                color_discrete_map={
                    '0': '#ef553b',
                    '1': '#00cc96'
                },
                #facet_row='sentiment', 
                facet_col="category",
                #facet_col_wrap=4
                #facet_row="targetTitle", 
                #facet_col="category",
                labels={
                    "source": "Sources",
                    "sentiment %": "# of articles (%)",
                    "sentiment": "Sentiment"
                  },
                title="Top 3 in CH Optimized"
                )

figCat.show()

## Plot spider category of a single newspaper


In [114]:
sourceCat = news_concat.groupby(['source','category','sentiment']).size().reset_index()
sourceCat['sentiment'] = sourceCat['sentiment'].astype(str)
sourceCat = sourceCat.rename(columns={0:'count'})
sourceCat = sourceCat.groupby(['category']).apply(norm).reset_index(drop=True)
sourceCat

Unnamed: 0,source,category,sentiment,count,sentiment %
0,cash,science,0,1,0.16
1,cash,technology,0,28,9.56
2,cash,technology,1,21,7.17
3,technik smartphone news,health,0,1,0.19
4,11freunde.de,sport,0,1,0.15
...,...,...,...,...,...
607,xboxdynasty,technology,1,1,0.34
608,xboxdynasty.de,world,1,1,0.09
609,youtube,science,0,1,0.16
610,zofingertagblatt.ch,health,0,1,0.19


## Def for plotting SPIDER

In [115]:
def plot_spider(df_name):
    d_name = str(df_name)
    df_name = sourceCat.loc[sourceCat['source'] == d_name ]
    df_name = df_name.groupby(['category']).apply(norm).reset_index(drop=True)
    df_name_pos = df_name.loc[df_name['sentiment'] == '1']
    df_name_neg = df_name.loc[df_name['sentiment'] == '0']
    
    label_neg = d_name + " NEG %"
    label_pos = d_name +" POS %"
    
    fig = go.Figure()
    fig.add_trace(go.Scatterpolar(
            r=df_name_neg['sentiment %'],
            theta=df_name_neg['category'], 
            fill='toself',
            mode = 'markers',
            name= label_neg,
            line_color = '#ef553b'
    ))
    fig.add_trace(go.Scatterpolar(
            r=df_name_pos['sentiment %'],
            theta=df_name_pos['category'], 
            fill='toself',
            mode = 'markers',
            name= label_pos ,
            line_color = '#00cc96'
        ))


    fig.update_layout(
        title = 'Spider Comparison: '+ d_name,
        showlegend = True
    )

    fig.show()

### Plot spider

In [116]:
list_newspaper = [
    '20 minuten',
    'blick',
    'bluewin.ch',
    'finews.ch',
    'nau.ch',
    'neue zürcher zeitung',
    'srf',
    'telebasel',
    'tages-anzeiger',
    'watson'
]

In [117]:
for x in list_newspaper:
    plot_spider(x)

# Plot in Time

In [118]:
def create_df_time(df_filter, subject):
    df = df_filter.groupby(['date_parsed',subject,'sentiment']).size().reset_index()
    df['sentiment'] = df['sentiment'].astype(str)
    df = df.rename(columns={0:'count'})
    df = normalize(df)
    
    df_pos= df.loc[df['sentiment'] == '1']
    df_neg = df.loc[df['sentiment'] == '0']
    # how much influence a newspaper had(pos neg) in % per day 
    df_pos = df_pos.groupby(['date_parsed']).apply(norm).reset_index(drop=True)
    df_neg = df_neg.groupby(['date_parsed']).apply(norm).reset_index(drop=True)
    df_pos[subject] = df_pos[subject].astype(str) + '_pos' 
    df_neg[subject] = df_neg[subject].astype(str) + '_neg' 
    df_concat = pd.concat([df_pos,df_neg],ignore_index=True)
    
    # pivot table
    test = df_concat.pivot(index=subject,columns='date_parsed', values='sentiment %')
    test = test.fillna(0)
    test = test.reset_index()
    return test

## Plot newspaper in time

In [119]:
newspaper_source = [
    '20 minuten',
    'blick',
    #'bluewin.ch',
    #'finews.ch',
    #'nau.ch',
    #'neue zürcher zeitung',
    'srf',
    #'telebasel',
    #'tages-anzeiger',
    #'watson'
]

In [120]:
news_small = news_concat[news_concat.source.isin(newspaper_source)]

In [121]:
# Filter data between two dates
filtered_df = news_small.loc[(news_small['date_parsed'] >= '2021-05-01') & (news_small['date_parsed']<= '2021-05-31')]

In [122]:
filtered_df

Unnamed: 0,source,content,category,sentiment,date_parsed
0,20 minuten,Verletzung im Schädelinneren : Frau lief nach Corona-Test Hirnwasser aus dem Kopf. In Osnabrück ist eine Frau beim Corona-Schnelltest im Inneren ihres Schädels verletzt worden. Danach lief ihr wochenlang Hirnwasser aus dem Kopf.,world,0,2021-05-01
1,blick,"USA: Freizeitpark wieder auf. 13 Monate lang war Disneyland wegen der Corona-Pandemie stillgelegt, nun hat der beliebte Freizeitpark in Kalifornien wieder auf.",world,1,2021-05-01
2,20 minuten,"Verdacht auf Menschenschmuggel : US-Polizei findet 91 Menschen ohne Papiere in Wohnhaus. Auf Hinweis einer Entführung finden Polizeibeamte in Houston, im US-Bundesstaat Texas, 91 Frauen und Männer ohne gültige Aufenthaltspapiere.",world,0,2021-05-01
10,blick,"Ironman: Ryf läuft mit Streckenrekord zu Sieg in St. George. In St. George feiert Daniela Ryf ihren zweiten Saisonsieg in einem 70.3-Ironman. Ein gutes Omen für die Mitteldistanz-WM, die im September auf der gleichen Strecke durchgeführt wird.",world,1,2021-05-01
12,srf,32. Runde der Super League - Luzern verschafft sich weiter Luft im Abstiegskampf. Der FC Luzern gewinnt bei Vaduz 2:1 und baut den Vorsprung auf den Barrageplatz auf 9 Punkte aus.,world,1,2021-05-01
...,...,...,...,...,...
4475,blick,Premier League: Liverpool gewinnt Nachholspiel gegen ManUtd. Liverpool holt sich im Nachholspiel gegen Manchester United einen wichtigen Sieg. Vor der Partie kommts allerdings erneut zu Fan-Protesten.,sport,1,2021-05-13
4476,blick,"Nach Fotos von Kobe Bryants Absturz - zwei Feuerwehrmänner gefeuert. Sie waren bei Kobe Bryants (†41) Helikopterabsturz im Einsatz: Zwei Feuerwehrleute in Los Angeles verlieren ihren Job, weil sie Fotos von der Unfallstelle gemacht haben.",sport,0,2021-05-13
4477,blick,"Radsport: Gino Mäder gewinnt Bergankunft am Giro. Drei Kilometer vor dem Ziel lässt der Schweizer Gino Mäder (24, Bahrain Victorious) seine letzten beiden Begleiter stehen und gewinnt die 6. Etappe des Giro d'Italia in Ascoli Piceno.",sport,1,2021-05-13
4478,srf,Schweizer Sieg beim Giro - Paukenschlag beim Giro: Gino Mäder siegt auf der 6. Etappe. Der Fahrer vom Team Bahrain Victorious siegt in Ascoli Pieno. Zuvor hatte er sich von einer Ausreissergruppe abgesetzt.,sport,1,2021-05-13


In [123]:
source_time = create_df_time(filtered_df, 'source')

In [124]:
source_time

date_parsed,source,2021-05-01 00:00:00,2021-05-02 00:00:00,2021-05-03 00:00:00,2021-05-04 00:00:00,2021-05-05 00:00:00,2021-05-06 00:00:00,2021-05-07 00:00:00,2021-05-08 00:00:00,2021-05-09 00:00:00,...,2021-05-22 00:00:00,2021-05-23 00:00:00,2021-05-24 00:00:00,2021-05-25 00:00:00,2021-05-26 00:00:00,2021-05-27 00:00:00,2021-05-28 00:00:00,2021-05-29 00:00:00,2021-05-30 00:00:00,2021-05-31 00:00:00
0,20 minuten_neg,34.38,57.58,39.39,42.5,45.0,38.64,29.03,24.24,41.18,...,29.41,20.69,31.25,41.07,30.23,50.94,36.11,22.58,53.85,56.25
1,20 minuten_pos,9.09,35.29,43.75,35.0,41.18,39.29,47.83,33.33,44.44,...,23.81,36.84,11.76,32.35,44.44,33.33,28.0,38.1,37.5,23.53
2,blick_neg,56.25,36.36,42.42,42.5,40.0,50.0,61.29,51.52,35.29,...,52.94,37.93,59.38,35.71,55.81,33.96,44.44,54.84,30.77,31.25
3,blick_pos,63.64,47.06,37.5,40.0,38.24,35.71,30.43,33.33,22.22,...,23.81,36.84,64.71,44.12,33.33,36.67,32.0,47.62,37.5,23.53
4,srf_neg,9.38,6.06,18.18,15.0,15.0,11.36,9.68,24.24,23.53,...,17.65,41.38,9.38,23.21,13.95,15.09,19.44,22.58,15.38,12.5
5,srf_pos,27.27,17.65,18.75,25.0,20.59,25.0,21.74,33.33,33.33,...,52.38,26.32,23.53,23.53,22.22,30.0,40.0,14.29,25.0,52.94


In [125]:
HTML('''<div class="flourish-embed flourish-chart" data-src="visualisation/6126744"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

## Plot category in time


In [126]:
big_filtered_df = news_concat.loc[(news_concat['date_parsed'] >= '2021-05-01') & (news_concat['date_parsed']<= '2021-05-31')]

In [127]:
cat_time = create_df_time(big_filtered_df, 'category')

In [128]:
cat_time

date_parsed,category,2021-05-01 00:00:00,2021-05-02 00:00:00,2021-05-03 00:00:00,2021-05-04 00:00:00,2021-05-05 00:00:00,2021-05-06 00:00:00,2021-05-07 00:00:00,2021-05-08 00:00:00,2021-05-09 00:00:00,...,2021-05-22 00:00:00,2021-05-23 00:00:00,2021-05-24 00:00:00,2021-05-25 00:00:00,2021-05-26 00:00:00,2021-05-27 00:00:00,2021-05-28 00:00:00,2021-05-29 00:00:00,2021-05-30 00:00:00,2021-05-31 00:00:00
0,business_neg,3.7,4.55,11.84,8.14,10.11,8.33,4.23,0.0,4.11,...,5.71,0.0,2.99,5.22,11.7,6.03,6.02,8.11,1.79,5.88
1,business_pos,4.35,4.76,8.0,16.07,21.31,14.29,3.45,6.12,2.5,...,0.0,7.69,2.04,19.77,9.52,17.65,4.84,6.52,4.44,2.13
2,entertainment_neg,9.26,10.61,3.95,11.63,10.11,6.25,11.27,5.56,8.22,...,8.57,7.14,10.45,12.17,14.89,11.21,14.46,9.46,10.71,11.76
3,entertainment_pos,4.35,7.14,8.0,10.71,21.31,17.46,10.34,2.04,5.0,...,10.87,20.51,22.45,11.63,17.46,5.88,17.74,10.87,13.33,14.89
4,health_neg,9.26,9.09,22.37,15.12,13.48,17.71,15.49,26.39,20.55,...,11.43,11.43,5.97,12.17,6.38,12.07,9.64,22.97,8.93,12.94
5,health_pos,6.52,7.14,12.0,10.71,3.28,15.87,8.62,24.49,17.5,...,8.7,5.13,6.12,6.98,11.11,0.0,8.06,8.7,8.89,6.38
6,nation_neg,12.96,9.09,9.21,10.47,7.87,6.25,8.45,8.33,12.33,...,11.43,4.29,11.94,7.83,6.38,10.34,13.25,5.41,16.07,9.41
7,nation_pos,4.35,0.0,6.0,0.0,3.28,4.76,10.34,4.08,2.5,...,4.35,0.0,4.08,0.0,7.94,2.94,4.84,0.0,2.22,0.0
8,science_neg,14.81,15.15,14.47,13.95,16.85,17.71,18.31,5.56,12.33,...,12.86,18.57,13.43,13.04,15.96,13.79,13.25,9.46,19.64,21.18
9,science_pos,19.57,14.29,24.0,17.86,13.11,11.11,20.69,16.33,10.0,...,19.57,2.56,12.24,8.14,9.52,14.71,17.74,13.04,11.11,12.77


In [129]:
HTML('''<div class="flourish-embed flourish-chart" data-src="visualisation/6167385"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')