# Get user info 

In this notebook we'll explore some user groups of interest. We will take two approaches.
1. Look at user groups around specific channels and look into their histories.
2. Look at user groups around certain topics and look into their histories.

Let's start with importing the required python libraries

In [2]:
import pandas as pd 
import matplotlib.pyplot as plt 
import glob 
import csv 
import re
import datetime as dt
import sys
import os
import config
import scenariofunctions as sf

%matplotlib inline
csv.field_size_limit(sys.maxsize)


131072

Set paths to data. Better to set these paths into a separate config file.

In [3]:
path_nl = config.PATH_NL
path_right = config.PATH_RIGHT
path_left = config.PATH_LEFT

IMPORTANT NOTE: some data is really messy, especially the comment data. I had a hard time parsing the data correctly. One of the solutions was to use rare characters as separators and quotechars. 
In this case, I mostly used:

- sep='¶'
- quotechar='þ'

You need to use the python engine when reading csv, because the C engine doesn't accept these kinds of delimiters.

## Merge channels and comments so we can select the users of interest

In [4]:
nl_comments = pd.read_csv(path_nl + 'comments_nl_right.csv',
                        sep='¶',
                        quotechar='þ',
                        engine='python')

In [5]:
nl_videos = pd.read_csv(path_nl + 'videos_nl_right.csv')

In [6]:
nl_comment_sphere = pd.merge(nl_comments, nl_videos, on='video_id', how='left')

So these are the channels in our Dutch right infosphere and the number of comments we found per channel

In [7]:
to_delete = ['Omroep PowNed', 
             'GeenStijl', 
             'Cafe Weltschmerz', 
             'Voice of Europe', 
             'Matthew & Doris',
             'ThePostOnline TPO',
             'Al Stankard aka HAarlem VEnison']            

In [8]:
nl_comments = nl_comment_sphere[~nl_comment_sphere['video_channel_title'].isin(to_delete)]

In [9]:
nl_comments.video_channel_title.value_counts()

Rafiek de Bruin                44645
Forum Democratie               30290
Paul Nielsen                   19938
Linkse Moskee                  15463
Laurens                        15378
Leukste YouTube fragmenten     15251
TheLvkrijger                   15140
LaVieJanRoos                   11060
Brave New World                10010
Deweycheatumnhowe               9995
PolitiekincorrectTV             8083
AvariceUntied                   7722
GeertWildersMedia               4738
Keihard Producties              3621
PVVpers                         3181
PVV Media                       3098
Pim Fortuyn                     2375
Politiekman                     1207
K                                993
Piet Zwarte                      702
De saneer-meneer                 502
Batavieren Podcast               490
PVVep                            467
De Dagelijkse Standaard DDS      383
Hollands Post                    364
Erkenbrand Kanaal                304
voorpostmedia                    288
R

In [10]:
poi = ['A Stuijt',
        'Adrie Van Dijk',
        'Akka Fietje',
        'Wouter Lensvelt',
        'Willem Sterk',
        'Michael Groenendijk',
        'Milo Overzicht',
        'Mike De Jong',
        'Mike Brink',
        'Nellie Rutten',
        'Paul van Dijck',
        'Peter Jongsma',
        'Piet Hein',
        'Pieter van der Meer',
        'Polder Cannabis Olie team',
        'Politiekman',
        'Raymond Doetjes',
        'Willem Pasterkamp',
        'Wimpiethe3',
        'Willie van het Kerkhof',
        'Vincent Vermeer',
        'Mark Tak',
        'Melvin Jansen',
        'Mark Kamphuis',
        'Tristan van Oosten',
        'Tom dGe-lugs-pa',
        'Tom Van de Pol',
        'Tom Van Gool',
        'Marcel Bruinsma',
        'Maarten van der Poel',
        'Maciano Van der Laan',
        'Tiemen Weistra',
        'TheRdamterror',
        'TheCitroenman1',
        'The flying dutchman',
        'Teun de Heer',
        'Stijn van de Ven',
        'Sjaak v Koten',
        'Sev Vermeer',
        'Tanya De Beer',
        'Tim Pietersen',
        'Alan Holland',
        'Bennie Leip',
        'Bert Prins',
        'Bestheftig',
        'Borisje Boef',
        'Chris Van Bekkum',
        'Coen Bijpost',
        'Cornelis van der Heijden',
        'David Teunissen',
        'David Van der Tweel',
        'De Veelvraat',
        'Dennis Bouma',
        'Dennis Eijs',
        'Donald gekkehenkie',
        'peter van',
        'onbekende telefoon',
        'nick van achthoven',
        'mikedehoogh black flag race photos',
        'kristof verbruggen',
        'jan holdijk',
        'jan Yup',
        'iwan munnikes',
        'hans van de mortel',
        'geroestetumor',
        'geheimschriver',
        'gaatje niksaan',
        'dutchmountainsnake',
        'dutch menneer',
        'donder bliksem',
        'boereriem',
        'appie D',
        'adam willems',
        'Yuri Klaver',
         'zuigdoos',
        'yvonneforsmanatyahoo',
        'vanhetgoor',
        'theflyingdutchboi',
        'r juttemeijer',
        'rutger houtdijk',
        'Dutch Patriot',
        'Dutch Whitey',
        'DutchFurnace',
        'Esias Lubbe',
        'Ewalds Eiland',
        'Joey Kuijs',
        'Faust',
        'Hollandia777',
        'Johan van Oldenbarnevelt',
        'Keescanadees',
        'Geert Kok',
        'Haasenpad',
        'Henk Damster',
        'Henk van der Laak',
        'Henri Zwols',
        'Haat Praat',
        'Gerard Mulder',    
        'Grootmeester Jan',
        'H. v. Heeswijk',
        'B. Hagen',
        '1234Daan4321',
        'Daniella Thoelen', 
        'Diederik',
        'Linda Bostoen', 
        'Christiaan Baron', 
        'Matthijs van Guilder',
        'Johannes Roose',
        'Deon Van der Westhuizen', 
        'Remko Jerphanion', 
        'Roosje Keizer',
        'Dennis Durkop',
        'ivar olsen',
        'Pete de pad',
        'georgio jansen',
        'Joel Peter',
        'Antonie de Vry',
        'Stijn Voorhoeve', 
        'liefhebber179',
        'Walter Taljaard',
        'joe van gogh',
        'Edo Peter', 
        'Ad Lockhorst',
        'kay hoorn',
        'Erik Bottema',
        'Deplorable Data',
        'JESSEverything',
        'Harry Balzak', 
        'Bokkepruiker Records',
        'zonnekat',
        'Peter-john De Jong',
        'marco mac',
        'Joubert x',
        'Natasja van Dijk',
        'Voornaam Achternaam',
        'hermanPla', 
        'M. van der Scheer',
        'gerald polyak',
        'Robbie Retro',
        'Johannes DeMoravia',
        'Wouter Vos',
        'AwoudeX',
        'carolineleiden',
        'A-dutch-Z',
        'piet ikke',
        'kutbleat',
        'David of Yorkshire',
        'Gert Tjildsen',
        'Flying Dutchman',
        'Visko Van Der Merwe',
        'Blobbejaan Blob',
        'TheBergbok',
        'jknochel76',
        'Olleke Bolleke',
        'Nayako Sadashi', 'demarcation'  
        'er zaal',
        'jhon jansen',
        '-____-',
        'Brummie Brink',
        'reindeerkid ',
        'Pagan Cloak',
        'NDY',
        'Karel de Kale',
        'top top',
        'Chris Veenendaal ',
        'MijnheerlijkeBuitenlandse befkut ,',
        'Kevin Zilverberg',
        'Rick Dekker ',
        'Adrie Van Dijk ',
        'miep miep',
        'pronto ',
        'TheUnTrustable0',
        'danny schaap',
        'Mark Mathieu',
        'Raysboss302',
        'Ruud Hooreman',
        'Willie W',
        'Barend Borrelworst',
        'theo breytenbach',
         'coinmaster1000 coinmaster1000'  ]

In [11]:
len(poi)

179

In [12]:
nl_right_comments = nl_comments[nl_comments['author_display_name'].isin(poi)]

In [41]:
len(nl_right_comments)

11757

In [13]:
nl_right_comments.video_channel_title.value_counts()

Rafiek de Bruin                3667
Forum Democratie               1899
Leukste YouTube fragmenten     1146
TheLvkrijger                   1052
Paul Nielsen                    824
LaVieJanRoos                    641
Linkse Moskee                   378
Laurens                         370
Politiekman                     343
Deweycheatumnhowe               300
Keihard Producties              237
PVV Media                       172
PolitiekincorrectTV             101
PVVpers                          82
Piet Zwarte                      72
K                                64
GeertWildersMedia                59
Brave New World                  58
De saneer-meneer                 45
Res cogitans                     43
De Dagelijkse Standaard DDS      40
Batavieren Podcast               36
Pim Fortuyn                      33
Hollands Post                    24
Erkenbrand Kanaal                18
PVVep                             8
Ruben FVD                         7
LavendelTV                  

## Our users in the international right infosphere

In [14]:
poi_unique = nl_right_comments.author_channel_id.unique()

In [15]:

iter_csv = pd.read_csv(path_right + 'comments_right.csv', 
                        chunksize=1000000, 
                        sep='¶',
                        quotechar='þ',
                        engine='python')
int_right_comments = pd.concat([chunk[chunk['author_channel_id'].isin(poi_unique)] for chunk in iter_csv])

# And see how many comments we found of our users
len(int_right_comments)

34555

In [17]:
videos_right = pd.read_csv(path_right + 'videos_right.csv', encoding='latin-1')

Unnamed: 0,video_id,video_published,channel_id,video_title,video_description,video_channel_title,video_tags,video_category_id,video_default_language,video_duration,video_view_count,video_comment_count,video_likes_count,video_dislikes_count,video_topic_ids,video_topic_categories
0,bSx-WpcSvh0,2018-08-08T19:49:03.000Z,UCJIAT4v6irhZChsrB4hzsPA,Trailer for Interview on HPANWO Radio 9 Aug 20...,Programme notes https://hpanwo-radio.blogspot...,Cosmic Claire,"['CS Lewis', 'Out of the Silent Planet', 'Voya...",27.0,not set,PT2M45S,48.0,8.0,4.0,1.0,"['/m/02jjt', '/m/02jjt']",['https://en.wikipedia.org/wiki/Entertainment']
1,SGKcO-NMLb0,2018-07-21T23:05:12.000Z,UCJIAT4v6irhZChsrB4hzsPA,Claire Rae Randall ~ Waking the Monkey! Interv...,Claire Rae Randall on The Hundredth Monkey Rad...,Cosmic Claire,"['Hundredth Monkey', 'Waking the Monkey', 'Cla...",27.0,not set,PT1H58M50S,35.0,2.0,2.0,0.0,"['/m/02jjt', '/m/02jjt']",['https://en.wikipedia.org/wiki/Entertainment']
2,mWWjMGYJRGo,2018-07-17T22:36:44.000Z,UCJIAT4v6irhZChsrB4hzsPA,Channel Update ~ The War on Gender,Update on the progress of my forthcoming book ...,Cosmic Claire,"['Transgender', 'War on Gender', 'Waking The M...",27.0,not set,PT26M26S,38.0,7.0,4.0,0.0,"['/m/019_rr', '/m/0kt51', '/m/019_rr', '/m/0kt...",['https://en.wikipedia.org/wiki/Lifestyle_(soc...
3,ssJJAmpv8AY,2018-03-29T02:55:36.000Z,UCJIAT4v6irhZChsrB4hzsPA,False Allegations Made Against Me,Further to the cancellation of my talk 'The Wa...,Cosmic Claire,not set,27.0,not set,PT33M57S,137.0,11.0,6.0,1.0,"['/m/019_rr', '/m/019_rr']",['https://en.wikipedia.org/wiki/Lifestyle_(soc...
4,JgmPirHGe9Q,2018-03-28T22:29:46.000Z,UCJIAT4v6irhZChsrB4hzsPA,Someone Doesn't Like Me,Following the cancellation of my pre-launch ev...,Cosmic Claire,"['Transgender', 'Transsexual', 'War on Gender'...",27.0,not set,PT4M22S,134.0,9.0,10.0,0.0,not set,not set
5,oYfCKUh4XSg,2018-03-26T21:47:20.000Z,UCJIAT4v6irhZChsrB4hzsPA,The Jeremy Clarkson Postulate,I mean no disrespect to Jeremy Clarkson by thi...,Cosmic Claire,"['Transsexual', 'Transgender', 'Jeremy Clarkso...",27.0,not set,PT18M47S,96.0,8.0,5.0,1.0,"['/m/02jjt', '/m/02jjt']",['https://en.wikipedia.org/wiki/Entertainment']
6,ahpVnHXzXEg,2018-03-26T20:32:40.000Z,UCJIAT4v6irhZChsrB4hzsPA,The War on Gender ~ Pre Book Launch talk Censored,I am a transsexual woman who has just been shu...,Cosmic Claire,"['Transgender', 'Transsexual', 'Headingley Lit...",27.0,not set,PT1H19M18S,125.0,3.0,6.0,0.0,"['/m/019_rr', '/m/0kt51', '/m/019_rr', '/m/0kt...","['https://en.wikipedia.org/wiki/Health', 'http..."
7,EvqQrK0GiUo,2018-03-21T00:00:32.000Z,UCJIAT4v6irhZChsrB4hzsPA,Tolkien in Leeds,This is my presentation for the Headingley Lit...,Cosmic Claire,"['Tolkien', 'Headingley', 'Shire Oak', 'Skyrac...",27.0,not set,PT40M49S,77.0,12.0,5.0,0.0,"['/m/098wr', '/m/098wr']",['https://en.wikipedia.org/wiki/Society']
8,C2Q_ujM4wJ8,2018-01-01T23:34:55.000Z,UCJIAT4v6irhZChsrB4hzsPA,In Memory of Randy California ~ Hero Died 2 Ja...,In memory of Randy California 20 Feb 1951 -2 J...,Cosmic Claire,"['Randy California', 'Spirit', 'Hendrix', 'Psy...",27.0,not set,PT12M34S,737.0,10.0,13.0,0.0,"['/m/02jjt', '/m/02jjt']",['https://en.wikipedia.org/wiki/Entertainment']
9,Ozns2xXH3VM,2017-12-22T00:56:40.000Z,UCJIAT4v6irhZChsrB4hzsPA,Cliche Came Out Of Its Cage by CS Lewis,My reading of the poem Cliche Came Out Of Its ...,Cosmic Claire,"['Cliche Came Out Of Its Cage', 'CS Lewis', 'O...",27.0,not set,PT3M56S,51.0,2.0,3.0,0.0,"['/m/02jjt', '/m/02jjt']",['https://en.wikipedia.org/wiki/Entertainment']


In [18]:

int_right_comments = pd.merge(int_right_comments, videos_right, on='video_id', how='left')


In [40]:
len(int_right_comments)

35923

In [21]:
iter_csv = pd.read_csv(path_left + 'comments_left.csv', 
                        chunksize=1000000, 
                        sep='¶',
                        quotechar='þ',
                        engine='python')
int_left_comments = pd.concat([chunk[chunk['author_channel_id'].isin(poi_unique)] for chunk in iter_csv])

# And see how many comments we found of our users
len(int_left_comments)

14008

In [25]:
videos_left = pd.read_csv(path_left + 'videos_left.csv', encoding='latin-1')
len(videos_left)

400206

In [23]:
int_left_comments = pd.merge(int_left_comments, videos_left, on='video_id', how='left')


In [24]:
len(int_left_comments)

14534

In [28]:
nl_comments.columns

Index(['video_id', 'comment_id', 'author_display_name', 'author_channel_url',
       'author_channel_id', 'comment_text', 'comment_like_count',
       'comment_dislike_count', 'comment_time', 'reply_count',
       'video_published', 'channel_id', 'video_title', 'video_description',
       'video_channel_title', 'video_tags', 'video_category_id',
       'video_default_language', 'video_duration', 'video_view_count',
       'video_comment_count', 'video_likes_count', 'video_dislikes_count',
       'video_topic_ids', 'video_topic_categories'],
      dtype='object')

In [31]:
comments = nl_right_comments.append([int_left_comments, int_right_comments] ,sort=False)

In [33]:
len(comments)

277401

In [34]:
comments = comments.drop_duplicates()

In [35]:
len(comments)

277353

In [36]:
comments.to_csv(path_right + 'comments.csv')

In [39]:
comments.author_display_name.nunique()

59435