# Case Study:  Reddit Social Network Analysis Against Influence Operation
Adel Abu Hashim - Oct 2021

## Table of Contents
<ul>
<li><a href="#intro"><b><mark>Introduction<mark/><b/></a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

>This case study aims to help **Amber Heard** <br>
> 
> By analyzing new accounts posting/ commenting against a victim of a Social Bot Disinformation/Influence Operation. 
> 
> **We have three main datasets**: <br>
>(The datasets screaped from **reddit**).
> - 1- A dataset with submissions & comments data (2018).
> - 2- Users Data (from 2006 to 2018).
> - 3- A merged dataset (submissions & comments data, users data).
> - 4- Daily creation data 
> (# of accounts created per day from 2006 to 2018)

In [1]:
#import dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import helpers
import matplotlib.dates as mdates
import plotly.express as px
import plotly.graph_objects as go
import re
import warnings
warnings.filterwarnings('ignore')
sb.set_style("darkgrid")
%matplotlib inline


In [2]:
# load data
df = pd.read_csv("cleaned_data/reddit_cleaned_2018.csv")
df_merged = pd.read_csv("cleaned_data/reddit_merged_2018.csv")

df_users = pd.read_csv("cleaned_data/users_cleaned.csv")

In [3]:
# convert to datetime
df.created_at = pd.to_datetime(df.created_at)
df_merged.created_at = pd.to_datetime(df_merged.created_at)
df_merged.user_created_at = pd.to_datetime(df_merged.user_created_at)
df_users.user_created_at = pd.to_datetime(df_users.user_created_at)

<a id='eda'></a>
## Exploratory Data Analysis
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#eda"><b><mark>Exploratory Data Analysis</mark></b></a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>


<ul>
<li><a href="#explore_reddit"><b><mark>Reddit Contributions (Comments / Submissions)</mark></b></a></li>
<li><a href="#reddit_comments">Reddit Comments</a></li>
<li><a href="#reddit_submissions">Reddit Submissions</a></li>
<li><a href="#subredits">Subredits</a></li>
<li><a href="#explore_merged">Merged Users Data with Comments/Submissions Data</a></li>
</ul>

<li><a href="#peak_days">Peak Days</a></li>

<a id='explore_reddit'></a>
> ### Reddit Contributions (Comments / Submissions)

<a id='submissions_comments'></a>
>>### The # of submissions VS the # of comments

<ul>
<li><a href="#submissions_comments"><b><mark>The # of submissions VS the # of comments</mark></b></a></li>   
<br>
<li><a href="#mostcommented_user">Most contributed user</a></li>
<li><a href="#NegativeCommented_state">Check wether the users contributing the most to comments/submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#text">Investigate the text column</a></li>
<li><a href="#text_words">Check the number of text words</a></li>
</ul>

In [197]:
df.shape[0]

6993

In [198]:
px.bar(data_frame=df['submission_comment'].value_counts().to_frame().reset_index(),
       x="index", y="submission_comment", color='submission_comment', text = 'submission_comment' ).update_layout(title='Comment or Submission',
                   xaxis_title='contribution category',
                   yaxis_title='number of contributions').update_traces(marker_color='#5296dd')

In [199]:
fig = px.pie(data_frame=df['submission_comment'].value_counts().to_frame().reset_index(), values='submission_comment', names='index',
             title='Comment or Submission')
fig.show()

<a id='mostcommented_user'></a>
>>### Most contributed user

<ul>
<li><a href="#submissions_comments">The # of submissions VS the # of comments</a></li>
<br>
<li><a href="#mostcommented_user"><b><mark>Most contributed user</mark></b></a></li>
<li><a href="#NegativeCommented_state">Check wether the users contributing the most to comments/submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#text">Investigate the text column</a></li>
<li><a href="#text_words">Check the number of text words</a></li>
</ul>

In [5]:
df.author.value_counts().to_frame().head(10).reset_index()

Unnamed: 0,index,author
0,-banned-,1666
1,Night_Chicken,135
2,AutoModerator,50
3,emilyguy,38
4,AutoNewsAdmin,35
5,ccrraapp,34
6,Rednaxela117,33
7,AutoNewspaperAdmin,33
8,ZorakLocust,27
9,tenchineuro,22


In [6]:
fig = px.bar(df.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
             height=500,
             title='Most commented user in 2018').update_traces(marker_color='#5296dd',).update_layout(
                   xaxis_title='number of comments',
                   yaxis_title='user name').update_traces(marker_color='#5296dd')

fig.update_yaxes(autorange="reversed")    

In [7]:
df.author.value_counts().to_frame().head(10).reset_index()

Unnamed: 0,index,author
0,-banned-,1666
1,Night_Chicken,135
2,AutoModerator,50
3,emilyguy,38
4,AutoNewsAdmin,35
5,ccrraapp,34
6,Rednaxela117,33
7,AutoNewspaperAdmin,33
8,ZorakLocust,27
9,tenchineuro,22


### Further Investigate The Most Commented Users In 2018

<a id='AutoModerator'></a>
#### AutoModerator

AutoModerator is a system built into reddit that allows moderators to define "rules" (consisting of checks and actions) to be automatically applied to posts in their subreddit.

<ul>
<li><a href="#AutoModerator"><b><mark>AutoModerator</mark></b></a></li>
<li><a href="#Night_Chicken">Night_Chicken</a></li>
<li><a href="#emilyguy	"><font color=''>emilyguy</font></a></li>
<li><a href="#ccrraapp	"><font color=''>ccrraapp</font></a></li>
<li><a href="#AutoNewspaperAdmin"><font>AutoNewspaperAdmin </font></a></li>
<li><a href="#Rednaxela117"><font color=''>Rednaxela117</font></a></li>
</ul>

In [8]:
df_auto_moderator = df.query(" author == 'AutoModerator' ").reset_index(drop=True)
print(df_auto_moderator.shape)
df_auto_moderator.head(1) 

(50, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
0,t1_dsmgq7k,/r/SubredditDrama/comments/7q5eif/rharrypotter...,"We're sorry, but accounts under 14 days old ma...",t3_7q5eif,r/SubredditDrama,AutoModerator,2018-01-13 16:30:08,Negative,Positive,1.0,submission,comment,82,rharrypotter_duel_on_the_topic_of_domestic_abuse,8,[],0


In [9]:
df_auto_moderator.subreddit.value_counts().head(10)

r/DC_Cinematic        12
r/JerkOffToCelebs      6
r/AskReddit            4
r/worldnews            4
r/Celebs               3
r/youtube              2
r/unpopularopinion     2
r/videos               2
r/politics             1
r/photoshopbattles     1
Name: subreddit, dtype: int64

In [203]:
df_auto_moderator.created_at.dt.date.value_counts()

2018-11-20    3
2018-11-19    3
2018-12-20    3
2018-11-27    2
2018-12-14    2
2018-12-21    2
2018-12-26    2
2018-01-13    1
2018-12-17    1
2018-06-03    1
2018-12-24    1
2018-12-22    1
2018-02-27    1
2018-12-03    1
2018-09-27    1
2018-10-09    1
2018-10-24    1
2018-11-22    1
2018-11-07    1
2018-10-01    1
2018-06-17    1
2018-05-28    1
2018-10-10    1
2018-12-27    1
2018-12-06    1
2018-10-04    1
2018-01-15    1
2018-06-06    1
2018-01-16    1
2018-07-26    1
2018-08-17    1
2018-07-04    1
2018-08-18    1
2018-12-29    1
2018-04-10    1
2018-07-24    1
2018-11-29    1
2018-11-26    1
2018-11-18    1
2018-12-15    1
Name: created_at, dtype: int64

In [10]:
df_auto_moderator['permalink'].iloc[26]

'/r/worldnews/comments/9ymw4g/the_real_reason_amber_heard_hesitated_to_take_her/ea2jaoy/'

In [204]:
df_auto_moderator.text.value_counts().head()

Your submission has been automatically removed for not including a valid category/subcategory tag. Tags are essential to an optimal browsing experience for our users.\n\nSince your post was removed automatically, you are free to resubmit it with an appropriate tag. You can find the tagging guide [here](/r/DC_Cinematic/wiki/linkflair#wiki_automated_tagging). Add a valid and appropriate tag in your submission title. Choose wisely, as posts with misleading tags are subject to removal.\n\n**Message the moderators if your post was removed despite being tagged with an input from the category list.**\n\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/DC_Cinematic) if you have any questions or concerns.*                                                                                                     12
Your comment has been removed due to your account not meeting the required (> 1 day old) limit necessary 

<a id='Night_Chicken'></a>
#### Night_Chicken

<ul>
<li><a href="#AutoModerator">AutoModerator</a></li>
<li><a href="#Night_Chicken"><b><mark>Night_Chicken</b></mark></a></li>
<li><a href="#emilyguy	"><font color=''>emilyguy</font></a></li>
<li><a href="#ccrraapp	"><font color=''>ccrraapp</font></a></li>
<li><a href="#AutoNewspaperAdmin"><font>AutoNewspaperAdmin </font></a></li>
<li><a href="#Rednaxela117"><font color=''>Rednaxela117</font></a></li>
</ul>

In [12]:
df_night_chicken = df.query(" author == 'Night_Chicken' ").reset_index(drop=True)
print(df_night_chicken.shape)
df_night_chicken.head() 

(135, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
0,t1_dslw4u6,/r/Celebs/comments/7pysyp/amber_heard/dslw4u6/,What? What did she hear?,t3_7pysyp,r/Celebs,Night_Chicken,2018-01-13 04:48:41,Neutral,Neutral,-6.0,submission,comment,5,amber_heard,2,[],0
1,t1_dvufzj9,/r/Celebs/comments/844rw7/amber_heard/dvufzj9/,What? Thank you!,t1_dvn0nfy,r/Celebs,Night_Chicken,2018-03-17 13:22:00,Neutral,Positive,1.0,comment,comment,3,amber_heard,2,[],0
2,t1_dvug20l,/r/Celebs/comments/81y6lx/amber_heard/dvug20l/,What? What did she hear?,t3_81y6lx,r/Celebs,Night_Chicken,2018-03-17 13:23:50,Neutral,Neutral,1.0,submission,comment,5,amber_heard,2,[],0
3,t1_e3uftxn,/r/Celebs/comments/94w2gm/amber_heard/e3uftxn/,What? What did she hear?,t3_94w2gm,r/Celebs,Night_Chicken,2018-08-08 20:07:11,Neutral,Neutral,1.0,submission,comment,5,amber_heard,2,[],0
4,t1_e3ufvla,/r/Celebs/comments/94tmn2/amber_heard/e3ufvla/,What? What did she hear?,t3_94tmn2,r/Celebs,Night_Chicken,2018-08-08 20:07:50,Neutral,Neutral,1.0,submission,comment,5,amber_heard,2,[],0


In [13]:
df_night_chicken['permalink'].iloc[0]

'/r/Celebs/comments/7pysyp/amber_heard/dslw4u6/'

In [14]:
df_night_chicken.subreddit.value_counts().head(10)

r/Celebs    135
Name: subreddit, dtype: int64

In [205]:
df_night_chicken.submission_comment.value_counts()

comment    135
Name: submission_comment, dtype: int64

In [15]:
df_night_chicken.text.value_counts().head()

What?  What did she hear?                                                                                           121
What? What did she hear?                                                                                              4
I want to know as well.                                                                                               1
What?  Thank you!                                                                                                     1
Excellent!  Not as exciting as the crunching and jostling sounds in Elon's full pockets which escaped her grasp.      1
Name: text, dtype: int64

<a id='emilyguy'></a>
#### emilyguy
<font color=''> </font>

<ul>
<li><a href="#AutoModerator">AutoModerator</a></li>
<li><a href="#Night_Chicken">Night_Chicken</a></li>
<li><a href="#emilyguy	"><font color=''><b><mark>emilyguy</b></mark></font></a></li>
<li><a href="#ccrraapp	"><font color=''>ccrraapp</font></a></li>
<li><a href="#AutoNewspaperAdmin"><font>AutoNewspaperAdmin </font></a></li>
<li><a href="#Rednaxela117"><font color=''>Rednaxela117</font></a></li>
</ul>

In [16]:
df_emily = df.query(" author == 'emilyguy' ").reset_index(drop=True)
print(df_emily.shape)
df_emily.head() 

(38, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
0,t3_82wa5c,/r/gentlemanboners/comments/82wa5c/amber_heard/,Amber Heard,,r/gentlemanboners,emilyguy,2018-03-08 09:30:44,Neutral,Neutral,3814.0,,submission,2,amber_heard,2,[],0
1,t3_842r8g,/r/gentlemanboners/comments/842r8g/amber_heard/,Amber Heard,,r/gentlemanboners,emilyguy,2018-03-13 09:09:02,Neutral,Neutral,442.0,,submission,2,amber_heard,2,[],0
2,t3_85j0o6,/r/gentlemanboners/comments/85j0o6/amber_heard/,Amber Heard,,r/gentlemanboners,emilyguy,2018-03-19 12:18:13,Neutral,Neutral,595.0,,submission,2,amber_heard,2,[],0
3,t3_8694b7,/r/WatchItForThePlot/comments/8694b7/amber_hea...,Amber Heard has an amazing body,,r/WatchItForThePlot,emilyguy,2018-03-22 05:16:10,Positive,Positive,3165.0,,submission,6,amber_heard_has_an_amazing_body,6,[],0
4,t3_8694cl,/r/celebsnaked/comments/8694cl/amber_heard/,Amber Heard,,r/celebsnaked,emilyguy,2018-03-22 05:16:21,Neutral,Neutral,316.0,,submission,2,amber_heard,2,[],0


In [17]:
df_emily.submission_comment.value_counts()

submission    34
comment        4
Name: submission_comment, dtype: int64

In [18]:
df_emily['permalink'].iloc[0]

'/r/gentlemanboners/comments/82wa5c/amber_heard/'

In [19]:
df_emily.subreddit.value_counts().head(10)

r/gentlemanboners      19
r/Celebs                9
r/celebsnaked           4
r/celebnsfw             3
r/WatchItForThePlot     2
r/CaraDelevingne        1
Name: subreddit, dtype: int64

In [20]:
df_emily.text.value_counts().head(10)

Amber Heard                        31
The Informers (2008)                2
with Amber Heard                    1
The Informers                       1
The Playboy Club (2011)             1
Amber Heard and Jessica Alba        1
Amber Heard has an amazing body     1
Name: text, dtype: int64

<a id='ccrraapp'></a>
#### ccrraapp
<font color=''></font>

<ul>
<li><a href="#AutoModerator">AutoModerator</a></li>
<li><a href="#Night_Chicken">Night_Chicken</a></li>
<li><a href="#emilyguy	"><font color=''>emilyguy</font></a></li>
<li><a href="#ccrraapp	"><font color=''><b><mark>ccrraapp</b></mark></font></a></li>
<li><a href="#AutoNewspaperAdmin"><font>AutoNewspaperAdmin </font></a></li>
<li><a href="#Rednaxela117"><font color=''>Rednaxela117</font></a></li>
</ul>

In [21]:
df_users[df_users.user_name == 'ccrraapp']

Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
5162,ccrraapp,True,True,False,False,34499.0,114824.0,2012-03-11 19:58:27,others,others


In [22]:
df_crap = df.query(" author == 'ccrraapp' ").reset_index(drop=True)
print(df_crap.shape)
df_crap.head() 

(34, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
0,t1_e0pfa56,/r/DCEUboners/comments/8r6fjx/amber_heard/e0pf...,Oh wow!,t3_8r6fjx,r/DCEUboners,ccrraapp,2018-06-15 07:40:04,Positive,Positive,2.0,submission,comment,2,amber_heard,2,[],0
1,t1_e0pfbb5,/r/DCEUboners/comments/8r6bq6/nicole_kidman_am...,Nicole Kidman looks stunning. At this point on...,t3_8r6bq6,r/DCEUboners,ccrraapp,2018-06-15 07:41:12,Positive,Positive,1.0,submission,comment,15,nicole_kidman_amber_heard,4,[],0
2,t1_e0x3uz0,/r/DCEUboners/comments/8s4e6n/amber_heard/e0x3...,Her dress is so perfectly wrapping her.,t3_8s4e6n,r/DCEUboners,ccrraapp,2018-06-19 09:09:46,Positive,Positive,1.0,submission,comment,7,amber_heard,2,[],0
3,t1_e2pq776,/r/DCEUboners/comments/90dxcj/amber_heard/e2pq...,Did she lose a lot of weight for this role? T...,t3_90dxcj,r/DCEUboners,ccrraapp,2018-07-20 08:23:49,Positive,Neutral,2.0,submission,comment,15,amber_heard,2,[],0
4,t1_e2psowf,/r/DCEUboners/comments/90dxcj/amber_heard/e2ps...,Yes but she looks slim than before.,t1_e2prpct,r/DCEUboners,ccrraapp,2018-07-20 09:51:55,Neutral,Positive,2.0,comment,comment,7,amber_heard,2,[],0


In [23]:
df_crap.subreddit.value_counts().head(10)

r/geekboners         19
r/DCEUboners         14
r/gentlemanboners     1
Name: subreddit, dtype: int64

In [24]:
df_crap.submission_comment.value_counts()

submission    27
comment        7
Name: submission_comment, dtype: int64

In [25]:
df_crap.text.value_counts().head(10)

[Aquaman] Amber Heard                                                                                                                                    16
Amber Heard                                                                                                                                              10
Idk what to tell you but she slimmed down a bit and is more athletic shaped since she started for Aquaman. Maybe thats why you don't see the old her.     1
Did she lose a lot of weight for this role?  To fit in that suit?                                                                                         1
Oh wow!                                                                                                                                                   1
Yes but she looks slim than before.                                                                                                                       1
Her dress is so perfectly wrapping her.                         

In [26]:
df_crap_contributions = df_crap.groupby(df_crap.created_at.dt.date).size().reset_index(name='n_contributions')
df_crap_contributions.sort_values('n_contributions', ascending=False).head(10)

Unnamed: 0,created_at,n_contributions
17,2018-12-06,3
4,2018-09-10,3
0,2018-06-15,2
9,2018-11-02,2
18,2018-12-09,2
15,2018-11-28,2
14,2018-11-27,2
13,2018-11-20,2
11,2018-11-13,2
10,2018-11-08,2


<a id='AutoNewspaperAdmin'></a>
#### AutoNewspaperAdmin
<font color=''> </font>

<ul>
<li><a href="#AutoModerator">AutoModerator</a></li>
<li><a href="#Night_Chicken">Night_Chicken</a></li>
<li><a href="#emilyguy	"><font color=''>emilyguy</font></a></li>
<li><a href="#ccrraapp	"><font color=''>ccrraapp</font></a></li>
<li><a href="#AutoNewspaperAdmin"><font><b><mark>AutoNewspaperAdmin </b></mark></font></a></li>
<li><a href="#Rednaxela117"><font color=''>Rednaxela117</font></a></li>
</ul>

In [27]:
# check the date this account was creted
df_users[df_users.user_name == 'AutoNewspaperAdmin']

Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
25721,AutoNewspaperAdmin,True,True,False,False,1.0,249122.0,2016-10-29 06:43:28,others,others


In [28]:
df_auto = df.query(" author == 'AutoNewspaperAdmin' ").reset_index(drop=True)
print(df_auto.shape)
df_auto.head() 

(33, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
0,t3_8a394f,/r/AutoNewspaper/comments/8a394f/entertainment...,[Entertainment] - Amber Heard says meeting Syr...,,r/AutoNewspaper,AutoNewspaperAdmin,2018-04-05 20:31:39,Neutral,Neutral,1.0,,submission,14,entertainment_amber_heard_says_meeting_syria,6,[],0
1,t3_8a3i6o,/r/AutoNewspaper/comments/8a3i6o/entertainment...,[Entertainment] - Amber Heard says meeting Syr...,,r/AutoNewspaper,AutoNewspaperAdmin,2018-04-05 21:01:37,Neutral,Neutral,1.0,,submission,14,entertainment_amber_heard_says_meeting_syria,6,[],0
2,t3_8jp0vd,/r/AutoNewspaper/comments/8jp0vd/lifestyle_in_...,"[Lifestyle] - In Cannes, Amber Heard chats abo...",,r/AutoNewspaper,AutoNewspaperAdmin,2018-05-15 20:47:12,Neutral,Neutral,1.0,,submission,16,lifestyle_in_cannes_amber_heard_chats_about_big,8,[],0
3,t3_8ttimo,/r/AutoNewspaper/comments/8ttimo/national_ambe...,"[National] - Amber Heard, other celebs travel ...",,r/AutoNewspaper,AutoNewspaperAdmin,2018-06-25 19:46:01,Negative,Negative,1.0,,submission,17,national_amber_heard_other_celebs_travel_to_texas,8,[],0
4,t3_8vur4r,/r/AutoNewspaper/comments/8vur4r/video_amber_h...,[Video] - Amber Heard upsets fans with offensi...,,r/AutoNewspaper,AutoNewspaperAdmin,2018-07-03 18:48:38,Neutral,Negative,1.0,,submission,11,video_amber_heard_upsets_fans_with_offensive,7,[],0


In [29]:
df_auto.subreddit.value_counts().head(10)

r/AutoNewspaper    33
Name: subreddit, dtype: int64

In [30]:
df_auto.text.value_counts().head(10)

[Entertainment] - Amber Heard says she is happy to have moved on with her life | ABC                                       2
[Entertainment] - Amber Heard: Time's Up movement 'has made incredible gains' | USA Today                                  1
[Entertainment] - Amber Heard says meeting Syria refugees left indelible mark | Miami Herald                               1
[Entertainment] - WATCH: Amber Heard dishes on Jason Momoa's pranks on 'Aquaman' set | ABC                                 1
[Entertainment] - Amber Heard: Style Diary | USA Today                                                                     1
[Entertainment] - Janelle Monae, Hillary Clinton, Amber Heard hit the 2018 Glamour Women of the Year Awards | USA Today    1
[Video] - Amber Heard talks superhero roles for women | FOX                                                                1
[Entertainment] - A couture swim cap? Amber Heard rocks an eye-popping headpiece at 'Aquaman' premiere | USA Today         1


In [31]:
df_auto = df_auto.groupby(df_auto.created_at.dt.date).size().reset_index(name='n_contributions')
df_auto

Unnamed: 0,created_at,n_contributions
0,2018-04-05,2
1,2018-05-15,1
2,2018-06-25,1
3,2018-07-03,3
4,2018-09-14,1
5,2018-10-03,2
6,2018-10-10,1
7,2018-10-22,4
8,2018-10-25,5
9,2018-10-29,1


<a id='Rednaxela117'></a>
#### Rednaxela117
<font color=''>news bot</font>

<ul>
<li><a href="#AutoModerator">AutoModerator</a></li>
<li><a href="#Night_Chicken">Night_Chicken</a></li>
<li><a href="#emilyguy	"><font color=''>emilyguy</font></a></li>
<li><a href="#ccrraapp	"><font color=''>ccrraapp</font></a></li>
<li><a href="#AutoNewspaperAdmin"><font>AutoNewspaperAdmin </font></a></li>
<li><a href="#Rednaxela117"><font color=''><b><mark>Rednaxela117</b></mark></font></a></li>
</ul>

In [32]:
df_rad = df.query(" author == 'Rednaxela117' ").reset_index(drop=True)
print(df_rad.shape)
df_rad.head() 

(33, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
0,t3_7zgadd,/r/SexyWomanOfTheDay/comments/7zgadd/amber_hea...,Amber Heard: easy on the eyes,,r/SexyWomanOfTheDay,Rednaxela117,2018-02-22 16:47:57,Positive,Positive,73.0,,submission,6,amber_heard_easy_on_the_eyes,6,[],0
1,t3_7zgcdi,/r/SexyWomanOfTheDay/comments/7zgcdi/amber_hea...,Amber Heard: interesting swimwear,,r/SexyWomanOfTheDay,Rednaxela117,2018-02-22 16:54:44,Positive,Positive,72.0,,submission,4,amber_heard_interesting_swimwear,4,[],0
2,t1_dunqfh1,/r/SexyWomanOfTheDay/comments/7zg975/amber_hea...,Apparently she's the most beautiful woman in t...,t3_7zg975,r/SexyWomanOfTheDay,Rednaxela117,2018-02-22 16:57:06,Positive,Positive,2.0,submission,comment,13,amber_heard_is_todays_sexy_woman_of_the_day,9,['https://www.maxim.com'],1
3,t3_7zgdf3,/r/SexyWomanOfTheDay/comments/7zgdf3/amber_hea...,Amber Heard: as Mera,,r/SexyWomanOfTheDay,Rednaxela117,2018-02-22 16:58:11,Neutral,Neutral,114.0,,submission,4,amber_heard_as_mera,4,[],0
4,t3_7zgh46,/r/SexyWomanOfTheDay/comments/7zgh46/amber_hea...,Amber Heard: lounging around,,r/SexyWomanOfTheDay,Rednaxela117,2018-02-22 17:11:13,Neutral,Neutral,63.0,,submission,4,amber_heard_lounging_around,4,[],0


In [33]:
df_rad.subreddit.value_counts().head(10)

r/Celebs               11
r/SexyWomanOfTheDay    10
r/gentlemanboners       8
r/PrettyGirls           2
r/goddesses             1
r/celebritylegs         1
Name: subreddit, dtype: int64

In [34]:
df_rad['permalink'].iloc[5]

'/r/SexyWomanOfTheDay/comments/7zgozg/amber_heard_legs_for_days/'

In [35]:
df_rad.text.value_counts().head(10)

Amber Heard                          18
Crazy, not dumb.                      1
Don't we all haha                     1
Amber Heard: lounging around          1
Amber Heard: easy on the eyes         1
She's as hot as she is crazy haha     1
Red heads, you gotta love them.       1
Gorgeous with red hair in Aquaman     1
Amber Heard: as Mera                  1
Amber Heard: business casual          1
Name: text, dtype: int64

In [36]:
df_rad_contributions = df_rad.groupby(df_rad.created_at.dt.date).size().reset_index(name='n_contributions')
df_rad_contributions.sort_values('n_contributions', ascending=False).head(10)

Unnamed: 0,created_at,n_contributions
0,2018-02-22,8
3,2018-05-01,8
1,2018-02-23,2
2,2018-03-28,2
4,2018-05-02,2
13,2018-12-29,2
5,2018-05-03,1
6,2018-05-25,1
7,2018-07-10,1
8,2018-07-22,1


<a id='NegativeCommented_state'></a>

>>### Check wether the users with the most contributions are mod, gold or having a verified email

<ul>
<li><a href="#submissions_comments">The # of submissions VS the # of comments</a></li>   
<br>
<li><a href="#mostcommented_user">Most contributed user</a></li>
<li><a href="#NegativeCommented_state"><b><mark>Check wether the users contributing the most <br> are mod, gold or having a verified email</mark></b></a></li>
<br>
<li><a href="#text">Investigate the text column</a></li>
<li><a href="#text_words">Check the number of text words</a></li>
</ul>

In [37]:
df.author.value_counts().nlargest(n=25)

-banned-              1666
Night_Chicken          135
AutoModerator           50
emilyguy                38
AutoNewsAdmin           35
ccrraapp                34
Rednaxela117            33
AutoNewspaperAdmin      33
ZorakLocust             27
tenchineuro             22
RuleIV                  22
jeff98379               21
vonmark955              20
InfiniTitans            20
Chronos2016             20
bundt_trundler          18
ZadocPaet               18
nobodycares65           18
Queen1110               17
MightUlt-7              16
Count_Fapula1           16
FlexOutlaw              15
sagar7854               15
NaveHarder              14
AngelaStettner69        13
Name: author, dtype: int64

In [38]:
check_list = df.author.value_counts().nlargest(n=25).index.tolist()[1:]
check_list

['Night_Chicken',
 'AutoModerator',
 'emilyguy',
 'AutoNewsAdmin',
 'ccrraapp',
 'Rednaxela117',
 'AutoNewspaperAdmin',
 'ZorakLocust',
 'tenchineuro',
 'RuleIV',
 'jeff98379',
 'vonmark955',
 'InfiniTitans',
 'Chronos2016',
 'bundt_trundler',
 'ZadocPaet',
 'nobodycares65',
 'Queen1110',
 'MightUlt-7',
 'Count_Fapula1',
 'FlexOutlaw',
 'sagar7854',
 'NaveHarder',
 'AngelaStettner69']

In [39]:
# get a data frame with the most negative-comments users
df_check = df_users[df_users['user_name'].isin(check_list)]
print(df_check.shape)
df_check.head(2)

(24, 10)


Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
1520,FlexOutlaw,True,False,False,False,5217.0,154864.0,2010-09-25 18:36:46,others,others
4432,AutoModerator,True,True,True,False,1000.0,1000.0,2012-01-05 05:24:28,others,others


In [40]:
df_check['user_name'].nunique()

24

In [41]:
def get_stats(df):
    for col in df.columns:
        if col not in ['user_name', 'user_created_at']:
            if col not in ['link_karma', 'comment_karma']:
                print('The value counts of the users with the most contributions: ' + col)
                print(df_check[col].value_counts())
                print('\n')
                
            else:
                print("The min of {}".format(col), round(df_check[col].min(),2))
                print('\n')
                print("The max of {}".format(col), round(df_check[col].max(),2))
                print('\n')
                print("The mean of {}".format(col), round(df_check[col].mean(),2))
                print('\n')
                print("The median of {}".format(col), round(df_check[col].mean(),2))
                print('\n')


In [42]:
get_stats(df_check)

The value counts of the users with the most contributions: has_verified_email
True     23
False     1
Name: has_verified_email, dtype: int64


The value counts of the users with the most contributions: is_mod
True     14
False    10
Name: is_mod, dtype: int64


The value counts of the users with the most contributions: is_gold
False    19
True      5
Name: is_gold, dtype: int64


The value counts of the users with the most contributions: is_banned
False    22
True      2
Name: is_banned, dtype: int64


The min of comment_karma -1.0


The max of comment_karma 280938.0


The mean of comment_karma 32688.95


The median of comment_karma 32688.95


The min of link_karma 860.0


The max of link_karma 2838485.0


The mean of link_karma 386394.59


The median of link_karma 386394.59


The value counts of the users with the most contributions: banned_unverified
others        21
banned         2
unverified     1
Name: banned_unverified, dtype: int64


The value counts of the users with the most 

<a id='text'></a>
>>### Investigate the text column


<ul>
<li><a href="#submissions_comments">The # of submissions VS the # of comments</a></li>   
<br>
<li><a href="#mostcommented_user">Most contributed user</a></li>
<li><a href="#NegativeCommented_state">Check wether the users contributing the most to comments/submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#text"><b><mark>Investigate the text column</mark></b></a></li>
<li><a href="#text_words">Check the number of text words</a></li>
</ul>

In [43]:
# pd.set_option('display.max_colwidth', None)
suspected_dict = {}

## Investigate The Most Negative key Words Used (from the wordcloud map)

### Lets first check for the users using the word 'f*ck'

In [44]:
df_fuc = df[df.text.str.lower().str.contains('fuck')]
print(df_fuc.shape)
df_fuc.head()

(225, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
5,t1_ds0ylr9,/r/JerkOffToCelebs/comments/7nbhsf/amber_heard...,"Tits look primed to get fucked, though yeah, I...",t3_7nbhsf,r/JerkOffToCelebs,-banned-,2018-01-01 04:44:31,Negative,Neutral,2.0,submission,comment,14,amber_heard_would_be_such_a_dirty_slut_in_bed,10,[],0
6,t1_ds1ghu7,/r/elonmusk/comments/7n76bc/amber_heard_and_el...,"He's asking for it - he's living a soap opera,...",t1_ds0339s,r/elonmusk,-banned-,2018-01-01 16:38:18,Negative,Neutral,2.0,comment,comment,81,amber_heard_and_elon_musk_spotted_vacationing_in,8,[],0
13,t1_ds2hdi8,/r/elonmusk/comments/7mbv6q/amber_heard_and_el...,She's a fucking spider.,t3_7mbv6q,r/elonmusk,BoracayBatCave,2018-01-02 05:29:24,Negative,Neutral,2.0,submission,comment,4,amber_heard_and_elon_musk_are_reportedly_back,8,[],0
21,t1_ds4unfi,/r/gentlemanboners/comments/7nolyx/amber_heard...,Damn Elon you fucked up,t3_7nolyx,r/gentlemanboners,-banned-,2018-01-03 17:01:17,Negative,Negative,1.0,submission,comment,5,amber_heard,2,[],0
35,t3_7ohkn2,/r/u_NeverOwnedAnyone/comments/7ohkn2/while_el...,While Elin Musk is riding high on his free pat...,,u/NeverOwnedAnyone,-banned-,2018-01-06 06:14:59,Positive,Positive,1.0,,submission,29,u_NeverOwnedAnyone,2,[],0


In [45]:
# get the authors of these submissions having 'fuck_amber_heard'

mask = (df['submission_text'] == 'fuck_amber_heard') & (df['submission_comment'] == 'submission')
df_sub = df[mask]
print(df_sub.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_sub.head())

(0, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count


In [46]:
# get the authors of these submissions having the same submission text

mask = (df['submission_text'].str.contains('fuck')) & (df['submission_comment'] == 'submission')
df_sub_fuc = df[mask]
print(df_sub_fuc.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_sub_fuc.head())

(15, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
107,t3_7rq9ap,/r/dirtypenpals/comments/7rq9ap/22f4a_anyone_with_a_fucked_up_mind_have_a_crush/,"22f4a - Anyone with a fucked up mind have a crush on Amber Heard, Lexi Belle, Lauren Cohan, Shay Mitchell?",,r/dirtypenpals,-banned-,2018-01-20 13:02:08,Negative,Negative,1.0,,submission,20,22f4a_anyone_with_a_fucked_up_mind_have_a_crush,10,[],0
265,t3_7wiqf8,/r/JerkOffToCelebs/comments/7wiqf8/amber_heard_deserves_a_good_rough_fuck/,Amber Heard deserves a good rough fuck.,,r/JerkOffToCelebs,arsenalmat,2018-02-10 02:40:27,Positive,Neutral,20.0,,submission,7,amber_heard_deserves_a_good_rough_fuck,7,[],0
433,t3_82lvbs,/r/gentlemanboners/comments/82lvbs/amber_heard_left_fuck_it_id_still_tap_her/,Amber heard (left): fuck it I’d still tap her!$!!,,r/gentlemanboners,-banned-,2018-03-07 05:21:31,Negative,Negative,0.0,,submission,9,amber_heard_left_fuck_it_id_still_tap_her,9,[],0
1055,t3_8cflf9,/r/JerkOffToCelebs/comments/8cflf9/amber_heard_is_ready_to_get_fucked_and_swallow_cum/,Amber Heard is ready to get fucked and swallow cum,,r/JerkOffToCelebs,-banned-,2018-04-15 14:51:48,Negative,Neutral,21.0,,submission,10,amber_heard_is_ready_to_get_fucked_and_swallow_cum,10,[],0
1769,t3_8p6csr,/r/JerkOffToCelebs/comments/8p6csr/amber_heard_is_so_fucking_hot/,Amber Heard is so fucking hot,,r/JerkOffToCelebs,arsenalmat,2018-06-07 01:42:34,Positive,Neutral,34.0,,submission,6,amber_heard_is_so_fucking_hot,6,[],0


In [47]:
df_sub_fuc_contributions = df_sub_fuc.groupby(df_sub_fuc.created_at.dt.date).size().reset_index(name='n_contributions')

fig = px.bar(df_sub_fuc_contributions,
             x='created_at', 
             y='n_contributions', title='The number of submissions with the word "F*CK" in 2018')
fig.update_traces(marker_color='red', marker_line_width=.5, opacity=1, textposition='auto') 
# , marker_line_color='#5296dd'

fig.show()


In [48]:
df_sub_fuc_contributions.sort_values('n_contributions', ascending=False)

Unnamed: 0,created_at,n_contributions
11,2018-12-28,3
9,2018-09-27,2
0,2018-01-20,1
1,2018-02-10,1
2,2018-03-07,1
3,2018-04-15,1
4,2018-06-07,1
5,2018-06-16,1
6,2018-06-28,1
7,2018-07-08,1


In [49]:
df_fuc.author.value_counts().head(10)

-banned-            65
Count_Fapula1        4
DariJC               3
TEDHARDYLEEN         2
TocTheElder          2
dirtydegrading       2
RedditZacuzzi        2
cornylamygilbert     2
chi_dist90           2
arsenalmat           2
Name: author, dtype: int64

In [50]:
df_fuc.submission_comment.value_counts()

comment       206
submission     19
Name: submission_comment, dtype: int64

In [51]:
df_fuc.subreddit.value_counts().head(10)

r/JerkOffToCelebs      54
r/gentlemanboners      18
r/WatchItForThePlot    17
r/movies               14
r/Celebs               12
r/celebJObuds          11
r/celebnsfw            10
r/MensRights           10
r/DC_Cinematic          9
r/goddesses             8
Name: subreddit, dtype: int64

In [52]:
df_fuc.created_at.dt.date.value_counts().head(10)

2018-12-19    14
2018-12-20    10
2018-05-02     5
2018-06-06     5
2018-08-15     5
2018-12-28     5
2018-08-11     5
2018-07-03     4
2018-11-19     4
2018-08-10     4
Name: created_at, dtype: int64

In [53]:
df_fuc_contributions = df_fuc.groupby(df_fuc.created_at.dt.date).size().reset_index(name='n_contributions')

fig = px.bar(df_fuc_contributions,
             x='created_at', 
             y='n_contributions', title='The number of "F*CK" contributions in 2018')
fig.update_traces(marker_color='red', marker_line_width=1, opacity=1, textposition='auto') 
# , marker_line_color='#5296dd'

fig.show()


In [54]:
df_fuc_contributions.sort_values('n_contributions', ascending=False).head(10)

Unnamed: 0,created_at,n_contributions
96,2018-12-19,14
97,2018-12-20,10
24,2018-05-02,5
61,2018-08-15,5
103,2018-12-28,5
59,2018-08-11,5
35,2018-06-06,5
88,2018-11-19,4
27,2018-05-08,4
58,2018-08-10,4


Top users who used f*ck word
- Count_Fapula1        (4)
- DariJC               (3)

#### Count_Fapula1
used the word f*ck 4 times <br>
<font color='red'>Negative Comments</font>


In [55]:
df_count = df.query(" author == 'Count_Fapula1' ")
print(df_count.shape)
df_count.head()

(16, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
5439,t3_a37wmy,/r/celebJObuds/comments/a37wmy/need_a_bud_to_r...,Need a bud to RP as Amber Heard for me,,r/celebJObuds,Count_Fapula1,2018-12-05 02:40:39,Neutral,Neutral,19.0,,submission,10,need_a_bud_to_rp_as_amber_heard_for_me,10,[],0
5440,t3_a381td,/r/JerkOffToCelebs/comments/a381td/what_id_giv...,What I'd give to have Amber Heard worshipping ...,,r/JerkOffToCelebs,Count_Fapula1,2018-12-05 02:57:38,Neutral,Neutral,52.0,,submission,10,what_id_give_to_have_amber_heard_worshipping_my,9,[],0
5463,t3_a3fhl3,/r/celebJObuds/comments/a3fhl3/still_looking_f...,Still looking for a bud to RP as Amber Heard a...,,r/celebJObuds,Count_Fapula1,2018-12-05 19:14:12,Neutral,Positive,5.0,,submission,14,still_looking_for_a_bud_to_rp_as_amber_heard_and,11,[],0
5465,t1_eb5q31k,/r/JerkOffToCelebs/comments/a381td/what_id_giv...,Right? She looks like the type that just *love...,t1_eb5pyp4,r/JerkOffToCelebs,Count_Fapula1,2018-12-05 19:15:59,Positive,Neutral,4.0,comment,comment,11,what_id_give_to_have_amber_heard_worshipping_my,9,[],0
5796,t1_ebqzlr1,/r/JerkOffToCelebs/comments/a5twvl/amber_heard...,Right? Sometimes I browse r/Amber_Heard and I'...,t3_a5twvl,r/JerkOffToCelebs,Count_Fapula1,2018-12-14 04:57:01,Positive,Neutral,3.0,submission,comment,11,amber_heard_is_so_damn_hot,6,[],0


In [56]:
df_count.created_at.dt.date.value_counts()

2018-12-05    4
2018-12-20    4
2018-12-26    2
2018-12-18    2
2018-12-23    1
2018-12-14    1
2018-12-29    1
2018-12-24    1
Name: created_at, dtype: int64

In [57]:
df_count.text.value_counts().head(3)

Amber, because I’d rather fuck Scarlett’s sweet pussy.               1
I can’t wait to see it. God, she looks so good with that red hair    1
Still looking for a bud to RP as Amber Heard and help me cum         1
Name: text, dtype: int64

In [58]:
df_count[df_count.author == 'Count_Fapula1'].submission_comment.value_counts()

comment       13
submission     3
Name: submission_comment, dtype: int64

In [59]:
df_count[df_count.author == 'Count_Fapula1'].text.value_counts()

Amber, because I’d rather fuck Scarlett’s sweet pussy.                                                  1
I can’t wait to see it. God, she looks so good with that red hair                                       1
Still looking for a bud to RP as Amber Heard and help me cum                                            1
Fuck, she's gorgeous.                                                                                   1
Oh geez... I need to see this fucking movie.                                                            1
Right? Sometimes I browse r/Amber_Heard and I'm totally smitten. Jesus Christ...                        1
God, she looks so sexy in this suit...                                                                  1
Is she on top of me like she is in the gif? Because if that's the case, we're going full on cowgirl.    1
What I'd give to have Amber Heard worshipping my cock                                                   1
Need a bud to RP as Amber Heard for me        

#### DariJC
used the word f*ck 3 times <br>
<font color='red'>Negative Comments</font>


In [60]:
df_dari = df.query(" author == 'DariJC' ")
print(df_dari.shape)
df_dari.head()

(11, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
104,t1_dsygyxg,/r/JerkOffToCelebs/comments/7roj2h/amber_heard...,She would be a really fun fuck...,t3_7roj2h,r/JerkOffToCelebs,DariJC,2018-01-20 05:38:22,Negative,Positive,13.0,submission,comment,7,amber_heardtight_and_wet,4,[],0
3888,t1_e4xbxip,/r/JerkOffToCelebs/comments/9aqp5j/amber_heard...,Giving her a rough fuck would be incredibly fu...,t3_9aqp5j,r/JerkOffToCelebs,DariJC,2018-08-27 17:03:46,Negative,Neutral,2.0,submission,comment,35,amber_heard,2,[],0
4011,t1_e52f18p,/r/JerkOffToCelebs/comments/9be7uy/amber_heard...,Needa tear off those panties and take her righ...,t3_9be7uy,r/JerkOffToCelebs,DariJC,2018-08-29 23:42:09,Positive,Neutral,2.0,submission,comment,10,amber_heard_is_freshening_up_before_the_next_r...,9,[],0
4145,t1_e6rez2z,/r/celebJObuds/comments/9jh06j/id_fuck_the_shi...,Lightly push her forward and lift that dress u...,t3_9jh06j,r/celebJObuds,DariJC,2018-09-27 22:06:24,Positive,Neutral,3.0,submission,comment,12,id_fuck_the_shit_out_of_amber_heard_in_that,10,[],0
4148,t1_e6rf3qo,/r/celebJObuds/comments/9jh06j/id_fuck_the_shi...,She def looks like she’s into the rough stuff,t1_e6rf0x6,r/celebJObuds,DariJC,2018-09-27 22:08:17,Negative,Positive,5.0,comment,comment,9,id_fuck_the_shit_out_of_amber_heard_in_that,10,[],0


In [61]:
df_dari.created_at.dt.date.value_counts()

2018-11-19    2
2018-09-27    2
2018-11-04    2
2018-08-27    1
2018-11-14    1
2018-12-31    1
2018-08-29    1
2018-01-20    1
Name: created_at, dtype: int64

In [62]:
df_dari.text.value_counts().head(3)

Unzip her and have her bend over slightly, bracing against the window.  Put on a show for everyone going by...                                                                                                                   2
She def looks like she’s into the rough stuff                                                                                                                                                                                    1
Giving her a rough fuck would be incredibly fun.  Doggystyle, smacking her ass making her moan and scream.  Seeing her lay on her stomach, breathing hard recovering afterwards, her blonde hair messily covering her face...    1
Name: text, dtype: int64

In [63]:
df_dari[df_dari.author == 'DariJC'].submission_comment.value_counts()

comment    11
Name: submission_comment, dtype: int64

In [64]:
df_dari[df_dari.author == 'DariJC'].text.value_counts()

Unzip her and have her bend over slightly, bracing against the window.  Put on a show for everyone going by...                                                                                                                   2
She def looks like she’s into the rough stuff                                                                                                                                                                                    1
Giving her a rough fuck would be incredibly fun.  Doggystyle, smacking her ass making her moan and scream.  Seeing her lay on her stomach, breathing hard recovering afterwards, her blonde hair messily covering her face...    1
Needa tear off those panties and take her right there                                                                                                                                                                            1
She’d be an incredible fuck, especially looking like she does in this pic                   

## Check for the repeated text

In [65]:
df.text.value_counts().head(25)

Amber Heard                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             

**Notes:**<br>
- "**Amber Heard**" only repeated more than 1000 times as a full contribution.
- A lot of contrbutions were deleted or removed.
- **What?  What did she hear?**  repeaated 121 times we will also find **What did she hear?**   .
- **[Aquaman] Amber Heard**    repeated 47 times
- **Amber Heard - The Informers** repeated 20 times.

There are a lot of weired textes repeated a lot like:

- "**Well I'm going to start doing preliminary emails to voice my displeasure that a DV perp is even coming back to make more millions.\n\nAnd I won't be watching that shit or buying it either\n\nEdit: ok so this is who I emailed\n\njessica.zacholl@warnerbros.com\n\nShe's in WB's Media and Press Dept.\n\nAnd this is what I said \n\nSubject: Amber Heard in Aquaman 2, why?\n\nJust curious why a studio such as WB would employ a domestic violence perpetrator for their big release? I've been told Hollywood is progressive, but apparently you can abuse your spouse and that's ok? Well I will not be seeing this movie. I get so tired of the talking about how domestic violence is bad, but it seems all of the movie industry just sits silent. #MeToo indeed."**

<a id='What? What did she hear?'></a>
### What? What did she hear?


<ul>
<li><a href="#What? What did she hear?">What? What did she hear?</a></li>
<li><a href="#[Aquaman] Amber Heard">[Aquaman] Amber Heard</a></li>
<li><a href="#Amber Heard - The Informers">Amber Heard - The Informers</a></li>
<li><a href="#Amber Heard">Amber Heard</a></li>    
<li><a href="#fuck"><b><mark>F*ck amber heard</mark></b></a></li>
</ul>

In [66]:
text = "what did she hear"

df_hear = df[df.text.str.lower().str.contains(text)]

print(df_hear.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_hear.head())

(148, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
68,t1_dslw4u6,/r/Celebs/comments/7pysyp/amber_heard/dslw4u6/,What? What did she hear?,t3_7pysyp,r/Celebs,Night_Chicken,2018-01-13 04:48:41,Neutral,Neutral,-6.0,submission,comment,5,amber_heard,2,[],0
119,t1_dt561qg,/r/goddesses/comments/7sgl66/amber_heard/dt561qg/,Well..? What did she hear?!,t3_7sgl66,r/goddesses,HEYL1STEN,2018-01-24 00:46:00,Neutral,Neutral,14.0,submission,comment,5,amber_heard,2,[],0
357,t1_duozpap,/r/WtSSTaDaMiT/comments/7zjjab/amber_heard/duozpap/,What did she hear?,t3_7zjjab,r/WtSSTaDaMiT,PigeonMan45,2018-02-23 05:39:55,Neutral,Neutral,1.0,submission,comment,4,amber_heard,2,[],0
369,t1_dus4aaq,/r/BeautifulFemales/comments/7zzek8/amber_heard/dus4aaq/,What did she hear?,t3_7zzek8,r/BeautifulFemales,predictablePosts,2018-02-25 00:27:04,Neutral,Neutral,1.0,submission,comment,4,amber_heard,2,[],0
381,t1_duwuoob,/r/DCEUboners/comments/80f3q9/amber_heard/duwuoob/,What did she hear?,t3_80f3q9,r/DCEUboners,iplayinv3rtd,2018-02-27 16:41:33,Neutral,Neutral,1.0,submission,comment,4,amber_heard,2,[],0


In [67]:
df_hear.subreddit.value_counts()

r/Celebs              131
r/goddesses             6
r/celebnsfw             2
r/sketches              1
r/CelebrityArmpits      1
r/WtSSTaDaMiT           1
r/CelebrityButts        1
r/Celebhub              1
r/BeautifulFemales      1
r/DCEUboners            1
r/PrettyGirls           1
r/CelebrityFeet         1
Name: subreddit, dtype: int64

In [68]:
df_hear.author.value_counts()

Night_Chicken       128
MyKey18               1
murph420              1
VVombatCombat         1
brownsatin            1
christpie             1
Matty_tt              1
PigeonMan45           1
omre16                1
iplayinv3rtd          1
HollowedHunter        1
no_di                 1
Augustus420           1
cynicaldotes          1
moemoolah37           1
Sonmeisterbank        1
HEYL1STEN             1
ArtemisSkrivey        1
ImJumentous           1
predictablePosts      1
empifer               1
Name: author, dtype: int64

In [69]:
df_hear[df_hear.author == 'Night_Chicken'].submission_comment.value_counts()

comment    128
Name: submission_comment, dtype: int64

In [70]:
df_hear[df_hear.author == 'Night_Chicken'].subreddit.value_counts()

r/Celebs    128
Name: subreddit, dtype: int64

128 contribution from 148 from only one user **Night_Chicken**, all of them are comments, in one subreddit **r/Celebs**

In [71]:
df_hear_contributions = df_hear.groupby(df_hear.created_at.dt.date).size().reset_index(name='n_contributions')


fig = px.bar(df_hear_contributions,
             x='created_at', 
             y='n_contributions', title='The number of "What? What did she hear?" contributions in 2018')
fig.update_traces(marker_color='red', marker_line_width=1, opacity=1, textposition='auto') 
# , marker_line_color='#5296dd'

fig.show()


In [72]:
df_hear_contributions.sort_values('n_contributions', ascending=False).head(5)

Unnamed: 0,created_at,n_contributions
17,2018-08-29,103
22,2018-09-24,4
28,2018-10-14,4
6,2018-03-13,2
15,2018-08-08,2


In [73]:
df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].author.value_counts()

Night_Chicken    103
Name: author, dtype: int64

In [74]:
df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].submission_comment.value_counts()

comment    103
Name: submission_comment, dtype: int64

In [75]:
df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].subreddit.value_counts()

r/Celebs    103
Name: subreddit, dtype: int64

In [76]:
max_t = df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].created_at.dt.time.max() 
max_t

datetime.time(4, 8, 48)

In [77]:
min_t = df_hear[df_hear.created_at.dt.date.astype(str) == '2018-08-29'].created_at.dt.time.min()
min_t

datetime.time(3, 41, 58)

In [78]:
from datetime import datetime, date

datetime.combine(date.today(), max_t) - datetime.combine(date.today(), min_t)

datetime.timedelta(seconds=1610)

In [79]:
1610 / 60

26.833333333333332

**103 contributions**, all of them are **comments**, in with the same text in one day **29-08-2018** by one user **Night_Chicken** in one subreddit **r/Celebs** in only **27 minutes**.

<a id='[Aquaman] Amber Heard'></a>
### [Aquaman] Amber Heard


<ul>
<li><a href="#What? What did she hear?">What? What did she hear?</a></li>
<li><a href="#[Aquaman] Amber Heard">[Aquaman] Amber Heard</a></li>
<li><a href="#Amber Heard - The Informers">Amber Heard - The Informers</a></li>
<li><a href="#Amber Heard">Amber Heard</a></li>   
<li><a href="#fuck">F*ck amber heard</a></li>
</ul>

In [80]:
text = "[aquaman] amber heard"

df_aqua = df[df.text.str.lower().str.contains(text)]

print(df_aqua.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_aqua.head())

(30, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
107,t3_7rq9ap,/r/dirtypenpals/comments/7rq9ap/22f4a_anyone_with_a_fucked_up_mind_have_a_crush/,"22f4a - Anyone with a fucked up mind have a crush on Amber Heard, Lexi Belle, Lauren Cohan, Shay Mitchell?",,r/dirtypenpals,-banned-,2018-01-20 13:02:08,Negative,Negative,1.0,,submission,20,22f4a_anyone_with_a_fucked_up_mind_have_a_crush,10,[],0
153,t3_7ur89i,/r/REPORT_ITALIANO/comments/7ur89i/il_nerd_elon_musk_e_la_modella_amber_heard_si/,Il nerd Elon Musk e la modella Amber Heard si sono lasciati (ancora),,r/REPORT_ITALIANO,report_italiano,2018-02-02 13:19:18,Neutral,Negative,1.0,,submission,13,REPORT_ITALIANO,2,[],0
229,t1_dtuxiz0,/r/moviescirclejerk/comments/7vt6wh/amber_heard/dtuxiz0/,">I’m not referring to Amber Heard not having the acting chops to play Mera (although she’s not exactly the most memorable actress out there), I’m more so referring to the fact that I honestly don’t think Heard is a very good person. It’s pretty ironic to me that people are outraged over Johnny Depp playing Grindelwald, when I honestly think that if any actor in a 2018 WB blockbuster deserves outrage, it’s Amber Heard.\n\n>I’m going to go ahead and say that I’m convinced at this point that she has been lying about being abused by Johnny Depp. I know that not believing the accuser often comes across as a bad thing to do, but there is evidence to suggest that she has been trying to frame Depp. Not only did the police apparently find zero evidence to suggest that Depp assaulted her, but the supposed video that Heard presented to “prove” that she was assaulted didn’t even show him doing anything of the sort. I’ve watched the video several times, and while there’s no denying that Depp is angry in it, none of it looks like it’s directed towards Heard. At no point in the video is he shown being physically or verbally abusive to her, nor does she seem like she’s scared of him in the slightest. This would of course explain why Heard hasn’t presented that “evidence” to court.\n\n>Furthermore, Heard demanded money from Depp from this whole ordeal. I know that people respond to abuse differently, but if she is truly someone who wanted out of an abusive relationship because she feared for her life, why exactly was her first instinct to try and get a piece of Johnny Depp’s money? This is not even getting into the fact that between the two of them, the only one who’s actually been arrested for domestic violence has been Amber Heard herself.\n\n>Frankly, I think that Amber Heard is a very shady, self-centered, and possibly narcissistic individual who only married Johnny Depp because she wanted his money, and wanted to enjoy the publicity of being someone who was “abused” by an A-lister.\n\n>With all this in mind, I really don’t like that she’s going to be a major character in Aquaman. I’ll still see the movie regardless, but I really don’t like that it has Amber Heard playing the protagonist’s love interest.",t3_7vt6wh,r/moviescirclejerk,Baramos_,2018-02-07 03:23:41,Positive,Neutral,9.0,submission,comment,394,amber_heard,2,[],0
247,t1_dtwk5f6,/r/DC_Cinematic/comments/7vr5ia/discussion_amber_heard_playing_mera_rubs_me_the/dtwk5f6/,"It kind of seemed like he was minding his own business at first. Also, according to reports, Heard was apparently “egging him on” and heavily edited the video: http://www.tmz.com/2016/08/12/johnny-depp-amber-heard-throws-wine-glass-domestic-violence-video/. \n\nAlso, need I once again point out that between the two of them, the only one who’s been arrested for domestic violence has been Amber Heard herself?",t1_dtwi26a,r/DC_Cinematic,ZorakLocust,2018-02-07 23:44:55,Positive,Negative,2.0,comment,comment,56,DC_Cinematic,2,['http://www.tmz.com'],1
253,t1_dtwzyeo,/r/DC_Cinematic/comments/7vr5ia/discussion_amber_heard_playing_mera_rubs_me_the/dtwzyeo/,Ben Affleck’s received plenty of backlash for his actions. I don’t see much of anyone bringing up what a lousy person Amber Heard is though.,t1_dtwut2v,r/DC_Cinematic,ZorakLocust,2018-02-08 04:39:04,Negative,Negative,2.0,comment,comment,25,DC_Cinematic,2,[],0


In [81]:
df_aqua.subreddit.value_counts()

r/DC_Cinematic        5
r/news                3
r/newstweetfeed       2
r/worldnews           2
r/Elon_musketeers     2
r/movies              2
r/AutoNewspaper       1
r/Amber_Heard         1
r/dirtypenpals        1
r/EnoughMuskSpam      1
r/morganwade          1
r/pickoneceleb        1
r/goddesses           1
r/FOXauto             1
r/removalbot          1
r/Spillthetea         1
r/entertainment       1
r/TwoXChromosomes     1
r/REPORT_ITALIANO     1
r/moviescirclejerk    1
Name: subreddit, dtype: int64

In [82]:
df_aqua.author.value_counts()

-banned-               5
BSRussell              2
ZorakLocust            2
jeff98379              2
iDevice_Help           2
trendynewsupdate       1
PHANTOMCREEPER         1
report_italiano        1
OwlWayneOwlwards       1
illegitimatemexican    1
Chronos2016            1
KrishAndChips          1
Nuggetry               1
Baramos_               1
BrkntKlc               1
morganwade             1
removalbot             1
viralreportnow         1
AutoNewspaperAdmin     1
AutoNewsAdmin          1
nomnomnomhangry        1
worldwide__master      1
Name: author, dtype: int64

<a id='Amber Heard - The Informers'></a>
### Amber Heard - The Informers


<ul>
<li><a href="#What? What did she hear?">What? What did she hear?</a></li>
<li><a href="#[Aquaman] Amber Heard">[Aquaman] Amber Heard</a></li>
<li><a href="#Amber Heard - The Informers">Amber Heard - The Informers</a></li>
<li><a href="#Amber Heard">Amber Heard</a></li>   
<li><a href="#fuck">F*ck amber heard</a></li>
</ul>

In [83]:
text = "Amber Heard - The Informers".lower()

df_inf = df[df.text.str.lower().str.contains(text)]

print(df_inf.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_inf.head())

(27, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
450,t3_82xah3,/r/WatchItForThePlot/comments/82xah3/amber_heard_the_informers_2008/,Amber Heard - The Informers (2008),,r/WatchItForThePlot,-banned-,2018-03-08 12:58:59,Neutral,Neutral,264.0,,submission,6,amber_heard_the_informers_2008,5,[],0
1264,t3_8hx1bz,/r/WatchItForThePlot/comments/8hx1bz/amber_heard_the_informers/,Amber Heard - The Informers,,r/WatchItForThePlot,-banned-,2018-05-08 14:18:26,Neutral,Neutral,3749.0,,submission,5,amber_heard_the_informers,4,[],0
1282,t3_8hzcyg,/r/CelebsGW/comments/8hzcyg/amber_heard_the_informers/,Amber Heard - The Informers,,r/CelebsGW,CelebsGW,2018-05-08 19:06:20,Neutral,Neutral,91.0,,submission,5,amber_heard_the_informers,4,[],0
2225,t3_8vslyy,/r/WatchItForThePlot/comments/8vslyy/amber_heard_the_informers_2008/,Amber Heard - The Informers (2008),,r/WatchItForThePlot,-banned-,2018-07-03 14:26:24,Neutral,Neutral,279.0,,submission,6,amber_heard_the_informers_2008,5,[],0
2825,t3_91crfc,/r/WatchItForThePlot/comments/91crfc/amber_heard_the_informers/,Amber Heard - The Informers,,r/WatchItForThePlot,Ezio9619,2018-07-24 00:51:55,Neutral,Neutral,459.0,,submission,5,amber_heard_the_informers,4,[],0


In [84]:
df_inf.subreddit.value_counts()

r/WatchItForThePlot    8
r/celebnsfw            3
r/celebsnaked          3
r/CelebSexScenes       2
r/Celebsnudess         2
r/Amber_Heard          2
r/PopCultureGifs       2
r/CelebsGW             1
r/nsfwcelebgifs        1
r/adultgifs            1
u/pccpux               1
r/Celebhub             1
Name: subreddit, dtype: int64

In [85]:
df_inf.author.value_counts()

-banned-      9
Ezio9619      8
vonmark955    4
vonjobi951    2
GRJR721       2
CelebsGW      1
vonjobi956    1
Name: author, dtype: int64

In [86]:
df_inf[df_inf.author == 'Ezio9619'].subreddit.value_counts()

r/celebsnaked          3
r/WatchItForThePlot    3
r/celebnsfw            1
r/Celebhub             1
Name: subreddit, dtype: int64

In [87]:
df_inf[df_inf.author == 'Ezio9619'].created_at.dt.date.value_counts()

2018-07-24    4
2018-08-14    4
Name: created_at, dtype: int64

<a id='Amber Heard'></a>
### Amber Heard


<ul>
<li><a href="#What? What did she hear?">What? What did she hear?</a></li>
<li><a href="#[Aquaman] Amber Heard">[Aquaman] Amber Heard</a></li>
<li><a href="#Amber Heard - The Informers">Amber Heard - The Informers</a></li>
<li><a href="#Amber Heard">Amber Heard</a></li>   
<li><a href="#fuck">F*ck amber heard</a></li>
</ul>

In [88]:
text = "Amber Heard".lower()

df_ah = df[df.text.str.lower().str.contains(text)]

print(df_ah.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_ah.head())

(2213, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
10,t3_7nkbt3,/r/gentlemanboners/comments/7nkbt3/amber_heard/,Amber Heard,,r/gentlemanboners,ZadocPaet,2018-01-02 05:07:43,Neutral,Neutral,5.0,,submission,2,amber_heard,2,[],0
11,t3_7nkbua,/r/DCEUboners/comments/7nkbua/amber_heard/,Amber Heard,,r/DCEUboners,ZadocPaet,2018-01-02 05:07:55,Neutral,Neutral,45.0,,submission,2,amber_heard,2,[],0
16,t3_7nolyx,/r/gentlemanboners/comments/7nolyx/amber_heard/,Amber Heard,,r/gentlemanboners,-banned-,2018-01-02 19:17:12,Neutral,Neutral,330.0,,submission,2,amber_heard,2,[],0
20,t3_7nuh77,/r/gentlemanboners/comments/7nuh77/lovely_duo_amber_heard_jessica_alba/,Lovely Duo. Amber Heard & Jessica Alba,,r/gentlemanboners,-banned-,2018-01-03 12:58:23,Positive,Positive,570.0,,submission,7,lovely_duo_amber_heard_jessica_alba,6,[],0
24,t3_7nxwrq,/r/gentlemanboners/comments/7nxwrq/amber_heard/,Amber Heard,,r/gentlemanboners,-banned-,2018-01-03 21:34:14,Neutral,Neutral,2.0,,submission,2,amber_heard,2,[],0


In [89]:
df_ah.subreddit.value_counts()

r/Celebs                 350
r/gentlemanboners        296
r/DCEUboners              87
r/DC_Cinematic            85
r/goddesses               79
                        ... 
r/jessicaalba              1
r/OnOff                    1
r/PickHerOutfit            1
r/JerkOffToDesiCelebs      1
r/celebdominatrixs         1
Name: subreddit, Length: 301, dtype: int64

In [90]:
df_ah.author.value_counts()

-banned-              829
AutoNewsAdmin          35
emilyguy               34
AutoNewspaperAdmin     33
ccrraapp               27
                     ... 
sonofcross              1
filmitalkies            1
JoLuGaLo                1
s0mnambulance           1
Clip_Dirtblade          1
Name: author, Length: 605, dtype: int64

In [91]:
df_ah[df_ah.author == 'AutoNewsAdmin'].subreddit.value_counts()

r/USATODAYauto        9
r/REUTERSauto         4
r/ABCauto             4
r/FOXauto             3
r/LATIMESauto         2
r/MIAMIHERALDauto     2
r/RTauto              1
r/SCMPauto            1
r/WAPOauto            1
r/NYTauto             1
r/INDEPENDENTauto     1
r/NEWSDAYauto         1
r/TWTauto             1
r/CBSauto             1
r/HOUSTONCHRONauto    1
r/BBCauto             1
r/NZHauto             1
Name: subreddit, dtype: int64

In [92]:
df_ah[df_ah.author == 'AutoNewsAdmin'].created_at.dt.date.value_counts()

2018-10-25    5
2018-10-22    4
2018-07-03    3
2018-12-06    2
2018-10-03    2
2018-04-05    2
2018-12-05    2
2018-12-19    2
2018-11-26    2
2018-06-25    1
2018-10-29    1
2018-09-14    1
2018-10-10    1
2018-12-03    1
2018-10-04    1
2018-11-27    1
2018-05-15    1
2018-12-20    1
2018-11-13    1
2018-12-12    1
Name: created_at, dtype: int64

In [93]:
df_ah[df_ah.author == 'emilyguy'].subreddit.value_counts()

r/gentlemanboners      19
r/Celebs                7
r/celebsnaked           4
r/celebnsfw             2
r/WatchItForThePlot     1
r/CaraDelevingne        1
Name: subreddit, dtype: int64

In [94]:
df_ah[df_ah.author == 'emilyguy'].created_at.dt.date.value_counts()

2018-03-22    3
2018-05-05    3
2018-10-05    2
2018-11-15    1
2018-11-01    1
2018-03-08    1
2018-06-11    1
2018-09-25    1
2018-12-22    1
2018-03-13    1
2018-05-16    1
2018-12-16    1
2018-05-01    1
2018-09-17    1
2018-05-25    1
2018-12-29    1
2018-08-26    1
2018-03-19    1
2018-07-04    1
2018-07-05    1
2018-03-24    1
2018-08-18    1
2018-07-23    1
2018-07-03    1
2018-06-28    1
2018-07-12    1
2018-04-13    1
2018-06-19    1
2018-10-17    1
Name: created_at, dtype: int64

<a id='Amber Heard'></a>
### Amber Heard


<ul>
<li><a href="#What? What did she hear?">What? What did she hear?</a></li>
<li><a href="#[Aquaman] Amber Heard">[Aquaman] Amber Heard</a></li>
<li><a href="#Amber Heard - The Informers">Amber Heard - The Informers</a></li>
<li><a href="#Amber Heard">Amber Heard</a></li>   
<li><a href="#fuck">F*ck amber heard</a></li>
</ul>

In [95]:
text = "fuck amber heard"

df_fuc2 = df[df.text.str.lower().str.contains(text)]

print(df_fuc2.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_fuc2.head())

(4, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
1726,t1_e0830mk,/r/movies/comments/8ozhus/first_poster_london_fields_amber_heard_billy_bob/e0830mk/,"Fuck Amber Heard, she's a lier and a gold digger.",t3_8ozhus,r/movies,-banned-,2018-06-06 18:59:57,Negative,Negative,-6.0,submission,comment,10,first_poster_london_fields_amber_heard_billy_bob,8,[],0
1776,t1_e091jem,/r/movies/comments/8p6bah/london_fields_official_trailer_2018_amber_heard/e091jem/,Fuck Amber Heard!,t3_8p6bah,r/movies,-banned-,2018-06-07 04:15:22,Negative,Negative,1.0,submission,comment,3,london_fields_official_trailer_2018_amber_heard,7,[],0
4287,t1_e79113k,/r/goddesses/comments/9ljspc/amber_heard/e79113k/,Fuck Amber Heard,t3_9ljspc,r/goddesses,-banned-,2018-10-06 01:25:58,Negative,Negative,3.0,submission,comment,3,amber_heard,2,[],0
4293,t1_e7dgj5g,/r/goddesses/comments/9maij0/amber_heard/e7dgj5g/,Fuck Amber Heard,t3_9maij0,r/goddesses,-banned-,2018-10-08 03:33:40,Negative,Negative,0.0,submission,comment,3,amber_heard,2,[],0


In [96]:
df_fuc2.author.value_counts()

-banned-    4
Name: author, dtype: int64

In [97]:
df_fuc2.subreddit.value_counts()

r/goddesses    2
r/movies       2
Name: subreddit, dtype: int64

In [98]:
df_fuc2.submission_comment.value_counts()

comment    4
Name: submission_comment, dtype: int64

### Further check Lecrapface contributions on 2018

In [99]:
df_nc = df.query(" author == 'Night_Chicken'")
df_nc = df_nc.sort_values('created_at')

print(df_nc.shape)
df_nc.head(2) 

(135, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
68,t1_dslw4u6,/r/Celebs/comments/7pysyp/amber_heard/dslw4u6/,What? What did she hear?,t3_7pysyp,r/Celebs,Night_Chicken,2018-01-13 04:48:41,Neutral,Neutral,-6.0,submission,comment,5,amber_heard,2,[],0
659,t1_dvufzj9,/r/Celebs/comments/844rw7/amber_heard/dvufzj9/,What? Thank you!,t1_dvn0nfy,r/Celebs,Night_Chicken,2018-03-17 13:22:00,Neutral,Positive,1.0,comment,comment,3,amber_heard,2,[],0


In [100]:
df_nc.text.value_counts()

What?  What did she hear?                                                                                           121
What? What did she hear?                                                                                              4
I want to know as well.                                                                                               1
What?  Thank you!                                                                                                     1
Excellent!  Not as exciting as the crunching and jostling sounds in Elon's full pockets which escaped her grasp.      1
Exactly.\n\n&#x200B;                                                                                                  1
What?   What did she hear?                                                                                            1
What?  What did she hear?                                                                                             1
Good question.\n\n&#x200B;              

<a id='text_words'></a>
>>### Check the number of text words <br>
Of course few words are easier for bots to create

<ul>
<li><a href="#submissions_comments">The # of submissions VS the # of comments</a></li>   
<br>
<li><a href="#mostcommented_user">Most contributed user</a></li>
<li><a href="#NegativeCommented_state">Check wether the users contributing the most to comments/submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#text">Investigate the text column</a></li>
<li><a href="#text_words"><b><mark>Check the number of text words</mark></b></a></li>
</ul>

In [101]:
df['text_words'].value_counts().head(10);

In [102]:
px.histogram(df['text_words'].to_frame(), x="text_words",title='number of words in each contribution',
            nbins=200).update_traces(marker_color='#5296dd')


<ul>
<li><a href="#explore_reddit">Reddit Contributions (Comments / Submissions)</a></li>
<li><a href="#reddit_comments"><b><mark>Reddit Comments</mark></b></a></li>
<li><a href="#reddit_submissions">Reddit Submissions</a></li>
<li><a href="#subredits">Subredits</a></li>
<li><a href="#explore_merged">Merged Users Data with Comments/Submissions Data</a></li>
</ul>

<li><a href="#peak_days">Peak Days</a></li>

<a id='reddit_comments'></a>
> ### Reddit Comments

<a id='parent_comments'></a>
>>### The number of parent comments on submissions

In [103]:
px.bar(data_frame=df['top_level'].value_counts().to_frame().reset_index(),
       x="index", y="top_level").update_layout(title='Comment or Submission (Top Level of Contrbution "Parent")',
                   xaxis_title='contribution top level (parent) category',
                   yaxis_title='number of contributions').update_traces(marker_color='#5296dd')

<ul>
<li><a href="#explore_reddit">Reddit Contributions (Comments / Submissions)</a></li>
<li><a href="#reddit_comments">Reddit Comments</a></li>
<li><a href="#reddit_submissions"><b><mark>Reddit Submissions</mark></b></a></li>
<li><a href="#subredits">Subredits</a></li>
<li><a href="#explore_merged">Merged Users Data with Comments/Submissions Data</a></li>
</ul>

<li><a href="#peak_days">Peak Days</a></li>

<a id='reddit_submissions'></a>
> ### Reddit Submission Data

<a id='submission_text'></a>
>>### Investigating the Submission Text <br> (Submissions with the most comments and replies)

<ul>
<li><a href="#submission_text"><b><mark>Investigating the Submission Text</mark></b></a></li>
<li><a href="#most_comments_submissions">Invesigating the sumbissions with most comments</a></li>
<br>
<li><a href="#NegativeSubmitted_users">Invesigating authors with the most submissions</a></li>
<li><a href="#users_state">Check wether the users with the most submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#submissions_urls">Submission URLS</a></li>
<li><a href="#submission_words">Check the number of submission text words</a></li>
</ul>

We can get the number of different submissions by looking only at the submissions dataframe <br>
Also we can look at submission_text with the most interactions (repeated submission_text)

In [104]:
df['submission_text'].value_counts().head(20)

amber_heard                                          2490
DC_Cinematic                                          760
amber_heard_received_death_threats_and_was            254
amber_heard_the_informers                             167
just_a_friendly_reminder_that_domestic_abuse_is       102
amber_heard_fans_upset_after_she_posted_a_racist       96
johnny_depp_claims_ex_amber_heard_punched_him          95
amber_heard_mirin_jason_momoa                          77
johnny_depp_accuses_amber_heard_of_shitting_in         76
amber_heard_london_fields_2018                         71
first_poster_london_fields_amber_heard_billy_bob       69
aquaman_actress_amber_heard_gets_called_racist         65
aquaman_amber_heard                                    64
new_aquaman_images_shows_jason_momoa_and_amber         63
amber_heard_film_london_fields_suffers_one_of_the      61
amber_heard_has_an_amazing_body                        60
2018_comiccon_red_carpet_gal_gadot_melissa             51
amber_heard_st

<a id='amber_heard'></a>
## amber_heard

                                          

<ul>
<li><a href="#amber_heard"><b><mark>amber_heard</mark></b></a></li>
<li><a href="#DC_Cinematic">DC_Cinematic</a></li>
<li><a href="#amber_heard_received_death_threats_and_was">amber_heard_received_death_threats_and_was</a></li>
<li><a href="#amber_heard_the_informers">amber_heard_the_informers</a></li>
<li><a href="#just_a_friendly_reminder_that_domestic_abuse_is">just_a_friendly_reminder_that_domestic_abuse_is</a></li>
</ul>

In [105]:
df_amber = df.query(" submission_text == 'amber_heard' & \
                           submission_comment == 'submission' ")
print(df_amber.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_amber.head())



(1044, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
10,t3_7nkbt3,/r/gentlemanboners/comments/7nkbt3/amber_heard/,Amber Heard,,r/gentlemanboners,ZadocPaet,2018-01-02 05:07:43,Neutral,Neutral,5.0,,submission,2,amber_heard,2,[],0
11,t3_7nkbua,/r/DCEUboners/comments/7nkbua/amber_heard/,Amber Heard,,r/DCEUboners,ZadocPaet,2018-01-02 05:07:55,Neutral,Neutral,45.0,,submission,2,amber_heard,2,[],0
16,t3_7nolyx,/r/gentlemanboners/comments/7nolyx/amber_heard/,Amber Heard,,r/gentlemanboners,-banned-,2018-01-02 19:17:12,Neutral,Neutral,330.0,,submission,2,amber_heard,2,[],0
24,t3_7nxwrq,/r/gentlemanboners/comments/7nxwrq/amber_heard/,Amber Heard,,r/gentlemanboners,-banned-,2018-01-03 21:34:14,Neutral,Neutral,2.0,,submission,2,amber_heard,2,[],0
29,t3_7o2ahb,/r/Celebs/comments/7o2ahb/amber_heard/,Amber Heard,,r/Celebs,-banned-,2018-01-04 11:03:29,Neutral,Neutral,55.0,,submission,2,amber_heard,2,[],0


1044 Different Submissions

In [106]:
li = list(df_amber.author.unique())
li.remove('-banned-')
df_users[df_users.user_name.isin(li)]

Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
29,RyanSmith,True,True,False,False,263666.0,5989322.0,2006-08-07 14:18:44,others,others
230,soundsoul,True,True,False,False,4156.0,238286.0,2008-01-04 02:52:49,others,others
1488,littlemisfit,True,True,False,False,9523.0,602311.0,2010-09-17 22:11:14,others,others
1520,FlexOutlaw,True,False,False,False,5217.0,154864.0,2010-09-25 18:36:46,others,others
2974,jarakacha,True,True,False,False,142.0,153129.0,2011-07-20 21:10:48,others,others
...,...,...,...,...,...,...,...,...,...,...
66206,jasontheblogger2018,True,True,True,True,,,NaT,banned,banned
66258,Snappleman87,True,True,True,True,,,NaT,banned,banned
66342,88MPH1,True,True,True,True,,,NaT,banned,banned
68878,armpit-lover,True,True,True,True,,,NaT,banned,banned


In [107]:
df_mera_comments = df.query(" submission_text == 'amber_heard' ")
print(df_mera_comments.shape)
df_mera_comments.head(1)                          

(2490, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
4,t1_ds0ylh7,/r/Celebs/comments/7naj20/amber_heard/ds0ylh7/,The arm pit looks like a badly shaved vag!,t3_7naj20,r/Celebs,vastio67,2018-01-01 04:44:17,Negative,Neutral,0.0,submission,comment,9,amber_heard,2,[],0


In [108]:
df_mera_contributions = df_mera_comments.groupby(df_mera_comments.created_at.dt.date).size().reset_index(name='n_contributions')


fig = px.bar(df_mera_contributions.head(7),
             x='created_at', 
             y='n_contributions', title='The number of contributions/date on these submissions')

fig.update_layout(
    xaxis = dict(
        title='Contribution Date',
        tickmode = 'array',
        tickvals = df_mera_contributions.head(7).created_at,
    )
)

clrs = ['red' if (y > 200) else '#5296dd' for y in df_mera_contributions.n_contributions] 

fig.update_traces(marker_color=clrs, marker_line_width=1.5, opacity=1, textposition='auto')

fig.show()

In [109]:
df_mera_authors = df_mera_comments.groupby(df_mera_comments.author).size().reset_index(name='n_contributions')


fig = px.bar(df_mera_authors,
             x='author', 
             y='n_contributions', title='The number of contributions per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto') 
# , marker_line_color='#5296dd', marker_line_width=2

fig.update_yaxes(range = [0,25])

fig.show()


In [110]:
df_mera_comments[df_mera_comments.author == 'Night_Chicken']

Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
68,t1_dslw4u6,/r/Celebs/comments/7pysyp/amber_heard/dslw4u6/,What? What did she hear?,t3_7pysyp,r/Celebs,Night_Chicken,2018-01-13 04:48:41,Neutral,Neutral,-6.0,submission,comment,5,amber_heard,2,[],0
659,t1_dvufzj9,/r/Celebs/comments/844rw7/amber_heard/dvufzj9/,What? Thank you!,t1_dvn0nfy,r/Celebs,Night_Chicken,2018-03-17 13:22:00,Neutral,Positive,1.0,comment,comment,3,amber_heard,2,[],0
660,t1_dvug20l,/r/Celebs/comments/81y6lx/amber_heard/dvug20l/,What? What did she hear?,t3_81y6lx,r/Celebs,Night_Chicken,2018-03-17 13:23:50,Neutral,Neutral,1.0,submission,comment,5,amber_heard,2,[],0
3297,t1_e3uftxn,/r/Celebs/comments/94w2gm/amber_heard/e3uftxn/,What? What did she hear?,t3_94w2gm,r/Celebs,Night_Chicken,2018-08-08 20:07:11,Neutral,Neutral,1.0,submission,comment,5,amber_heard,2,[],0
3298,t1_e3ufvla,/r/Celebs/comments/94tmn2/amber_heard/e3ufvla/,What? What did she hear?,t3_94tmn2,r/Celebs,Night_Chicken,2018-08-08 20:07:50,Neutral,Neutral,1.0,submission,comment,5,amber_heard,2,[],0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4446,t1_e834yvb,/r/Celebs/comments/9pcp29/amber_heard/e834yvb/,What? What did she hear?,t3_9pcp29,r/Celebs,Night_Chicken,2018-10-19 21:47:03,Neutral,Neutral,0.0,submission,comment,5,amber_heard,2,[],0
4644,t1_e8vhnfn,/r/Celebs/comments/9t9agd/amber_heard/e8vhnfn/,Yes. I want to know.,t1_e8utt9m,r/Celebs,Night_Chicken,2018-11-01 21:28:46,Neutral,Positive,1.0,comment,comment,5,amber_heard,2,[],0
4931,t1_e9jg966,/r/Celebs/comments/9vyy48/amber_heard/e9jg966/,I want to know as well.,t1_e9h3ijf,r/Celebs,Night_Chicken,2018-11-12 06:36:51,Neutral,Positive,1.0,comment,comment,6,amber_heard,2,[],0
5371,t1_eatt7s2,/r/Celebs/comments/a1u9h8/amber_heard/eatt7s2/,Good question.\n\n&#x200B;,t1_easxyfi,r/Celebs,Night_Chicken,2018-12-01 01:52:52,Positive,Positive,0.0,comment,comment,3,amber_heard,2,[],0


In [111]:
df_mera_comments[df_mera_comments.author == 'emilyguy'].shape

(34, 17)

<a id='DC_Cinematic'></a>
## DC_Cinematic

                                          

<ul>
<li><a href="#amber_heard">amber_heard</a></li>
<li><a href="#DC_Cinematic"><b><mark>DC_Cinematic</mark></b></a></li>
<li><a href="#amber_heard_received_death_threats_and_was">amber_heard_received_death_threats_and_was</a></li>
<li><a href="#amber_heard_the_informers">amber_heard_the_informers</a></li>
<li><a href="#just_a_friendly_reminder_that_domestic_abuse_is">just_a_friendly_reminder_that_domestic_abuse_is</a></li>
</ul>

In [112]:
df_dc = df.query(" submission_text == 'DC_Cinematic' & \
                           submission_comment == 'submission' ")
print(df_dc.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_dc.head())

(47, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
190,t3_7vr5ia,/r/DC_Cinematic/comments/7vr5ia/discussion_amber_heard_playing_mera_rubs_me_the/,DISCUSSION: Amber Heard playing Mera rubs me the wrong way,,r/DC_Cinematic,ZorakLocust,2018-02-06 22:16:01,Negative,Neutral,0.0,,submission,10,DC_Cinematic,2,[],0
479,t3_82z9f0,/r/DC_Cinematic/comments/82z9f0/rumor_aquaman_press_tour_with_amber_heard_and/,RUMOR: Aquaman press tour with Amber Heard and Jason Momoa is starting SOON in Europe!,,r/DC_Cinematic,Sabya2kMukherjee,2018-03-08 17:30:32,Neutral,Neutral,125.0,,submission,15,DC_Cinematic,2,[],0
859,t3_89jugi,/r/DC_Cinematic/comments/89jugi/other_amber_heard_visits_children_in_syrian/,OTHER: Amber Heard visits children in Syrian refugee camp of Zaatari,,r/DC_Cinematic,-banned-,2018-04-03 23:21:48,Negative,Neutral,1.0,,submission,11,DC_Cinematic,2,[],0
887,t3_8bbokl,/r/DC_Cinematic/comments/8bbokl/news_heroes_onscreen_and_off_amber_heard_donates/,NEWS: Heroes onscreen and off: Amber Heard donates to children's hospital,,r/DC_Cinematic,-banned-,2018-04-10 21:40:38,Neutral,Positive,81.0,,submission,11,DC_Cinematic,2,[],0
925,t3_8c2e0d,/r/DC_Cinematic/comments/8c2e0d/social_media_amber_heard_aquafied_yet_again/,Social Media: Amber Heard: Aquafied yet again,,r/DC_Cinematic,Mohamed_Todd,2018-04-13 20:33:56,Positive,Neutral,163.0,,submission,7,DC_Cinematic,2,[],0


47 Different Submissions

In [113]:
li = list(df_dc.author.unique())
df_users[df_users.user_name.isin(li)]

Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
2981,indian22,True,False,False,False,6300.0,69586.0,2011-07-21 10:57:48,others,others
3846,-banned-,True,False,False,False,77234.0,82.0,2011-11-06 21:08:20,others,others
10847,boumtjeboo,True,False,False,False,89482.0,116498.0,2013-08-13 21:48:05,others,others
11664,GlowInThe,True,False,False,False,27992.0,87943.0,2013-10-29 06:50:39,others,others
13934,Rugby11,True,True,False,False,2397.0,114562.0,2014-05-19 17:41:26,others,others
19407,Richardrumeo,True,False,False,False,14716.0,40662.0,2015-08-18 18:01:48,others,others
20990,Mohamed_Todd,True,False,False,False,7017.0,28735.0,2016-01-01 21:31:20,others,others
21016,BeenTryin,True,False,False,False,29570.0,313003.0,2016-01-03 17:29:07,others,others
21386,ZorakLocust,True,False,False,False,29289.0,10356.0,2016-01-30 19:10:14,others,others
24215,Sabya2kMukherjee,True,False,False,False,55889.0,37522.0,2016-07-31 18:15:40,others,others


In [114]:
df_dc_comments = df.query(" submission_text == 'DC_Cinematic' & \
                           submission_comment == 'comment' ")
print(df_dc_comments.shape)
df_dc_comments.head(1)                          

(713, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
191,t1_dtug451,/r/DC_Cinematic/comments/7vr5ia/discussion_amb...,She's hot af though lol,t3_7vr5ia,r/DC_Cinematic,Speekeazyyyy,2018-02-06 22:21:06,Positive,Positive,23.0,submission,comment,5,DC_Cinematic,2,[],0


In [115]:
df_dc_comments = df_dc_comments.groupby(df_dc_comments.created_at.dt.date).size().reset_index(name='n_contributions')


fig = px.bar(df_dc_comments.head(4),
             x='created_at', 
             y='n_contributions', title='The number of contributions/date on these submissions')

fig.update_layout(
    xaxis = dict(
        title='Contribution Date',
        tickmode = 'array',
        tickvals = df_dc_comments.head(4).created_at,
    )
)

clrs = ['red' if (y > 200) else '#5296dd' for y in df_dc_comments.n_contributions] 

fig.update_traces(marker_color=clrs, marker_line_width=1.5, opacity=1, textposition='auto')

fig.show()

In [116]:
df_dc_comments = df.query(" submission_text == 'DC_Cinematic' & \
                           submission_comment == 'comment' ")

In [117]:
df_dc_authors = df_dc_comments.groupby(df_dc_comments.author).size().reset_index(name='n_contributions')
df_dc_authors.sort_values('n_contributions', ascending=False).head(10)

Unnamed: 0,author,n_contributions
0,-banned-,122
184,ZorakLocust,16
31,Chronos2016,15
115,NaveHarder,14
14,AutoModerator,12
93,KAIZOKUGARI23,12
270,serjon_arryn,10
78,Hydrostorm9,9
22,BatmanNewsChris,8
34,CliffordMoreau,8


In [118]:
fig = px.bar(df_dc_authors,
             x='author', 
             y='n_contributions', title='The number of contributions per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto') 
# , marker_line_color='#5296dd', marker_line_width=2

fig.update_yaxes(range = [0,25])

fig.show()

In [119]:
df_dc_comments[df_dc_comments.author == 'ZorakLocust'].text.head(5).values

array(['Could you perhaps elaborate on what’s so dumb about it? ',
       'That’s not what her Wikipedia page says. ',
       'Notice that I put “abused” in quotations. It’s because I don’t believe she’s an actual abuse victim. I think she’s a manipulative gold digger who wanted some easy publicity. This is in no way meant to discredit abuse victims but anyone who blindly believes that Heard is one of them is not doing anyone any favors. ',
       'I would only be victim blaming if Amber Heard actually was a victim. ',
       'I addressed your point about how no one would want to be an abuse victim. My response was that while that’s true, there is evidence to suggest that Amber Heard is not a victim of anything. \n\nAlso, if you want an answer as to why she would wait a year (because that’s how long they were married) before accusing Depp of domestic abuse, I don’t know the answer to that. Still, it seems suspicious as hell that she was apparently demanding money out of him. \n\nAnyway

In [120]:
df_dc_comments[df_dc_comments.author == 'Chronos2016'].text.head(5).values

array(['Not a big change acting wise but she would look cool as her too. ',
       "no it's a wig.",
       "I remember being in like 9th grade when Amber Heard came out as bisexual. It was 2008 and it was a pretty major thing for a celebrity to say. That was my first introduction to her.\n\nI really liked her makeup looks and her fashion back then and she's probably one of the celebs who got me into makeup. Her ad campaign for Guess was also iconic and the campaign video is often used in fan videos for leaked Lana Del Rey songs. \n\nShe's one of the more glamorous celebs out there and she super well loved amongst the sad indie girl circles online.",
       "I think she played Seth Rogen's gf in Pineapple Express. It was a pretty small and thankless role. ",
       "Don't do that. They both put out a joint statement saying that both sides told the truth. Johnny is being petty and going back on that statement. \n\nWe may never know what happened between the two of them, clearly it was a

<a id='amber_heard_received_death_threats_and_was'></a>
## amber_heard_received_death_threats_and_was

                                          

<ul>
<li><a href="#amber_heard">amber_heard</a></li>
<li><a href="#DC_Cinematic">DC_Cinematic</a></li>
<li><a href="#amber_heard_received_death_threats_and_was"><b><mark>amber_heard_received_death_threats_and_was</mark></b></a></li>
<li><a href="#amber_heard_the_informers">amber_heard_the_informers</a></li>
<li><a href="#just_a_friendly_reminder_that_domestic_abuse_is">just_a_friendly_reminder_that_domestic_abuse_is</a></li>
</ul>

In [121]:
df_death = df.query(" submission_text == 'amber_heard_received_death_threats_and_was' & \
                           submission_comment == 'submission' ")
print(df_death.shape)

with pd.option_context('display.max_colwidth', None):
  display(df_death.head())

(1, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
6103,t3_a7lypo,/r/MensRights/comments/a7lypo/amber_heard_received_death_threats_and_was/,"Amber Heard received death threats and was 'blacklisted' after accusing Johnny Depp of abuse | Actress who went public with a false accusation is shocked to discover her lies have negative consequences - for her, not for Depp.",,r/MensRights,EricAllonde,2018-12-19 12:26:42,Negative,Negative,2605.0,,submission,38,amber_heard_received_death_threats_and_was,7,[],0


one Submission

In [122]:
df_death.permalink.values[0]

'/r/MensRights/comments/a7lypo/amber_heard_received_death_threats_and_was/'

In [123]:
li = list(df_death.author.unique())
df_users[df_users.user_name.isin(li)]

Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
23311,EricAllonde,True,False,False,False,117967.0,409733.0,2016-06-12 01:05:21,others,others


In [124]:
df_death_comments = df.query(" submission_text == 'amber_heard_received_death_threats_and_was' & \
                           submission_comment == 'comment' ")
print(df_death_comments.shape)
df_death_comments.head(1)                          

(253, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
6106,t1_ec3z6h3,/r/MensRights/comments/a7lypo/amber_heard_rece...,This article doesn't indicate whether these al...,t3_a7lypo,r/MensRights,DownvotedByShitters,2018-12-19 13:15:12,Positive,Neutral,413.0,submission,comment,416,amber_heard_received_death_threats_and_was,7,['https://www.theguardian.com'],1


In [125]:
df_death_comments = df_death_comments.groupby(df_death_comments.created_at.dt.date).size().reset_index(name='n_contributions')


fig = px.bar(df_death_comments.head(4),
             x='created_at', 
             y='n_contributions', title='The number of contributions/date on these submissions')

fig.update_layout(
    xaxis = dict(
        title='Contribution Date',
        tickmode = 'array',
        tickvals = df_death_comments.head(4).created_at,
    )
)

clrs = ['red' if (y > 200) else '#5296dd' for y in df_death_comments.n_contributions] 

fig.update_traces(marker_color=clrs, marker_line_width=1.5, opacity=1, textposition='auto')

fig.show()

In [126]:
df_death_comments = df.query(" submission_text == 'amber_heard_received_death_threats_and_was' & \
                           submission_comment == 'comment' ")

In [127]:
df_death_authors = df_death_comments.groupby(df_death_comments.author).size().reset_index(name='n_contributions')
df_death_authors.sort_values('n_contributions', ascending=False).head(10)

Unnamed: 0,author,n_contributions
0,-banned-,30
91,tenchineuro,22
67,j3utton,13
37,Rogdozz,12
87,scyth3s,10
20,LateNightTestPattern,8
15,GreyFox860,7
52,_pseudodragon,6
82,purpleblossom,5
57,chambertlo,5


In [128]:
fig = px.bar(df_death_authors,
             x='author', 
             y='n_contributions', title='The number of contributions per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto') 
# , marker_line_color='#5296dd', marker_line_width=2

fig.update_yaxes(range = [0,25])

fig.show()

In [129]:
df_death_comments[df_death_comments.author == 'tenchineuro'].text.head(5).values

array(["> Explaining that she rarely left her apartment for fear of being hounded, she added: **'I felt as though I was on trial in the court of public opinion** - and my life and livelihood depended on myriad judgments far beyond my control.'\n\nWelcome to the world #metoo created.\n\nI suspect that she's exaggerating the career effects though, apparently it's OK for female actresses to sexually assault or assault their husbands/BFs.",
       "> Yes, the burden of proof is on the accuser, but not having proof does not mean her allegations are false.\n\nIs that your default assumption? For how long have you been a feminist?\n\n> However, when someone qualifies those allegations as false, they are now making a new allegation about the initial allegations. The burden of proof - of proving the allegations are actually false, is now on them.\n\nSo if you deny the allegations then the burden is now on you to prove your innocence? You'd make a fine feminist lawyer, or maybe you are already?"

In [130]:
df_death_comments[df_death_comments.author == 'tenchineuro'].created_at.dt.date.value_counts()

2018-12-19    22
Name: created_at, dtype: int64

In [131]:
df_death_comments[df_death_comments.author == 'tenchineuro'].subreddit.value_counts()

r/MensRights    22
Name: subreddit, dtype: int64

**tenchineuro** made 22 comment in one day **19-12-2018** in one subreddit **r/MensRights**

<a id='most_comments_submissions'></a>
>>### Invesigating the sumbissions with most comments <br> (Top Level Comments)

<ul>
<li><a href="#submission_text">Investigating the Submission Text</a></li>
<li><a href="#most_comments_submissions"><b><mark>Invesigating the sumbissions with most comments</mark></b></a></li>
<br>
<li><a href="#NegativeSubmitted_users">Invesigating authors with the most submissions</a></li>
<li><a href="#users_state">Check wether the users with the most submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#submissions_urls">Submission URLS</a></li>
<li><a href="#submission_words">Check the number of submission text words</a></li>
</ul>

In [132]:
df.parent_id.value_counts().head()

t3_91hqrc    49
t3_a6i1j4    40
t3_a7lypo    36
t3_9uzs60    34
t3_8hx1bz    30
Name: parent_id, dtype: int64

In [133]:
fig = px.bar(df.parent_id.value_counts().to_frame().head(25).reset_index(), x="parent_id", y="index",
             height=500,
             title='sumbissions with most comments (Top Level Comments)').update_layout(
                   xaxis_title='Number of comments',
                   yaxis_title='subbredit').update_traces(marker_color='#5296dd')

fig.update_yaxes(autorange="reversed")

### Check the top 5 submissions in 2018

In [134]:
df_top5 = df_merged[df_merged.child_id.isin(df_merged.parent_id.value_counts().head().index)]

with pd.option_context('display.max_colwidth', None):
  display(df_top5)

Unnamed: 0,child_id,permalink,text,parent_id,subreddit,created_at,sentiment_blob,sentiment_nltk,score,top_level,...,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year,diff,days_after_creation
369,t3_8hx1bz,/r/WatchItForThePlot/comments/8hx1bz/amber_heard_the_informers/,Amber Heard - The Informers,,r/WatchItForThePlot,2018-05-08 14:18:26,Neutral,Neutral,3749.0,,...,False,False,False,77234.0,82.0,2011-11-06 21:08:20,others,others,2374 days 17:10:06,2374.0
1134,t3_9uzs60,/r/WatchItForThePlot/comments/9uzs60/amber_heard_london_fields_2018/,Amber Heard - London Fields [2018],,r/WatchItForThePlot,2018-11-07 14:17:00,Neutral,Neutral,4015.0,,...,False,False,False,77234.0,82.0,2011-11-06 21:08:20,others,others,2557 days 17:08:40,2557.0
1408,t3_a6i1j4,/r/DC_Cinematic/comments/a6i1j4/other_amber_heard_is_painfully_beautiful_as_mera/,OTHER: Amber Heard is painfully beautiful as Mera,,r/DC_Cinematic,2018-12-15 19:31:20,Positive,Neutral,2082.0,,...,False,False,False,77234.0,82.0,2011-11-06 21:08:20,others,others,2595 days 22:23:00,2595.0
4469,t3_91hqrc,/r/TrueFMK/comments/91hqrc/2018_comiccon_red_carpet_gal_gadot_melissa/,"2018 Comic-Con Red Carpet: Gal Gadot, Melissa Benoist, Amber Heard",,r/TrueFMK,2018-07-24 14:13:54,Neutral,Neutral,41.0,,...,True,False,False,24502.0,92265.0,2015-12-05 21:18:19,others,others,961 days 16:55:35,961.0
6430,t3_a7lypo,/r/MensRights/comments/a7lypo/amber_heard_received_death_threats_and_was/,"Amber Heard received death threats and was 'blacklisted' after accusing Johnny Depp of abuse | Actress who went public with a false accusation is shocked to discover her lies have negative consequences - for her, not for Depp.",,r/MensRights,2018-12-19 12:26:42,Negative,Negative,2605.0,,...,False,False,False,117967.0,409733.0,2016-06-12 01:05:21,others,others,920 days 11:21:21,920.0


In [135]:
df_merged.submission_text = df_merged.submission_text.str.replace('_', ' ')

In [136]:
# get a list with the top 5 submission text
top5_text = list(df_top5.text)

#### Define a function to create the mask and check if a submission text is part of the top 5 submissions

In [137]:
def compare(str):
    for text in top5_text:
        if str in text:
            return True
        else: return False

In [138]:
mask = df_merged.submission_text.apply(compare)

df_top5_contributions1 = df_merged[mask]

df_top5_authors = df_top5_contributions1.groupby(df_top5_contributions1.user_name).size().reset_index(name='n_contributions')

fig = px.bar(df_top5_authors,
             x='user_name', 
             y='n_contributions', title='The number comments per author on these submissions')

fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto') 

fig.show()


In [139]:
df_top5_contributions1.shape

(33, 24)

### Check the authors of parent comments 

In [140]:
df_top5_contributions2 = df_merged[df_merged.parent_id.isin(df_merged.parent_id.value_counts().head().index)]

df_top5_authors = df_top5_contributions2.groupby(df_top5_contributions2.user_name).size().reset_index(name='n_contributions')


fig = px.bar(df_top5_authors,
             x='user_name', 
             y='n_contributions', title='The number of parent comments per author on these submissions')
fig.update_traces(marker_color='#5296dd', opacity=1, textposition='auto') 
# , marker_line_color='#5296dd', marker_line_width=2

fig.update_yaxes(range = [0,5])

fig.show()

**NOTE:** There are 32 parent comments from banned accounts

In [141]:
df_top5_contributions2.shape

(189, 24)

<a id='NegativeSubmitted_users'></a>
>>### Invesigating authors with the most submissions 


<ul>
<li><a href="#submission_text">Investigating the Submission Text</a></li>
<li><a href="#most_comments_submissions">Invesigating the sumbissions with most comments</a></li>
<br>
<li><a href="#NegativeSubmitted_users"><b><mark>Invesigating authors with the most submissions </mark></b></a></li>
<li><a href="#users_state">Check wether the users with the most submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#submissions_urls">Submission URLS</a></li>
<li><a href="#submission_words">Check the number of submission text words</a></li>
</ul>

### Let us first Investigate authors with the most submissions

In [142]:
df_submissions = df[df.submission_comment == 'submission']
print(df_submissions.shape)
df_submissions.head(2)

(2000, 17)


Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
10,t3_7nkbt3,/r/gentlemanboners/comments/7nkbt3/amber_heard/,Amber Heard,,r/gentlemanboners,ZadocPaet,2018-01-02 05:07:43,Neutral,Neutral,5.0,,submission,2,amber_heard,2,[],0
11,t3_7nkbua,/r/DCEUboners/comments/7nkbua/amber_heard/,Amber Heard,,r/DCEUboners,ZadocPaet,2018-01-02 05:07:55,Neutral,Neutral,45.0,,submission,2,amber_heard,2,[],0


In [143]:
df_submissions.author.value_counts().nlargest(n=10)

-banned-              794
AutoNewsAdmin          35
emilyguy               34
AutoNewspaperAdmin     33
ccrraapp               27
Rednaxela117           25
jeff98379              21
vonmark955             20
InfiniTitans           18
ZadocPaet              17
Name: author, dtype: int64

In [144]:
df_submissions.author.value_counts().to_frame().head(10)

Unnamed: 0,author
-banned-,794
AutoNewsAdmin,35
emilyguy,34
AutoNewspaperAdmin,33
ccrraapp,27
Rednaxela117,25
jeff98379,21
vonmark955,20
InfiniTitans,18
ZadocPaet,17


In [145]:
fig = px.bar(df_submissions.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
             height=500,
             title='Authors with most Submissions').update_traces(marker_color='#5296dd',).update_layout(
                   xaxis_title='Number of Negative Submissions',
                   yaxis_title='Author_Name').update_traces(marker_color='#5296dd')

fig.update_yaxes(autorange="reversed")

<a id='users_state'></a>

>>### Check wether the users with the most submissions are mod, gold or having a verified email

<ul>
<li><a href="#submission_text">Investigating the Submission Text</a></li>
<li><a href="#most_comments_submissions">Invesigating the sumbissions with most comments</a></li>
<br>
<li><a href="#NegativeSubmitted_users">Invesigating authors with the most submissions</a></li>
<li><a href="#users_state"><b><mark>Check wether the users with the most submissions <br> are mod, gold or having a verified email</mark></b></a></li>
<br>
<li><a href="#submissions_urls">Submission URLS</a></li>
<li><a href="#submission_words">Check the number of submission text words</a></li>
</ul>

In [146]:
df_submissions.author.value_counts().head()

-banned-              794
AutoNewsAdmin          35
emilyguy               34
AutoNewspaperAdmin     33
ccrraapp               27
Name: author, dtype: int64

In [147]:
check_list = df_submissions.author.value_counts().nlargest(n=25).index.tolist()[1:]
check_list

['AutoNewsAdmin',
 'emilyguy',
 'AutoNewspaperAdmin',
 'ccrraapp',
 'Rednaxela117',
 'jeff98379',
 'vonmark955',
 'InfiniTitans',
 'ZadocPaet',
 'MightUlt-7',
 'FlexOutlaw',
 'Queen1110',
 'AngelaStettner69',
 'Pm-me-your-ass-photo',
 'GRJR721',
 'naughtytwd',
 'vonjobi951',
 'Ezio9619',
 'horny_fuckers',
 'Luke5to1',
 'iDevice_Help',
 'pitsnbush',
 'windowmedia',
 'sagar7854']

In [148]:
# get a data frame with the most negative-comments users
df_check = df_users[df_users['user_name'].isin(check_list)]
print(df_check.shape)
df_check.head(2)

(24, 10)


Unnamed: 0,user_name,has_verified_email,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year
1520,FlexOutlaw,True,False,False,False,5217.0,154864.0,2010-09-25 18:36:46,others,others
3937,GRJR721,True,True,False,False,891.0,167792.0,2011-11-15 20:48:23,others,others


In [149]:
df_check['user_name'].nunique()

24

In [150]:
get_stats(df_check)

The value counts of the users with the most contributions: has_verified_email
True     22
False     2
Name: has_verified_email, dtype: int64


The value counts of the users with the most contributions: is_mod
True     18
False     6
Name: is_mod, dtype: int64


The value counts of the users with the most contributions: is_gold
False    15
True      9
Name: is_gold, dtype: int64


The value counts of the users with the most contributions: is_banned
False    17
True      7
Name: is_banned, dtype: int64


The min of comment_karma -1.0


The max of comment_karma 280938.0


The mean of comment_karma 23877.53


The median of comment_karma 23877.53


The min of link_karma 242.0


The max of link_karma 2838485.0


The mean of link_karma 529420.82


The median of link_karma 529420.82


The value counts of the users with the most contributions: banned_unverified
others        15
banned         7
unverified     2
Name: banned_unverified, dtype: int64


The value counts of the users with the most 

<a id='submissions_urls'></a>
>>### Submission URLS


<ul>
<li><a href="#submission_text">Investigating the Submission Text</a></li>
<li><a href="#most_comments_submissions">Invesigating the sumbissions with most comments</a></li>
<br>
<li><a href="#NegativeSubmitted_users">Invesigating authors with the most submissions</a></li>
<li><a href="#users_state">Check wether the users with the most submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#submissions_urls"><b><mark>Submission URLS</mark></b></a></li>
<li><a href="#submission_words">Check the number of submission text words</a></li>
</ul>

In [151]:
df['urls'].nunique()

113

In [152]:
df[df.astype(str)['urls'] != '[]'].head(2)  

Unnamed: 0,child_id,permalink,text,parent_id,subreddit,author,created_at,sentiment_blob,sentiment_nltk,score,top_level,submission_comment,text_words,submission_text,submission_words,urls,urls_count
2,t1_ds0x0jx,/r/elonmusk/comments/7n76bc/amber_heard_and_el...,Here's a sneak peek of /r/MGTOW using the [top...,t1_ds0x0d9,r/elonmusk,sneakpeekbot,2018-01-01 03:55:47,Negative,Negative,3.0,comment,comment,64,amber_heard_and_elon_musk_spotted_vacationing_in,8,"['https://np.reddit.com', 'https://i.imgur.com...",9
14,t1_ds2okme,/r/elonmusk/comments/7n76bc/amber_heard_and_el...,Apparently common sense has escaped Elon Musk ...,t1_ds1q00a,r/elonmusk,-banned-,2018-01-02 10:01:07,Negative,Neutral,3.0,comment,comment,62,amber_heard_and_elon_musk_spotted_vacationing_in,8,['http://docs.cpuc.ca.gov'],1


In [153]:
df['urls'].astype('str').value_counts().head()

[]                             6761
['https://t.co']                 22
['https://www.reddit.com']       21
['https://youtu.be']             10
['https://www.youtube.com']       9
Name: urls, dtype: int64

In [154]:
# the value counts of the # of urls
df['urls_count'].value_counts();

In [155]:
fig = px.histogram(df['urls_count'].to_frame(), x="urls_count",title='Count of the number of URLS in each Contribution',
            nbins=130).update_traces(marker_color='#5296dd')

fig.update_layout(
    xaxis = dict(
        tickmode = 'array',
        tickvals = df['urls_count'],
    )
)

fig.show()

In [156]:
px.histogram(df[~(df['urls_count'].isin([0,60,20,16]))], x="urls_count",title='Count of the number of URLS in each Contribution',
            nbins=20).update_traces(marker_color='#5296dd')



<a id='submission_words'></a>
>>### Check the number of submission text words <br>
Of course few words are easier for bots to create

<ul>
<li><a href="#submission_text">Investigating the Submission Text</a></li>
<li><a href="#most_comments_submissions">Invesigating the sumbissions with most comments</a></li>
<br>
<li><a href="#NegativeSubmitted_users">Invesigating authors with the most submissions</a></li>
<li><a href="#users_state">Check wether the users with the most submissions <br> are mod, gold or having a verified email</a></li>
<br>
<li><a href="#submissions_urls">Submission URLS</a></li>
<li><a href="#submission_words"><b><mark>Check the number of submission text words</mark></b></a></li>
</ul>

In [157]:
fig = px.histogram(df['submission_words'].to_frame(), x="submission_words",
                   title='number of words in submission text',
                   nbins=50).update_traces(marker_color='#5296dd')

fig.update_layout(
    xaxis = dict(
        title='Number of submission words',
        tickmode = 'linear',
    )
)

<ul>
<li><a href="#explore_reddit">Reddit Contributions (Comments / Submissions)</a></li>
<li><a href="#reddit_comments">Reddit Comments</a></li>
<li><a href="#reddit_submissions">Reddit Submissions</a></li>
<li><a href="#subredits"><b><mark>Subredits</mark></b></a></li>
<li><a href="#explore_merged">Merged Users Data with Comments/Submissions Data</a></li>
</ul>

<li><a href="#peak_days">Peak Days</a></li>

<a id='subredits'></a>
>### Subredits

>>### Most used Subreddits

In [158]:
df['subreddit'].nunique()

302

In [159]:
df['subreddit'] = df['subreddit'].str[:]

In [160]:
df.subreddit.value_counts().to_frame().head(20).reset_index()

Unnamed: 0,index,subreddit
0,r/gentlemanboners,950
1,r/Celebs,849
2,r/DC_Cinematic,760
3,r/movies,386
4,r/WatchItForThePlot,368
5,r/JerkOffToCelebs,285
6,r/celebnsfw,258
7,r/MensRights,255
8,r/entertainment,171
9,r/goddesses,145


In [161]:
fig = px.bar(df.subreddit.value_counts().to_frame().head(20).reset_index(), x="subreddit", y="index",
             height=500,
             title='Most used subbredits').update_traces(marker_color='#5296dd',).update_layout(
                   xaxis_title='Number of comments',
                   yaxis_title='subbredit').update_traces(marker_color='#5296dd')

fig.update_yaxes(autorange="reversed")

<ul>
<li><a href="#explore_reddit">Reddit Contributions (Comments / Submissions)</a></li>
<li><a href="#reddit_comments">Reddit Comments</a></li>
<li><a href="#reddit_submissions">Reddit Submissions</a></li>
<li><a href="#subredits">Subredits</a></li>
<li><a href="#explore_merged"><b><mark>Merged Users Data with Comments/Submissions Data</mark></b></a></li>
</ul>

<li><a href="#peak_days">Peak Days</a></li>

<a id='explore_merged'></a>
>### Merged Users Data with Comments & Submissions Data

<ul>
<li><a href="#diff"><b><mark>Difference in time between creating the account and posting</mark></b></a></li>
<li><a href="#posting_duration">Posting Duration After Account Creation</a></li>
<li><a href="#contribution_creation">The Number of Accounts Created in each year / having contributions in 2018</a></li>
</ul>

<a id='diff'></a>
>>### Difference in time between creating the account and posting

In [162]:
# note that value_counts() neglect Zeros
df_merged["days_after_creation"].value_counts()

1582.0    104
2599.0     37
2430.0     36
2449.0     35
487.0      31
         ... 
3422.0      1
1740.0      1
919.0       1
923.0       1
1174.0      1
Name: days_after_creation, Length: 2115, dtype: int64

In [163]:
px.histogram(df_merged, x="days_after_creation",title='days_after_creation',
            nbins=250).update_traces(marker_color='#5296dd',).update_layout(
                   xaxis_title='number of days',)

<a id='posting_duration'></a>
>>### Posting Duration After Account Creation

<ul>
<li><a href="#diff">Difference in time between creating the account and posting</a></li>
<li><a href="#posting_duration"><b><mark>Posting Duration After Account Creation</mark></b></a></li>
<li><a href="#contribution_creation">The Number of Accounts Created in each year / having contributions in 2018</a></li>
</ul>

In [164]:
print('The number of accounts posted the same day they was created!')
df_merged[df_merged['days_after_creation'] == 0].shape[0]

The number of accounts posted the same day they was created!


24

In [165]:
print('The number of accounts posted the same week they was created!')
df_merged[df_merged['days_after_creation'] <= 7].shape[0]

The number of accounts posted the same week they was created!


62

In [166]:
print('The number of accounts posted the same month they was created!')
df_merged[df_merged['days_after_creation'] <= 30].shape[0]

The number of accounts posted the same month they was created!


171

In [167]:
df_merged[df_merged['days_after_creation'] <= 30]['user_created_at'].dt.year.value_counts()

2018    171
Name: user_created_at, dtype: int64

### THE SAME MONTH:

In [168]:
mask = (df_merged['days_after_creation'] <= 30) & (df_merged['user_created_at'].dt.year == 2018)
df_merged[mask]['user_created_at'].dt.strftime('%b').value_counts()

Jun    28
Nov    26
Dec    20
Jul    19
Oct    17
Feb    13
Mar    10
Apr    10
Aug     9
May     7
Sep     7
Jan     5
Name: user_created_at, dtype: int64

In [169]:
months = df_merged[df_merged['days_after_creation'] <= 30]['user_created_at'].dt.strftime('%b')
months_sorted = months.value_counts()[['Jan', 'Feb', 'Mar', 'Apr', 'May']]
months_sorted

Jan     5
Feb    13
Mar    10
Apr    10
May     7
Name: user_created_at, dtype: int64

In [170]:
fig = px.bar(months_sorted,
             x=months_sorted.index, y=months_sorted.values, text=months_sorted.values)

fig.update_layout(
            title={
        'text': "contributions of the accounts posted/commented <br> the same month they were created",
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'
        })

# fig.update_layout(
#     xaxis = dict(
#         title='Month(2021)',
#         tickmode = 'array',
#         tickvals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
#         ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
#         )
# )

clrs = ['red' if (y > 100) else '#5296dd' for y in months_sorted.values] 

fig.update_traces(marker_color=clrs,
                  marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()

In [171]:
# THE SAME MONTH:
# check for the date these accounts posted/commented 
reddit_30 = df_merged[df_merged['days_after_creation'] <= 30]

dates_count = df_merged.groupby(reddit_30['created_at'].dt.date).size().reset_index(name='contributions')
dates_count.sort_values('contributions', ascending=False);   

In [172]:
fig = px.bar(dates_count,
             x='created_at', 
             y='contributions', title = 'contributions of the accounts posted/commented the same month they were created')
fig.update_traces(marker_color='#5296dd',
                  marker_line_width=1, opacity=1, textposition='auto').update_layout()
fig.show()

### THE SAME WEEK

In [173]:
# THE SAME WEEK
# check for the date these accounts posted/commented
reddit_7 = df_merged[df_merged['days_after_creation'] <= 7]

dates_count_7 = df_merged.groupby(reddit_7['created_at'].dt.date).size().reset_index(name='contributions')
dates_count_7.sort_values('contributions', ascending=False);   

In [174]:
fig = px.bar(dates_count_7,
             x='created_at', 
             y='contributions', title = 'contributions of the accounts posted/commented the same week they were created')
fig.update_traces(marker_color='#5296dd',
                  marker_line_width=1, opacity=1, textposition='auto').update_layout()
fig.show()

### THE SAME DAY

In [175]:
# THE SAME DAY
# check for the date these accounts posted/commented
reddit_1 = df_merged[df_merged['days_after_creation'] <= 0]

dates_count_1 = df_merged.groupby(reddit_1['created_at'].dt.date).size().reset_index(name='contributions')
dates_count_1.sort_values('contributions', ascending=False); 

In [176]:
fig = px.bar(dates_count_1,
             x='created_at', 
             y='contributions', title = 'contributions of the accounts posted/commented the same day they were created')
fig.update_traces(marker_color='#5296dd',
                  marker_line_width=.5, opacity=1, textposition='auto').update_layout()
fig.show()

In [177]:
# get the author names that commented in a negative way the same month the account was created
# to add to the suspected list
df_merged_30 = df_merged.query("days_after_creation <= 30 & sentiment_blob == sentiment_nltk == 'Negative' ")
df_merged_30.head()

Unnamed: 0,child_id,permalink,text,parent_id,subreddit,created_at,sentiment_blob,sentiment_nltk,score,top_level,...,is_mod,is_gold,is_banned,comment_karma,link_karma,user_created_at,banned_unverified,creation_year,diff,days_after_creation
3144,t1_dyc2my4,/r/celebJObuds/comments/8gdrmz/any_buds_want_t...,I'd fuck the crazy right out of her,t3_8gdrmz,r/celebJObuds,2018-05-02 17:32:46,Negative,Negative,2.0,submission,...,False,False,False,1022.0,2649.0,2018-04-04 04:06:28,others,2018,28 days 13:26:18,28.0
3262,t1_dytrknx,/r/premed/comments/8iiisc/scrolling_through_my...,"That makes me so mad! Damn you, Caduceus!",t1_dys31hq,r/premed,2018-05-11 19:17:10,Negative,Negative,1.0,comment,...,False,False,False,8516.0,2057.0,2018-04-27 18:57:47,others,2018,14 days 00:19:23,14.0
4394,t1_e2ulws0,/r/movies/comments/910nqx/just_a_friendly_remi...,OP really picked the wrong example to prove th...,t1_e2ujdpj,r/movies,2018-07-22 20:58:35,Negative,Negative,24.0,comment,...,False,False,False,34616.0,42684.0,2018-07-15 17:03:43,unverified,2018,7 days 03:54:52,7.0
4545,t1_e2zfi6i,/r/JerkOffToCelebs/comments/91mf8f/which_redhe...,ScarJo i mean dat ass,t3_91mf8f,r/JerkOffToCelebs,2018-07-25 02:22:00,Negative,Negative,11.0,submission,...,False,False,False,309.0,1.0,2018-07-21 15:43:24,unverified,2018,3 days 10:38:36,3.0
5474,t1_e8ffubo,/r/CelebAssPussyMouth/comments/9r48bq/another_...,G and K all day long.\n\nIt was a difficult ch...,t3_9r48bq,r/CelebAssPussyMouth,2018-10-25 14:56:22,Negative,Negative,4.0,submission,...,False,False,False,684.0,600.0,2018-10-01 12:42:45,others,2018,24 days 02:13:37,24.0


<a id='contribution_creation'></a>
>>### Estimation of Number of User Accounts Created in each year / having contributions in 2021

<ul>
<li><a href="#diff">Difference in time between creating the account and posting</a></li>
<li><a href="#posting_duration">Posting Duration After Account Creation</a></li>
<li><a href="#contribution_creation"><b><mark>The Number of Accounts Created in each year / having contributions in 2018</mark></b></a></li>
</ul>

In [178]:
# group by creation year and count 
df_contributions = df_merged.groupby(df_merged['user_created_at'].dt.year).size().reset_index(name='n_accounts')

fig = px.bar(df_contributions,
             x='user_created_at', y='n_accounts', text='n_accounts', title='Number of User Accounts Created in each year / having contributions in 2018')
fig.update_traces(marker_color='#5296dd',
                  marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()

<ul>
<li><a href="#explore_reddit">Reddit Contributions (Comments / Submissions)</a></li>
<li><a href="#reddit_comments">Reddit Comments</a></li>
<li><a href="#reddit_submissions">Reddit Submissions</a></li>
<li><a href="#subredits">Subredits</a></li>
<li><a href="#explore_merged">Merged Users Data with Comments/Submissions Data</a></li>
</ul>

<li><a href="#peak_days"><b><mark>Peak Days</mark></b></a></li>
<a id='peak_days'></a>

># Peaks
<ul>
<li><a href="#peak_months"><b><mark>Peak Months</mark></b></a></li>
<li><a href="#DaysOfMonth">Peak Days of Month</a></li>
<li><a href="#DaysOfWeek">Peak Days of Week</a></li>
<li><a href="#peak_hrs">Peak Hours</a></li>
<li><a href="#peak_dates">Peak Dates</a></li>
</ul>

<a id='peak_months'></a>
>>### Contributions count over Months in 2018

In [179]:
fig = px.bar(df.groupby(df['created_at'].dt.month).size().reset_index(name='contribution_count'),
             x='created_at', y='contribution_count', text='contribution_count')

fig.update_layout(
            title={
        'text': "Estimation of the number contributions created in each month of 2018",
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'
        })

fig.update_layout(
    xaxis = dict(
        title='Month(2018)',
        tickmode = 'array',
        tickvals = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
        ticktext = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
        )
)

clrs = ['red' if (y > 4000) else '#5296dd' for y in df.groupby(df['created_at'].dt.month).size()] 

fig.update_traces(marker_color=clrs,
                  marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()

<a id='DaysOfMonth'></a>
>>### Contributions count over Days of month in 2018

<ul>
<li><a href="#peak_months">Peak Months</a></li>
<li><a href="#DaysOfMonth"><b><mark>Peak Days of Month</mark></b></a></li>
<li><a href="#DaysOfWeek">Peak Days of Week</a></li>
<li><a href="#peak_hrs">Peak Hours</a></li>
<li><a href="#peak_dates">Peak Dates</a></li>
</ul>

In [180]:
fig = px.bar(df.groupby(df['created_at'].dt.day).size().reset_index(name='contribution_count'),
             x='created_at', y='contribution_count', text='contribution_count')

fig.update_layout(
            title={
        'text': "Estimation of the number contributions created in each DayOfMonth in 2018",
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'
        })

fig.update_layout(
    xaxis = dict(
        title='Month Days(2018)',
        tickmode = 'linear',
    )
)



clrs = ['red' if (y > 1500) else '#5296dd' for y in df.groupby(df['created_at'].dt.day).size()] 

fig.update_traces(marker_color=clrs,
                  marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()

<a id='DaysOfWeek'></a>
>>### In Which DayOfWeek users created more?

<ul>
<li><a href="#peak_months">Peak Months</a></li>
<li><a href="#DaysOfMonth">Peak Days of Month</a></li>
<li><a href="#DaysOfWeek"><b><mark>Peak Days of Week</mark></b></a></li>
<li><a href="#peak_hrs">Peak Hours</a></li>
<li><a href="#peak_dates">Peak Dates</a></li>
</ul>

In [181]:
week_day = df['created_at'].dt.strftime('%a')
# one can sort by any order by providing a custom index explicitely :
# https://stackoverflow.com/questions/43855474/changing-sort-in-value-counts/43855492
week_sorted = week_day.value_counts()[['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']]
week_sorted

Mon     811
Tue    1111
Wed    1406
Thu    1185
Fri     865
Sat     741
Sun     874
Name: created_at, dtype: int64

https://realpython.com/pandas-sort-python/

https://www.py4u.net/discuss/11286

In [182]:
fig = px.bar(df.groupby(df['created_at'].dt.dayofweek).size().reset_index(name='contribution_count'),
             x='created_at', y='contribution_count', text='contribution_count')

fig.update_layout(
            title={
        'text': "Estimation of the number contributions created in each DayOfWeek (2018)",
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'
        })

fig.update_layout(
    xaxis = dict(
        title='DayOfWeek(2018)',
        tickmode = 'array',
        tickvals = [0, 1, 2, 3, 4, 5, 6],
        ticktext = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
        )
)

clrs = ['red' if (y > 4000) else '#5296dd' for y in df.groupby(df['created_at'].dt.dayofweek).size()] 

fig.update_traces(marker_color=clrs,
                  marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()

<a id='peak_hrs'></a>
>>### check for the hour  the contributions were made (2018)

<ul>
<li><a href="#peak_months">Peak Months</a></li>
<li><a href="#DaysOfMonth">Peak Days of Month</a></li>
<li><a href="#DaysOfWeek">Peak Days of Week</a></li>
<li><a href="#peak_hrs"><b><mark>Peak Hours</mark></b></a></li>
<li><a href="#peak_dates">Peak Dates</a></li>
</ul>

In [183]:
# check for the hour the contributions were made
df_hours = df.groupby(df['created_at'].dt.hour).size().reset_index(name='contribution_count')
# df_hours.sort_values('contribution_count', ascending=False);   


fig = px.bar(df_hours,
             x='created_at', y='contribution_count', 
             title='Number of contrbutions Comment/Submission in day hours (2018)')

fig.update_layout(
    xaxis = dict(
        title='Hours of Day',
        tickmode = 'linear',
        dtick = 1
    )
)

fig.update_traces(marker_color='#5296dd',
                  marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()


It's weird to have high contributions all the day!!

<a id='peak_dates'><a/>
>>### Which dates has the highest contrbitions for users?

<ul>
<li><a href="#peak_months">Peak Months</a></li>
<li><a href="#DaysOfMonth">Peak Days of Month</a></li>
<li><a href="#DaysOfWeek">Peak Days of Week</a></li>
<li><a href="#peak_hrs">Peak Hours</a></li>
<li><a href="#peak_dates"><b><mark>Peak Dates</mark></b></a></li>
</ul>

In [184]:
df.created_at.dt.date.value_counts().head()

2018-12-19    244
2018-07-03    190
2018-12-20    150
2018-07-22    137
2018-07-24    128
Name: created_at, dtype: int64

In [185]:
trendy_dates = df.groupby(df['created_at'].dt.date).size().reset_index(name='contribution_count')

fig = px.bar(trendy_dates,
             x='created_at', y='contribution_count')

fig.update_layout(
            title={
        'text': "The number of contributions created in each date",
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'
        })

fig.update_traces(marker_color='#5296dd',
                  marker_line_width=0.5, opacity=1, textposition='auto').update_layout()
fig.show()

In [186]:
trendy_dates.sort_values('contribution_count', ascending=False)

Unnamed: 0,created_at,contribution_count
339,2018-12-19,244
173,2018-07-03,190
340,2018-12-20,150
192,2018-07-22,137
194,2018-07-24,128
...,...,...
50,2018-02-26,1
87,2018-04-04,1
266,2018-10-07,1
43,2018-02-19,1


In [187]:
# get the top 5 trendy dates first, then sort them by date
top_trendy_dates = trendy_dates.sort_values('contribution_count', ascending=False).head(5)
top_trendy_dates.sort_values('created_at', inplace=True)

In [188]:
top_trendy_dates

Unnamed: 0,created_at,contribution_count
173,2018-07-03,190
192,2018-07-22,137
194,2018-07-24,128
339,2018-12-19,244
340,2018-12-20,150


In [189]:
top_trendy_dates.reset_index(inplace=True)

In [190]:
fig = px.bar(top_trendy_dates,
             x='created_at', y='contribution_count', title='Number of contrbutions Comment/Submission in trendy dates')

fig.update_layout(
    xaxis = dict(
        title='Contribution Date',
        tickmode = 'array',
        tickvals = top_trendy_dates.created_at,
    )
)

clrs = ['red' if (y > 400) else '#5296dd' for y in top_trendy_dates.contribution_count] 

fig.update_traces(marker_color=clrs,
                  opacity=1, textposition='auto').update_layout()

# marker_line_width=1.5,

fig.show()


In [191]:
def peack_days(date):
    print(f'How many users contributed on the peak day ({date})')
    print(df_merged[df_merged.created_at.dt.strftime('%Y-%m-%d') == date].user_name.nunique())
    
    print('Years')
    df_merged_peak1 = df_merged[df_merged.created_at.dt.strftime('%Y-%m-%d') == date]


    # check for the year the accounts were created
    df_user_year1 = df_merged_peak1.groupby(df_merged_peak1['user_created_at'].dt.year).size().reset_index(name='contribution_count')

    fig = px.bar(df_user_year1,
                 x='user_created_at', y='contribution_count', 
                 title=f'The creation year of the accounts contributed on the peak day ({date})')

    fig.update_layout(
        xaxis = dict(
            title='Accout Creation Year',
            tickmode = 'linear',
            dtick = 1
        )
    )

    clrs = ['red' if (y > 250) else '#5296dd' for y in df_user_year1.contribution_count] 

    fig.update_traces(marker_color=clrs,
                      marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
    fig.show()
    
    print('hours')
    
    df_peak_1 = df[df.created_at.dt.strftime('%Y-%m-%d') == f'{date}']

    # check for the hour the contributions were made
    df_hours = df_peak_1.groupby(df['created_at'].dt.hour).size().reset_index(name='contribution_count')
    # df_hours.sort_values('contribution_count', ascending=False);   


    fig2 = px.bar(df_hours,
                 x='created_at', y='contribution_count', 
                 title='Number of contrbutions Comment/Submission in day hours')

    fig2.update_layout(
        xaxis = dict(
            title='Hours of Day',
            tickmode = 'linear',
            dtick = 1
        )
    )

    clrs = ['red' if (y > 80) else '#5296dd' for y in df_hours.contribution_count] 

    fig2.update_traces(marker_color=clrs,
                      marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
    fig2.show()



    

### How many users contributed on the peak day (Jul 03,2018)

In [192]:
peack_days('2018-07-03')

How many users contributed on the peak day (2018-07-03)
98
Years


hours


### How many users contributed on the peak day (Jul 22,2018)

In [193]:
peack_days('2018-07-22')

How many users contributed on the peak day (2018-07-22)
50
Years


hours


### How many users contributed on the peak day (Jul 24,2018)

In [194]:
peack_days('2018-07-24')

How many users contributed on the peak day (2018-07-24)
79
Years


hours


### How many users contributed on the peak day (Dec 19,2018)

In [195]:
peack_days('2018-12-19')

How many users contributed on the peak day (2018-12-19)
96
Years


hours


### How many users contributed on the peak day (Dec 20,2018)

In [196]:
peack_days('2018-12-20')

How many users contributed on the peak day (2018-12-20)
94
Years


hours


<a id='conclusions'></a>
## Conclusions

<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions"><b><mark>Conclusions</mark></b></a></li>
</ul>