# Access to data.

In [1]:
# pip install psycopg2-binary.
!pip install psycopg2-binary

Collecting psycopg2-binary
[?25l  Downloading https://files.pythonhosted.org/packages/1e/c0/16303cef8d54fdcfae7be7880cf471f21449225687f61cc3be2a7ef4e6e5/psycopg2_binary-2.8.4-cp36-cp36m-manylinux1_x86_64.whl (2.9MB)
[K     |████████████████████████████████| 2.9MB 2.8MB/s 
[?25hInstalling collected packages: psycopg2-binary
Successfully installed psycopg2-binary-2.8.4


In [0]:
# imports.
import numpy as np
import pandas as pd
from sqlalchemy import create_engine
from google.cloud import bigquery
from google.oauth2 import service_account

In [0]:
# connection to postgresss database.
engine = create_engine('postgres://ibnzqkfl:rYgeprTJq6jD_eR0bxEXwAnYX7fM-yRD@rajje.db.elephantsql.com:5432/ibnzqkfl')

In [0]:
pg_conn = engine.connect()

In [0]:
 # google service account credentials.
 credentials = service_account.Credentials.from_service_account_file('HackerNews-a13892bba4db.json')
 # label the project.
project_id = 'SaltyNews-HackerNews'

In [0]:
# set the bigquery client.
client = bigquery.Client(credentials=credentials, project=project_id)

In [0]:
# set the referenced dataset and proect from bigquery.
dp_ref = client.dataset('hacker_news', project='bigquery-public-data')

In [0]:
# set the referenced tabe 'comments' from the bigquery hacker news dataset.
table_ref = dp_ref.table('comments')
# get the table from big query.
comments_table = client.get_table(table_ref)
# create the dataframe with 30,000 rows, ElephantSQL limits it 20MB.
HNcommentsDB = client.list_rows(comments_table, max_results=30000).to_dataframe()

In [9]:
print(HNcommentsDB.shape)
HNcommentsDB.head()

(30000, 10)


Unnamed: 0,id,by,author,time,time_ts,text,parent,deleted,dead,ranking
0,2701393,5l,5l,1309184881,2011-06-27 14:28:01+00:00,And the glazier who fixed all the broken windo...,2701243,,,0
1,5811403,99,99,1370234048,2013-06-03 04:34:08+00:00,Does canada have the equivalent of H1B/Green c...,5804452,,,0
2,21623,AF,AF,1178992400,2007-05-12 17:53:20+00:00,"Speaking of Rails, there are other options in ...",21611,,,0
3,10159727,EA,EA,1441206574,2015-09-02 15:09:34+00:00,Humans and large livestock (and maybe even pet...,10159396,,,0
4,2988424,Iv,Iv,1315853580,2011-09-12 18:53:00+00:00,I must say I reacted in the same way when I re...,2988179,,,0


In [10]:
# drop unecessary columns.
HNcommentsDB = HNcommentsDB.drop(columns=['dead', 'deleted', 'by'])
HNcommentsDB = HNcommentsDB.rename(columns={"time":"order"})
# use datetime on time_ts column.
HNcommentsDB['time_ts'] = pd.to_datetime(HNcommentsDB['time_ts'], infer_datetime_format=True)
# seperate the date and time from time_ts column.
HNcommentsDB['date'] = [d.date() for d in HNcommentsDB['time_ts']]
HNcommentsDB['time'] = [d.time() for d in HNcommentsDB['time_ts']]
# sepereate the year, month, day for date column.
HNcommentsDB['year'] = HNcommentsDB['date'].map(lambda x: x.year)
HNcommentsDB['month'] = HNcommentsDB['date'].map(lambda x: x.month)
HNcommentsDB['day'] = HNcommentsDB['date'].map(lambda x: x.day)
# drop the time_ts column.
HNcommentsDB = HNcommentsDB.drop(columns=['time_ts', 'date'])
# reorganize the columns.
HNcommentsDB = HNcommentsDB[['year', 'month', 'day', 'time', 'order', 'author', 'id', 'text', 'ranking']]
# drop any duplicate columns based on id column.
HNcommentsDB.drop_duplicates(subset ="id", keep = "first", inplace = True)
# keep the most frequest 1000 authors and their comments.
n = 1000
frequent_list = HNcommentsDB['author'].value_counts()[:n].index.tolist()
top_commentors = HNcommentsDB[HNcommentsDB['author'].isin(frequent_list)]
# show data frame shape.
print(top_commentors.shape)
# show the data frame with headers.
top_commentors.head(10)

(14610, 9)


Unnamed: 0,year,month,day,time,order,author,id,text,ranking
9,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I ...",0
10,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing s...,0
11,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he ...",0
12,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being gener...",0
13,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted i...,0
14,2013,6,5,06:35:40,1370414140,Jd,5824036,"Sadly doesn't provide any filtering on tags, w...",0
15,2007,10,27,06:36:41,1193467001,Jd,73111,Feferman usefully explores the presuppositions...,0
16,2012,9,25,08:38:22,1348562302,Jd,4569290,Here are my take aways:<p>(1) Say that you can...,0
19,2012,12,9,19:55:35,1355082935,Mz,4895850,"So, basically, you think I have Munchausen the...",0
20,2015,10,1,18:56:55,1443725815,Mz,10313701,One way to test your hypothesis is to start re...,0


In [11]:
top_commentors.isnull().sum()

year       0
month      0
day        0
time       0
order      0
author     0
id         0
text       0
ranking    0
dtype: int64

In [12]:
!pip install vaderSentiment

Collecting vaderSentiment
[?25l  Downloading https://files.pythonhosted.org/packages/86/9e/c53e1fc61aac5ee490a6ac5e21b1ac04e55a7c2aba647bb8411c9aadf24e/vaderSentiment-3.2.1-py2.py3-none-any.whl (125kB)
[K     |██▋                             | 10kB 18.2MB/s eta 0:00:01[K     |█████▏                          | 20kB 1.7MB/s eta 0:00:01[K     |███████▉                        | 30kB 2.5MB/s eta 0:00:01[K     |██████████▍                     | 40kB 1.7MB/s eta 0:00:01[K     |█████████████                   | 51kB 2.1MB/s eta 0:00:01[K     |███████████████▋                | 61kB 2.5MB/s eta 0:00:01[K     |██████████████████▎             | 71kB 2.9MB/s eta 0:00:01[K     |████████████████████▉           | 81kB 3.2MB/s eta 0:00:01[K     |███████████████████████▍        | 92kB 3.6MB/s eta 0:00:01[K     |██████████████████████████      | 102kB 2.8MB/s eta 0:00:01[K     |████████████████████████████▋   | 112kB 2.8MB/s eta 0:00:01[K     |███████████████████████████████▎| 12

In [13]:
!pip install paralleldots

Collecting paralleldots
  Downloading https://files.pythonhosted.org/packages/6a/69/f1d3dede5bdf9a25d683e031ce5214328e9575d4b097db0247800b3b80db/ParallelDots-3.2.13-py3-none-any.whl
Installing collected packages: paralleldots
Successfully installed paralleldots-3.2.13


In [0]:
import html
import pandas as pd
from html.parser import HTMLParser
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import json
import requests
import numpy as np

In [0]:
pd.set_option('display.max_colwidth', -1)

In [0]:
class HTMLStripper(HTMLParser):
    def __init__(self):
        super().__init__()
        self.reset()
        self.fed = []
    def handle_data(self, d):
        self.fed.append(d)
    def get_data(self):
        return ' '.join(self.fed)

def strip_tags(html):
    s = HTMLStripper()
    s.feed(html)
    return s.get_data()

In [0]:
df = top_commentors[top_commentors['text'].notnull()]

In [18]:
df['text']  = df['text'].apply(html.unescape)
df['text'] = df['text'].apply(strip_tags)
df.head(10)

Unnamed: 0,year,month,day,time,order,author,id,text,ranking
9,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I tend to think employee monitoring is evil and if I am personally not engaged in the work I am doing I would rather find engaging work than look at the time I waste on other things (HN, for example). A proposed startup: web interface that would block all devices totally (or simply from the internet) for certain period of time. For instance, you could click a button when you get home from work that says, ""I'm done for the evening"" that removes all electronic distractions until some set time in the morning. Possibilities exist on different devices for this, but nothing that ties them all together. Of course, it may not be that popular since not all that many people that are addicted to their devices treat it as a serious problem to be remedied (HN addicts included).",0
10,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing some karma for it. Like I fucking care! :P,0
11,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he answered in his reddit interview that you have up there that probably don't need to be there (e.g. Resig knows Alexis and briefly did a YCombinator startup in Cambridge). Reddit link here: http://www.reddit.com/r/IAmA/related/h42ak/i_am_john_resig_c...",0
12,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being generic browser usage (I also use a web-based browser) that not all that much was useful. I think it might have been a bug in RescueTime that didn't always capture the information on the site visited in the browser. Anyways, I don't think about it much, I don't use it, and I don't miss it.",0
13,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted it :),0
14,2013,6,5,06:35:40,1370414140,Jd,5824036,"Sadly doesn't provide any filtering on tags, which is the only way to assign priority. In fact, all I really get is a list of all open issues.",0
15,2007,10,27,06:36:41,1193467001,Jd,73111,"Feferman usefully explores the presuppositions and equivocations of both Godel and Nagel in their exchanges over the mathematical mind. In the end Feferman advocates for recognizing mathematics as part of a broader open-ended domain instead of the reductive and mechanistic sense described by Nagel and others. Consequently, he ends up concurring with Godel over the issue of AI, claiming our first goal should be formulating a coherent, systematic account of how the mathematical mind works. AI is secondary and currently unapproachable.",0
16,2012,9,25,08:38:22,1348562302,Jd,4569290,"Here are my take aways: (1) Say that you can do everything (2) Be cheap (he starts at $200/day) (3) Offer something for free (4) Talk about ""shipping"" a lot I'm not sure I would want to attract the sort of clients he gets.",0
19,2012,12,9,19:55:35,1355082935,Mz,4895850,"So, basically, you think I have Munchausen therefore I have it and there is no need to actually engage me in discussion. Color me surprised.",0
20,2015,10,1,18:56:55,1443725815,Mz,10313701,"One way to test your hypothesis is to start reducing your exposure to soft plastics and materials that offgas VOCs at high rates. See if that makes any difference. However, do realize that common wisdom in alternative medicine groups (and South American tribal medicine men, etc) is that first the bad stuff has to come out. Kind of like with drug withdrawal, if this helps, it will get worse before it gets better. Much like drug withdrawal, that interim transition stage can be hellish. I do consume sodas sold in 2 liter bottles. So I don't completely avoid all plastics. But I try like hell to avoid styrofoam and other soft plastics. When offered a choice at the deli, I ask for the hard plastic container instead of the styrofoam one. /unsolicited advice",0


In [0]:
analyzer = SentimentIntensityAnalyzer()

In [0]:
def analyze(sentence):
    return analyzer.polarity_scores(sentence)

In [21]:
df['scores'] = df['text'].apply(analyze)
df[['neg', 'neu', 'pos', 'rating']] = df.scores.apply(pd.Series)
df = df.drop(columns="scores")
df.head(10)

Unnamed: 0,year,month,day,time,order,author,id,text,ranking,neg,neu,pos,rating
9,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I tend to think employee monitoring is evil and if I am personally not engaged in the work I am doing I would rather find engaging work than look at the time I waste on other things (HN, for example). A proposed startup: web interface that would block all devices totally (or simply from the internet) for certain period of time. For instance, you could click a button when you get home from work that says, ""I'm done for the evening"" that removes all electronic distractions until some set time in the morning. Possibilities exist on different devices for this, but nothing that ties them all together. Of course, it may not be that popular since not all that many people that are addicted to their devices treat it as a serious problem to be remedied (HN addicts included).",0,0.106,0.824,0.069,-0.547
10,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing some karma for it. Like I fucking care! :P,0,0.161,0.329,0.511,0.8623
11,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he answered in his reddit interview that you have up there that probably don't need to be there (e.g. Resig knows Alexis and briefly did a YCombinator startup in Cambridge). Reddit link here: http://www.reddit.com/r/IAmA/related/h42ak/i_am_john_resig_c...",0,0.0,0.931,0.069,0.4404
12,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being generic browser usage (I also use a web-based browser) that not all that much was useful. I think it might have been a bug in RescueTime that didn't always capture the information on the site visited in the browser. Anyways, I don't think about it much, I don't use it, and I don't miss it.",0,0.0,0.91,0.09,0.6722
13,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted it :),0,0.0,0.56,0.44,0.6705
14,2013,6,5,06:35:40,1370414140,Jd,5824036,"Sadly doesn't provide any filtering on tags, which is the only way to assign priority. In fact, all I really get is a list of all open issues.",0,0.101,0.899,0.0,-0.4215
15,2007,10,27,06:36:41,1193467001,Jd,73111,"Feferman usefully explores the presuppositions and equivocations of both Godel and Nagel in their exchanges over the mathematical mind. In the end Feferman advocates for recognizing mathematics as part of a broader open-ended domain instead of the reductive and mechanistic sense described by Nagel and others. Consequently, he ends up concurring with Godel over the issue of AI, claiming our first goal should be formulating a coherent, systematic account of how the mathematical mind works. AI is secondary and currently unapproachable.",0,0.0,0.965,0.035,0.4215
16,2012,9,25,08:38:22,1348562302,Jd,4569290,"Here are my take aways: (1) Say that you can do everything (2) Be cheap (he starts at $200/day) (3) Offer something for free (4) Talk about ""shipping"" a lot I'm not sure I would want to attract the sort of clients he gets.",0,0.068,0.809,0.123,0.5597
19,2012,12,9,19:55:35,1355082935,Mz,4895850,"So, basically, you think I have Munchausen therefore I have it and there is no need to actually engage me in discussion. Color me surprised.",0,0.083,0.755,0.162,0.2732
20,2015,10,1,18:56:55,1443725815,Mz,10313701,"One way to test your hypothesis is to start reducing your exposure to soft plastics and materials that offgas VOCs at high rates. See if that makes any difference. However, do realize that common wisdom in alternative medicine groups (and South American tribal medicine men, etc) is that first the bad stuff has to come out. Kind of like with drug withdrawal, if this helps, it will get worse before it gets better. Much like drug withdrawal, that interim transition stage can be hellish. I do consume sodas sold in 2 liter bottles. So I don't completely avoid all plastics. But I try like hell to avoid styrofoam and other soft plastics. When offered a choice at the deli, I ask for the hard plastic container instead of the styrofoam one. /unsolicited advice",0,0.102,0.769,0.129,-0.2722


In [22]:
df = df.drop(columns="neu")
df = df.drop(columns="pos")
df.head(10)

Unnamed: 0,year,month,day,time,order,author,id,text,ranking,neg,rating
9,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I tend to think employee monitoring is evil and if I am personally not engaged in the work I am doing I would rather find engaging work than look at the time I waste on other things (HN, for example). A proposed startup: web interface that would block all devices totally (or simply from the internet) for certain period of time. For instance, you could click a button when you get home from work that says, ""I'm done for the evening"" that removes all electronic distractions until some set time in the morning. Possibilities exist on different devices for this, but nothing that ties them all together. Of course, it may not be that popular since not all that many people that are addicted to their devices treat it as a serious problem to be remedied (HN addicts included).",0,0.106,-0.547
10,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing some karma for it. Like I fucking care! :P,0,0.161,0.8623
11,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he answered in his reddit interview that you have up there that probably don't need to be there (e.g. Resig knows Alexis and briefly did a YCombinator startup in Cambridge). Reddit link here: http://www.reddit.com/r/IAmA/related/h42ak/i_am_john_resig_c...",0,0.0,0.4404
12,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being generic browser usage (I also use a web-based browser) that not all that much was useful. I think it might have been a bug in RescueTime that didn't always capture the information on the site visited in the browser. Anyways, I don't think about it much, I don't use it, and I don't miss it.",0,0.0,0.6722
13,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted it :),0,0.0,0.6705
14,2013,6,5,06:35:40,1370414140,Jd,5824036,"Sadly doesn't provide any filtering on tags, which is the only way to assign priority. In fact, all I really get is a list of all open issues.",0,0.101,-0.4215
15,2007,10,27,06:36:41,1193467001,Jd,73111,"Feferman usefully explores the presuppositions and equivocations of both Godel and Nagel in their exchanges over the mathematical mind. In the end Feferman advocates for recognizing mathematics as part of a broader open-ended domain instead of the reductive and mechanistic sense described by Nagel and others. Consequently, he ends up concurring with Godel over the issue of AI, claiming our first goal should be formulating a coherent, systematic account of how the mathematical mind works. AI is secondary and currently unapproachable.",0,0.0,0.4215
16,2012,9,25,08:38:22,1348562302,Jd,4569290,"Here are my take aways: (1) Say that you can do everything (2) Be cheap (he starts at $200/day) (3) Offer something for free (4) Talk about ""shipping"" a lot I'm not sure I would want to attract the sort of clients he gets.",0,0.068,0.5597
19,2012,12,9,19:55:35,1355082935,Mz,4895850,"So, basically, you think I have Munchausen therefore I have it and there is no need to actually engage me in discussion. Color me surprised.",0,0.083,0.2732
20,2015,10,1,18:56:55,1443725815,Mz,10313701,"One way to test your hypothesis is to start reducing your exposure to soft plastics and materials that offgas VOCs at high rates. See if that makes any difference. However, do realize that common wisdom in alternative medicine groups (and South American tribal medicine men, etc) is that first the bad stuff has to come out. Kind of like with drug withdrawal, if this helps, it will get worse before it gets better. Much like drug withdrawal, that interim transition stage can be hellish. I do consume sodas sold in 2 liter bottles. So I don't completely avoid all plastics. But I try like hell to avoid styrofoam and other soft plastics. When offered a choice at the deli, I ask for the hard plastic container instead of the styrofoam one. /unsolicited advice",0,0.102,-0.2722


In [0]:
author = df.author.unique()
top = []
names = ['author', 'average']
for a in author:
  top1 = []
  top1.append(a)
  x = df[df['author']== a]
  value = x['neg'].mean()
  top1.append(value)
  top.append(top1)

df1 = pd.DataFrame(columns = names, data = top)

In [24]:
sorted_df = df1.sort_values(by='average', ascending=False)
sorted_df.reset_index()

Unnamed: 0,index,author,average
0,332,iLoch,0.277333
1,14,Nux,0.178143
2,856,hartror,0.172500
3,766,alan_cx,0.162909
4,782,ballard,0.158714
...,...,...,...
995,326,gtani,0.007364
996,614,mentat,0.006667
997,757,Revisor,0.005833
998,702,troymc,0.000000


In [0]:
df = df.reset_index()

In [28]:
df = df.drop(columns='index')
df.head()

Unnamed: 0,year,month,day,time,order,author,id,text,ranking,neg,rating
0,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I tend to think employee monitoring is evil and if I am personally not engaged in the work I am doing I would rather find engaging work than look at the time I waste on other things (HN, for example). A proposed startup: web interface that would block all devices totally (or simply from the internet) for certain period of time. For instance, you could click a button when you get home from work that says, ""I'm done for the evening"" that removes all electronic distractions until some set time in the morning. Possibilities exist on different devices for this, but nothing that ties them all together. Of course, it may not be that popular since not all that many people that are addicted to their devices treat it as a serious problem to be remedied (HN addicts included).",0,0.106,-0.547
1,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing some karma for it. Like I fucking care! :P,0,0.161,0.8623
2,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he answered in his reddit interview that you have up there that probably don't need to be there (e.g. Resig knows Alexis and briefly did a YCombinator startup in Cambridge). Reddit link here: http://www.reddit.com/r/IAmA/related/h42ak/i_am_john_resig_c...",0,0.0,0.4404
3,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being generic browser usage (I also use a web-based browser) that not all that much was useful. I think it might have been a bug in RescueTime that didn't always capture the information on the site visited in the browser. Anyways, I don't think about it much, I don't use it, and I don't miss it.",0,0.0,0.6722
4,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted it :),0,0.0,0.6705


In [32]:
for x in range(len(sorted_df.values)):
  entry = sorted_df.loc[x]
  average = entry['average']
  author = entry['author']
  rank = x + 1
  df[df['author'] == author]['average'] = average
  df[df['author'] == author]['rank'] = rank

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


In [34]:
df['average'] = 0
df.head()

Unnamed: 0,year,month,day,time,order,author,id,text,ranking,neg,rating,average
0,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I tend to think employee monitoring is evil and if I am personally not engaged in the work I am doing I would rather find engaging work than look at the time I waste on other things (HN, for example). A proposed startup: web interface that would block all devices totally (or simply from the internet) for certain period of time. For instance, you could click a button when you get home from work that says, ""I'm done for the evening"" that removes all electronic distractions until some set time in the morning. Possibilities exist on different devices for this, but nothing that ties them all together. Of course, it may not be that popular since not all that many people that are addicted to their devices treat it as a serious problem to be remedied (HN addicts included).",0,0.106,-0.547,0
1,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing some karma for it. Like I fucking care! :P,0,0.161,0.8623,0
2,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he answered in his reddit interview that you have up there that probably don't need to be there (e.g. Resig knows Alexis and briefly did a YCombinator startup in Cambridge). Reddit link here: http://www.reddit.com/r/IAmA/related/h42ak/i_am_john_resig_c...",0,0.0,0.4404,0
3,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being generic browser usage (I also use a web-based browser) that not all that much was useful. I think it might have been a bug in RescueTime that didn't always capture the information on the site visited in the browser. Anyways, I don't think about it much, I don't use it, and I don't miss it.",0,0.0,0.6722,0
4,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted it :),0,0.0,0.6705,0


In [0]:
sorted_df['ranking'] = 0 

In [0]:
sorted_df = sorted_df.reset_index()

In [57]:
sorted_df = sorted_df.drop(columns='index')
sdf = sorted_df
sdf.head()

Unnamed: 0,author,average,ranking
0,iLoch,0.277333,0
1,Nux,0.178143,0
2,hartror,0.1725,0
3,alan_cx,0.162909,0
4,ballard,0.158714,0


In [0]:
for index, data in sdf.iterrows():
  sdf.loc[index, 'ranking'] = int(index + 1)

In [59]:
sdf.head()

Unnamed: 0,author,average,ranking
0,iLoch,0.277333,1
1,Nux,0.178143,2
2,hartror,0.1725,3
3,alan_cx,0.162909,4
4,ballard,0.158714,5


In [0]:
for index, data in df.iterrows():
  author = data['author']
  entry = sdf[sdf['author'] == author]
  rank = entry['ranking'].values[0]
  average = entry['average'].values[0]
  df.loc[index, 'average'] = average
  df.loc[index, 'ranking'] = rank

In [61]:
df.head(20)

Unnamed: 0,year,month,day,time,order,author,id,text,ranking,neg,rating,average
0,2011,8,7,04:17:26,1312690646,Jd,2855741,"Yep, I didn't find Rescuetime very helpful. I tend to think employee monitoring is evil and if I am personally not engaged in the work I am doing I would rather find engaging work than look at the time I waste on other things (HN, for example). A proposed startup: web interface that would block all devices totally (or simply from the internet) for certain period of time. For instance, you could click a button when you get home from work that says, ""I'm done for the evening"" that removes all electronic distractions until some set time in the morning. Possibilities exist on different devices for this, but nothing that ties them all together. Of course, it may not be that popular since not all that many people that are addicted to their devices treat it as a serious problem to be remedied (HN addicts included).",628,0.106,-0.547,0.0545
1,2007,9,5,17:04:05,1189011845,Jd,50570,It was a risky joke. Looks like I am losing some karma for it. Like I fucking care! :P,628,0.161,0.8623,0.0545
2,2011,5,30,22:34:14,1306794854,Jd,2600618,"Looks good, there are a bunch of questions he answered in his reddit interview that you have up there that probably don't need to be there (e.g. Resig knows Alexis and briefly did a YCombinator startup in Cambridge). Reddit link here: http://www.reddit.com/r/IAmA/related/h42ak/i_am_john_resig_c...",628,0.0,0.4404,0.0545
3,2011,5,30,21:00:05,1306789205,Jd,2600423,"A bit, but so much for me ended up being generic browser usage (I also use a web-based browser) that not all that much was useful. I think it might have been a bug in RescueTime that didn't always capture the information on the site visited in the browser. Anyways, I don't think about it much, I don't use it, and I don't miss it.",628,0.0,0.6722,0.0545
4,2010,12,8,18:12:25,1291831945,Jd,1983932,I also agree with your rejoinder and upvoted it :),628,0.0,0.6705,0.0545
5,2013,6,5,06:35:40,1370414140,Jd,5824036,"Sadly doesn't provide any filtering on tags, which is the only way to assign priority. In fact, all I really get is a list of all open issues.",628,0.101,-0.4215,0.0545
6,2007,10,27,06:36:41,1193467001,Jd,73111,"Feferman usefully explores the presuppositions and equivocations of both Godel and Nagel in their exchanges over the mathematical mind. In the end Feferman advocates for recognizing mathematics as part of a broader open-ended domain instead of the reductive and mechanistic sense described by Nagel and others. Consequently, he ends up concurring with Godel over the issue of AI, claiming our first goal should be formulating a coherent, systematic account of how the mathematical mind works. AI is secondary and currently unapproachable.",628,0.0,0.4215,0.0545
7,2012,9,25,08:38:22,1348562302,Jd,4569290,"Here are my take aways: (1) Say that you can do everything (2) Be cheap (he starts at $200/day) (3) Offer something for free (4) Talk about ""shipping"" a lot I'm not sure I would want to attract the sort of clients he gets.",628,0.068,0.5597,0.0545
8,2012,12,9,19:55:35,1355082935,Mz,4895850,"So, basically, you think I have Munchausen therefore I have it and there is no need to actually engage me in discussion. Color me surprised.",266,0.083,0.2732,0.076469
9,2015,10,1,18:56:55,1443725815,Mz,10313701,"One way to test your hypothesis is to start reducing your exposure to soft plastics and materials that offgas VOCs at high rates. See if that makes any difference. However, do realize that common wisdom in alternative medicine groups (and South American tribal medicine men, etc) is that first the bad stuff has to come out. Kind of like with drug withdrawal, if this helps, it will get worse before it gets better. Much like drug withdrawal, that interim transition stage can be hellish. I do consume sodas sold in 2 liter bottles. So I don't completely avoid all plastics. But I try like hell to avoid styrofoam and other soft plastics. When offered a choice at the deli, I ask for the hard plastic container instead of the styrofoam one. /unsolicited advice",266,0.102,-0.2722,0.076469


In [0]:
# convert dataframe to SQL, push to ElephantSQL.
df.to_sql('HNTopCommentors', con=engine, if_exists='replace', index=False)