# Examining Ask HN and Show HN postings on Hacker News

In this project, we compare the comments on 'Ask HN' and 'Show HN' posts on the website Hacker News to see which have more comments. We also look at number of comments versus the time at which posts were created.

In [1]:
import csv

f = open('hacker_news.csv')
rdr = csv.reader(f)

hn = []
for row in rdr:
    hn.append(row)
    
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [3]:
import re
ask_posts = []
show_posts = []
other_posts = []

ask_patt = r'^Ask HN'
show_patt = r'Show HN'

for row in hn:
    title = row[1]
    match1 = re.search(ask_patt, title, re.I)
    match2 = re.search(show_patt, title, re.I)
    if match1:
        ask_posts.append(row)
    elif match2:
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('Num Ask: ',len(ask_posts)) 
print('Num Show: ',len(show_posts)) 
print('Num other: ',len(other_posts)) 

Num Ask:  1744
Num Show:  1165
Num other:  17191


In [4]:
ask_comments = [int(row[4]) for row in ask_posts]
show_comments = [int(row[4]) for row in show_posts]

avg_ask_comments = sum(ask_comments) / len(ask_comments)
avg_show_comments = sum(show_comments) / len(show_comments)

print('Avg ask comments: ', avg_ask_comments)
print('Avg show comments: ', avg_show_comments)

Avg ask comments:  14.038417431192661
Avg show comments:  10.302145922746782


On average, Ask HN posts receive more comments (14) than Show HN posts (10).

In [5]:
import datetime as dt

created_date = [row[6] for row in ask_posts]

counts_by_hour = {}
comments_by_hour = {}

result_list = zip(created_date, ask_comments)

for date, comments in result_list:
    dtdate = dt.datetime.strptime(date, '%m/%d/%Y %H:%M')
    hr = dtdate.hour
    if hr in counts_by_hour:
        counts_by_hour[hr] += 1
        comments_by_hour[hr] += comments
    else:
        counts_by_hour[hr] = 1
        comments_by_hour[hr] = comments

In [6]:
avg_by_hour = [[h, comments_by_hour[h]/counts_by_hour[h]] for h in counts_by_hour]

print(avg_by_hour)

[[9, 5.5777777777777775], [13, 14.741176470588234], [10, 13.440677966101696], [14, 13.233644859813085], [16, 16.796296296296298], [23, 7.985294117647059], [12, 9.41095890410959], [17, 11.46], [15, 38.5948275862069], [21, 16.009174311926607], [20, 21.525], [2, 23.810344827586206], [18, 13.20183486238532], [3, 7.796296296296297], [5, 10.08695652173913], [19, 10.8], [1, 11.383333333333333], [22, 6.746478873239437], [8, 10.25], [4, 7.170212765957447], [0, 8.127272727272727], [6, 9.022727272727273], [7, 7.852941176470588], [11, 11.051724137931034]]


In [7]:
avg_by_hour = sorted(avg_by_hour, key=lambda row:row[1], reverse=True)


In [8]:
# Print the sorted list of most popular hours to post
for hr, comments in avg_by_hour:
    hr_dt = dt.datetime.strptime(str(hr),'%H')
    hr_str = hr_dt.strftime("%H:%M")
    print("{}: {:.2f} average comments per post".format(hr_str, comments))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post
10:00: 13.44 average comments per post
14:00: 13.23 average comments per post
18:00: 13.20 average comments per post
17:00: 11.46 average comments per post
01:00: 11.38 average comments per post
11:00: 11.05 average comments per post
19:00: 10.80 average comments per post
08:00: 10.25 average comments per post
05:00: 10.09 average comments per post
12:00: 9.41 average comments per post
06:00: 9.02 average comments per post
00:00: 8.13 average comments per post
23:00: 7.99 average comments per post
07:00: 7.85 average comments per post
03:00: 7.80 average comments per post
04:00: 7.17 average comments per post
22:00: 6.75 average comments per post
09:00: 5.58 average comments per post


The best time to post to get the most comments are at 15, 2, and 20 UTC. Those times are 7am, 6pm, and noon on the West Coast of the US. They seem to fall at the edges of the workday here. I wonder what percent of HN readers/commenters are Californians.