# Ask HN Posts v.s. Show HN Posts on Hacker News

**Project Description:**

In this project, I will work with a dataset of submissions to a popular technology site - Hacker News. Hacker News is a social news website focusing on computer science and entrepreneurship, run by Paul Graham's investment fund and startup incubator, Y Combinator. Hacker News is similar to reddit, where user-submitted stories (known as "posts") are voted and commented upon. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

There are two main pages on this site: "Ask HN" and "Show HN". Users submit Ask HN posts to ask the Hacker News community a specific question, while users submit Show HN posts to show the Hacker News community a project or a product.

The mission of this project is to explore data on Ask HN posts and Show HN posts to find out:

1. Which type of posts receives more comments on average.
2. What is the best time of a day to create a post that would most likely to receive more comments on average.

**I. Importing and preparing data for analysis**

In [34]:
from csv import reader

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

hn_header = hn[0]
hn_data = hn[1:]

In [35]:
def view_data(dataset, start, end, print_row_column = False):
    dataset_subset = dataset[start:end]    
    for row in dataset_subset:
        print(row)
        print('\n') 

    if print_row_column:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [36]:
print(hn_header)
print('\n') 
print(view_data(hn_data, 0, 9, True))

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']


['10482257

**II. Extracting relevant data**

First of all, I am going to extract data on Ask HN posts and Show HN posts from the whole dataset, since this project is targeted at these two types of posts.

Note: The titles of Ask HN posts start with "Ask HN:", and the titles of Show HN posts start with "Show HN:".

In [37]:
ask_hn_posts = []
show_hn_posts = []
other_posts = []

for row in hn_data:
    ask_or_show = row[1]
    ask_or_show = ask_or_show.lower()
    
    if ask_or_show.startswith('ask hn'):
        ask_hn_posts.append(row)
    elif ask_or_show.startswith('show hn'):
        show_hn_posts.append(row)
    else:
        other_posts.append(row)

In [38]:
print(view_data(ask_hn_posts, 0, 4, True))
print('\n') 
print(view_data(show_hn_posts, 0, 4, True))

['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55']


['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43']


['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']


['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20']


Number of rows: 1744
Number of columns: 7
None


['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03']


['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46']


['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']


['12178806', 'Show HN: Webscope  Easy way for web devel

As we can see, there are 1,744 Ask HN posts and 1,162 Show HN posts in this dataset.

**III. Computing and comparing numbers of comments of Ask HN posts and Show HN posts**

In [39]:
ask_total_comments = 0

for row in ask_hn_posts:
    ask_comments = row[4]
    ask_total_comments += int(ask_comments)
    ask_avg_comments = ask_total_comments/len(ask_hn_posts)
    
print("Total number of Ask HN comments:", ask_total_comments)
print("Average number of Ask HN comments:", ask_avg_comments)

Total number of Ask HN comments: 24483
Average number of Ask HN comments: 14.038417431192661


In [40]:
show_total_comments = 0

for row in show_hn_posts:
    show_comments = row[4]
    show_total_comments += int(show_comments)
    show_avg_comments = show_total_comments/len(show_hn_posts)
    
print("Total number of Show HN comments:", show_total_comments)
print("Average number of Show HN comments:", show_avg_comments)

Total number of Show HN comments: 11988
Average number of Show HN comments: 10.31669535283993


On average, Ask HN posts receive more comments than Show HN posts. 

**IV. Calculating the number of comments for Ask HN posts during each hour of the day**

In [41]:
import datetime as dt

ask_posts_by_hour = {}
ask_comments_by_hour = {}

for row in ask_hn_posts:
    ask_post_time = row[6]
    ask_post_comments = row[4]
    ask_post_comments = int(ask_post_comments)
    
    ask_post_time = dt.datetime.strptime(ask_post_time, "%m/%d/%Y %H:%M")
    ask_post_hour = ask_post_time.strftime("%H")
    
    if ask_post_hour not in ask_posts_by_hour:
        ask_posts_by_hour[ask_post_hour] = 1
        ask_comments_by_hour[ask_post_hour] = ask_post_comments
    else: 
        ask_posts_by_hour[ask_post_hour] += 1
        ask_comments_by_hour[ask_post_hour] += ask_post_comments

In [42]:
# The number of Ask HN posts created during each hour of the day

ask_posts_by_hour

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

In [43]:
# The number of comments received during each hour of the day

ask_comments_by_hour

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

In [44]:
# The average number of comments per post during each hour of the day

avg_askhn_by_hour = []

for hour in ask_posts_by_hour:
    avg_askhn_by_hour.append([hour, ask_comments_by_hour[hour]/ask_posts_by_hour[hour]])

In [45]:
print(view_data(avg_askhn_by_hour, 0, 5, True))

['09', 5.5777777777777775]


['13', 14.741176470588234]


['10', 13.440677966101696]


['14', 13.233644859813085]


['16', 16.796296296296298]


Number of rows: 24
Number of columns: 2
None


In [46]:
# Sorting data

sort_avg_askhn_by_hour = []

for row in avg_askhn_by_hour:
    sort_avg_askhn_by_hour.append([row[1], row[0]])
      
sort_avg_askhn_by_hour = sorted(sort_avg_askhn_by_hour, reverse = True)

In [47]:
print(view_data(sort_avg_askhn_by_hour, 0, 24, True))

[38.5948275862069, '15']


[23.810344827586206, '02']


[21.525, '20']


[16.796296296296298, '16']


[16.009174311926607, '21']


[14.741176470588234, '13']


[13.440677966101696, '10']


[13.233644859813085, '14']


[13.20183486238532, '18']


[11.46, '17']


[11.383333333333333, '01']


[11.051724137931034, '11']


[10.8, '19']


[10.25, '08']


[10.08695652173913, '05']


[9.41095890410959, '12']


[9.022727272727273, '06']


[8.127272727272727, '00']


[7.985294117647059, '23']


[7.852941176470588, '07']


[7.796296296296297, '03']


[7.170212765957447, '04']


[6.746478873239437, '22']


[5.5777777777777775, '09']


Number of rows: 24
Number of columns: 2
None


In [48]:
# Top 5 Best Time for Creating Ask HN Posts

print("Top 5 Best Time for Creating Ask HN Posts:")
print('\n')

for row in sort_avg_askhn_by_hour[0:5]:
    average = row[0]
    hour = row[1]

    hour_format = "%H"
    hour_ob = dt.datetime.strptime(hour, hour_format)
    hour = hour_ob.strftime("%H:%M")
    
    hour_string = "{h}: {a:.2f} average comments per posts".format(h = hour, a = average)
    print(hour_string)

Top 5 Best Time for Creating Ask HN Posts:


15:00: 38.59 average comments per posts
02:00: 23.81 average comments per posts
20:00: 21.52 average comments per posts
16:00: 16.80 average comments per posts
21:00: 16.01 average comments per posts


It is shown that 15:00, 02:00, 20:00, 16:00, 21:00 (EST) are better time for creating an Ask HN post with a higher chance to receive more comments.

**V. Calculating the number of comments for Show HN posts during each hour of the day**

In [49]:
show_posts_by_hour = {}
show_comments_by_hour = {}

for row in show_hn_posts:
    show_post_time = row[6]
    show_post_comments = row[4]
    show_post_comments = int(show_post_comments)
    
    show_post_time = dt.datetime.strptime(show_post_time, "%m/%d/%Y %H:%M")
    show_post_hour = show_post_time.strftime("%H")
    
    if show_post_hour not in show_posts_by_hour:
        show_posts_by_hour[show_post_hour] = 1
        show_comments_by_hour[show_post_hour] = show_post_comments
    else: 
        show_posts_by_hour[show_post_hour] += 1
        show_comments_by_hour[show_post_hour] += show_post_comments

In [50]:
show_posts_by_hour

{'14': 86,
 '22': 46,
 '18': 61,
 '07': 26,
 '20': 60,
 '05': 19,
 '16': 93,
 '19': 55,
 '15': 78,
 '03': 27,
 '17': 93,
 '06': 16,
 '02': 30,
 '13': 99,
 '08': 34,
 '21': 47,
 '04': 26,
 '11': 44,
 '12': 61,
 '23': 36,
 '09': 30,
 '01': 28,
 '10': 36,
 '00': 31}

In [51]:
show_comments_by_hour

{'14': 1156,
 '22': 570,
 '18': 962,
 '07': 299,
 '20': 612,
 '05': 58,
 '16': 1084,
 '19': 539,
 '15': 632,
 '03': 287,
 '17': 911,
 '06': 142,
 '02': 127,
 '13': 946,
 '08': 165,
 '21': 272,
 '04': 247,
 '11': 491,
 '12': 720,
 '23': 447,
 '09': 291,
 '01': 246,
 '10': 297,
 '00': 487}

In [52]:
avg_showhn_by_hour = []

for hour in show_posts_by_hour:
    avg_showhn_by_hour.append([hour, show_comments_by_hour[hour]/show_posts_by_hour[hour]])
    
sort_avg_showhn_by_hour = []

for row in avg_showhn_by_hour:
    sort_avg_showhn_by_hour.append([row[1], row[0]])
      
sort_avg_showhn_by_hour = sorted(sort_avg_showhn_by_hour, reverse = True)

print("Top 5 Best Time for Creating Show HN Posts:")
print('\n')

for row in sort_avg_showhn_by_hour[0:5]:
    average = row[0]
    hour = row[1]

    hour_format = "%H"
    hour_ob = dt.datetime.strptime(hour, hour_format)
    hour = hour_ob.strftime("%H:%M")
    
    hour_string = "{h}: {a:.2f} average comments per posts".format(h = hour, a = average)
    print(hour_string)

Top 5 Best Time for Creating Show HN Posts:


18:00: 15.77 average comments per posts
00:00: 15.71 average comments per posts
14:00: 13.44 average comments per posts
23:00: 12.42 average comments per posts
22:00: 12.39 average comments per posts


It is shown that 18:00, 00:00, 14:00, 23:00, 22:00 (EST) are better time for creating a Show HN post with a higher chance to receive more comments.

**Summary**

In this project, I've discovered that:

1. On the Hacker News website, users seem to be more active on the "Ask HN" page than on the "Show HN" page because the former page contains more posts and comments per post than the latter on average. 
2. Furthermore, it is recommended that one creates an Ask HN post during the hours of 15:00, 02:00, 20:00, 16:00, 21:00 (EST) or a Show HN post during the hours of 18:00, 00:00, 14:00, 23:00, 22:00 (EST) if he/she aims for higher engagements, since posts created during these hours receive more comments on average.