# Determining The Best Posts To Make On Hacker News

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.

We should make use of Hacker News's extreme popularity by discovering the types of posts that receive the most comments and then discover the optimal time of day to post. As a result, every post we make in the future will have a greater chance of ranking high on the Hacker News's listings.

In [1]:
# Read in the data.
import csv

with open("hacker_news.csv") as f:
    reader = csv.reader(f)
    hn = list(reader)

In [2]:
# Remove 1st row from data and save it into headers.
headers = hn[0]
del hn[0]

In [3]:
# Filter posts starting with "Ask HN" or "Show HN" 
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1].lower()
    
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [4]:
# Compute the average number of comments "Ask HN" posts recieve.
total_ask_comments = 0

for post in ask_posts:
    num_comments = int(post[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

14.038417431192661


In [5]:
# Compute the average number of comments "Show HN" posts recieve.
total_show_comments = 0

for post in show_posts:
    num_comments = int(post[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)

10.31669535283993


"Ask HN" posts recieve more comments on average than "Show HN" posts. Because of this, we will only analyze "Ask HN" posts from here on out.

In [6]:
# Import the datetime module.
import datetime as dt

In [7]:
# Create a list of lists from posts in ask_posts. 
# [[Date and Time, Number of Comments]]
result_list = []

for post in ask_posts:
    num_comments = int(post[4])
    datetime_created = post[6]
    result_list.append([datetime_created, num_comments])
# For each hour in result_list, find the number of posts and comments
# made for that hour.
# {Hour: Count}
counts_by_hour = {}
comments_by_hour = {}

for result in result_list:
    date = dt.datetime.strptime(result[0], "%m/%d/%Y %H:%S")
    hour = date.strftime("%H")
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = result[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += result[1]
# Calculate the average number of comments for posts created for
# each hour.
# [[Hour, Average Number of Comments]]
avg_by_hour = []

for hour in counts_by_hour:
    post_count = counts_by_hour[hour]
    comment_count = comments_by_hour[hour]
    
    avg_comments = comment_count / post_count
    avg_by_hour.append([hour, avg_comments])

In [8]:
# Swap the columns of each row in avg_by_hour.
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
# Sort the lists by avg comments in descending order.
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top 5 Hours for Ask Posts Comments\n")
# Print the first five row in the following format:
# "Hour:Second 00.00 average comments per post"
for row in sorted_swap[:5]:
    output = "{}: {:.2f} average comments per post"
    date_obj = dt.datetime.strptime(row[1], "%H")
    hour = date_obj.strftime("%H:%S")
    output_formatted = output.format(hour, row[0])
    print(output_formatted + "\n")

Top 5 Hours for Ask Posts Comments

15:00: 38.59 average comments per post

02:00: 23.81 average comments per post

20:00: 21.52 average comments per post

16:00: 16.80 average comments per post

21:00: 16.01 average comments per post



Assuming that our timezone is Eastern Standard in the US, then creating a post at 3:00 PM would give the highest chance to receive the most ammount of comments.