# Hacker News - Maximising Engagement

Hacker News is a site started by the startup incubator [Y Combinator](https://www.ycombinator.com/). Much like reddit, users submit posts which can receive votes and comments. Popular among technology and startup circles, posts that reach the top of the Hacker News listings can reach hundreds of thousands of visitors.

In this project, we will be comparing two different types of posts, `Ask HN` and `Show HN`. Users submit `Ask HN` posts to request guidance from the Hacker News community or `Show HN` posts to share projects, products or anything interesting.  

Our aim is the demonstrate which type of post receives more comments and to explore which times are the best to post to maximise the amount of comments a post may receive. As such we must answer the following questions:
- Between `Ask HN` and `Show HN` posts, which receives more comments on average
- How many comments do posts receive on average relative to the time of posting

It is important to note that the dataset we are working with was reduced from almost 300,000 rows to approximately 20,000 rows by removing submissions that did not receive any comments, and then randomly sampling from the remaining submissions. 

In [1]:
# Read in Data and remove headers

from csv import reader
hn = list(reader(open('hacker_news.csv')))
headers = hn[0]
hn = hn[1:]

In [2]:
# Identify posts that begin with `Ask HN` and `Show HN`, separate into lists and tally totals

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
    
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


In [3]:
# Compare the average number of comments on Ask HN posts and Show HN posts

total_ask_comments = 0

for row in ask_posts:
    total_ask_comments += int(row[4])
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print(avg_ask_comments)

total_show_comments = 0

for row in show_posts:
    total_show_comments += int(row[4])
    
avg_show_comments = total_show_comments/len(show_posts)
print(avg_show_comments)

14.038417431192661
10.31669535283993


Based on our sample, ask posts have, on average, approximately 14 comments per post whereas show posts have approximately 10. This shows that ask posts are more likely to receive comments, so we will focus our analysis on these posts.

In [4]:
# Calculate number of posts created in each hour of the day and average number of comments received by hour created

import datetime as dt

result_list = []

for row in ask_posts:
    result_list.append([row[6], int(row[4])])
    
counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for row in result_list:
    date = row[0]
    comment = row[1]
    time = dt.datetime.strptime(date, date_format).strftime("%H")
    if time in counts_by_hour:
        counts_by_hour[time] += 1
        comments_by_hour[time] += comment
    else:
        counts_by_hour[time] = 1
        comments_by_hour[time] = comment

avg_by_hour = []
for time in comments_by_hour:
    avg_by_hour.append([time, comments_by_hour[time]/counts_by_hour[time]])
        
print(avg_by_hour)

[['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]


In [5]:
# Sorting the list and showing 5 best hours to post

swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse=True)    

print("Top 5 Hours for Ask Posts Comments")

for avg, hr in sorted_swap[:5]:
    print(
        "{}, {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"), avg
        )
    )

Top 5 Hours for Ask Posts Comments
15:00, 38.59 average comments per post
02:00, 23.81 average comments per post
20:00, 21.52 average comments per post
16:00, 16.80 average comments per post
21:00, 16.01 average comments per post


### Conclusion

In this project, we analysed a sample dataset of ask posts and show posts on Hacker News to determine which category receives the highest average comments and at what time. The findings above suggest that the best time to maximise comments a post receives is between 15:00 and 16:00(3pm est - 4pm est). Adjusted to the GMT timezone in the UK, that would be between 8pm gmt and 9pm gmt. 

However, as our sample excluded posts without any comments, it would be more accurate to say that the above holds true *of the posts that received comments*. 