# Hacker News Post Popularity Analysis

Hacker News is a site where user-submitted stories are voted and commented upon in a similar fashioin to Reddit.

Herein, we will analyse the historical post data on Hacker News to identify the characteristics of the popular posts on Hacker News.

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)

In [2]:
print(hn[0:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [3]:
# extracting headers
headers = hn[0]
hn = hn[1:]
print(hn[:10])
# headers have been successfully removed

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12'], ['10482257', 'Title II kills investment? Comcast and other ISPs are now spending more', 'http

In [4]:
ask_posts = []
show_posts = []
other_posts = []

In [5]:
for post in hn:
    title = post[1]
    if title.startswith('Ask HN'):
        ask_posts.append(post)
    elif title.startswith('Show HN'):
        show_posts.append(post)
    else:
        other_posts.append(post)
print(f'ask_posts has length {len(ask_posts)}.')
print(f'show_posts has length {len(show_posts)}.')
print(f'otherr_posts has length {len(other_posts)}.')

ask_posts has length 1742.
show_posts has length 1161.
otherr_posts has length 17197.


In [6]:
# We will now be determining if ask posts or show posts receive more comments on average.

In [7]:
def avg_comment_nums(input_list, index): #A function that takes in list and the index of the comment num column to provide the average comment numbers
    total_num_comments = 0
    for row in input_list:
        comment_num = int(row[index])
        total_num_comments += comment_num
    avg_num_comments = total_num_comments / len(input_list)
    print(f'The average number of comments for the type of posts is {avg_num_comments:.2f}')
    return avg_num_comments

In [8]:
avg_comment_nums(ask_posts,4)

The average number of comments for the type of posts is 14.04


14.044776119402986

In [9]:
avg_comment_nums(show_posts,4)

The average number of comments for the type of posts is 10.32


10.324720068906116

We can see that, on average, Asks Posts receive more comments than Show Posts.

Next, we will determine how the creation time for the Ask Posts may affect the number of comments the post receives. This will be done through the following steps:

1. Calculate the total number of Ask Posts created in each hour of the day.
2. Calculate the total number of comments made on these posts.
3. Calculate the average number of comments the Ask Posts receive, grouped by the creation time.

In [10]:
import datetime as dt
result_list = []
strp_format = '%m/%d/%Y %H:%M'
for post in ask_posts:
    created_at = dt.datetime.strptime(post[6],strp_format)
    comment_num = int(post[4])
    result_list.append([created_at,comment_num])

posts_by_hour = dict()
comments_by_hour = dict()

for row in result_list:
    post_time = row[0].hour
    comment_num = row[1]
    if post_time in posts_by_hour:
        posts_by_hour[post_time] += 1
        comments_by_hour[post_time] += comment_num
    else:
        posts_by_hour[post_time] = 1
        comments_by_hour[post_time] = comment_num

print(posts_by_hour)
#Successfully extracted the numebr of posts and the number of comments grouped by the hours.

{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 108, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 54, 6: 44, 7: 34, 11: 58}


In [11]:
avg_comments_by_hour = []

for hour in posts_by_hour:
    post_num = posts_by_hour[hour]
    comment_num = comments_by_hour[hour]
    avg_comments_by_hour.append([comment_num/post_num,hour])

avg_comments_by_hour.sort(reverse = True)

for row in avg_comments_by_hour:
    hour = dt.datetime.strptime(str(row[1]),'%H')
    hour = hour.strftime('%H:%M')
    print(f'{hour}: {row[0]:.2f} average comments per post.')

15:00: 38.59 average comments per post.
02:00: 23.81 average comments per post.
20:00: 21.52 average comments per post.
16:00: 16.80 average comments per post.
21:00: 16.01 average comments per post.
13:00: 14.74 average comments per post.
10:00: 13.44 average comments per post.
18:00: 13.24 average comments per post.
14:00: 13.23 average comments per post.
17:00: 11.46 average comments per post.
01:00: 11.38 average comments per post.
11:00: 11.05 average comments per post.
19:00: 10.80 average comments per post.
08:00: 10.25 average comments per post.
05:00: 10.09 average comments per post.
12:00: 9.41 average comments per post.
06:00: 9.02 average comments per post.
00:00: 8.13 average comments per post.
23:00: 7.99 average comments per post.
07:00: 7.85 average comments per post.
03:00: 7.80 average comments per post.
04:00: 7.17 average comments per post.
22:00: 6.75 average comments per post.
09:00: 5.58 average comments per post.


The above result above shows that Ask Posts created in the Hour 15:00 receives the highest number of comments on average (38.59 comments per post), followed by 02:00 posts at 23.81 commnets per post and 20:00 posts at 21.52 comments per post.

It has been noted in the [source](https://www.kaggle.com/hacker-news/hacker-news-posts) that the timestamps in the dataset are based on US Eastern Standard Time, which is 15 hours behind the Australian Eastern Standard Time.

In this project, we have analysed the post popularity on Hacker News, measured in average comment counts, for different types of posts and posts made at various times. Based on our findings, Ask Hacker News (Ask HN) Posts have a higher chance of receivng comments. Among the Ask HN Posts, the ones posted at 15:00 EST (6:00 AEST) are most likely to receive comments.