# Hacker News: Ask HN v. Show HN

This is a review of Hacker News data to determine if Ask HN or Show HN receives more comments. it also looks at the idea of creating a post at a certain time will ensure more comments than average.


In [4]:
from csv import reader

In [52]:
# open, read, and convert the CSV file into a list
open_file = open('hacker_news.csv')
hn = list(reader(open_file))

The Hacker News data contains header information and for easy of consumption, the headers are recorded in the headers list and removed from the rest of the data. The data is split further into three buckets: the ask, show, and other. The ask and show are the two in particular focus in this analysis, the other everythig that is outside the scope of this analysis.

In [6]:
# split headers into different list
headers = hn[:1]
hn = hn[1:]
print(headers)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


## Extracting Ask HN and Show HN Posts

In [7]:
# these lists will be used to bucket the data into managable lists
ask_posts = []
show_posts = []
other_posts = []

In [8]:
# divide the data, list of lists, into one of three buckets: ask, show, other
for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


## Calculate the Average Number of Comments

In [9]:
total_ask_comments = 0
for post in ask_posts:
    num_comments = int(post[4])
    total_ask_comments += num_comments

avg_ask_comments = total_ask_comments/len(ask_posts)

print(avg_ask_comments)

14.038417431192661


In [10]:
total_show_comments = 0
for post in show_posts:
    num_comments = int(post[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments/len(show_posts)

print(avg_show_comments)

10.31669535283993


The data indicates that the ask posts show more comments, on average, than the show posts.

## Finding the Number of Ask Comments by Hour

In [19]:
# import datetime to isolate hour of the day for posts
import datetime as dt

In [20]:
# make a list holding the datetime of the post and number of comments
result_list = []
for post in ask_posts:
    created = post[6]
    comments = int(post[4])
    result_list.append([created, comments])

In [21]:
# dictionsaries are easier to use for sorting and reviewing
counts_by_hour = {}
comments_by_hour = {}

In [22]:
# dictionary keys are the hour in which the user created the post
# dictionary values are total number of comments the post received
for result in result_list:
    created_dt = dt.datetime.strptime(result[0], '%m/%d/%Y %H:%M')
    created_hour = created_dt.hour
    if created_hour not in counts_by_hour:
        counts_by_hour[created_hour] = 1
        comments_by_hour[created_hour] = int(result[1])
    else:
        counts_by_hour[created_hour] += 1
        comments_by_hour[created_hour] += int(result[1])
    

## Calculate the Average Number of Comments by Hour

In [24]:
# make a list holding the hour and the average number of comments
avg_by_hour = []
for hr in counts_by_hour:
    avg_by_hour.append([hr, (comments_by_hour[hr]/counts_by_hour[hr])])

In [48]:
# reverse the list so the sorted() function will sort based off the avg num of comments
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

## Printing Sorted Values from a List of Lists

In [51]:
# loop through the list of lists and identify the top 5 times a post will on average see the most comments
# convert from EST to MST, because I live in MST
print("Top 5 Hours for Ask Posts Comments.")
for avg, hr in sorted_swap[:5]:
    print(
        '{}: {:.2f} average comments per post.'.format(
            dt.datetime.strptime(str(hr - 2), "%H").strftime("%H:%M"), avg
            )
        )

Top 5 Hours for Ask Posts Comments.
13:00: 38.59 average comments per post.
00:00: 23.81 average comments per post.
18:00: 21.52 average comments per post.
14:00: 16.80 average comments per post.
19:00: 16.01 average comments per post.


## Conclusion

The above time is set in Mountain Standard time, which is two hours behind Eastern time. The data shows that on average if someone were to create a post at 1:00 in the afternoon the post would receive more comments than usual. The posts, that receive comments, receive roughtly 60% more comments than other times, on average.

On a broader note, the times seem to be 1-2 pm, 6-7 pm, or midnight. Almost like it matches up with after lunches, after dinners, or late nighters. Also, it should be noted that these are the only the posts that received comments. 