# Hacker News - Ask vs Show
This notebook runs through the HN public data and provides analyse on whether ASK HN or SHOW HK posts receive more responses.

In [11]:
from csv import reader
import datetime as dt

In [12]:
file = open('hacker_news.csv')
filereader = reader(file)
hn = list(filereader)

In [13]:
headers = hn[0]
hn = hn[1:]
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [14]:
print(hn[:4])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [15]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    if row[1].lower().startswith('ask hn'):
        ask_posts.append(row)
    elif row[1].lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('ask_posts: ',len(ask_posts), ' show_posts: ', len(show_posts), ' other_posts: ', len(other_posts))


ask_posts:  1744  show_posts:  1162  other_posts:  17194


In [16]:
total_ask_comments = 0

for q in ask_posts:
    total_ask_comments += int(q[4])
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

total_show_comments = 0
for s in show_posts:
    total_show_comments += int(s[4])
    
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)


14.038417431192661
10.31669535283993


### Analysis Thought
It would appear that Ask HN receives more interactions with the viewer base then Show HN does.
This is ontop of the fact there are a larger number of Ask HN posts compared to Show HN posts.

# Which hour was content created for the highest comment posts?
This next section produces a frequency table of the hour content was created on Ask HN posts. This shows us the best time to post your content to rapidly get maximum reponses as fast as possible and move up in the "hot lists" etc.

In [17]:
result_list = []
for post in ask_posts:
    result_list.append([int(post[4]), post[6]])

In [49]:
counts_by_hour, comments_by_hour = {},{}
dtformat = '%m/%d/%Y %H:%M'

for r in result_list:
    d = dt.datetime.strptime(r[1], dtformat).hour
    
    if d in counts_by_hour:
        counts_by_hour[d] += 1
    else:
        counts_by_hour[d] = 1
        
    if d in comments_by_hour:
        comments_by_hour[d] += r[0]
    else:
        comments_by_hour[d] = r[0]

Now we are looking for the average number of comments per post, at each hour of the day

In [62]:
avg_by_hour = []

for h in counts_by_hour:
    print('Hour', h, ': ', comments_by_hour[h], ' / ', counts_by_hour[h], ' = ', comments_by_hour[h] / counts_by_hour[h])
    avg_by_hour.append([h, comments_by_hour[h] / counts_by_hour[h]])    

Hour 0 :  447  /  55  =  8.127272727272727
Hour 1 :  683  /  60  =  11.383333333333333
Hour 2 :  1381  /  58  =  23.810344827586206
Hour 3 :  421  /  54  =  7.796296296296297
Hour 4 :  337  /  47  =  7.170212765957447
Hour 5 :  464  /  46  =  10.08695652173913
Hour 6 :  397  /  44  =  9.022727272727273
Hour 7 :  267  /  34  =  7.852941176470588
Hour 8 :  492  /  48  =  10.25
Hour 9 :  251  /  45  =  5.5777777777777775
Hour 10 :  793  /  59  =  13.440677966101696
Hour 11 :  641  /  58  =  11.051724137931034
Hour 12 :  687  /  73  =  9.41095890410959
Hour 13 :  1253  /  85  =  14.741176470588234
Hour 14 :  1416  /  107  =  13.233644859813085
Hour 15 :  4477  /  116  =  38.5948275862069
Hour 16 :  1814  /  108  =  16.796296296296298
Hour 17 :  1146  /  100  =  11.46
Hour 18 :  1439  /  109  =  13.20183486238532
Hour 19 :  1188  /  110  =  10.8
Hour 20 :  1722  /  80  =  21.525
Hour 21 :  1745  /  109  =  16.009174311926607
Hour 22 :  479  /  71  =  6.746478873239437
Hour 23 :  543  /  68 

In [63]:
swap_avg_by_hour = []

for hour in avg_by_hour:
    swap_avg_by_hour.append([hour[1],hour[0]])

In [66]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

In [78]:
print("Top 5 Hours for Ask Posts Comments")
for p in sorted_swap[:5]:
    template = '{h}: {avg:.2f} average comments per post'
    print(template.format(h = dt.datetime.strptime(str(p[1]), "%H").strftime("%H:%M"), avg = p[0]))

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
