# Exploring Hacker News Posts
For this project, we'll compare two different types of posts from Hacker News. Hacker News is website where technology related stories are voted and commented on. The two types of posts we'll analysis begin with Ask HN or Show HN.

We're working with approximately 20,000 rows from this dataset. Posts that did not receive any comments were removed.

In [3]:
import csv

open_file = open('hacker_news.csv')
hn = list(csv.reader(open_file))
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [4]:
#Removing headers
headers = hn[0]
hn = hn[1:]
print(headers[:5])

['id', 'title', 'url', 'num_points', 'num_comments']


We'll need to identify posts that begin with either Ask HN or Show HN and separate them into different lists:

In [5]:
ask_posts = []
show_posts = []
other_posts = []

for i in hn:
    title = i[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(i)
    elif title.lower().startswith('show hn'):
        show_posts.append(i)
    else:
        other_posts.append(i)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


In [7]:
#Calculate the average number of comments 'Ask HN' posts receive.
total_ask_comments = 0

for i in ask_posts:
    total_ask_comments += int(i[4])
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

14.038417431192661


In [8]:
#Same process for 'Show HN'
total_show_comments = 0

for i in show_posts:
    total_show_comments += int(i[4])
    
avg_ask_comments = total_show_comments / len(show_posts)
print(avg_ask_comments)

10.31669535283993


On average (with this sample), ask posts receive approximately 14 comments, while show posts receive approximately 10. Since ask posts are more likely to receive comments, we'll focus our analysis on these posts.

In [11]:
#Calculate amount of ask posts created during each hour of day and the number of comments received
import datetime as dt

result_list = []
for i in ask_posts:
    result_list.append([i[6], int(i[4])])
    
counts_by_hour = {}
comments_by_hour = {}
date_format = '%m/%d/%Y %H:%M'

for i in result_list:
    date = i[0]
    comment = i[1]
    time = dt.datetime.strptime(date, date_format).strftime("%H") #just select hour
    if time in counts_by_hour:
        comments_by_hour[time] += comment
        counts_by_hour[time] += 1
    else:
        comments_by_hour[time] = comment
        counts_by_hour[time] = 1

comments_by_hour
counts_by_hour

{'00': 55,
 '01': 60,
 '02': 58,
 '03': 54,
 '04': 47,
 '05': 46,
 '06': 44,
 '07': 34,
 '08': 48,
 '09': 45,
 '10': 59,
 '11': 58,
 '12': 73,
 '13': 85,
 '14': 107,
 '15': 116,
 '16': 108,
 '17': 100,
 '18': 109,
 '19': 110,
 '20': 80,
 '21': 109,
 '22': 71,
 '23': 68}

In [14]:
#Calculate average amount of comments 'Ask HN' posts created at each hour of the day

avg_by_hour = []

for i in comments_by_hour:
    avg_by_hour.append([i, comments_by_hour[i] / counts_by_hour[i]])

avg_by_hour #read as: first element is hour, second is comments per post

[['06', 9.022727272727273],
 ['03', 7.796296296296297],
 ['09', 5.5777777777777775],
 ['00', 8.127272727272727],
 ['08', 10.25],
 ['22', 6.746478873239437],
 ['17', 11.46],
 ['19', 10.8],
 ['04', 7.170212765957447],
 ['13', 14.741176470588234],
 ['05', 10.08695652173913],
 ['12', 9.41095890410959],
 ['01', 11.383333333333333],
 ['14', 13.233644859813085],
 ['02', 23.810344827586206],
 ['10', 13.440677966101696],
 ['21', 16.009174311926607],
 ['15', 38.5948275862069],
 ['11', 11.051724137931034],
 ['07', 7.852941176470588],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['18', 13.20183486238532],
 ['20', 21.525]]

In [15]:
swap_avg_by_hour = []

for i in avg_by_hour:
    swap_avg_by_hour.append([i[1], i[0]])
    
print(swap_avg_by_hour)

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

sorted_swap

[[9.022727272727273, '06'], [7.796296296296297, '03'], [5.5777777777777775, '09'], [8.127272727272727, '00'], [10.25, '08'], [6.746478873239437, '22'], [11.46, '17'], [10.8, '19'], [7.170212765957447, '04'], [14.741176470588234, '13'], [10.08695652173913, '05'], [9.41095890410959, '12'], [11.383333333333333, '01'], [13.233644859813085, '14'], [23.810344827586206, '02'], [13.440677966101696, '10'], [16.009174311926607, '21'], [38.5948275862069, '15'], [11.051724137931034, '11'], [7.852941176470588, '07'], [16.796296296296298, '16'], [7.985294117647059, '23'], [13.20183486238532, '18'], [21.525, '20']]


[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

In [16]:
#Sort values and print the 5 hours with the highest average comments

print("Top 5 Hours for 'Ask HN' Comments")
for avg, hr in sorted_swap[:5]:
    print(
        "{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg
        )
    )

Top 5 Hours for 'Ask HN' Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


# Conclusion
In this project, we analyzed ask posts and show posts to determine which type of post and time receive the most comments on average. Based on this analysis, to maximize the amount of comments a post receives, it is recommended to post as an 'ask post' and submit it between 15:00 and 16:00 (3:00 pm est - 4:00 pm est).