# Guided Project - Exploring Hacker News Posts

This project will investigate the most successful posts on Hacker News. We will see whether Ask HN or Show HN posts receive more comments on average. We'll also see if posts created at a certain time receive more comments on average.

Conclusion: The most engaged posts are "Ask HN" posts and receive the most comments around 15:00 - 16:00 ET, 20:00 - 21:00 ET, and 02:00 ET.

In [1]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

headers = hn[:1]
print(headers)
print('\n')
hn = hn[1:]
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


### Extracting Ask HN and Show HN Posts

In [2]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print("# of Ask Posts: " + str(len(ask_posts)))
print("# of Show Posts: " + str(len(show_posts)))
print("# of Other Posts: " + str(len(other_posts)))

print(ask_posts[:3])
print(show_posts[:3])
print(other_posts[:3])

# of Ask Posts: 1744
# of Show Posts: 1162
# of Other Posts: 17194
[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']]
[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']]
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How t

### Clean Up Data

In [3]:
# print(len(ask_posts))

# for row in ask_posts:
#     num_comments = row[4]
#     if type(num_comments) != int:
#         del row
        
# print(len(ask_posts))

### Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [4]:
total_ask_comments = 0

for post in ask_posts:
    num_comments = int(post[4])
    total_ask_comments += num_comments

avg_ask_comments = total_ask_comments / len(ask_posts)
print("Avg Ask Comments: ")
print(avg_ask_comments)

total_show_comments = 0

for post in show_posts:
    num_comments = int(post[4])
    total_show_comments += num_comments

avg_show_comments = total_show_comments / len(show_posts)
print("Avg Show Comments: ")
print(avg_show_comments)

Avg Ask Comments: 
14.038417431192661
Avg Show Comments: 
10.31669535283993


Based on the data above, Ask HN posts on Hacker News tends to get more comments per post on average. This may be due to the interactive nature of prompting a question vs showing a topic.

### Finding the Amount of Ask Posts and Comments by Hour Created

In [5]:
import datetime as dt

result_list = []

for row in ask_posts:
    create_date = row[6]
    num_comments = int(row[4])
    result_list.append([create_date, num_comments])
    
print(result_list[:5])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17]]


In [6]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = row[0]
    date_str = dt.datetime.strptime(date, '%m/%d/%Y %H:%M')
    hour = date_str.strftime("%H")
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

print("# of Posts Each Hour")
print(counts_by_hour)
print('\n')
print("# of Comments for Ask Posts Each Hour")
print(comments_by_hour)


# of Posts Each Hour
{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}


# of Comments for Ask Posts Each Hour
{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}


### Calculating the Average Number of Comments for Ask HN Posts by Hour

In [7]:
avg_by_hour = []

for row in counts_by_hour:
    avg_by_hour.append([row, comments_by_hour[row] / counts_by_hour[row]])
    
print(avg_by_hour)

[['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]


### Sorting and Printing Values from a List of Lists

In [8]:
swap_avg_by_hour = []

for row in avg_by_hour:
    hour = row[0]
    avg_comments = row[1]
    swap_avg_by_hour.append([avg_comments, hour])

print("Swap Avg: ")
print(swap_avg_by_hour)
print("\n")

sorted_swap = sorted(swap_avg_by_hour, reverse=True)
print("Sorted Swap Avg: ")
print(sorted_swap)

Swap Avg: 
[[5.5777777777777775, '09'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [16.796296296296298, '16'], [7.985294117647059, '23'], [9.41095890410959, '12'], [11.46, '17'], [38.5948275862069, '15'], [16.009174311926607, '21'], [21.525, '20'], [23.810344827586206, '02'], [13.20183486238532, '18'], [7.796296296296297, '03'], [10.08695652173913, '05'], [10.8, '19'], [11.383333333333333, '01'], [6.746478873239437, '22'], [10.25, '08'], [7.170212765957447, '04'], [8.127272727272727, '00'], [9.022727272727273, '06'], [7.852941176470588, '07'], [11.051724137931034, '11']]


Sorted Swap Avg: 
[[38.5948275862069, '15'], [23.810344827586206, '02'], [21.525, '20'], [16.796296296296298, '16'], [16.009174311926607, '21'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [13.20183486238532, '18'], [11.46, '17'], [11.383333333333333, '01'], [11.051724137931034, '11'], [10.8, '19'], [10.25, '08'], [10.08695652173913, '05

In [9]:
print("Top 5 Hours for Ask Posts Comments")

for avg in sorted_swap[:5]:
    time = avg[1]
    avg_comments = avg[0]
    print(time + ":00: " + "{:.2f}".format(avg_comments) + " average comments per post")
    

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


I've found that the most successful posts tend to be at 15:00, 2:00, and 20:00, respectively. Based on the top 5 hours, I would break my recommendations down into 3 ranges:

1) 15:00 - 16:00 (Afternoon)
2) 20:00 - 21:00 (Evening)
3) 02:00 (Early Morning)

If you post around these times (ET), you may have a better chance of getting engagement on your Ask HN post

### Possible Next Steps:
- Determine if show or ask posts receive more points on average.
- Determine if posts created at a certain time are more likely to receive more points.
- Compare your results to the average number of comments and points other posts receive.
- Use Dataquest's data science project style guide to format your project.