# Hacker News Post Popularity

We are comparing Ask Hacker News and Show Hacker News posts to see a) which of the two receive the most comments and b) what the optimal posting time is to receive comments.

### Set-up

First we open the file to be analyzed and separate headers from the body of information. Please pardon the language!

In [2]:
from csv import reader
openfile = open('hacker_news.csv')
readfile = reader(openfile)
hn = list(readfile)
print(hn[1:6])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [3]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


### Separating Post Types

Next we separate post types into three lists: ask posts, show posts, other posts. We generate a count of posts for each category.

In [16]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):--use lower to prevent mishaps
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print('ask posts:',len(ask_posts))
print('show posts:',len(show_posts))
print('other posts:',len(other_posts))

ask posts: 1744
show posts: 1162
other posts: 17194


### Calculating Average Comments

Now we calculate comments for the two categories (ask and show). First we initalize a comment ccounting variable, which accrues total comment counts via for loop. Then we calculate the average comments per post in each category.

On average, ask posts generate more comments than show posts at 14 vs 10 per post.

In [14]:
total_ask_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments = total_ask_comments + num_comments
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print('Average ask comments: ',avg_ask_comments)

total_show_comments = 0

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments = total_show_comments + num_comments

avg_show_comments = total_show_comments/len(show_posts)
print('Average show comments: ', avg_show_comments)

Average ask comments:  14.038417431192661
Average show comments:  10.31669535283993


### Get Comments per Date for Ask Posts

This is the first step to calculating posts per hour. We append dates with their respective number of comments to a list.

In [20]:
import datetime as dt

result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at,num_comments])

print(result_list[0:10])


[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17], ['9/26/2015 23:23', 1], ['4/22/2016 12:24', 4], ['11/16/2015 9:22', 1], ['2/24/2016 17:57', 1], ['6/4/2016 17:17', 2]]


### Calculate Posts and Comments per Hour

Now that we have split our date strings into a list, we can further sort them by hour into separate dictionaries counting posts by hour and comments by hour.

In [27]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = row[0]
    date = dt.datetime.strptime(date,'%m/%d/%Y %H:%M')
    hour = date.strftime('%H')
    comments = row[1]
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
    if hour in counts_by_hour:
        counts_by_hour[hour]+=1
        comments_by_hour[hour]=comments_by_hour[hour] + comments

### Calculate Average Comments per Hour

Now we calculate average comments per post by hour and append them to their own list.

In [31]:
avg_by_hour = []

for hour in comments_by_hour:
    average = comments_by_hour[hour]/counts_by_hour[hour]
    avg_by_hour.append([hour,average])

print(avg_by_hour)

[['09', 5.586956521739131], ['13', 14.906976744186046], ['10', 13.233333333333333], ['14', 13.13888888888889], ['16', 16.798165137614678], ['23', 7.884057971014493], ['12', 9.337837837837839], ['17', 11.356435643564357], ['15', 38.27350427350427], ['21', 15.9], ['20', 21.28395061728395], ['02', 23.45762711864407], ['18', 13.1], ['03', 7.672727272727273], ['05', 10.48936170212766], ['19', 10.72972972972973], ['01', 11.737704918032787], ['22', 6.680555555555555], ['08', 10.142857142857142], ['04', 7.083333333333333], ['00', 8.160714285714286], ['06', 8.844444444444445], ['07', 7.685714285714286], ['11', 10.898305084745763]]


### Swap Column Positions

We do this with a for loop that appends the swapped columns to a new list.

In [None]:
swap_avg_by_hour = []

for lst in avg_by_hour:
    avg = lst[1]
    hr = lst[0]
    swap_avg_by_hour.append([avg,hr])
    
print(swap_avg_by_hour)


### Review Top 5 Hours to Post

Now we sort our swapped list with averages descending, then restrict this sorted list to the first five rows to review the best hours to post.

In [42]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print("Top 5 Hours for Ask Posts Comments:")

for row in sorted_swap[:5]:
    hour = row[1]
    avg = row[0]
    hour_dt = dt.datetime.strptime(hour,'%H')
    hour_final = hour_dt.strftime('%H')
    string = '{}:{:.2f} average comments per post'
    print(string.format(hour_final,avg))
    

Top 5 Hours for Ask Posts Comments:
15:38.27 average comments per post
02:23.46 average comments per post
20:21.28 average comments per post
16:16.80 average comments per post
21:15.90 average comments per post


### Findings

Ask posts are more popular than share posts at 14 vs 10 average comments each.

Ask posts created at 3pm, 2am, 8pm, 4pm, and 11pm have the greatest chance of receiving comments.