# Hacker News Post Project

***Hacker News is a user-submitted collection of posts that can have hundreds of thousands of visitors and commentors. This project will retrieve and analyze data from the site in order to answer a couple basic questions:***

#### 1.) Do Ask HN or Show HN Posts Receive More Comments?

*(Ask HN is a form of post to gain insight from the community, while Show HN posts aim to inform)*

#### 2.) Does Time of Post Affect Comment Ct?

In [1]:
import csv

file = open('hacker_news.csv')
read_file = csv.reader(file)
hn = list(read_file)

hn[:5]


[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [2]:
headers = hn[0]
hn = hn[1:]

print(headers)
print(hn[:5])


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [3]:
ask_posts, show_posts, other_posts = [],[],[]

for row in hn:
    title = row[1].lower()
    if title.startswith('ask hn') == True:
        ask_posts.append(row)
    elif title.startswith('show hn') == True:
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('Ask Posts: ', "{:,}".format(len(ask_posts)))
print('Show Posts: ', "{:,}".format(len(show_posts)))
print('Other Posts: ', "{:,}".format(len(other_posts)))

Ask Posts:  1,744
Show Posts:  1,162
Other Posts:  17,194


In [4]:
def total_a_avg_com(a_list):
    total_coms = 0
    for row in a_list:
        n_com = int(row[4])
        total_coms = total_coms + n_com
    avg_com = total_coms / len(a_list)
    return total_coms, avg_com

total_ask_comments,avg_ask_comments = total_a_avg_com(ask_posts)
total_show_comments,avg_show_comments = total_a_avg_com(show_posts)
total_other_comments,avg_other_comments = total_a_avg_com(other_posts)

print("\nTotal Ask Comments: ", "{:,}".format(total_ask_comments))
print("Average Ask Comments: ","{0:.2f}".format(avg_ask_comments))  
print("\nTotal Show Comments: ", "{:,}".format(total_show_comments))
print("Average Show Comments: ","{0:.2f}".format(avg_show_comments))  
print("\nTotal Other Comments: ", "{:,}".format(total_other_comments))
print("Average Other Comments: ","{0:.2f}".format(avg_other_comments))  




Total Ask Comments:  24,483
Average Ask Comments:  14.04

Total Show Comments:  11,988
Average Show Comments:  10.32

Total Other Comments:  462,055
Average Other Comments:  26.87


**It appears Ask HN posts generate ~40% more comments on average than Show HN posts**

 Using the Ask HN posts, let's look at the effect of posting time on comment counts

In [9]:
import datetime as dt

result_list = []
for row in ask_posts:
    created_at = row[6]
    count = int(row[4])
    result_list.append([created_at,count])
    
result_list[:1]


[['8/16/2016 9:55', 6]]

In [18]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date_str = row[0]
    date_dt = dt.datetime.strptime(date_str,"%m/%d/%Y %H:%M")
    hour_dt = date_dt.hour
    
    if hour_dt in counts_by_hour:
        counts_by_hour[hour_dt] += 1
        comments_by_hour[hour_dt] += row[1]
    else:
        counts_by_hour[hour_dt] = 1
        comments_by_hour[hour_dt] = row[1]
        
print("\n# of Posts by Hour:  \n" , counts_by_hour)
print("\n# of Comments by Hour \n",comments_by_hour)


# of Posts by Hour:  
 {9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}

# of Comments by Hour 
 {9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


Let's create a list of lists containing the hours during which posts were created and the average number of comments those posts received.

In [45]:
avg_by_hour = []

for item in counts_by_hour:
    avg_by_hour.append([item, float("{0:.2f}".format(comments_by_hour[item]/counts_by_hour[item]))])

def sorted(i):
    return i[0]

avg_by_hour.sort(key=sorted)
print("\nAverage Comment Count per Post by Hour:")
avg_by_hour


Average Comment Count per Post by Hour:


[[0, 8.13],
 [1, 11.38],
 [2, 23.81],
 [3, 7.8],
 [4, 7.17],
 [5, 10.09],
 [6, 9.02],
 [7, 7.85],
 [8, 10.25],
 [9, 5.58],
 [10, 13.44],
 [11, 11.05],
 [12, 9.41],
 [13, 14.74],
 [14, 13.23],
 [15, 38.59],
 [16, 16.8],
 [17, 11.46],
 [18, 13.2],
 [19, 10.8],
 [20, 21.52],
 [21, 16.01],
 [22, 6.75],
 [23, 7.99]]

This is hard to read - let's swap columns and show the top 5 times for post counts

In [58]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])
    
print(swap_avg_by_hour)

swap_avg_by_hour.sort(key=sorted,reverse = True)


[[8.13, 0], [11.38, 1], [23.81, 2], [7.8, 3], [7.17, 4], [10.09, 5], [9.02, 6], [7.85, 7], [10.25, 8], [5.58, 9], [13.44, 10], [11.05, 11], [9.41, 12], [14.74, 13], [13.23, 14], [38.59, 15], [16.8, 16], [11.46, 17], [13.2, 18], [10.8, 19], [21.52, 20], [16.01, 21], [6.75, 22], [7.99, 23]]


In [64]:
sorted_swap_top_5 = swap_avg_by_hour[:5]

import pytz

est = pytz.timezone('US/Eastern')
cst = pytz.timezone('US/Central')

for row in sorted_swap_top_5:
    hour = dt.time(row[1])
    hour_str = hour.strftime("%H:00")
    print(hour_str,"ET: ",row[0]," average comments per post")

15:00 ET:  38.59  average comments per post
02:00 ET:  23.81  average comments per post
20:00 ET:  21.52  average comments per post
16:00 ET:  16.8  average comments per post
21:00 ET:  16.01  average comments per post


### The best times to post for increased comment counts is the above times