## Posts with max. user engazement in 'Hacker News Posts'.

__Hacker News__ is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. _Hacker News_ is extremely popular in _technology_ and _startup circles_, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

We will work with a dataset of submission to _Hacker News_.

__It includes the following columns:__

>_title_: title of the post (self explanatory)
>
>_url_: the url of the item being linked to
>
>_num_points_: the number of upvotes the post received
>
>_num_comments_: the number of comments the post received
>
>_author_: the name of the account that made the post
>
>_created_at_: the date and time the post was made (the time zone is Eastern Time in the US)

We're specifically interested in posts with titles that begin with either _Ask HN_ or _Show HN_. 

Users submit _Ask HN_ posts to ask the _Hacker News_ community a specific question. Below are a few examples:

>Ask HN: How to improve my personal website?
Ask HN: Am I the only one outraged by Twitter shutting down share counts?
Ask HN: Aby recent changes to CSS that broke mobile?

Likewise, users submit _Show HN_ posts to show the _Hacker News_ community a project, product, or just something interesting. Below are a few examples:

>Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
Show HN: Something pointless I made
Show HN: Shanhu.io, a programming playground powered by e8vm

We'll compare these two types of posts to determine the following:

 - Do Ask HN or Show HN receive more comments on average?
 - Do posts created at a certain time receive more comments on average?


In [10]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [11]:
headers = hn[0]
print(headers)
hn = hn[1:]
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


In [12]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('show hn'):
        show_posts.append(row)
    elif title.lower().startswith('ask hn'):
        ask_posts.append(row)
    else:
        other_posts.append(row)

print("ask_posts no: ", len(ask_posts))
print("show_posts no: ", len(show_posts))
print("other_posts no: ", len(other_posts))



ask_posts no:  1744
show_posts no:  1162
other_posts no:  17194


In [13]:
total_ask_comments = 0
for row in ask_posts:
    total_ask_comments += int(row[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
print("avg_ask_comments : ", avg_ask_comments)
    

avg_ask_comments :  14.038417431192661


In [14]:
total_show_comments = 0
for row in show_posts:
    total_show_comments += int(row[4])
    
avg_show_comments = total_show_comments / len(show_posts)
print("avg_show_comments : ", avg_show_comments)

avg_show_comments :  10.31669535283993


#### The average number of comments for 'ask posts' : 14.03
#### The average number of comments for 'show posts' : 10.31

> #### From above calcualations on the 'Hacker News Posts', we found that 'ask posts' receive more comments on the average than 'show posts'. 


In [15]:
import datetime as dt
result_list = []

for row in ask_posts:
    created_at_str = row[6]
    created_at_dt = dt.datetime.strptime(created_at_str, "%m/%d/%Y %H:%M")
    #print(created_at_dt)
    new_list= []
    new_list.append(created_at_dt) # Adding the Post creation date
    new_list.append(int(row[4])) # Addng the number of comments
    result_list.append(new_list)
    #print(result_list)
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    post_hour = row[0].hour # The first element is creation date
    #print(post_hour)
    if post_hour not in counts_by_hour:
        counts_by_hour[post_hour] = 1
        comments_by_hour[post_hour] = row[1] # Adding the second element, which is the no. of comments received
    else:
        counts_by_hour[post_hour] += 1
        comments_by_hour[post_hour] += row[1] # Adding the second element, which is the no. of comments received
        
print(counts_by_hour, "length : ", len(counts_by_hour))
print("****************************")
print(comments_by_hour, "length : ", len(comments_by_hour))

{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58} length :  24
****************************
{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641} length :  24


In [16]:
avg_by_hour = []
for post_hour in counts_by_hour:
    #print(row)
    total_posts = counts_by_hour[post_hour]
    total_comments = comments_by_hour[post_hour]
    avg_comments_per_post = total_comments / total_posts 
    avg_by_hour.append([post_hour, avg_comments_per_post])
    
print(avg_by_hour)

[[9, 5.5777777777777775], [13, 14.741176470588234], [10, 13.440677966101696], [14, 13.233644859813085], [16, 16.796296296296298], [23, 7.985294117647059], [12, 9.41095890410959], [17, 11.46], [15, 38.5948275862069], [21, 16.009174311926607], [20, 21.525], [2, 23.810344827586206], [18, 13.20183486238532], [3, 7.796296296296297], [5, 10.08695652173913], [19, 10.8], [1, 11.383333333333333], [22, 6.746478873239437], [8, 10.25], [4, 7.170212765957447], [0, 8.127272727272727], [6, 9.022727272727273], [7, 7.852941176470588], [11, 11.051724137931034]]


In [17]:
# Creating a list `swap_avg_by_hour` by reversing the sequence 
# within each row of avg_by_hour,so that average no. of comments
# per hour comes first and the hour of posting comes second. 
swap_avg_by_hour = [] 
for entry in avg_by_hour:
    swap_avg_by_hour.append([entry[1], entry[0]])
print(swap_avg_by_hour)

# Sorting the list `swap_avg_by_hour' in descending order. Since,
# the first column of the list is the avg. no. of columns, sorting
# the list will sort by the average no. of columns.
sorted_swap = sorted(swap_avg_by_hour, reverse = True)
# Printing only the top 5 hours for posting with max. no. of comments.
print("\nTop 10 Hours for 'Ask Posts' Comments:\n")

prnt_format = "{0}: {1:.2f} average comments per post."
for entry in sorted_swap[:10]:
    post_hour = dt.datetime.strptime(str(entry[1]),"%H").strftime("%H:%M") 
    print(prnt_format.format(post_hour, entry[0]))
    

[[5.5777777777777775, 9], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [16.796296296296298, 16], [7.985294117647059, 23], [9.41095890410959, 12], [11.46, 17], [38.5948275862069, 15], [16.009174311926607, 21], [21.525, 20], [23.810344827586206, 2], [13.20183486238532, 18], [7.796296296296297, 3], [10.08695652173913, 5], [10.8, 19], [11.383333333333333, 1], [6.746478873239437, 22], [10.25, 8], [7.170212765957447, 4], [8.127272727272727, 0], [9.022727272727273, 6], [7.852941176470588, 7], [11.051724137931034, 11]]

Top 10 Hours for 'Ask Posts' Comments:

15:00: 38.59 average comments per post.
02:00: 23.81 average comments per post.
20:00: 21.52 average comments per post.
16:00: 16.80 average comments per post.
21:00: 16.01 average comments per post.
13:00: 14.74 average comments per post.
10:00: 13.44 average comments per post.
14:00: 13.23 average comments per post.
18:00: 13.20 average comments per post.
17:00: 11.46 average comments per post.


_The most likely hour to make post to receive maximum number of comments is in the afternoon till evening from 3:00 P.M to 9:00 P.M. Eastern Time US._

__The US Eastern time is nine and half hours behind India Standard Time. So, 5:30 A.M to 11:30 A.M is the best time in India to create posts.__ 