# Introduction
1. We import our data 
2. Remove our headers from list
Hacker News is where clients can submit posts that are decided on and remarked upon by different clients. I will concentrate on two sorts of posts from Hacker News: Ask HN and Show HN. Ask HN are posts that ask the Hacker News people group explicit inquiries while Show HN entries show client tasks to the network. The top Hacker News posts which have the most remarks can get a huge number of guests to their locales. In this undertaking, I will break down an informational collection (https://www.kaggle.com/programmer news/programmer news-posts) from Kaggle to decide if Ask HN or Show HN posts get more remarks overall. The informational collection has been diminished from ~3000,000 lines to ~20,000 pushes by expelling all entries which didn't get any remarks.

In [32]:
from csv import reader
open_file = open('hacker_news.csv')
read_file = reader(open_file)
hn = list(read_file)
hn_header= hn[0]
hn = hn[1:]
print(hn_header)
print(hn[:5])


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


# Removing Ask HN and Show HN Posts
1.Since we're just worried about post titles starting with Ask HN or Show HN, we'll make new arrangements of records containing only the information for those titles.So we sepertae our data for better analysis.


In [47]:
ask_posts = []
show_posts = []
other_posts = []

for post in hn:
    
    title = post[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(post)
       
    elif title.lower().startswith("show hn"):
        show_posts.append(post)
        
    else:
        other_posts.append(post)
        
print('ask posts:',         len(ask_posts))
print('show posts:',       len(show_posts))
print('other posts:',     len(other_posts))


ask posts: 1744
show posts: 1162
other posts: 17194


# Calculating the average number of posts for ask posts and show posts.
Let's find the average number of comments existing in ask_posts and show_posts and see which of the two type of posts have the largest average number of comments.

In [54]:
total_ask_comments = 0
# Finding average number of comments in ask_posts

for item in ask_posts:
    total_ask_comments += int(item[4])
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print('ask comments:',  avg_ask_comments)
print('\n')

# Finding average number of comments in show_posts
total_show_comments = 0
for item in show_posts:
    total_show_comments += int(item[4])
    
avg_show_comments = total_show_comments/len(show_posts)
print('show comments:',   avg_show_comments)

ask comments: 14.038417431192661


show comments: 10.31669535283993


From the above results we can see that show posts receive less comments on average than ask posts.Posts whose title starts with 'Ask comments' have more normal remarks than posts whose title starts with  'Show comments'.This end bodes well since posts who are provoking a talk would be bound to get remarks than presents whose objective is in plain view client work.



# Calculating the amount of ask posts and total number of comments created per hour


In [56]:
import datetime as dt

result_list = []
counts_by_hour = {}
comments_by_hour = {}

# Created_at column is in elemdent of index 6
# and number of comments is in column with index 4
for posts in ask_posts:
    result_list.append([posts[6], int(posts[4])])
                        
for row in result_list:
    row[0] = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    if row[0].hour not in counts_by_hour:
        counts_by_hour[row[0].hour] = 1
        comments_by_hour[row[0].hour] = row[1]
    else:
        counts_by_hour[row[0].hour] += 1
        comments_by_hour[row[0].hour] += row[1]
print("Number of ask HN posts created by hour ->", counts_by_hour)
print("\n")
print("Number of ask HN comments created by hour ->", comments_by_hour)

Number of ask HN posts created by hour -> {9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}


Number of ask HN comments created by hour -> {9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


# Calculating the average number of comments for posts created during each hour of the day


In [66]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(swap_avg_by_hour)


[[5.5777777777777775, 9], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [16.796296296296298, 16], [7.985294117647059, 23], [9.41095890410959, 12], [11.46, 17], [38.5948275862069, 15], [16.009174311926607, 21], [21.525, 20], [23.810344827586206, 2], [13.20183486238532, 18], [7.796296296296297, 3], [10.08695652173913, 5], [10.8, 19], [11.383333333333333, 1], [6.746478873239437, 22], [10.25, 8], [7.170212765957447, 4], [8.127272727272727, 0], [9.022727272727273, 6], [7.852941176470588, 7], [11.051724137931034, 11]]


In [67]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print(sorted_swap)

[[38.5948275862069, 15], [23.810344827586206, 2], [21.525, 20], [16.796296296296298, 16], [16.009174311926607, 21], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [13.20183486238532, 18], [11.46, 17], [11.383333333333333, 1], [11.051724137931034, 11], [10.8, 19], [10.25, 8], [10.08695652173913, 5], [9.41095890410959, 12], [9.022727272727273, 6], [8.127272727272727, 0], [7.985294117647059, 23], [7.852941176470588, 7], [7.796296296296297, 3], [7.170212765957447, 4], [6.746478873239437, 22], [5.5777777777777775, 9]]


# Printing the five highest average number of comments for posts created during each hour of the day

In [74]:
# Sort the values and print the the 5 hours with the highest average comments.

print("5 highest average numbers of comments")

for avg, hr in sorted_swap[:5]:
    print( "{}: {:.2f} average comments per post".format(dt.datetime.strptime(str(hr), "%H").strftime("%H:%M"),avg ))

5 highest average number of comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


# Conclusion

Prior to this undertaking, it was built up that Hacker News posts with the Ask HN posts have increasingly normal remarks than the Post Hn posts. In this manner, a client is bound to get their post to the highest point of the Hacker News posting on the off chance that they make Ask HN post types. In the wake of examining the hour of each posting and averaging the number of remarks every hour, the information proposes that the main 5 hours for Ask Posts to get remarks are: 15, 2, 20, 16, and 21 UTC.
From the results above we see that ask HN posts created at 3PM ET/10:00 PM UTC+3 have a higher chance of receiving comments.