# Looking for Ways to Get the Most Comments on Hacker News Posts
Hacker News is a site started by the startup [incubator Y](https://www.ycombinator.com) Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

* We're specifically interested in posts with titles that begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Below are a few examples

* Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just something interesting. Below are a few examples

In [3]:
#Importing necessary liabraries
# import pandas as pd

# hn = pd.read_csv("hacker_news.csv")
# hn.head()

from csv import reader
import datetime as dt

opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)
hn[:5]

#Extracting the first row of data to a variable
headers = hn[0]
del hn[0]
hn[:5]


[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

# Extracting Ask HN and Show HN Posts

In [4]:
ask_posts = [] 
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
# Verifying length 
print(len(other_posts) + len(show_posts) + len(ask_posts))
print(len(hn))

20100
20100


# Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [5]:
# Finding total number of comments in ask_post list
total_ask_comments = 0

for post in ask_posts:
    num_comments = post[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)

avg_ask_comments

14.038417431192661

In [6]:
# Finding total number of comments in show_post list
total_show_comments = 0
for post in show_posts:
    num_comments = post[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)

avg_show_comments


10.31669535283993

It is then seen from  the above analysis that the average number of posts for ask post is greater than show post. This shows that according to this analysis, people mostly post on Hacker News to ask questions.

According to this analysis, further analysis will be focused on ask posts since they are likely to receive more comments.

# Finding the Number of Ask Posts and Comments by Hour Created

In [7]:
result_list = [] ## List of lists containinng dates and comments
#Dictionary holding the number of number of `ASK HN` in every
#our of the day.
counts_by_hour = {}  

#Dictionary containing the number of comments during eah 
#hour of the day
comments_by_hour = {}

date_format = "%m/%d/%Y %H:%M"

for post in ask_posts:
    created_at = post[6]
    num_comments = post[4]
    num_comments = int(num_comments)
    result_list.append([created_at,num_comments])
    
for row in result_list:
    date = row[0]
    num_comments = row[1]
    date = dt.datetime.strptime(date,date_format)
    hour = date.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = num_comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += num_comments
        
print(counts_by_hour)

print("--------")

print(comments_by_hour)


{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}
--------
{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}


Now it is time for us to calculate the **average number  of comments for ASk HN Posts by hour**. This will be done by considering the two dictionaries created above, that is, the `counts_by_hour` and `comments_by_hour` dictionaries 

In [18]:
avg_by_hour = []
for key in comments_by_hour:
    avg_by_hour.append([key,comments_by_hour[key] / counts_by_hour[key]])
    
    
avg_by_hour

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

In [16]:
# sort_tup = {}
# for hour, avg in avg_by_hour:
#     sort_tup[avg] = hour

The result sofar obtained, can not really be a right for drawing conclusions because that will be visually difficult since the list of lists is not sorted.

Our next target is to get the `avg_by_hour` list of list sorted and the top five hours with the highest number of comments in order.

In [45]:
swap_avg_by_hour = []
for item in avg_by_hour:
    hour,avg = item
    swap_avg_by_hour.append([avg,hour])

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print(f"first five top hours for highest rating")
sorted_swap[:5]

first five top hours for highest rating


[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21']]

* Let us now try to format the result to be visuaaally more pleasing so that conclusion can easily be drawn

In [60]:
for item in sorted_swap:
    average,hour = item
    date = dt.datetime.strptime(hour,"%H")
    date = date.strftime("%H:%M")
    print("{}: {:.2f} average comments per post".format(date,average))
    

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post
10:00: 13.44 average comments per post
14:00: 13.23 average comments per post
18:00: 13.20 average comments per post
17:00: 11.46 average comments per post
01:00: 11.38 average comments per post
11:00: 11.05 average comments per post
19:00: 10.80 average comments per post
08:00: 10.25 average comments per post
05:00: 10.09 average comments per post
12:00: 9.41 average comments per post
06:00: 9.02 average comments per post
00:00: 8.13 average comments per post
23:00: 7.99 average comments per post
07:00: 7.85 average comments per post
03:00: 7.80 average comments per post
04:00: 7.17 average comments per post
22:00: 6.75 average comments per post
09:00: 5.58 average comments per post


# Conclusion

From our analysis, top hour with the highest number of comments is **15:00(3:00pm)**, followed by **02:00(02:00am)**, then by **20:00(8pm)**.

This results show that the hour ***15:00*** has the highest chance of receiving comments.
