# What Makes a Successful Hacker News Post? 
## The Most Succesful Hacker News Post Engagement Profiles

**Analyzing 12 months of posts across 2016 (265k+ entries)**

The goal of this project is to see which types of post ("Ask HN"/"Show HN"), and which types of day get the most post engagement on Hacker News. We will be using [this](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts?resource=download) dataset numbering over 265k entries.

### Prerequisites

Opening and Sorting Our Data

In [1]:
from csv import reader
openfile = open("hn_posts.csv", encoding="utf8")
readfile = reader(openfile)
hn = list(readfile)
openfile.close()

Seperating The Header Row From The Main Data

In [2]:
header = hn[0]
hn = hn[1:]

In [3]:
print(header)
print("\n")
print(hn[0:3])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']]


### Finding The Total and Average Comment Counts Between "Ask HN" and "Show HN" Posts

Seperating "Ask HN" and "Show HN" Posts In to Their Own Lists

In [4]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [5]:
ask_posts=[]
show_posts=[]
other_posts=[]
for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

Checking List Lengths

In [6]:
print("\"Ask HN\" Posts", len(ask_posts))
print("\"Show HN\" Posts", len(show_posts))
print("\"Other\" Posts", len(other_posts))

"Ask HN" Posts 9139
"Show HN" Posts 10158
"Other" Posts 273822


Finding The Total and Average Comment Counts

In [7]:
#Ask HN
total_ask_comments = 0

for row in ask_posts:
    commentnum = int(row[4])
    total_ask_comments += commentnum
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print("Total \"Ask HN\" Comments", (total_ask_comments))
print("Average \"Ask HN\" Comments", avg_ask_comments)

print("\n")

#Show HN
total_show_comments = 0

for row in show_posts:
    commentnum = int(row[4])
    total_show_comments += commentnum

avg_show_comments = total_show_comments / len(show_posts)
print("Total \"Show HN\" Comments", (total_show_comments))
print("Average \"Show HN\" Comments", avg_show_comments)


Total "Ask HN" Comments 94986
Average "Ask HN" Comments 10.393478498741656


Total "Show HN" Comments 49633
Average "Show HN" Comments 4.886099625910612


We see above that the "Ask HN" category has nearly double the total comments, and more than double the average comments. With this in mind, We'll focus on only the "Ask HN" category, and look in to the prime time(s) of day to post.

### Finding The Most Succesful Times Of Day
**(Focusing Only On "Ask HN" Posts)**

Seperating The Time and Comment Count Categories Into Their Own List

In [8]:
import datetime as dt
result_list = []

for row in ask_posts:
    time_posted = row[6]
    comment_num = int(row[4])
    result_list.append([time_posted, comment_num])
    
print(result_list[0:3])

[['9/26/2016 2:53', 7], ['9/26/2016 1:17', 3], ['9/25/2016 22:57', 0]]


Sorting Each Hour Into a Dictionary and Finding Their Total Comment Counts

In [9]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    time_dt = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = time_dt.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]     

Total Post Counts Across Each Hour

In [10]:
print(counts_by_hour) #not sorted

{'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}


Total Comment Counts Across Each Hour

In [14]:
print(comments_by_hour) #not sorted

{'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '23': 2297, '20': 4462, '16': 4466, '08': 2362, '00': 2277, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


Calculating The Average Comment Counts of Each Hour and Ranking

In [12]:
avg_by_hour = []

for hour in comments_by_hour:
    avg_by_hour.append([comments_by_hour[hour]/counts_by_hour[hour], hour])
    
sorted_data = sorted(avg_by_hour, reverse = True)

In [13]:
print("Top 5 Hacker News Posting Hours (EST): ")

for avg,h in sorted_data[:5]:
    time = dt.datetime.strptime(h,"%H").strftime("%H:%M")
    print(time, "{:.2f} avg comments".format(avg))

Top 5 Hacker News Posting Hours (EST): 
15:00 28.68 avg comments
13:00 16.32 avg comments
12:00 12.38 avg comments
02:00 11.14 avg comments
10:00 10.68 avg comments


With our final ranking, we can see that noon (12:00 - 15:00 or 12 PM - 3 PM) is the most popular time to post, with 2:00 and 10:00 as the runnerups in 4th and 5th.

## Summary and Conclusion


In this project we found that the "Ask HN" category has nearly double the average comment count with 94,986 comments compared to the 49,633 total comments in the "Show HN" category, and also has 10.31 average comments, over double than that of the "Show HN" category with 4.88 average comments, meaning that the "Ask HN" category is significantly more successful than "Show HN" on Hacker News.

In this project we also found that the most popular hours of the day to post on Hacker News are 15:00 (3 PM) with 28.68 average  comments, 13:00 (1 PM) with 16.32 average comments, 12:00 (12 PM) with 12.38 average comments, 2:00 (2 AM) with 11.14 average comments, and 10:00 (10 AM) with 10.68 average comments.

This leads to our conclusion, the most popular type of post on Hacker News is an "Ask HN" post, posted between 12:00 and 15:00. Posts created at 2:00 are also quite successful, this could be the personal preference waking time of Hacker News users, but it's most likely just a regional time difference.