Exploring Hacker News Posts

On Hacker News users submit Ask HN posts to ask the "Hacker News" community a specific question and "Show HN" posts to show a project, product, or just something interesting.

We'll compare these two types of posts to determine the following:

Do Ask HN or Show HN receive more comments on average?
Do posts created at a certain time receive more comments on average?

In [58]:
import csv

with open('hacker_news.csv', 'r') as file:
    reader = csv.reader(file)
    hn = list(reader)
headers = hn[0]
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [59]:
del hn[0]
    
print(hn[:5])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


Number of posts by type

In [60]:
ask_posts = list()
show_posts = list()
other_posts = list()



for post in hn:
    title = post[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(post)
        """ print(title, 'ask') """
    elif title.lower().startswith('show hn'):
        show_posts.append(post)
        """ print(title, 'show') """
    else:
        other_posts.append(post)
        """ print(title, 'other') """

print("Number of Ask HN posts:", len(ask_posts))
print("Number of Show HN posts:", len(show_posts))
print("Number of other posts:", len(other_posts))

Number of Ask HN posts: 1744
Number of Show HN posts: 1162
Number of other posts: 17194


Type of post with most comments on average

In [61]:
total_ask_comments = 0
total_show_comments = 0

for askcomm in ask_posts:
    comments: int = int(askcomm[4])
    total_ask_comments += comments

avg_ask_comments = total_ask_comments / len(ask_posts)

for showcomm in show_posts:
    comments: int = int(showcomm[4])
    total_show_comments += comments

avg_show_comments = total_show_comments / len(show_posts)

most_commented = ''

if avg_ask_comments != avg_show_comments:
    if avg_ask_comments > avg_show_comments:
        most_commented = 'Ask'
    elif avg_show_comments > avg_ask_comments:
        most_commented = 'Show'

comment = ''
if most_commented:
    comment = 'On average, ' + most_commented + ' posts have more comments.'
else:
    comment = 'On average, Show and Ask posts have the same number of comments.'

print(comment)


On average, Ask posts have more comments.


Number of posts and number of comments by hour

In [62]:
'''importing the datetime library'''
try:
    from _datetime import *
    from _datetime import __doc__
except ImportError:
    from _pydatetime import *
    from _pydatetime import __doc__

__all__ = ("date", "datetime", "time", "timedelta", "timezone", "tzinfo",
           "MINYEAR", "MAXYEAR", "UTC")

In [63]:
from zoneinfo import ZoneInfo

result_list = list()

for post in ask_posts:
    result_list.append([post[6], post[4]])

counts_by_hour = dict()
comments_by_hour = dict()

source_tz = ZoneInfo('US/Eastern')  
target_tz = ZoneInfo('Europe/Rome')

for res in result_list:
    naive_dt = datetime.strptime(res[0], '%m/%d/%Y %H:%M')
    date_object = naive_dt.replace(tzinfo=source_tz)
    converted_dt = date_object.astimezone(target_tz)
    hour = converted_dt.hour

    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = res[1]
    else: 
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += res[1]

counts_by_hour_sorted = dict(sorted(counts_by_hour.items(), key=lambda x: x[0]))
comments_by_hour_sorted = dict(sorted(comments_by_hour.items(), key=lambda x: x[0]))

print(counts_by_hour_sorted)
print(comments_by_hour_sorted)

{0: 108, 1: 108, 2: 82, 3: 113, 4: 63, 5: 72, 6: 53, 7: 66, 8: 54, 9: 52, 10: 47, 11: 44, 12: 45, 13: 33, 14: 50, 15: 44, 16: 57, 17: 66, 18: 75, 19: 77, 20: 113, 21: 118, 22: 100, 23: 104}
{0: '235615661722041111328332210116141733121119667223134812117797327310116136111134738343312711121327518722422323301071641122121921327225912304199', 1: '311527623433131851321525532142446181531229172223106434124172733537525354111143437215427122211224124216182365111351151726131111812', 2: '2374212224111183813264226406737151446712424034774242132237411723358331912839521063122168221315532869', 3: '443208312571211191713321010119411062162321612134126232520217211951517241314711313410292221754432121610633150322818125115371110745433014238118', 4: '2194931311672439202361213121655246721349131655236316291222191619162229341052', 5: '14521712215218291511271118765791438665631411329494241044614614562651333351202151111282', 6: '1013611244332289139334211121414431043381463531012142724321295215', 7: '3223341241543313192

Average number of comments per posts created during each hour of the day

In [64]:
avg_by_hour = list()

for hour in counts_by_hour_sorted:
    avg_comments = int(comments_by_hour_sorted[hour]) / counts_by_hour_sorted[hour]
    avg_by_hour.append([hour, avg_comments])

for row in avg_by_hour:
    print('At ', row[0], ' people commented an average of ', row[1], ' times.')

At  0  people commented an average of  2.1816264974263065e+138  times.
At  1  people commented an average of  2.884515031788258e+126  times.
At  2  people commented an average of  2.8953807611111997e+97  times.
At  3  people commented an average of  3.9221974563824e+138  times.
At  4  people commented an average of  3.484017955035618e+73  times.
At  5  people commented an average of  2.016904474335874e+83  times.
At  6  people commented an average of  1.9124740459099795e+61  times.
At  7  people commented an average of  4.883850365974717e+85  times.
At  8  people commented an average of  5.687256372429336e+66  times.
At  9  people commented an average of  2.3559965676329293e+62  times.
At  10  people commented an average of  7.179002662166407e+53  times.
At  11  people commented an average of  6.64114843414173e+53  times.
At  12  people commented an average of  2.497581500493558e+52  times.
At  13  people commented an average of  7.006776098581887e+38  times.
At  14  people commented a

Sorting values by hours with the highest average of commments

In [65]:
swap_avg_by_hour = list()

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, key=lambda x: x[0], reverse = True)
print(sorted_swap)

[[1.4170781085805876e+159, 21], [2.857730192231992e+143, 20], [3.9221974563824e+138, 3], [2.1816264974263065e+138, 0], [1.2241459827098222e+127, 23], [2.884515031788258e+126, 1], [1.7719414023109172e+121, 22], [2.8953807611111997e+97, 2], [3.789262883328773e+93, 19], [5.534900711443899e+85, 18], [4.883850365974717e+85, 7], [2.016904474335874e+83, 5], [1.0951674867070504e+79, 17], [3.484017955035618e+73, 4], [2.301703721440942e+72, 16], [5.687256372429336e+66, 8], [2.3559965676329293e+62, 9], [1.9124740459099795e+61, 6], [1.0261668458232248e+57, 14], [7.179002662166407e+53, 10], [6.64114843414173e+53, 11], [2.497581500493558e+52, 12], [1.3990937327050943e+48, 15], [7.006776098581887e+38, 13]]


Top 5 Hours for Ask Posts Comments

In [66]:
big5 = list()

for i in range(5):
    time = datetime.strptime(str(sorted_swap[i][1]), '%H').strftime('%H:%M')
    avg = f"{sorted_swap[i][0]:.2f}"
    big5.append([time, avg])
for big in big5:
    print(big[0], big[1], ' average comments per post.')

21:00 1417078108580587559331530285601711789305210008340469493440947513285722557243569572081472639376278577604942602338673396105190074063888741559898159866500383506432.00  average comments per post.
20:00 285773019223199216212462136407634615093392036717772879101300308554594451925356879859397557538447687611376123947012367031732089941252331343446016.00  average comments per post.
03:00 3922197456382399818927584308708211153819729838525062176033039584422287631539354769978997561981457666551429685825476016391094317340761784320.00  average comments per post.
00:00 2181626497426306517684847801335785275057143782629969853391748526577949275464865044954255795325704469747090230886151336501215097007969927168.00  average comments per post.
23:00 12241459827098221858677546334088342641118661528658632186569104565696526417018813635319639353227830446082295937753131695957082112.00  average comments per post.


During which hours should you create a post to have a higher chance of receiving comments?

In [67]:
print('In descendant order, in order to receive a higher number of comments, the best time possible to post on Hacker News are: ')

for big in big5:
    print ('- ', big[0])

In descendant order, in order to receive a higher number of comments, the best time possible to post on Hacker News are: 
-  21:00
-  20:00
-  03:00
-  00:00
-  23:00
