# Hacker News
## Ask HN vs Show HN

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

We're going to determine which hour of the day the author should create the post so it yields the most comments.

You can find the data set [here](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts).

We start with importing our dataset.

In [119]:
from csv import reader
opened_file = open("datasets/HackerNews.csv")
read_file = reader(opened_file)
dataset = list(read_file)

These are the first 5 rows. 
`row[0]` is the header which we'll extract into it's own variable.

In [120]:
header = dataset.pop(0)

for row in dataset[:5]:
    
    print(row)
    print("---")

['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']
---
['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']
---
['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']
---
['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']
---
['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']
---


Our dataset contains these headers.

* <span style='color:red'>id</span>: the unique identifier from Hacker News for the post
* <span style='color:red'>title</span>: the title of the post
* <span style='color:red'>url</span>: the URL that the posts links to, if the post has a URL
* <span style='color:red'>num_points</span>: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
* <span style='color:red'>num_comments</span>: the number of comments on the post
* <span style='color:red'>author</span>: the username of the person who submitted the post
* <span style='color:red'>created_at</span>: the date and time of the post's submission

In [121]:
print(header)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


Now that we've removed the headers from the dataset, we're ready to filter our data. Since we're only concerned with post titles beginning with "Ask HN" or "Show HN", we'll create new lists of lists containing just the data for those titles.

In [122]:
ask_posts = []
show_posts = []
other_posts = []

for row in dataset:
    
    title = row[1].lower()
    
    if title.startswith("ask hn"):
        
        ask_posts.append(row)
    elif title.startswith("show hn"):
        
        show_posts.append(row)
    else:
        other_posts.append(row)
        
        
#print the number of posts
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

9139
10158
273822


Examples of ask posts

In [123]:
print(ask_posts[:5])

[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'], ['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48'], ['12577647', 'Ask HN: Someone uses stock trading as passive income?', '', '5', '2', '00taffe', '9/25/2016 21:50']]


Examples of show posts

In [124]:
print(show_posts[:5])

[['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'], ['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'], ['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25/2016 23:44'], ['12577991', 'Show HN: Pomodoro-centric, heirarchical project management with ES6 modules', 'https://github.com/jakebian/zeal', '2', '0', 'dbranes', '9/25/2016 23:17'], ['12577142', 'Show HN: Jumble  Essays on the go #PaulInYourPocket', 'https://itunes.apple.com/us/app/jumble-find-startup-essay/id1150939197?ls=1&mt=8', '1', '1', 'ryderj', '9/25/2016 20:06']]


Examples of other posts

In [125]:
print(other_posts[:5])

[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']]


## Number of comments for ask vs show
Let's determine if ask posts or show posts receive more comments on average.

In [126]:
total_ask_comments = 0

for row in ask_posts:
    
    num_comments = int(row[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print("Ask HN post get {:.2f} comments on average".format(avg_ask_comments))

Ask HN post get 10.39 comments on average


In [127]:
total_show_comments = 0

for row in show_posts:
    
    num_comments = int(row[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)
print("Show HN post get {:.2f} comments on average".format(avg_show_comments))

Show HN post get 4.89 comments on average


On average we see more interaction from the userbase on posts that start with *"Ask HN"*. This indicates that the author plans to ask a question to the communicaty and expects the users to reply with a comment. This could be the reason *"Ask HN"* posts get more comments (**10.39** on average) compared to *"Show HN posts"* (**4.89** on average)

## Finding the Number of Ask Posts and Comments by Hour Created

We've concluded that, on average, ask posts receive more comments than show posts. Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts.

Let's determine if ask posts created at a certain time are more likely to attract comments. We will

1. Calculate the number of ask posts created in each hour of the day, along with the number of comments received.
2. Calculate the average number of comments ask posts receive by hour created.

In [128]:
from datetime import datetime as datetime_module

result_list = []

for row in ask_posts:
    
    created_at = datetime_module.strptime(row[6], "%m/%d/%Y %H:%M")
    comments_count = int(row[4])
    item = [created_at, comments_count]
    result_list.append(item)

In [129]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    
    hour = datetime_module.strftime(row[0], "%H")
    
    if hour not in counts_by_hour:
        
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

* **counts_by_hour**: contains the number of ask posts created during each hour of the day.
* **comments_by_hour**: contains the corresponding number of comments ask posts created at each hour received.

In [135]:
for i in sorted(counts_by_hour):
    print("hour: {} - posts: {}".format(i, counts_by_hour[i]))

hour: 00 - posts: 301
hour: 01 - posts: 282
hour: 02 - posts: 269
hour: 03 - posts: 271
hour: 04 - posts: 243
hour: 05 - posts: 209
hour: 06 - posts: 234
hour: 07 - posts: 226
hour: 08 - posts: 257
hour: 09 - posts: 222
hour: 10 - posts: 282
hour: 11 - posts: 312
hour: 12 - posts: 342
hour: 13 - posts: 444
hour: 14 - posts: 513
hour: 15 - posts: 646
hour: 16 - posts: 579
hour: 17 - posts: 587
hour: 18 - posts: 614
hour: 19 - posts: 552
hour: 20 - posts: 510
hour: 21 - posts: 518
hour: 22 - posts: 383
hour: 23 - posts: 343


In [136]:
for i in sorted(comments_by_hour):
    print("hour: {} : comments: {}".format(i, comments_by_hour[i]))

hour: 00 : comments: 2277
hour: 01 : comments: 2089
hour: 02 : comments: 2996
hour: 03 : comments: 2154
hour: 04 : comments: 2360
hour: 05 : comments: 1838
hour: 06 : comments: 1587
hour: 07 : comments: 1585
hour: 08 : comments: 2362
hour: 09 : comments: 1477
hour: 10 : comments: 3013
hour: 11 : comments: 2797
hour: 12 : comments: 4234
hour: 13 : comments: 7245
hour: 14 : comments: 4972
hour: 15 : comments: 18525
hour: 16 : comments: 4466
hour: 17 : comments: 5547
hour: 18 : comments: 4877
hour: 19 : comments: 3954
hour: 20 : comments: 4462
hour: 21 : comments: 4500
hour: 22 : comments: 3372
hour: 23 : comments: 2297


### Calculating the average number of comments per post for posts created during each hour of the day

In [132]:
avg_by_hour = []

for key in counts_by_hour:
    
    post_count = counts_by_hour[key]
    comment_count = comments_by_hour[key]
    avg_by_hour.append([key, comment_count / post_count])

Let's improve the readability

In [133]:
swap_avg_by_hour = []

for l in avg_by_hour:
    swap_avg_by_hour.append([l[1], l[0]])
    
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top 5 Hours for Ask Posts Comments")

for l in sorted_swap[:5]:
    
    print("{hour}:00: {count:.2f} average comments per post".format(hour=l[1], count=l[0]))

Top 5 Hours for Ask Posts Comments
15:00: 28.68 average comments per post
13:00: 16.32 average comments per post
12:00: 12.38 average comments per post
02:00: 11.14 average comments per post
10:00: 10.68 average comments per post


We can conclude that's creating a post during **15:00** or **13:00** yields the best results when we want to maximize the chance of receiving comments.

This is most likely the end of a workday or during a lunch break for most of the users visiting the Hacker News website.