# Hacker News website: Ask HN vs Show HN

-----------------------------------------
Hacker News is a site started by the startup incubator [Y Combinator](https://www.ycombinator.com/), where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

Some post on the website are either categorized as 'Ask HN' or 'Show HN'. Users submit 'Ask HN' posts to ask the HN community questions. Below are a few examples
>Ask HN: How to improve my personal website?
Ask HN: Am I the only one outraged by Twitter shutting down share counts?
Ask HN: Aby recent changes to CSS that broke mobile?

likewise, users submit 'Show HN' posts to show the community a project, news, or anything they find interesting.
>Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform'
Show HN: Something pointless I made
Show HN: Shanhu.io, a programming playground powered by e8vm

In this project, I want to explore these two categories of post and determine:
- Do Ask HN or Show HN receive more comments on average?
- Do posts created at a certain time receive more comments on average?

The dataset used for this project can be gotten [here](https://www.kaggle.com/hacker-news/hacker-news-posts). Although the one used has been reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that didn't receive any comments and then randomly sampling from the remaining submissions. Below are descriptions of the columns:

- `id`: the unique identifier from Hacker News for the post
- `title`: the title of the post
- `url`: the URL that the posts links to, if the post has a URL
- `num_points`: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
- `num_comments`: the number of comments on the post
- `author`: the username of the person who submitted the post
- `created_at`: the date and time of the post's submission

In [63]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)
hn_header = hn[0]
hn = hn[1:]

In [64]:
display(hn[:5])

[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

## Extracting the 'Ask HN' posts and 'Show HN' post 
--------------------------------------------------------------------------------
Our data currently is mixed. We will extract the posts we are interested in from the larger dataset.

In [65]:
show_posts = []
ask_posts = []
other_posts = []

for row in hn:
    title = row[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    if title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print("There are {} 'ask hn' posts, {} 'show hn' posts, and {} other posts".format(len(ask_posts), len(show_posts), len(other_posts)))

There are 1744 'ask hn' posts, 1162 'show hn' posts, and 18938 other posts


In [66]:
print("examples of 'ask hn' posts:" + '\n' + str(ask_posts[:5]))
print('\n')
print("examples of 'show hn' posts:" + '\n' + str(show_posts[:5]))

examples of 'ask hn' posts:
[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]


examples of 'show hn' posts:
[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming pla

## Highest number of posts and average
------------------------------------------

'ask HN' posts generally receive more comments than 'show HN' posts. We calculate below the exact number below from this dataset and the average number of comments per post.

In [67]:
total_ask_comments = 0
for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments+=num_comments
total_ask_comments
avg_ask_comments=total_ask_comments/len(ask_posts)
print("'show HN' posts have {:,} total comments and an average of {:.0f} comments per post.".format(total_ask_comments, avg_ask_comments).replace("show", "ask"))

'ask HN' posts have 24,483 total comments and an average of 14 comments per post.


In [68]:
total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments+=num_comments
total_show_comments
avg_show_comments = total_show_comments/len(show_posts)
print("'show HN' posts have {:,} total comments and an average of a {:.0f} comments per post.".format(total_show_comments, avg_show_comments))

'show HN' posts have 11,988 total comments and an average of a 10 comments per post.


## Time in which 'ask HN' posts are mostly made
-------------------------------------------------------------

Below, we get the date from the dataset and extract the hour. We then count each posts by the 'hour' created along with the total number of comments from each row which is then aggregated to get total comments by hour.

In [69]:
import datetime as dt
result_list = []
for row in ask_posts:
    result_list.append([row[6], row[4]])
result_list[:5]

[['8/16/2016 9:55', '6'],
 ['11/22/2015 13:43', '29'],
 ['5/2/2016 10:14', '1'],
 ['8/2/2016 14:20', '3'],
 ['10/15/2015 16:38', '17']]

In [71]:
counts_by_hour = {}
comments_by_hour = {}
hours = []
for row in result_list:
    time = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M").time()
    hour = time.hour
    hours.append(hour)
    row.append(hour)
for time in hours:
    if time in counts_by_hour:
        counts_by_hour[time] += 1
    else:
        counts_by_hour[time] = 1
for row in result_list:
    hour = row[-1]
    if hour in comments_by_hour:
        comments_by_hour[hour] += int(row[1])
    else:
        comments_by_hour[hour] = int(row[1])
print(counts_by_hour)
comments_by_hour


{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}


{9: 251,
 13: 1253,
 10: 793,
 14: 1416,
 16: 1814,
 23: 543,
 12: 687,
 17: 1146,
 15: 4477,
 21: 1745,
 20: 1722,
 2: 1381,
 18: 1439,
 3: 421,
 5: 464,
 19: 1188,
 1: 683,
 22: 479,
 8: 492,
 4: 337,
 0: 447,
 6: 397,
 7: 267,
 11: 641}

## Calculating the Average Number of Comments for Ask HN Posts by Hour

In [76]:
avg_by_hour = []
for hour in comments_by_hour:
    avg_by_hour.append([hour, round(comments_by_hour[hour]/counts_by_hour[hour], 2)])
avg_by_hour

[[9, 5.58],
 [13, 14.74],
 [10, 13.44],
 [14, 13.23],
 [16, 16.8],
 [23, 7.99],
 [12, 9.41],
 [17, 11.46],
 [15, 38.59],
 [21, 16.01],
 [20, 21.52],
 [2, 23.81],
 [18, 13.2],
 [3, 7.8],
 [5, 10.09],
 [19, 10.8],
 [1, 11.38],
 [22, 6.75],
 [8, 10.25],
 [4, 7.17],
 [0, 8.13],
 [6, 9.02],
 [7, 7.85],
 [11, 11.05]]

In [84]:
swap = []
for row in avg_by_hour:
    swap.append([row[1], dt.datetime.strptime(str(row[0]), "%H").strftime("%H:00")])
swap
sorted_swap = sorted(swap, reverse = True)  
for entry in sorted_swap:
    print('{}: {} average comments per post'.format(entry[1], entry[0]))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.8 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post
10:00: 13.44 average comments per post
14:00: 13.23 average comments per post
18:00: 13.2 average comments per post
17:00: 11.46 average comments per post
01:00: 11.38 average comments per post
11:00: 11.05 average comments per post
19:00: 10.8 average comments per post
08:00: 10.25 average comments per post
05:00: 10.09 average comments per post
12:00: 9.41 average comments per post
06:00: 9.02 average comments per post
00:00: 8.13 average comments per post
23:00: 7.99 average comments per post
07:00: 7.85 average comments per post
03:00: 7.8 average comments per post
04:00: 7.17 average comments per post
22:00: 6.75 average comments per post
09:00: 5.58 average comments per post


We can see that at 15:00 hours, with the highest number of comments per post is 38.59. followed by 02:00 hours with 23.81 average comments which is about 39% less than the highest.

# Conclusion
In this project, we analyzed ask posts and show posts to determine which type of post and time receive the most comments on average. From our results, we can conlude that the 15:00 hours is the time with the most likelihood a post created will receive high number of comments when posted as an 'ask HN'.
