# Hacker News Posts
**Hacker News** is a site started by the startup incubator [Y Combinator](https://www.ycombinator.com/), where user-submitted stories (known as "posts") are voted and commented upon, similar to **reddit**. Hacker News is extremely popular in **technology and startup circles**, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.
Iam specifically interested in posts whose titles begin with either **Ask HN** or **Show HN**. Users submit **Ask HN** posts to ask the Hacker News community a specific question, users submit **Show HN** posts to show the Hacker News community a project, product, or just generally something interesting.

**I'll compare these two types of posts to determine the following:**
1. Do Ask HN or Show HN receive more comments on average?
2. Do posts created at a certain time receive more comments on average?

### Lets start..

In [14]:
from csv import reader
opened_file = open('HN_posts.csv')
read_file = reader(opened_file)
hn = list(read_file)
header = hn[:1]
hn = hn[1:]

print(header)
print('\n')
print(hn[:10])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-أ‚آ“the-data-vaultأ‚آ”', '1', '0', 'markgainor1', '9/26/2016 3:14'], ['12578975', '

##### As I said Iam only concerned with post titles beginning with Ask HN or Show HN.

In [27]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if (title.lower()).startswith('ask hn'):
        ask_posts.append(row)
    elif (title.lower()).startswith('show hn:'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('The number of ASK-posts is: ', len(ask_posts))
print(ask_posts[:3])
print('\n')
print('The number of SHOW-posts is: ', len(show_posts))
print(show_posts[:3])
print('\n')
print('The number of OTHER posts is: ', len(other_posts))
print(other_posts[:3])

The number of ASK-posts is:  9139
[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57']]


The number of SHOW-posts is:  10146
[['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'], ['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'], ['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25/2016 23:44']]


The number of OTHER posts is:  273834
[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http:/

So we have **9139** ASK-posts, **10146** SHOW-posts and **273834** OTHER posts.

#### Calculating the average number of comments..

In [36]:
total_ask_comments = 0
for row in ask_posts:
    ask_comments_num = int(row[4])
    total_ask_comments += ask_comments_num
    
total_show_comments = 0
for row in show_posts:
    show_comments_num = int(row[4])
    total_show_comments += show_comments_num
    
avg_ask_comments = round(total_ask_comments / len(ask_posts))
avg_show_comments = round(total_show_comments / len(show_posts))
    
print('The total number of ask-posts comments is: ', total_ask_comments, 'comments')
print('The total number of show-posts comments is: ', total_show_comments, 'comments')
print('The average number of comments in ask-posts is: ', avg_ask_comments, 'comments')
print('The average number of comments in show-posts is: ', avg_show_comments, 'comments')    

The total number of ask-posts comments is:  94986 comments
The total number of show-posts comments is:  49627 comments
The average number of comments in ask-posts is:  10 comments
The average number of comments in show-posts is:  5 comments


We can note that the average number of comments on ask-posts is **nearly the double** of its in show-posts.

#### Next, I'll determine if ask posts created at a certain time are more likely to attract comments.

In [52]:
import datetime as dt
result_list = []

for row in ask_posts:
    created_at = row[-1]
    num_comments = int(row[-3])
    result_list.append([created_at, num_comments])
    
counts_per_hour = {}
comments_per_hour = {}

for result in result_list:
    created_at_result = result[0]
    comments_result = result[1]
    created_at_result = dt.datetime.strptime(created_at_result, "%m/%d/%Y %H:%M")
    creation_hour = created_at_result.strftime('%H')
    if creation_hour not in counts_per_hour:
        counts_per_hour[creation_hour] = 1
        comments_per_hour[creation_hour] = comments_result
    else:
        counts_per_hour[creation_hour] += 1
        comments_per_hour[creation_hour] += comments_result
    
print(counts_per_hour)
print('\n')
print(comments_per_hour)

{'02': 269, '01': 282, '22': 383, '21': 518, '19': 552, '17': 587, '15': 646, '14': 513, '13': 444, '11': 312, '10': 282, '09': 222, '07': 226, '03': 271, '23': 343, '20': 510, '16': 579, '08': 257, '00': 301, '18': 614, '12': 342, '04': 243, '06': 234, '05': 209}


{'02': 2996, '01': 2089, '22': 3372, '21': 4500, '19': 3954, '17': 5547, '15': 18525, '14': 4972, '13': 7245, '11': 2797, '10': 3013, '09': 1477, '07': 1585, '03': 2154, '23': 2297, '20': 4462, '16': 4466, '08': 2362, '00': 2277, '18': 4877, '12': 4234, '04': 2360, '06': 1587, '05': 1838}


In [57]:
avg_per_hour = []

for hour in comments_per_hour:
    average = round(comments_per_hour[hour] / counts_per_hour[hour])
    avg_per_hour.append([hour, average])
    
print(avg_per_hour)

[['02', 11], ['01', 7], ['22', 9], ['21', 9], ['19', 7], ['17', 9], ['15', 29], ['14', 10], ['13', 16], ['11', 9], ['10', 11], ['09', 7], ['07', 7], ['03', 8], ['23', 7], ['20', 9], ['16', 8], ['08', 9], ['00', 8], ['18', 8], ['12', 12], ['04', 10], ['06', 7], ['05', 9]]


In [64]:
swap_avg_per_hour = []

for row in avg_per_hour:
    swap_avg_per_hour.append([row[1], row[0]])
    
sorted_swap = sorted(swap_avg_per_hour, reverse=True)

print("The Top Five Hours for Ask Posts Comments:")
for row in sorted_swap[:5]:
    # US/Eastern timezone (EST) - UTC-06
    est_hour_dt = dt.datetime.strptime(row[1], '%H')
    est_hour_str = est_hour_dt.strftime('%H:%M')
    
    # Our timezone (WAT) - UTC+01: 7 hours ahead of EST
    # Converting the `Hour` from EST to WAT
    our_hour_dt = dt.datetime.strptime(row[1], '%H') + dt.timedelta(hours=7)
    our_hour_str = our_hour_dt.strftime('%H:%M')
    
    print('   ', '{est_time} EST or {our_time} WAT:    {avg:.1f} average comments per post'.format(est_time=est_hour_str, our_time=our_hour_str, avg=row[0]))

The Top Five Hours for Ask Posts Comments:
    15:00 EST or 22:00 WAT:    29.0 average comments per post
    13:00 EST or 20:00 WAT:    16.0 average comments per post
    12:00 EST or 19:00 WAT:    12.0 average comments per post
    10:00 EST or 17:00 WAT:    11.0 average comments per post
    02:00 EST or 09:00 WAT:    11.0 average comments per post


## Conclusion:
Our results show that creating a post at 15:00 - 16:00 EST has the highest chance of receiving comments. One of the possible explanations is that 15:00 EST is a time when users in both North America and Europe are active. This is based on our assumption that most of the Hacker News users are from these two continents. For this reason, the best time for us (IN EGYPT) to submit a post at our time zone is 22:00, and it is followed by 09:00 and 03:00 WAT.