# Hacker News

The following is an analysis of the most popular posts in Hacker News by number of comments, type of post and hour of posting.

### Openening, reading and exploring the dataset

In [44]:
from csv import reader
import datetime as dt
import pytz

opened_file = open('hackers_news.csv', encoding='utf8')
read = reader(opened_file)
hn = list(read)

In [45]:
for posts in hn[0:4]:
        print(posts)
        print('\n')

print(hn[4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']


['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']


['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']


['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']


### Separating the header

In [46]:
headers = hn[0]
hn = hn[1:]

print(headers)
print('\n')
print(hn[0:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']]


### Ask or Show posts? Finding out which generates more comments

We are only concerned with post titles starting with 'Ask HN' or 'Show HN'. The objective is to see which kind of posts, whether questions or display posts, are more popular.

In [47]:
ask_posts = []
show_posts = []
other_posts = []

for rows in hn:
    title = rows[1]
    title_low = title.lower()
    if title_low.startswith('ask hn'):
        ask_posts.append(rows)
    elif title_low.startswith('show hn'):
        show_posts.append(rows)
    else:
        other_posts.append(rows)
        
print("The number of ask posts is: " + str(len(ask_posts)))
print("The number of show posts is: " + str(len(show_posts)))
print("The number of other posts is: " + str(len(other_posts)))

The number of ask posts is: 9139
The number of show posts is: 10158
The number of other posts is: 273822


In [48]:
print("Average comments:")
total_ask_comments = 0

for elements in ask_posts:
    total_ask_comments += int(elements[4])
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print(str(round(avg_ask_comments, 2)) + " for ask posts")

total_show_comments = 0

for elements in show_posts:
    total_show_comments += int(elements[4])
    
avg_show_comments = total_show_comments/len(show_posts)
print(str(round(avg_show_comments, 2)) + " for show posts")

Average comments:
10.39 for ask posts
4.89 for show posts


### Most comment-effective hours for asks posts

In [49]:
result_list = []
counts_by_hour = {}
comments_by_hour = {}

for posts in ask_posts:
    created_at = posts[6]
    comments = int(posts[4])
    stripped_time = dt.datetime.strptime(created_at, "%m/%d/%Y %H:%M")
    hour = stripped_time.hour
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments

Once created two dictionaries with the counts and comments grouped by hour, I will find the average number of comments per post for each hour.

In [50]:
avg_by_hour = []

for hours in comments_by_hour:
    avg_by_hour.append([hours, comments_by_hour[hours]/counts_by_hour[hours]])

swapped_avg = []
for elements in avg_by_hour:
    swapped_avg.append([elements[1], elements[0]])
    
ordered_avg = sorted(swapped_avg, reverse=True)

print("Top 5 Hours for Ask Posts Comments:")
for avg, hr in ordered_avg[0:5]:
    time = dt.datetime.strptime(str(hr), "%H").strftime("%H:%M")
    print('{}: {:.2f} average comments per post'.format(time, avg))

Top 5 Hours for Ask Posts Comments:
15:00: 28.68 average comments per post
13:00: 16.32 average comments per post
12:00: 12.38 average comments per post
02:00: 11.14 average comments per post
10:00: 10.68 average comments per post


The dataset is in US eastern time. What does the dataset mean for my timezone (Uruguay)?

In [51]:
eastern = pytz.timezone('US/Eastern')
uruguay = pytz.timezone('America/Montevideo')

print("Top 5 Hours for Ask Posts Comments (in Uruguay's time):")
for avg, hr in ordered_avg[0:5]:
    time = dt.datetime.strptime(str(hr), "%H")
    time = eastern.normalize(eastern.localize(time))
    time = time.astimezone(uruguay).strftime("%H:%M")
    print('{}: {:.2f} average comments per post'.format(time, avg))

Top 5 Hours for Ask Posts Comments (in Uruguay's time):
16:11: 28.68 average comments per post
14:11: 16.32 average comments per post
13:11: 12.38 average comments per post
03:11: 11.14 average comments per post
11:11: 10.68 average comments per post


## Conclusion

Ask posts are more popular than show posts measured in terms of comments: they get 10.39 comments versus 4.89 for show posts on average. The most popular hours in which to upload ask posts would be from 15:00 to 16:00 in eastern time or 16:00 to 17:00 in Uruguay's time, as they are the most comment-productive range of hours for ask posts.

People might be inclined to comment to show off their knowledge, or to help, more than they might be inclined to comment a 'show' post. This would not necessarily imply that they were more interested by the 'ask' posts or that 'ask' posts generated more fuzz. This is a possible limitation of the analysis. An example of additional data that may add value to the analysis is the number of 'views' of each post or the amount of times posts were cited or shared outside of the site.