# Analysing Hacker News' posts

In this notebook we are going to discover the answers for the two questions bellow:

* Do Ask HN or Show HN receive more comments on average?
* Do posts created at a certain time receive more comments on average?

## Steps

* Opening, reading, converting and displaying some rows of the csv file;

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
for i in range(0,5):
    print(hn[i])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


* Excluding the header;

In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
print()
for i in range(0,5):
    print(hn[i])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']
['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']
['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']
['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']
['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']


* Separing the ask posts and show posts from the others;

In [3]:
ask_posts = list()
show_posts = list()
other_posts = list()
for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

* Calculating the average number of comments for ask and show posts;

In [4]:
total_ask_comments = 0
for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
num_ask_posts = len(ask_posts)
avg_ask_comments = total_ask_comments / num_ask_posts
print('{:.2f}'.format(avg_ask_comments))

total_show_comments = 0
for row in show_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
num_show_posts = len(show_posts)
avg_show_comments = total_show_comments / num_show_posts
print('{:.2f}'.format(avg_show_comments))

14.04
10.32


**First conclusion:** *Ask posts receive more comments on average than show posts.*

## More steps...

- Adding the quantities of comments according to what hour they have been created;

In [5]:
import datetime as dt
result_list = list()
for row in ask_posts:
    result_list.append([row[6], int(row[4])])
counts_by_hour = dict()
comments_by_hour = dict()
for row in result_list:
    created_at = dt.datetime.strptime(row[0], '%m/%d/%Y %H:%M')
    hour = created_at.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

- Calculating the comments' average of each hour;

In [6]:
avg_by_hour = list()
for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
for average in avg_by_hour:
    print(average)

['18', 13.20183486238532]
['12', 9.41095890410959]
['20', 21.525]
['22', 6.746478873239437]
['01', 11.383333333333333]
['14', 13.233644859813085]
['03', 7.796296296296297]
['16', 16.796296296296298]
['11', 11.051724137931034]
['19', 10.8]
['08', 10.25]
['23', 7.985294117647059]
['02', 23.810344827586206]
['15', 38.5948275862069]
['04', 7.170212765957447]
['09', 5.5777777777777775]
['13', 14.741176470588234]
['05', 10.08695652173913]
['17', 11.46]
['06', 9.022727272727273]
['07', 7.852941176470588]
['21', 16.009174311926607]
['10', 13.440677966101696]
['00', 8.127272727272727]


- Converting the columns Comments' average and Hour, sorting in descending order of average of comments aand printing the first five;

In [12]:
swap_avg_by_hour = list()
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
for hour in swap_avg_by_hour:
    print(hour)
print()
sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print('Top 5 Hours for Ask Post Comments\n')
for i in range(0,5):
    hour_object = dt.datetime.strptime(sorted_swap[i][1],"%H")
    hour_formated = hour_object.strftime("%H:%M")
    print('{}: {:.2f} average comments per post'.format(hour_formated,sorted_swap[i][0]))

[13.20183486238532, '18']
[9.41095890410959, '12']
[21.525, '20']
[6.746478873239437, '22']
[11.383333333333333, '01']
[13.233644859813085, '14']
[7.796296296296297, '03']
[16.796296296296298, '16']
[11.051724137931034, '11']
[10.8, '19']
[10.25, '08']
[7.985294117647059, '23']
[23.810344827586206, '02']
[38.5948275862069, '15']
[7.170212765957447, '04']
[5.5777777777777775, '09']
[14.741176470588234, '13']
[10.08695652173913, '05']
[11.46, '17']
[9.022727272727273, '06']
[7.852941176470588, '07']
[16.009174311926607, '21']
[13.440677966101696, '10']
[8.127272727272727, '00']

Top 5 Hours for Ask Post Comments

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


**Second conclusion:** *At 15:00, the posts receive more comments, in average, than posts created in other time.*

Ps: The time above is from Eastern Time in the US. To convert to Brasília Time (BRT), just add 1 hour.

## Conclusion

After all this analysis, we can conclude that Ask posts receive more comments on average than show posts and, to have a higher chance of receiving comments, you should create a post at 16:00 (BRT time zone).

## Next steps...

- Determine if show or ask posts receive more points on average.
- Determine if posts created at a certain time are more likely to receive more points.
- Compare your results to the average number of comments and points other posts receive.
- Use Dataquest's data science project style guide to format your project.