# Hacker News data set

In this notebook we are going to analyze the Hacker News data set and do an analysis of some things.

In [1]:
from csv import reader

open_file = open('hacker_rank.csv')
read_file = reader(open_file)
hn = list(read_file)
header = hn[0]
hn = hn[1:]

# for row in hn:
#     print(row)
#     print('\n')

Now that we have our data imported, we're going to start filtering out. We will be analyzing to kind of posts from Hacker Rank, which are the *Ask Hacker Rank* and *Show Hacker Rank* posts. For now, we're going to split the data set into different lists to be able to manipulate and analyze them separately. 

In [2]:
ask_hn = []
show_hn = []
rest_hn = []
for row in hn:
    check = row[1].lower()
    if check.startswith('ask hn'):
        ask_hn.append(row)
    elif check.startswith('show hn'):
        show_hn.append(row)
    else:
        rest_hn.append(row)
        
print(len(ask_hn))
print(len(show_hn))
print(len(rest_hn))



1744
1162
17194


Great! Now we can see how many rows we have for each of the different type of posts we are going to analyze. Now we are going to analyze which kind of posts have more engagement. We will be using comments as our metric.

In [3]:
ask_hn_comments = 0
show_hn_comments = 0
rest_hn_comments = 0

for row in ask_hn:
    num = int(row[4])
    ask_hn_comments += num
    
ask_avg_comments = ask_hn_comments / len(ask_hn)
print('Ask Hacker News average comments: ' + str(ask_avg_comments))

for row in show_hn:
    num = int(row[4])
    show_hn_comments += num
    
show_avg_comments = show_hn_comments / len(show_hn)
print('Show Hacker News average comments: ' + str(show_avg_comments))

for row in rest_hn:
    num = int(row[4])
    rest_hn_comments += num
    
rest_avg_comments = rest_hn_comments / len(rest_hn)
print('Rest of Hacker News average comments: ' +str(rest_avg_comments))


Ask Hacker News average comments: 14.038417431192661
Show Hacker News average comments: 10.31669535283993
Rest of Hacker News average comments: 26.8730371059672


By analyzing our findings, we can see that ask posts receive the highest number of average comments. Even the rest of the posts have a higher number of comments than the show posts! Let's now focus on Ask Hacker New posts and see what else we can find about them.

We are now going to analyze Ask Hacker News posts by hour. Let's see at which times posts are more popular.

In [4]:
import datetime as dt

ask_hn_time = []

for row in ask_hn:
    date = dt.datetime.strptime(row[6],'%d/%j/%Y %H:%M')
    comment = row[4]
    new_list = [date, comment]
    ask_hn_time.append(new_list)

    
post_hour = {}
comments_hour = {}

for row in ask_hn_time:
    hour = row[0].hour
    comment = int(row[1])
    if hour not in post_hour:
        post_hour[hour] = 1
        comments_hour[hour] = comment
    else:
        post_hour[hour] += 1
        comments_hour[hour] += comment
    
print(comments_hour)

{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


In the last cell, we created two different dictionaries. <code>post_hour</code>, which shows us the number of posts created for each hour of the day, and <code>comments_hour</code>, which shows the number of comments for each hour of the day. Now, we are going to create a list to see the average number of comments for each post for each hour of the day

In [36]:
avg_comments_per_hour = []

    
for post in post_hour:
    avg_comments_per_hour.append([post, comments_hour[post]/post_hour[post]])
    
swap_avg = []

for row in avg_comments_per_hour:
    first = row[0]
    second = row[1]
    swap_avg.append([second,first])
    
swap_avg.sort(reverse=True)

print('Top 5 hours:')
for row in swap_avg:
    comment = round(row[0],0)
    time = row[1]
    print('# of comments : ' + str(comment) + "        " + 'Time: ' + str(time))

Top 5 hours:
# of comments : 39.0        Time: 15
# of comments : 24.0        Time: 2
# of comments : 22.0        Time: 20
# of comments : 17.0        Time: 16
# of comments : 16.0        Time: 21
# of comments : 15.0        Time: 13
# of comments : 13.0        Time: 10
# of comments : 13.0        Time: 14
# of comments : 13.0        Time: 18
# of comments : 11.0        Time: 17
# of comments : 11.0        Time: 1
# of comments : 11.0        Time: 11
# of comments : 11.0        Time: 19
# of comments : 10.0        Time: 8
# of comments : 10.0        Time: 5
# of comments : 9.0        Time: 12
# of comments : 9.0        Time: 6
# of comments : 8.0        Time: 0
# of comments : 8.0        Time: 23
# of comments : 8.0        Time: 7
# of comments : 8.0        Time: 3
# of comments : 7.0        Time: 4
# of comments : 7.0        Time: 22
# of comments : 6.0        Time: 9


As we can see, 3:00 PM is the best time to post, while 9:00 AM will give you the least number of comments, on average. This will be everything for this project. Thanks for reading!