**Exploring Hacker News Posts**

In this project, we will analyse posts from the Hacker News website.

Hacker News is a site started by the startup incubator Y Combinator.

Users can submit posts which can be voted and commented on by other users, similar to reddit.

The aim of the analysis to is to find what is the most popular post type between ask posts and shows posts, and what time is best to recieve the most comments. 

In [2]:
#Reading and opening file
from csv import reader
open_file = open('hacker_news.csv')
read_file = reader(open_file)
hn = list(read_file)
hn_header = hn[0]
hn = hn[1:]

#Spliting the header and the rest of the rows
print(hn_header)
print(hn[1:4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


**Identify number of comments per each post type** 

In [3]:
#Checking posts that begin with `Ask HN` or `Show HN` and separate the data into different lists.

ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


We are going to look at two different types of posts, 'Ask HN' and 'Show HN'

Ask HN posts are specific questions.

For Example: 'Cheapest/ easiest way to host a static site?'

Show HN are posts to show the Hacker News community a project, 

For Example: 'I built a VS Code Theme Creator – easily make VS Code themes in browser'




In [4]:
# Calculating the average number of comments `Ask HN` posts receive
total_ask_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments/len(ask_posts)
print(avg_ask_comments)

# Calculating the average number of comments `Show HN` posts receive
total_show_comments = 0

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments/len(show_posts)
print(avg_show_comments)

14.038417431192661
10.31669535283993


Ask posts receive more comments on average than show posts. Therefore we will concentrate on Ask posts

**Checking What Time is the Best Time to Post**

Firstly we will find how many comments for every time of the day. Then, we'll calculate the average amount of comments.

In [5]:
import datetime as dt
result_list = []

for row in ask_posts:
    created_at = row[6]
    num_of_comments = (int(row[4]))
    result_list.append([created_at, num_of_comments])
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    dates = row[0]
    dates = dt.datetime.strptime(dates, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(dates, '%H')
    
   
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
        
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
        
print(comments_by_hour)

{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}


In [6]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])

print(avg_by_hour)

[['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]


**Sorting Average Comments from Highest to Lowest**

In [22]:
swap_avg_by_hour = []

for row in avg_by_hour:
    hour = row[0]
    avg_hour = row[1]
    swap_avg_by_hour.append([row[1],row[0]])
    
sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print(sorted_swap)

[[38.5948275862069, '15'], [23.810344827586206, '02'], [21.525, '20'], [16.796296296296298, '16'], [16.009174311926607, '21'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [13.20183486238532, '18'], [11.46, '17'], [11.383333333333333, '01'], [11.051724137931034, '11'], [10.8, '19'], [10.25, '08'], [10.08695652173913, '05'], [9.41095890410959, '12'], [9.022727272727273, '06'], [8.127272727272727, '00'], [7.985294117647059, '23'], [7.852941176470588, '07'], [7.796296296296297, '03'], [7.170212765957447, '04'], [6.746478873239437, '22'], [5.5777777777777775, '09']]


In [20]:
print("Top 5 Hours for Ask Posts Comments")

for row in sorted_swap:
    comment = ("{}: {:,.2f} average comments per post")
    time = dt.datetime.strptime(row[1], '%H')
    time = dt.datetime.strftime(time, '%H:%M')
    output = comment.format(time,row[0])
    print(output)

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post
10:00: 13.44 average comments per post
14:00: 13.23 average comments per post
18:00: 13.20 average comments per post
17:00: 11.46 average comments per post
01:00: 11.38 average comments per post
11:00: 11.05 average comments per post
19:00: 10.80 average comments per post
08:00: 10.25 average comments per post
05:00: 10.09 average comments per post
12:00: 9.41 average comments per post
06:00: 9.02 average comments per post
00:00: 8.13 average comments per post
23:00: 7.99 average comments per post
07:00: 7.85 average comments per post
03:00: 7.80 average comments per post
04:00: 7.17 average comments per post
22:00: 6.75 average comments per post
09:00: 5.58 average comments per post


15:00 is the best time to post. On average there is ten more comments per hour than the second highest time.    

**Conclusion**

We first found which type of post had four more comments on average between ask posts and show posts. After deducing this, we concentrated only on ask posts, and found 15:00 the most ideal time to receive the most comments. 
