# Exploring Hacker News Posts 


Hacker news is a website where posts are uploaded by users and are voted or commented by other users It is a well-known website within technology circles so if a post becomes a popular it could receive numerous views and responses. 

We have a data set which contains the follwing information 
- id: The unique identifier from Hacker News for the post
- title: The title of the post
- url: The URL that the posts links to, if it the post has a URL
- num_points: The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
- num_comments: The number of comments that were made on the post
- author: The username of the person who submitted the post
- created_at: The date and time at which the post was submitted

There are posts with Ask HN in the title where users ask a question to the Hacker News community or Show HN in the title where users show a interesting project or a product. 

We would like to explore the Hacker News data set and reveal the following 
- Do Ask HN or Show HN receive comments on average?
- On average, do the number of comments depend on the when the posts were created?

## Extracting ASK HN and Show HN posts

In [27]:
from csv import reader 

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hacker = list(read_file)
hacker_noheader = hacker[1:]
hacker_header = hacker[0]

print(hacker_header)

for item in hacker_noheader[0:5]:
    print(item)
    print('\n')

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']




In [2]:
ask_posts = []
show_posts = []
other_posts = []

for row in hacker_noheader:
    title = row[1]
    lowercase_title = title.lower()
    
    if lowercase_title.startswith('ask hn'):
        ask_posts.append(row)
    elif lowercase_title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
num_ask = len(ask_posts)
num_show = len(show_posts)
num_other = len(other_posts)
print('Number of ask HN posts: ', num_ask)
print('Number of show HN posts: ', num_show)
print('Number of other posts: ', num_other)    

Number of ask HN posts:  1744
Number of show HN posts:  1162
Number of other posts:  17194


## Calculating the average number of comments for Ask HN and Show HN posts

In [3]:
total_ask_comments = 0 

for post in ask_posts:
    num_comments = int(post[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / num_ask
print('The average number of comments of ask HN post is: ', avg_ask_comments)


total_show_comments = 0 

for post in show_posts:
    num_comments = int(post[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / num_show
print('The average number of comments of show HN post is: ', avg_show_comments)
    


The average number of comments of ask HN post is:  14.038417431192661
The average number of comments of show HN post is:  10.31669535283993


On average, the number of comments for ask HN posts has 4 comments more. 
This is reasonable as the user is asking a question and seeking an answer from other users so other users are more likely to make a comment. 

Now that we established that the ask HN posts are more likely to receive comments, the analysis will focused on these posts. 

## Determining the amount of Ask posts and comments created by hour created

Now we would like to investigate whether posts created at a certain time are more likely to attract users to make commments.

In [21]:
import datetime as dt 

result_list = []

for post in ask_posts:
    time_comment = [post[6], int(post[4])]
    result_list.append(time_comment)
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date_str = row[0]
    date_time = dt.datetime.strptime(date_str, "%m/%d/%Y %H:%M")
    hour_str = dt.datetime.strftime(date_time, "%H")
    
    if hour_str not in counts_by_hour:
        counts_by_hour[hour_str] = 1
        comments_by_hour[hour_str] = row[1]
    else:
        counts_by_hour[hour_str] += 1
        comments_by_hour[hour_str] += row[1]

avg_hour = []

for key in comments_by_hour:
    avg_comments_hour = comments_by_hour[key] / counts_by_hour[key]
    avg_hour.append([key,avg_comments_hour])    

for i in avg_hour:
    print(i)

['09', 5.5777777777777775]
['13', 14.741176470588234]
['10', 13.440677966101696]
['14', 13.233644859813085]
['16', 16.796296296296298]
['23', 7.985294117647059]
['12', 9.41095890410959]
['17', 11.46]
['15', 38.5948275862069]
['21', 16.009174311926607]
['20', 21.525]
['02', 23.810344827586206]
['18', 13.20183486238532]
['03', 7.796296296296297]
['05', 10.08695652173913]
['19', 10.8]
['01', 11.383333333333333]
['22', 6.746478873239437]
['08', 10.25]
['04', 7.170212765957447]
['00', 8.127272727272727]
['06', 9.022727272727273]
['07', 7.852941176470588]
['11', 11.051724137931034]


In [22]:
swap_avg_by_hour = []

for row in avg_hour:
    swap_avg_by_hour.append([row[1],row[0]])

sorted_avg_hour = sorted(swap_avg_by_hour,reverse = True)

print('Top 5 hours to ask posts comment')
for_str = "{}:00: {:.2f} average comments per post."

for row in sorted_avg_hour[0:5]:
    print(for_str.format(row[1],row[0]))

Top 5 hours to ask posts comment
15:00: 38.59 average comments per post.
02:00: 23.81 average comments per post.
20:00: 21.52 average comments per post.
16:00: 16.80 average comments per post.
21:00: 16.01 average comments per post.
