# Exploring Hacker News Posts

In this project we'll be working with a dataset of submissions to the popular technology site Hacker News.

Column Descriptions:
- id: the unique identifier from Hacker News for the post
- title: the title of the post
- url: the URL that the posts links to, if the post has a URL
- num_points: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
- num_comments: the number of comments on the post
- author: the username of the person who submitted the post
- created_at: the date and time of the post's submission

We are specifically interested in the Ask HN and Show HN posts. We want to determine if these types of posts attract more interactions.

As a bonus we also want to evaluate if posts created at certain times respond to more comments on average

In [1]:
# read in the data
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
headers = hn[0]
hn = hn[1:]

## Extracting Ask HN and Show HN Posts

In [2]:
# identify categories of posts and append to corresponding lists
ask_posts = []
show_posts =[]
other_posts = []

for post in hn:
    title = post[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(post)
    elif title.lower().startswith("show hn"):
        show_posts.append(post)
    else:
        other_posts.append(post)
        
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))


1744
1162
17194


## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [3]:
# Calculate the average number of comments 'Ask HN' posts receive.
total_ask_comments = 0

for post in ask_posts:
    total_ask_comments += int(post[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
print(avg_ask_comments)

14.038417431192661


In [4]:
# Calculate average number of 'Show HN' posts and find average
total_show_comments = 0

for post in show_posts:
    total_show_comments += int(post[4])
    
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)

10.31669535283993


From our results we can see that on average Ask HN posts are receiving more comments on average with 14.03, in comparison to 10.31 average from Show HN.

## Finding the Number of Ask Posts and Comments by Hour Created

In [5]:
# add the time and date of the post along with the number of comments
import datetime as dt

result_list = []
for post in ask_posts:
    result_list.append([post[6], int(post[4])])

counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"

for result in result_list:
    date = result[0]
    comment = result[1]
    hour = dt.datetime.strptime(date, date_format).strftime("%H")
    
    if hour not in result_list:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment
    else:
        counts_by_hour += 1
        comments_by_hour += comment


## Calculating the Average Number of Comments for Ask HN Posts by Hour

Use the dictionaries created above to calculate the average number of comments for posts created during each hour of the day

In [7]:
avg_by_hour = []

for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])
    
print(avg_by_hour)

[['09', 2.0], ['13', 13.0], ['10', 1.0], ['14', 18.0], ['16', 2.0], ['23', 2.0], ['12', 3.0], ['17', 5.0], ['15', 1.0], ['21', 8.0], ['20', 9.0], ['02', 6.0], ['18', 199.0], ['03', 1.0], ['05', 2.0], ['19', 2.0], ['01', 4.0], ['22', 1.0], ['08', 2.0], ['04', 2.0], ['00', 15.0], ['06', 22.0], ['07', 1.0], ['11', 29.0]]


## Sorting and Printing Values from a List of Lists

Need to sort our list of lists to be in a more readable format to express our conclusions on the dataset

In [13]:
# swapping values to sort so that we can compare comment nums
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(swap_avg_by_hour)
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top 5 Hours for 'Ask HN' Comments")
for avg, hr in sorted_swap[:5]:
    print(
        "{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg
        )
    )
    
    

[[2.0, '09'], [13.0, '13'], [1.0, '10'], [18.0, '14'], [2.0, '16'], [2.0, '23'], [3.0, '12'], [5.0, '17'], [1.0, '15'], [8.0, '21'], [9.0, '20'], [6.0, '02'], [199.0, '18'], [1.0, '03'], [2.0, '05'], [2.0, '19'], [4.0, '01'], [1.0, '22'], [2.0, '08'], [2.0, '04'], [15.0, '00'], [22.0, '06'], [1.0, '07'], [29.0, '11']]
Top 5 Hours for 'Ask HN' Comments
18:00: 199.00 average comments per post
11:00: 29.00 average comments per post
06:00: 22.00 average comments per post
14:00: 18.00 average comments per post
00:00: 15.00 average comments per post


The hour that receives the most comments per post on average is 15:00, with an average of 38.59 comments per post.

The timezone is set to Eastern Time in the US, so to get this level of attention we would need to post at 11pm UK time.