# Hacker News Commenting

This project is about posting on Hacker News that begin with `Ask HN` or `Show HN`. It aims to answer the questions "Do `Ask HN` or `Show HN` recieve more comments on average?" and "Do posts created at a certain time recieve more comments on average?" 

The original dataset for this project can be downloaded on [Kaggle](https://www.kaggle.com/hacker-news/hacker-news-posts); the project uses a reduced data set of 20,000 from the orginal 300,000. The columns include the post title, its url, the number of upvotes, the number of received comments, the author name, and the date and time the post was created in US Eastern Standard Time (EST).

We begin by opening the csv file and reading it as a list of lists, `hn`. The header row is removed for filtering the posts.

In [32]:
from csv import reader
opened_file = open("hacker_news.csv", encoding="utf8")
read_file = reader(opened_file)
hn = list(read_file)

# removing the header row
headers = hn[0]
hn = hn[1:]

## Extracting Ask HN and Show HN posts

We only want to examine posts beginning with `Ask HN` and `Show HN`, so we'll create new three lists for each category, `ask_posts`, `show_posts`, and `other_posts`. To do this, we'll use the `startswith` string method to check whether a post begins with `ask hn` or `show hn` in lowercase.

In [33]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

## Average Number of Comments for Ask HN and Show HN posts

Now we'll see whether ask or show posts receive more comments on average. We'll use the `num_comments` column for each post and use it to compute the total number of posts to get the average.

In [34]:
total_ask_comments = 0
total_show_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments

avg_ask_comments = total_ask_comments / len(ask_posts)
avg_show_comments = total_show_comments / len(show_posts)

print(f"An ask HN post recieves an average of {avg_ask_comments:.2f} comments.")
print(f"A show HN post recieves an average of {avg_show_comments:.2f} comments.")

An ask HN post recieves an average of 14.04 comments.
A show HN post recieves an average of 10.32 comments.


Ask posts recieve more comments than show posts. This might be because the community is more receptive to people who are willing to post questions, or the questions in ask posts allow for more discussion. We'll look more at ask posts since they recieve more comments on average.

## Amount of Ask Posts and Comments by Hour Created

Now we'll see if ask posts created at a certain time will recieve more comments. For each hour of the day, we'll count the number of ask posts created and their comments. Then we'll calculate the average number of comments ask posts recieve by the hour they were created.

In [35]:
# number of ask posts created and number of comments
import datetime as dt
result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = row[0]
    hour_min = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    just_hour = hour_min.strftime("%H")
    if just_hour not in counts_by_hour:
        counts_by_hour[just_hour] = 1
        comments_by_hour[just_hour] = row[1]
    else:
        counts_by_hour[just_hour] += 1
        comments_by_hour[just_hour] += row[1]

In [36]:
# getting the average number of 
# comments for ask posts by hour

avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])

print(f"Average Number of Comments per Post for Each Hour: {avg_by_hour}")

Average Number of Comments per Post for Each Hour: [['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]


## Top 5 Hours for Posting

To make it easier to indentify the hours with the highest number of average comments, will sort the list of lists.

In [37]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(f"Hour and Average Swapped: {swap_avg_by_hour}")

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

Hour and Average Swapped: [[5.5777777777777775, '09'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [16.796296296296298, '16'], [7.985294117647059, '23'], [9.41095890410959, '12'], [11.46, '17'], [38.5948275862069, '15'], [16.009174311926607, '21'], [21.525, '20'], [23.810344827586206, '02'], [13.20183486238532, '18'], [7.796296296296297, '03'], [10.08695652173913, '05'], [10.8, '19'], [11.383333333333333, '01'], [6.746478873239437, '22'], [10.25, '08'], [7.170212765957447, '04'], [8.127272727272727, '00'], [9.022727272727273, '06'], [7.852941176470588, '07'], [11.051724137931034, '11']]


In [38]:
print("Top 5 Hours for Ask Posts Comments: ")

for row in sorted_swap[:5]:
    hour = row[1]
    hour_format = dt.datetime.strptime(hour, "%H")
    hour = hour_format.strftime("%H:%M")
    print(f"{hour}: {row[0]:.2f} average comments per post")

Top 5 Hours for Ask Posts Comments: 
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


To receive the most comments, I should post at 3PM. 2AM, although the runner up time, is too late for posting. Posting at 8PM would be better, as it's not too late, and I'd still recieve a similar amount to 2AM. 