# Ask HN vs Show HN: What gets more attention?

In this project, we will be exploring data consisting of posts made on the well known technology news website, [Hacker News](https://news.ycombinator.com/).

The data consists of the following fields:

- *id* - A unique identifiers
- *title* - The title of the post
- *url* - The link to the post
- *num_points* - The difference between the upvotes and the downvotes the post acquired
- *num_comments* - The number of comments the post acquired
- *author* - The writer of the post
- *created_at* - The date and time of the post

We will ask ourselves the questions:

1. Do Ask HN or Show HN posts receive more comments on average?
2. Do posts created at a certain time receive more comments on average?

Let's first read the file in and print the first five rows.

In [2]:
from csv import reader

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
index = 0

for post in hn:
    if index == 5:
        break;
    print(post)
    print("")
    index += 1

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']



We'll remove the headers from the list so they we don't have to worry about them later. We know what the column names are.

In [3]:
headers = hn[0]
hn = hn[1:]
print(headers)

index = 0

for post in hn:
    if index == 5:
        break;
    print(post)
    print("")
    index += 1

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']

['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']

['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']

['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']

['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']



### Splitting the Data Up

We will now split the data so that we can focus on posts specifically dealing with Ask HN and Show HN posts.

In [4]:
ask_posts = []
show_posts = []
other_posts = []

for post in hn:
    title = post[1]
    
    if title.lower().startswith("ask hn"):
        ask_posts.append(post)
    elif title.lower().startswith("show hn"):
        show_posts.append(post)
    else:
        other_posts.append(post)

print("Number of Ask HN posts: ")
print(len(ask_posts))
print("\n")

print("Number of Show HN posts: ")
print(len(show_posts))
print("\n")

print("Number of other posts: ")
print(len(other_posts))

Number of Ask HN posts: 
1744


Number of Show HN posts: 
1162


Number of other posts: 
17194


As can be seen, the number of Ask and Show HN posts are overshadowed by other posts not relating to those two topics.

This gives us a succint, yet plentiful amount of data to work with to answer our questions.

## Question 1:Do Ask or Show HN have more comments on average?

In [13]:
total_ask_comments = 0

for post in ask_posts:
    comments = post[4]
    comments = int(comments)
    total_ask_comments += comments

avg_ask_comments = total_ask_comments / len(ask_posts)
avg_ask = "Average number of comments on Ask HN posts: {:.2f}".format(avg_ask_comments)
print(avg_ask)

Average number of comments on Ask HN posts: 14.04


In [14]:
total_show_comments = 0

for post in show_posts:
    comments = post[4]
    comments = int(comments)
    total_show_comments += comments

avg_show_comments = total_show_comments / len(show_posts)
avg_show = "Average number of comments on Show HN posts: {:.2f}".format(avg_show_comments)
print(avg_show)

Average number of comments on Show HN posts: 10.32


From our output here, we can determine that the Ask HN posts get more comments on average than the Show HN posts do.

This intuitively makes sense because Ask HN posts are all about getting answers to your specific questions. The point of those posts is to receive your answers through comments on the post. 

On the contrary, Show HN posts have less comments because an author is only posting about their work; they would love feedback, but the purpose of their post was not to receive comments, but attention through upvotes and views.

## Focusing on Ask HN: What timeframe has the most comments on average?

### Creating the timeframes by hour

Now that we know that Ask HN posts have, on average, more comments than Show HN posts, we will put all of our focus on those. 

We want to determine during what hour during the day that the most amount of comments on average are posted in Ask HN posts. To do this, we'll need to create datetime objects (which are either a date and time, just a date, or just a time). After that, we'll use them as keys in a dictionary and create a frequency table of how many comments during each hour.

In [19]:
import datetime as dt

result_list = []

for post in ask_posts:
    comments = post[4]
    comments = int(comments)
    result_list.append([post[6], comments])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = row[0]
    date = dt.datetime.strptime(date, "%m/%d/%Y %H:%M")
    hour = date.strftime("%H")
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

In [24]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])
    
for post in avg_by_hour:
    print(post)

['22', 6.746478873239437]
['23', 7.985294117647059]
['21', 16.009174311926607]
['08', 10.25]
['18', 13.20183486238532]
['19', 10.8]
['16', 16.796296296296298]
['20', 21.525]
['00', 8.127272727272727]
['04', 7.170212765957447]
['14', 13.233644859813085]
['01', 11.383333333333333]
['05', 10.08695652173913]
['06', 9.022727272727273]
['09', 5.5777777777777775]
['10', 13.440677966101696]
['03', 7.796296296296297]
['13', 14.741176470588234]
['15', 38.5948275862069]
['11', 11.051724137931034]
['12', 9.41095890410959]
['07', 7.852941176470588]
['02', 23.810344827586206]
['17', 11.46]


With this, we have created a list of lists that displays the hour of a post, followed by the average number of comments on posts posted during that hour. This is easy to follow, but it would be much more readable if they were sorted. We'll first swap the contents of the *avg_by_hour* list of lists and put it into a new list of lists.

In [26]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
print(swap_avg_by_hour)

[[6.746478873239437, '22'], [7.985294117647059, '23'], [16.009174311926607, '21'], [10.25, '08'], [13.20183486238532, '18'], [10.8, '19'], [16.796296296296298, '16'], [21.525, '20'], [8.127272727272727, '00'], [7.170212765957447, '04'], [13.233644859813085, '14'], [11.383333333333333, '01'], [10.08695652173913, '05'], [9.022727272727273, '06'], [5.5777777777777775, '09'], [13.440677966101696, '10'], [7.796296296296297, '03'], [14.741176470588234, '13'], [38.5948275862069, '15'], [11.051724137931034, '11'], [9.41095890410959, '12'], [7.852941176470588, '07'], [23.810344827586206, '02'], [11.46, '17']]


Now, we will sort them. The reason for the swap was so that the sorted function would sort by comments instead of by the hour.

In [31]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print("Top 5 Hours for Ask HN Posts Comments")

for avg, hour in sorted_swap[:5]:
    date = dt.datetime.strptime(hour, "%H")
    hour_of_post = date.strftime("%H:%M")
    print_string = "{}: {:.2f} average comments per post".format(hour_of_post, avg)
    print(print_string)

Top 5 Hours for Ask HN Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


Given that I live in Eastern Time in the US, these times represent the exact hours to post an Ask HN post if you want to get a lot of responses. Listed above are the top 5 times. 

The best time to post in EST is at 3:00 P.M. Here is the same best time to post, but in other time zones:

- **China**: 3:00 AM
- **India**: 12:30 AM
- **France**: 9:00 PM
- **UK**: 8:00 PM
- **Australia**: 5:00 AM

# Conclusion: Ask Hacker News your Question at 3:00 PM!

We began this project by describing the fields of the data and providing a couple of questions to be answered. After that, we cleaned the data by separating three types of posts on Hacker News from each other:

1. Ask HN posts
2. Show HN posts
3. Other posts

From there, we analyzed the average number of comments posted on Ask HN and Show HN posts to determine which one had the higher average. We came to the conclusion that Ask HN posts had more participation.

After that, we focused on the Ask HN posts and granulated further by trying to figure out during what hour do posts receive the most amount of comments on average. We determined that at and around 3:00 PM EST to 4:00 PM EST, the most amount of comments on average were posted to Ask HN posts.

I hope you find this project useful! If you ever have a question to ask Hacker News, now you know when the best time to post would be!

Other questions we could explore would be:

1. Do Show or Ask HN posts receive more points on average?
2. During what hour timeframe do posts receive the most points on average?
3. What author has posted the most during what hour?