# Whats the best way and time to get comments on Hacker News?

The goal of this project is to see whats the best way to get comments and when is the best time to post to receive comments.  We take a look at the Ask HN and Show HN and see which ones receive more comments.  Then we take a look at what times comments are made and see when people are most active.  This will help give us the best way to get feedback when using Hacker News for any questions we have.  You can download load the file [here]('https://dq-content.s3.amazonaws.com/356/hacker_news.csv').

In [1]:
# Importing csv file and looking at the top five rows
from csv import reader

opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [2]:
# Getting header from data
header = hn[0]

# Removing header from data
hn = hn[1:]
hn[:5]

[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

## Separating the Ask and Show post from the dataset

In [3]:
# Creating empty list for each type of post
ask_posts = []
show_posts = []
other_posts = []

# Getting number of post for each type
for x in hn:
    title = x[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(x)
    elif title.startswith('show hn'):
        show_posts.append(x)
    else:
        other_posts.append(x)
        
print('The Number of Ask Posts:', len(ask_posts))
print('The Number of Show Posts:', len(show_posts))
print('The Number of Other Posts:', len(other_posts))

The Number of Ask Posts: 1744
The Number of Show Posts: 1162
The Number of Other Posts: 17194


We can see once the post have been sorted we get a handful of Ask and Show posts out of all posts from the dataset.  Even though our data is greatly reduced this gives us a better insight.  The other post are random and can be about anything so it will be very difficult and time consuming to try and sort then into categories. 

## Getting the Average number of comments for Ask and Show

In [4]:
# Creating variables to get the number of comments for ask and show post
total_ask_comments = 0
total_show_comments = 0

# Getting the sum of comments for each type of post
for x in ask_posts:
    total_ask_comments += int(x[4])
    
for x in show_posts:
    total_show_comments += int(x[4])

# Getting the average comment per post
avg_ask_comments = total_ask_comments/len(ask_posts)
avg_show_comments = total_show_comments/len(show_posts)

print('The average number of comments for ask post:',round(avg_ask_comments,2))
print('The average number of comments for show post:', round(avg_show_comments,2))

The average number of comments for ask post: 14.04
The average number of comments for show post: 10.32


We can see on average that Ask posts get more comments then Show post.  If a user wanted to get a conversation going then Asking would be the way to go.  The Show post also gets a good amount of comments as well so its not to say using a Show post won't get you any comments.

## What Hours do Post get the Most Comments 

Now we will take a look at the Ask posts and see what time they get the most comments on average.

In [5]:
import datetime as dt

# Creating empty list to get time and number of comments
result_list = []

# Creating a list for time created and number of comments and appending to list
for x in ask_posts:
    created_at = x[6]
    num_of_comments = int(x[4])
    result_list.append([created_at,num_of_comments])

In [6]:
# Creating empty dict
counts_by_hour = {}
comments_by_hour = {}

# Getting number of comments for every hour
for x in result_list:
    # Converting time to datetime and selecting the hour
    date = x[0]
    date_time = dt.datetime.strptime(date,'%m/%d/%Y %H:%M')
    hour = date_time.strftime('%H')
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = x[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += x[1]
        


In [7]:
avg_by_hour = []

# Getting average number of comments per hour
for x in counts_by_hour:
    avg = round((comments_by_hour[x]/counts_by_hour[x]),2)
    avg_by_hour.append([avg,x])
    
avg_by_hour.sort(reverse = True)

In [8]:
print('Top 5 Hours for Ask Posts Comments')
for x in avg_by_hour[:5]:
    time = dt.datetime.strptime(x[1],'%H')
    time = time.strftime('%H:%M')
    print(time,':',x[0])

Top 5 Hours for Ask Posts Comments
15:00 : 38.59
02:00 : 23.81
20:00 : 21.52
16:00 : 16.8
21:00 : 16.01


Looking at the Top 5 hours we can see that the best time your Ask post will get comments is at 3 pm.  People are very active between 3-5 pm and again at 8-10 pm.  So these will be the target times to make your Ask post to get people to comment on your post.

## Getting Average number of Points per Ask and Show Post

In [9]:
# Creating variable to get the value of points
total_ask_points = 0
total_show_points = 0

# Getting the total sum of points
for x in ask_posts:
    total_ask_points += int(x[3])
    
for x in show_posts:
    total_show_points += int(x[3])

# Getting the average number of points per post
avg_ask_points = total_ask_points/len(ask_posts)
avg_show_points = total_show_points/len(show_posts)

print('The average number of points for ask post:',round(avg_ask_points,2))
print('The average number of points for show post:', round(avg_show_points,2))

The average number of points for ask post: 15.06
The average number of points for show post: 27.56


Now looking at the average number of points per post, we can see that Show post have close to double compared to Ask post.  This tells us how popular a post is.  We can see that Show posts are more popular then the Ask posts.

## What hours do Show post get the Most Points

Since Show post got more points on average, we will take a closer look to see what times they get the most points.

In [10]:
# Creating empty list for time and points for show post
show_points = []

# Getting the time and points for each show post and appending to show_points
for x in show_posts:
    created_at = x[6]
    num_of_points = int(x[3])
    show_points.append([created_at,num_of_points])

In [11]:
# Creating empty dicts
counts_show = {}
points_show = {}

# Getting number of points per hour
for x in show_points:
    # Converting time to a datetime and getting hour
    date = x[0]
    date_time = dt.datetime.strptime(date,'%m/%d/%Y %H:%M')
    hour = date_time.strftime('%H')
    if hour not in counts_show:
        counts_show[hour] = 1
        points_show[hour] = x[1]
    else:
        counts_show[hour] += 1
        points_show[hour] += x[1]

In [12]:
avg_show = []

# Getting average points per hour
for x in counts_show:
    avg = round((points_show[x]/counts_show[x]),2)
    avg_show.append([avg,x])
    
avg_show.sort(reverse = True)

In [13]:
print('Top 5 Hours for Show Posts based on Points')
for x in avg_show[:5]:
    time = dt.datetime.strptime(x[1],'%H')
    time = time.strftime('%H:%M')
    print(time,':',x[0])

Top 5 Hours for Show Posts based on Points
23:00 : 42.39
12:00 : 41.69
22:00 : 40.35
00:00 : 37.84
18:00 : 36.31


Looking at the top averages we can see that at 11 pm Show post get the most points.  It looks like people are most active between 10pm and 12am.  They are most likely just reading post on the site right before bed. Noon is another good time to get the most points.  Most working adults are taking lunch at this time and are just trying to relax for a moment by reading show posts before having to go back to work.

## Conclusion

Based on the averages from comment and points, I would recommend using Show HM to get response to your posts.  On average you will get 40% less comments on your post but your post will get 83% more upvotes on it.  This trade off is more favorable due to the post getting more views and while still getting a good comment response.