# Project Title: Exploring Hacker News Posts


Hacker News (HN) is site that allows users to post news articles mainly about technology. Users can reply to post in form of comments and rate posts by voting. 

Users can ask the HN community questions using the post title 'Ask HN'. Also, users can show members of the HN community a project, product or anything interesting with post title 'Show HN'.

The goal of this project is to (1) find out which of these two post headings has the most comments on average (2) find out if certain posts receive more comments because of the time it was posted. 

## Exploring the data 

In [1]:
#Read the dataset csv file and create list of lists
from csv import reader
open_file = open('hacker_news.csv')
read_file = reader(open_file)
hn = list(read_file)

In [2]:
#Print the first five rows of the dataset
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


## Removing Headers from a List of Lists

In [3]:
#Extract first row of the dataset
headers = hn[0]

In [4]:
#Remove first row from the dataset
del hn[0]

In [5]:
#Print first five row to confirm headers have been removed
print(hn[:5])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


## Extracting Ask HN and Show HN Posts


In [6]:
#Create three empty lists
ask_posts = []
show_posts = []
other_posts = []

In [7]:
#Iterate over hn dataset to separate posts in a list
for row in hn:
    title = row[1]
    lower_title = title.lower()
    if lower_title.startswith('ask hn'):
        ask_posts.append(row)
    elif lower_title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)


In [8]:
#Print the number of posts in each list
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


## Calculating the Average Number of Comments for Ask HN and Show HN Posts

In [9]:
#Initialize total number of ask posts comments
total_ask_comments = 0

In [10]:
#Iterate over ask post comment to extract number of comments
for row in ask_posts:
    total_ask_comments += int(row[4])

In [11]:
#Compute average number of ask posts comment
avg_ask_comments = total_ask_comments / len(ask_posts)

In [12]:
#Print average number of ask posts comment
print(avg_ask_comments)

14.038417431192661


In [13]:
#Initialize total number of show posts comments
total_show_comments = 0

In [14]:
#Iterate over show post comment to extract number of comments
for row in show_posts:
    total_show_comments += int(row[4])

In [15]:
#Compute average number of show posts comment
avg_show_comments = total_show_comments / len(show_posts)

In [16]:
#Print average number of show posts comment
print(avg_show_comments)

10.31669535283993


Ask posts receive more comments on average because users are more likely to provide answers to technology related questions than show their products or research work. Also, another reason why ask post receives more comments on average is because one comment can spark an arguement that will invite many users to contribute.

## Finding the Number of Ask Posts and Comments by Hour Created

In [151]:
#Import datetime module
import datetime as dt

In [152]:
#create an empty list to store lists 
result_list = []

In [153]:
#Iterate over ask_posts to extract number of comments and date created
for row in ask_posts: 
    result_list.append([row[6], int(row[4])])

In [154]:
#Create two empty dictionaries 
counts_by_hour = {}
comments_by_hour = {}

In [156]:
#Iterate over result list
for row in result_list:
    date_time = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = date_time.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    elif hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

## Calculating the Average Number of Comments for Ask HN Posts by Hour

In [55]:
#Create empty list for average nummber of comments
avg_by_hour = []

In [56]:
#Iterate over dictionary to calculate average number of commments
for key in comments_by_hour:
    
    avg_by_hour.append([key,  comments_by_hour[key] / counts_by_hour[key]])

In [57]:
#Print average nummber of comments
print(avg_by_hour)

[['09', 5.5777777777777775], ['13', 14.741176470588234], ['10', 13.440677966101696], ['14', 13.233644859813085], ['16', 16.796296296296298], ['23', 7.985294117647059], ['12', 9.41095890410959], ['17', 11.46], ['15', 38.5948275862069], ['21', 16.009174311926607], ['20', 21.525], ['02', 23.810344827586206], ['18', 13.20183486238532], ['03', 7.796296296296297], ['05', 10.08695652173913], ['19', 10.8], ['01', 11.383333333333333], ['22', 6.746478873239437], ['08', 10.25], ['04', 7.170212765957447], ['00', 8.127272727272727], ['06', 9.022727272727273], ['07', 7.852941176470588], ['11', 11.051724137931034]]


## Sorting and Printing Values from a List of Lists

In [58]:
#Create an empty list to store swapped columns
swap_avg_by_hour = []

In [59]:
#Iterate over list to create new list with swapped columns
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

In [61]:
#Print new list with swapped columns
print(swap_avg_by_hour)

[[5.5777777777777775, '09'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [16.796296296296298, '16'], [7.985294117647059, '23'], [9.41095890410959, '12'], [11.46, '17'], [38.5948275862069, '15'], [16.009174311926607, '21'], [21.525, '20'], [23.810344827586206, '02'], [13.20183486238532, '18'], [7.796296296296297, '03'], [10.08695652173913, '05'], [10.8, '19'], [11.383333333333333, '01'], [6.746478873239437, '22'], [10.25, '08'], [7.170212765957447, '04'], [8.127272727272727, '00'], [9.022727272727273, '06'], [7.852941176470588, '07'], [11.051724137931034, '11']]


In [64]:
#Sort new list in descending order
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

In [65]:
#Print string 
print('Top 5 Hours for Ask Posts Comments')

Top 5 Hours for Ask Posts Comments


In [157]:
for avg, hour in sorted_swap[:5]:
    time = dt.datetime.strptime(hour,'%H')
    f_time = time.strftime('%H:%M')
    print(f_time + ': {:.2f} average comments per post'.format(avg))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


From the analysis, posts created at day time tend to receive the highest number of comments. To receive a high number of comments, its best to create a post in the day time at 3pm.