# Hacker News
In this project, we'll work with a data set of submissions to popular technology site [Hacker News](https://news.ycombinator.com/).

![](https://s3.amazonaws.com/dq-content/354/hacker_news.jpg)

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") are voted and commented upon, similar to reddit.

Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as a result.


In [1]:
from csv import reader
openend_file=open("hacker_news.csv")
read_file=reader(openend_file)
hn=list(read_file)

print(hn[0:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


Removing the header row

In [2]:
headers=hn[0]
hn=hn[1:]
print(headers)
print("")
print(hn[0:5])


['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


Since we're only concerned with post titles beginning with __Ask HN__ or __Show HN__, we'll create new lists of lists containing just the data for those titles.

In [6]:
ask_posts=[]
show_posts=[]
other_posts=[]
for item in hn:
    title=item[1]
    if (title.lower()).startswith("ask hn"):
        ask_posts.append(item)
    elif (title.lower()).startswith("show hn"):
        show_posts.append(item)
    else:
        other_posts.append(item)
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))


1744
1162
17194


 determine if ask posts or show posts receive more comments on average.
 

In [7]:
total_ask_comments=0
for item in ask_posts:
    num_comments=item[4]
    total_ask_comments+=int(num_comments)
average_ask_comments=  total_ask_comments/len(ask_posts) 

total_show_comments=0
for item in show_posts:
    num_comments=item[4]
    total_show_comments+=int(num_comments)
average_show_comments=  total_show_comments/len(show_posts) 

print(average_ask_comments)
print(average_show_comments)


14.038417431192661
10.31669535283993


The average number of comments on the ask hn posts are slightly(3.72) higher than the comments on show hn posts

Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts.

we'll determine if ask posts created at a certain time are more likely to attract comments. We'll use the following steps to perform this analysis:

1. Calculate the amount of ask posts created in each hour of the day, along with the number of comments received.
2. Calculate the average number of comments ask posts receive by hour created.

In [15]:
import datetime as dt
result_list=[]
for item in ask_posts:
    created=item[6]
    comment=int(item[4])
    result_list.append([created,comment])
counts_by_hour={}
comments_by_hour={}
for item in result_list:
    hour=item[0]
    hour=dt.datetime.strptime(hour,"%m/%d/%Y %H:%M")
    hour=hour.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour]=1
        comments_by_hour[hour]=item[1]
    elif hour in counts_by_hour:
        counts_by_hour[hour]+=1
        comments_by_hour[hour]+=item[1]

        
print(counts_by_hour) 
print(comments_by_hour) 



{'09': 45, '20': 80, '02': 58, '21': 109, '19': 110, '07': 34, '16': 108, '13': 85, '23': 68, '15': 116, '22': 71, '10': 59, '00': 55, '06': 44, '11': 58, '14': 107, '03': 54, '05': 46, '18': 109, '12': 73, '08': 48, '01': 60, '17': 100, '04': 47}
{'09': 251, '20': 1722, '02': 1381, '21': 1745, '19': 1188, '07': 267, '16': 1814, '13': 1253, '23': 543, '15': 4477, '22': 479, '10': 793, '00': 447, '06': 397, '11': 641, '14': 1416, '03': 421, '05': 464, '18': 1439, '12': 687, '08': 492, '01': 683, '17': 1146, '04': 337}


__counts_by_hour__: contains the number of ask posts created during each hour of the day.

__comments_by_hour__: contains the corresponding number of comments ask posts created at each hour received.

we'll use these two dictionaries to calculate the average number of comments for posts created during each hour of the day.

In [17]:
avg_by_hour=[]
for hour in comments_by_hour:
    avg_by_hour.append([hour,(comments_by_hour[hour]/counts_by_hour[hour])])
print(avg_by_hour)                       

[['09', 5.5777777777777775], ['20', 21.525], ['02', 23.810344827586206], ['21', 16.009174311926607], ['19', 10.8], ['07', 7.852941176470588], ['16', 16.796296296296298], ['13', 14.741176470588234], ['23', 7.985294117647059], ['15', 38.5948275862069], ['22', 6.746478873239437], ['10', 13.440677966101696], ['00', 8.127272727272727], ['06', 9.022727272727273], ['11', 11.051724137931034], ['14', 13.233644859813085], ['03', 7.796296296296297], ['05', 10.08695652173913], ['18', 13.20183486238532], ['12', 9.41095890410959], ['08', 10.25], ['01', 11.383333333333333], ['17', 11.46], ['04', 7.170212765957447]]


Let's finish by sorting the list of lists and printing the five highest values in a format that's easier to read.


In [19]:
swap_avg_by_hour=[]
for item in avg_by_hour:
    swap_avg_by_hour.append([item[1],item[0]])
print( swap_avg_by_hour)   
sorted_swap=sorted(swap_avg_by_hour,reverse=True)
print("Top 5 Hours for Ask Posts Comments")
for item in swap_avg_by_hour[0:5]:
    line="{}:00:  {:.2f} average comments per post".format(item[1],item[0])
    print(line)

[[5.5777777777777775, '09'], [21.525, '20'], [23.810344827586206, '02'], [16.009174311926607, '21'], [10.8, '19'], [7.852941176470588, '07'], [16.796296296296298, '16'], [14.741176470588234, '13'], [7.985294117647059, '23'], [38.5948275862069, '15'], [6.746478873239437, '22'], [13.440677966101696, '10'], [8.127272727272727, '00'], [9.022727272727273, '06'], [11.051724137931034, '11'], [13.233644859813085, '14'], [7.796296296296297, '03'], [10.08695652173913, '05'], [13.20183486238532, '18'], [9.41095890410959, '12'], [10.25, '08'], [11.383333333333333, '01'], [11.46, '17'], [7.170212765957447, '04']]
Top 5 Hours for Ask Posts Comments
09:00:  5.58 average comments per post
20:00:  21.52 average comments per post
02:00:  23.81 average comments per post
21:00:  16.01 average comments per post
19:00:  10.80 average comments per post


Posts which are posted between 02:00-03:00 hrs records the highest number of average comments per post