# Hacker News Post Evaluation
We will evaluate posts from Hacker News, specifically focusing on those posts defined as either Ask Hacker News or Show Hacker News. We will compare the popularity of those. Additional analysis will be performed on post time to determine if a certain time results in more popular posts. 

In [1]:
from csv import reader

opened_file = open('hacker_news.csv')

hn = reader(opened_file)

hn = list(hn)

print(hn[:4])


[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']]


In [2]:
headers = hn[0] #saving a separate headers list

In [3]:
hn = hn[1:] #removing header from the data 

print(hn[0:4])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [4]:
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [5]:
print(hn[:4])

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


In [6]:
ask_posts = []
show_posts = []
other_posts = []

In [7]:
#Startwith is case sensi so lower accounts for that
#this surprisingly worked?
for row in hn:
    title = row[1]
    if (title.lower()).startswith('ask hn'):
        ask_posts.append(title)
    elif (title.lower()).startswith('show hn'):
        show_posts.append(title)
    else:
        other_posts.append(title)

In [8]:
print(len(ask_posts), len(show_posts), len(other_posts))

1744 1162 17194


In [9]:
print(ask_posts[:4])

['Ask HN: How to improve my personal website?', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', 'Ask HN: Aby recent changes to CSS that broke mobile?', 'Ask HN: Looking for Employee #3 How do I do it?']


### Starting analysis on the 'ask posts' content 

In [10]:
total_ask_comments = 0 

In [11]:
for row in hn:
    title = row[1]
    if (title.lower()).startswith('ask hn'):
        total_ask_comments += int(row[4])

In [12]:
print(total_ask_comments)

24483


In [13]:
#determining the avg comment/post
avg_ask_comments = total_ask_comments/len(ask_posts)

In [14]:
print(avg_ask_comments)

14.038417431192661


### Starting analysis on the 'show posts'

In [15]:
total_show_comments = 0 

In [16]:
for row in hn:
    title = row[1]
    if (title.lower()).startswith('show hn'):
        total_show_comments += int(row[4])

In [17]:
print(total_show_comments)

11988


In [18]:
#determining the avg comment/post
avg_show_comments = total_show_comments/len(show_posts)

In [19]:
print(avg_show_comments)

10.31669535283993


Ask posts receive more comments at roughly 7 to 5 comments per post type comparatively. Ask posts may get more comments because they actively seek feedback. However, I'm hesitant to believe an average fully because it can be so easily skewed. A median may be more insightful. 

#### Calcutating the post frequencies and comment amounts on an hourly basis 

In [20]:
ask_posts_all =[]

In [21]:
for row in hn:
    title = row[1]
    if (title.lower()).startswith('ask hn'):
        ask_posts_all.append(row[:])
        

In [22]:
print(ask_posts_all[:4])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20']]


In [23]:
import datetime as dt

In [24]:
result_list = []

In [25]:
counts_by_hour = {}

In [26]:
comments_by_hour = {}

In [27]:
for row in ask_posts_all:
    created_at = row[6]
    comments = row[4]
    result_list.append([created_at, comments])

In [28]:
print(result_list[:4])

[['8/16/2016 9:55', '6'], ['11/22/2015 13:43', '29'], ['5/2/2016 10:14', '1'], ['8/2/2016 14:20', '3']]


In [29]:
type(result_list[0])

list

In [30]:
for row in result_list:
    date_1_str = row[0]
    date_1_str = dt.datetime.strptime(date_1_str, 
                                      "%m/%d/%Y %H:%M")
    row[0] = date_1_str

In [31]:
for row in result_list:
    dt_obj = row[0]
    comments = int(row[1])
    hour = dt_obj.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments

In [32]:
## i think there's an issue with comment number being a str

In [33]:
print(comments_by_hour)

{'00': 447, '23': 543, '14': 1416, '02': 1381, '18': 1439, '10': 793, '09': 251, '01': 683, '20': 1722, '06': 397, '19': 1188, '22': 479, '11': 641, '21': 1745, '12': 687, '13': 1253, '03': 421, '07': 267, '04': 337, '05': 464, '16': 1814, '17': 1146, '08': 492, '15': 4477}


In [34]:
print(counts_by_hour)

{'00': 55, '23': 68, '14': 107, '02': 58, '18': 109, '10': 59, '09': 45, '01': 60, '20': 80, '06': 44, '19': 110, '22': 71, '11': 58, '21': 109, '12': 73, '13': 85, '03': 54, '07': 34, '04': 47, '05': 46, '16': 108, '17': 100, '08': 48, '15': 116}


In [35]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/
                           counts_by_hour[hour]])

In [36]:
print(avg_by_hour)

[['00', 8.127272727272727], ['23', 7.985294117647059], ['14', 13.233644859813085], ['02', 23.810344827586206], ['18', 13.20183486238532], ['10', 13.440677966101696], ['09', 5.5777777777777775], ['01', 11.383333333333333], ['20', 21.525], ['06', 9.022727272727273], ['19', 10.8], ['22', 6.746478873239437], ['11', 11.051724137931034], ['21', 16.009174311926607], ['12', 9.41095890410959], ['13', 14.741176470588234], ['03', 7.796296296296297], ['07', 7.852941176470588], ['04', 7.170212765957447], ['05', 10.08695652173913], ['16', 16.796296296296298], ['17', 11.46], ['08', 10.25], ['15', 38.5948275862069]]


In [37]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])

In [38]:
print(swap_avg_by_hour)

[[8.127272727272727, '00'], [7.985294117647059, '23'], [13.233644859813085, '14'], [23.810344827586206, '02'], [13.20183486238532, '18'], [13.440677966101696, '10'], [5.5777777777777775, '09'], [11.383333333333333, '01'], [21.525, '20'], [9.022727272727273, '06'], [10.8, '19'], [6.746478873239437, '22'], [11.051724137931034, '11'], [16.009174311926607, '21'], [9.41095890410959, '12'], [14.741176470588234, '13'], [7.796296296296297, '03'], [7.852941176470588, '07'], [7.170212765957447, '04'], [10.08695652173913, '05'], [16.796296296296298, '16'], [11.46, '17'], [10.25, '08'], [38.5948275862069, '15']]


In [39]:
sorted_swap = sorted(swap_avg_by_hour,reverse=True)

In [40]:
print("Top 5 Hours for Ask Posts Comments")

Top 5 Hours for Ask Posts Comments


In [43]:
for row in sorted_swap[:4]:
    print("{0}:00: {1:.2f} average comments per post".format(row[1], row[0]))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
