## Exploring Hacker News Posts

#### Introduction

This project aims to extract and compare the engagement of two Hacker News Posts topics, namely, **Ask HN** and **Show HN** posts.

The following concepts were used in this project:

- Concepts on Functions
- String functions
- Datetime functions

#### Reading the dataset

In [1]:
from csv import reader

def read_csv(filename):
    opened_file = open(filename, encoding="utf8")
    read_file = reader(opened_file)
    csv_file = list(read_file)
    return csv_file[0], csv_file[1:]

header, hn = read_csv("hacker_news.csv") 

for row in hn[:5]:
    print(row)
    print("\n")

['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26']


['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24']


['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19']


['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16']


['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']




#### Extracting Hacker News Posts

In [2]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
message = "There are {} posts for {} posts"

print(message.format(len(ask_posts), "Ask HN"))
print(message.format(len(show_posts), "Show HN"))
print(message.format(len(other_posts), "other"))

There are 9139 posts for Ask HN posts
There are 10158 posts for Show HN posts
There are 273822 posts for other posts


In [3]:
print(ask_posts[:5])

[['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53'], ['12578522', 'Ask HN: How do you pass on your work when you die?', '', '6', '3', 'PascLeRasc', '9/26/2016 1:17'], ['12577908', 'Ask HN: How a DNS problem can be limited to a geographic region?', '', '1', '0', 'kuon', '9/25/2016 22:57'], ['12577870', 'Ask HN: Why join a fund when you can be an angel?', '', '1', '3', 'anthony_james', '9/25/2016 22:48'], ['12577647', 'Ask HN: Someone uses stock trading as passive income?', '', '5', '2', '00taffe', '9/25/2016 21:50']]


In [4]:
print(show_posts[:5])

[['12578335', 'Show HN: Finding puns computationally', 'http://puns.samueltaylor.org/', '2', '0', 'saamm', '9/26/2016 0:36'], ['12578182', 'Show HN: A simple library for complicated animations', 'https://christinecha.github.io/choreographer-js/', '1', '0', 'christinecha', '9/26/2016 0:01'], ['12578098', 'Show HN: WebGL visualization of DNA sequences', 'http://grondilu.github.io/dna.html', '1', '0', 'grondilu', '9/25/2016 23:44'], ['12577991', 'Show HN: Pomodoro-centric, heirarchical project management with ES6 modules', 'https://github.com/jakebian/zeal', '2', '0', 'dbranes', '9/25/2016 23:17'], ['12577142', 'Show HN: Jumble  Essays on the go #PaulInYourPocket', 'https://itunes.apple.com/us/app/jumble-find-startup-essay/id1150939197?ls=1&mt=8', '1', '1', 'ryderj', '9/25/2016 20:06']]


#### Computing for the Average Number of Comments per Post

##### Ask Hacker News Posts

In [5]:
total_ask_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)

comments_message = "On average, there are ~{} comments in {}."
print(comments_message.format(round(avg_ask_comments,), "Ask HN posts"))

On average, there are ~10 comments in Ask HN posts.


##### Show Hacker News Posts

In [6]:
total_show_comments = 0

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)
print(comments_message.format(round(avg_show_comments), "Show HN posts"))

On average, there are ~5 comments in Show HN posts.


Findings above show that Ask Hacker News posts tend to have more engagements (comments) on average compared to Show Hacker News posts.

#### Determining the Number of Engagements per Hour for Ask Posts

In [7]:
import datetime as dt

result_list = []

for row in ask_posts:
    created_at = row[6]
    comments = int(row[4])
    result_list.append([created_at, comments])
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    hr = row[0]
    hr = dt.datetime.strptime(hr, "%m/%d/%Y %H:%M")
    hr = dt.datetime.strftime(hr, "%H")
    if hr not in counts_by_hour:
        counts_by_hour[hr] = 1
        comments_by_hour[hr] = row[1]
    else:
        counts_by_hour[hr] += 1
        comments_by_hour[hr] += row[1]

In [8]:
counts_by_hour

{'02': 269,
 '01': 282,
 '22': 383,
 '21': 518,
 '19': 552,
 '17': 587,
 '15': 646,
 '14': 513,
 '13': 444,
 '11': 312,
 '10': 282,
 '09': 222,
 '07': 226,
 '03': 271,
 '23': 343,
 '20': 510,
 '16': 579,
 '08': 257,
 '00': 301,
 '18': 614,
 '12': 342,
 '04': 243,
 '06': 234,
 '05': 209}

In [9]:
comments_by_hour

{'02': 2996,
 '01': 2089,
 '22': 3372,
 '21': 4500,
 '19': 3954,
 '17': 5547,
 '15': 18525,
 '14': 4972,
 '13': 7245,
 '11': 2797,
 '10': 3013,
 '09': 1477,
 '07': 1585,
 '03': 2154,
 '23': 2297,
 '20': 4462,
 '16': 4466,
 '08': 2362,
 '00': 2277,
 '18': 4877,
 '12': 4234,
 '04': 2360,
 '06': 1587,
 '05': 1838}

#### Determining the Average Number of Comments per Hour

In [10]:
avg_by_hour = []

for hr in comments_by_hour:
    avg_comments = comments_by_hour[hr]/counts_by_hour[hr]
    avg_by_hour.append([hr, avg_comments])

In [11]:
avg_by_hour

[['02', 11.137546468401487],
 ['01', 7.407801418439717],
 ['22', 8.804177545691905],
 ['21', 8.687258687258687],
 ['19', 7.163043478260869],
 ['17', 9.449744463373083],
 ['15', 28.676470588235293],
 ['14', 9.692007797270955],
 ['13', 16.31756756756757],
 ['11', 8.96474358974359],
 ['10', 10.684397163120567],
 ['09', 6.653153153153153],
 ['07', 7.013274336283186],
 ['03', 7.948339483394834],
 ['23', 6.696793002915452],
 ['20', 8.749019607843136],
 ['16', 7.713298791018998],
 ['08', 9.190661478599221],
 ['00', 7.5647840531561465],
 ['18', 7.94299674267101],
 ['12', 12.380116959064328],
 ['04', 9.7119341563786],
 ['06', 6.782051282051282],
 ['05', 8.794258373205741]]

#### Sorting the Average Number of Comments per Hour

In [12]:
swap_avg_by_hour = []

for hr in avg_by_hour:
    swap_avg_by_hour.append([hr[1], hr[0]])
    
print(swap_avg_by_hour)

[[11.137546468401487, '02'], [7.407801418439717, '01'], [8.804177545691905, '22'], [8.687258687258687, '21'], [7.163043478260869, '19'], [9.449744463373083, '17'], [28.676470588235293, '15'], [9.692007797270955, '14'], [16.31756756756757, '13'], [8.96474358974359, '11'], [10.684397163120567, '10'], [6.653153153153153, '09'], [7.013274336283186, '07'], [7.948339483394834, '03'], [6.696793002915452, '23'], [8.749019607843136, '20'], [7.713298791018998, '16'], [9.190661478599221, '08'], [7.5647840531561465, '00'], [7.94299674267101, '18'], [12.380116959064328, '12'], [9.7119341563786, '04'], [6.782051282051282, '06'], [8.794258373205741, '05']]


#### Which hours should you create an ask post in HN to have a higher chance of receiving comments?

In [13]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print("Top 5 Hours for Ask Posts Comments")

top_hrs_message = "{}:00 {:.2f} average comments per post"
for sorted_avg in sorted_swap[:5]:
    print(top_hrs_message.format(sorted_avg[1], sorted_avg[0]))

Top 5 Hours for Ask Posts Comments
15:00 28.68 average comments per post
13:00 16.32 average comments per post
12:00 12.38 average comments per post
02:00 11.14 average comments per post
10:00 10.68 average comments per post


Based on the results above, the user should make an ask post in Hacker News at around 3PM U.S. Eastern Time. Based on the world timezones, this is 7PM Greenwhich Mean Time.

Other recommended Eastern Times are either early in the morning 2AM U.S. Eastern Time and midday (10-1 PM).

### Additional Analysis

#### Determining if show or ask points receive more points on the average

##### Ask Hacker News Posts

In [14]:
total_ask_points = 0

for row in ask_posts:
    num_points = int(row[3])
    total_ask_points += num_points
    
avg_ask_points = total_ask_points / len(ask_posts)

points_message = "On average, there are ~{} points in {}."
print(points_message.format(round(avg_ask_points), "Ask HN posts"))

On average, there are ~11 points in Ask HN posts.


##### Show Hacker News Posts

In [15]:
total_show_points = 0

for row in show_posts:
    num_points = int(row[3])
    total_show_points += num_points
    
avg_show_points = total_show_points / len(show_posts)
print(points_message.format(round(avg_show_points), "Show HN posts"))

On average, there are ~15 points in Show HN posts.


Findings above show that Show Hacker News posts tend to have more reactions (votes) on average compared to Ask Hacker News posts.

#### Which hours are more likely for posts to receive more points?

In [16]:
import datetime as dt

result_list = []

for row in ask_posts:
    created_at = row[6]
    points = int(row[3])
    result_list.append([created_at, points])
    
counts_by_hour = {}
points_by_hour = {}

for row in result_list:
    hr = row[0]
    hr = dt.datetime.strptime(hr, "%m/%d/%Y %H:%M")
    hr = dt.datetime.strftime(hr, "%H")
    if hr not in counts_by_hour:
        counts_by_hour[hr] = 1
        points_by_hour[hr] = row[1]
    else:
        counts_by_hour[hr] += 1
        points_by_hour[hr] += row[1]
        
points_list = []

for points in points_by_hour:
    points_list.append([points_by_hour[points], points])

swapped_points_per_hour = sorted(points_list, reverse=True)
swapped_points_per_hour[:5]

[[13978, '15'], [7962, '13'], [7155, '17'], [6850, '18'], [5970, '16']]

Show posts which are created in the afternoon (1PM - 6PM U.S. Eastern Times) are more likely to receive more points compared to other times of the day.

##### Other Posts' Average Number of Comments

In [17]:
total_other_comments = 0

for row in other_posts:
    num_comments = int(row[4])
    total_other_comments += num_comments
    
avg_other_comments = total_other_comments / len(other_posts)
print(comments_message.format(round(avg_other_comments), "Other HN posts"))

On average, there are ~6 comments in Other HN posts.


Other posts tend to have slightly higher number of engagements (comments) compared to Show Hacker News posts but less than the Ask Posts.

##### Other Posts' Average Number of Points

In [19]:
total_other_points = 0

for row in other_posts:
    num_points = int(row[3])
    total_other_points += num_points
    
avg_total_points = total_other_points / len(other_posts)
print(points_message.format(round(avg_total_points), "Other HN posts"))

On average, there are ~15 points in Other HN posts.


Other posts also tend to have a similar number of points with Show HN posts.