### Exploring Hacker News Posts

In this project, we would explore hacker news posts and analyze 'ask posts' and 'show post' that received comments, to determine which type of post and what time received the most comments on average.
Note: Dataset excluded posts with no comment.

In [1]:
from csv import reader
opened_file = open('hackernews.csv', encoding='utf8')
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12578822', 'Amazons Algorithms Dont Find You the Best Deals', 'https://www.technologyreview.com/s/602442/amazons-algorithms-dont-find-you-the-best-deals/', '1', '1', 'yarapavan', '9/26/2016 2:26'], ['12578694', 'Emergency dose of epinephrine that does not cost an arm and a leg', 'http://m.imgur.com/gallery/th6Ua', '2', '1', 'dredmorbius', '9/26/2016 1:54'], ['12578624', 'Phone Makers Could Cut Off Drivers. So Why Dont They?', 'http://www.nytimes.com/2016/09/25/technology/phone-makers-could-cut-off-drivers-so-why-dont-they.html', '4', '1', 'danso', '9/26/2016 1:37'], ['12578311', 'Americas Lost Boys: Men who choose video games over work', 'https://www.firstthings.com/blogs/firstthoughts/2016/08/americas-lost-boys', '5', '1', 'jseliger', '9/26/2016 0:31']]


In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12578822', 'Amazons Algorithms Dont Find You the Best Deals', 'https://www.technologyreview.com/s/602442/amazons-algorithms-dont-find-you-the-best-deals/', '1', '1', 'yarapavan', '9/26/2016 2:26'], ['12578694', 'Emergency dose of epinephrine that does not cost an arm and a leg', 'http://m.imgur.com/gallery/th6Ua', '2', '1', 'dredmorbius', '9/26/2016 1:54'], ['12578624', 'Phone Makers Could Cut Off Drivers. So Why Dont They?', 'http://www.nytimes.com/2016/09/25/technology/phone-makers-could-cut-off-drivers-so-why-dont-they.html', '4', '1', 'danso', '9/26/2016 1:37'], ['12578311', 'Americas Lost Boys: Men who choose video games over work', 'https://www.firstthings.com/blogs/firstthoughts/2016/08/americas-lost-boys', '5', '1', 'jseliger', '9/26/2016 0:31'], ['12578212', 'A Walking Tour of New Yorks Massive Surveillance Network', 'https://theintercept.com/2016/09/24/a-walking-tour-of-new-yorks-massive-surveilla

**Filtering The Data For Posts Titles Begining With 'Ask HN' or 'Show HN'**

In [3]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):      
        show_posts.append(row)
    else:
        other_posts.append(row)
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

6911
5059
68430


**First Five Rows in ask_posts**

In [4]:
print(ask_posts[:5])

[['12576946', 'Ask HN: How hard would it be to make a cheap, hackable phone?', '', '2', '1', 'hkt', '9/25/2016 19:30'], ['12573681', 'Ask HN: Where can I learn more about and contribute to the AI singularity?', '', '1', '1', 'DSteinmann', '9/25/2016 3:00'], ['12572353', 'Ask HN: Is Riak a viable alternative to Cassandra?', '', '5', '1', 'nvarsj', '9/24/2016 19:57'], ['12571744', 'Ask HN: What are the best (free if possible) Wordpress themes for coding blogs?', '', '2', '1', 'kexari', '9/24/2016 17:27'], ['12570947', "Ask HN: If you've successfully outsourced software dev work, how did you do it?", '', '3', '1', 'Mattasher', '9/24/2016 14:03']]


**First Five Rows in show_posts**

In [5]:
print(show_posts[:5])

[['12577142', 'Show HN: Jumble  Essays on the go #PaulInYourPocket', 'https://itunes.apple.com/us/app/jumble-find-startup-essay/id1150939197?ls=1&mt=8', '1', '1', 'ryderj', '9/25/2016 20:06'], ['12576813', 'Show HN: Learn Japanese Vocab via multiple choice questions', 'http://japanese.vul.io/', '1', '1', 'soulchild37', '9/25/2016 19:06'], ['12576090', 'Show HN: Markov chain Twitter bot. Trained on comments left on Pornhub', 'https://twitter.com/botsonasty', '3', '1', 'keepingscore', '9/25/2016 16:50'], ['12575471', 'Show HN: Project-Okot: Novel, CODE-FREE data-apps in mere seconds', 'https://studio.nuchwezi.com/', '3', '1', 'nfixx', '9/25/2016 14:30'], ['12574556', 'Show HN: Geto, a mobile local compass', 'https://andreapaiola.name/geto/', '2', '1', 'andreapaiola', '9/25/2016 9:19']]


**Calculating the Average Number of Comments For Ask HN and Show HN Posts**

In [6]:
total_ask_comments = 0
for row in ask_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_ask_comments += num_comments
avg_ask_comments = total_ask_comments/len(ask_posts)
print(avg_ask_comments)

13.744175951381855


In [7]:
total_show_comments = 0
for row in show_posts:
    num_comments = row[4]
    num_comments = int(num_comments)
    total_show_comments += num_comments
avg_show_comments = total_show_comments/len(show_posts)
print(avg_show_comments)

9.810832180272781


On average, ask posts in our sample receive approximately 14 comments, whereas show posts receive approximately 10. Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts.

#### Finding the Amount of Ask Posts and Comments by Hour Created

In [8]:
import datetime as dt

result_list = []
for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])
    
counts_by_hour = {}
comments_by_hour = {}
date_format = '%m/%d/%Y %H:%M'

for c, n in result_list:
    time = dt.datetime.strptime(c, date_format).strftime('%H')
    if time not in counts_by_hour:
        counts_by_hour[time] = 1
        comments_by_hour[time] = n
    else:
        counts_by_hour[time] += 1
        comments_by_hour[time] += n
comments_by_hour

{'19': 3954,
 '03': 2154,
 '17': 5547,
 '14': 4972,
 '08': 2362,
 '20': 4462,
 '09': 1477,
 '01': 2089,
 '18': 4877,
 '15': 18525,
 '06': 1587,
 '21': 4500,
 '12': 4234,
 '04': 2360,
 '00': 2277,
 '16': 4466,
 '23': 2297,
 '05': 1838,
 '10': 3013,
 '07': 1585,
 '11': 2797,
 '22': 3372,
 '13': 7245,
 '02': 2996}

**Calculating the Average Number of Comments For Ask HN Posts by Hour**

In [9]:
avg_by_hour = []
for hr in comments_by_hour:
    avg_by_hour.append([hr, comments_by_hour[hr]/counts_by_hour[hr]])
    
avg_by_hour

[['19', 9.414285714285715],
 ['03', 10.160377358490566],
 ['17', 13.73019801980198],
 ['14', 13.153439153439153],
 ['08', 12.43157894736842],
 ['20', 11.38265306122449],
 ['09', 8.392045454545455],
 ['01', 9.367713004484305],
 ['18', 10.789823008849558],
 ['15', 39.66809421841542],
 ['06', 9.017045454545455],
 ['21', 11.056511056511056],
 ['12', 15.452554744525548],
 ['04', 12.688172043010752],
 ['00', 9.857142857142858],
 ['16', 10.76144578313253],
 ['23', 8.322463768115941],
 ['05', 11.139393939393939],
 ['10', 13.757990867579908],
 ['07', 10.095541401273886],
 ['11', 11.143426294820717],
 ['22', 11.749128919860627],
 ['13', 22.2239263803681],
 ['02', 13.198237885462555]]

**Sorting and Printing Values**

In [10]:
swap_avg_by_hour = []
for row in avg_by_hour:
    a = row[1]
    b = row[0]
    swap_avg_by_hour.append([a,b])
    print(swap_avg_by_hour)

[[9.414285714285715, '19']]
[[9.414285714285715, '19'], [10.160377358490566, '03']]
[[9.414285714285715, '19'], [10.160377358490566, '03'], [13.73019801980198, '17']]
[[9.414285714285715, '19'], [10.160377358490566, '03'], [13.73019801980198, '17'], [13.153439153439153, '14']]
[[9.414285714285715, '19'], [10.160377358490566, '03'], [13.73019801980198, '17'], [13.153439153439153, '14'], [12.43157894736842, '08']]
[[9.414285714285715, '19'], [10.160377358490566, '03'], [13.73019801980198, '17'], [13.153439153439153, '14'], [12.43157894736842, '08'], [11.38265306122449, '20']]
[[9.414285714285715, '19'], [10.160377358490566, '03'], [13.73019801980198, '17'], [13.153439153439153, '14'], [12.43157894736842, '08'], [11.38265306122449, '20'], [8.392045454545455, '09']]
[[9.414285714285715, '19'], [10.160377358490566, '03'], [13.73019801980198, '17'], [13.153439153439153, '14'], [12.43157894736842, '08'], [11.38265306122449, '20'], [8.392045454545455, '09'], [9.367713004484305, '01']]
[[9.4142

In [11]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)
print(sorted_swap)

[[39.66809421841542, '15'], [22.2239263803681, '13'], [15.452554744525548, '12'], [13.757990867579908, '10'], [13.73019801980198, '17'], [13.198237885462555, '02'], [13.153439153439153, '14'], [12.688172043010752, '04'], [12.43157894736842, '08'], [11.749128919860627, '22'], [11.38265306122449, '20'], [11.143426294820717, '11'], [11.139393939393939, '05'], [11.056511056511056, '21'], [10.789823008849558, '18'], [10.76144578313253, '16'], [10.160377358490566, '03'], [10.095541401273886, '07'], [9.857142857142858, '00'], [9.414285714285715, '19'], [9.367713004484305, '01'], [9.017045454545455, '06'], [8.392045454545455, '09'], [8.322463768115941, '23']]


In [12]:
print('Top 5 hours for ASK HN Comments')

for avg, hr in sorted_swap[:5]:
      print('{}: {:.2f} average comments per post'.format(dt.datetime.strptime(hr, '%H').strftime('%H:%M'), avg))
      

Top 5 hours for ASK HN Comments
15:00: 39.67 average comments per post
13:00: 22.22 average comments per post
12:00: 15.45 average comments per post
10:00: 13.76 average comments per post
17:00: 13.73 average comments per post


**The hour that receives the most comments per post on average is 15:00 (3pm est) with average of 39.67 comments per post**

### Conclusion

Based on our analysis on post that receive comments, 'ask posts' received more comments on average, and 'ask posts' created between 15:00 and 16:00 (3pm and 4pm est) received the most comments on average. Therefore to maximize the amount of comments a post receives, we would recommend the post be categorized as ask post and created between 15:00 and 16:00 (3pm and 4pm)