# Hacker News Project

**Exercise Description**: This was a quick data analysis on the types of user posts on the Hacker News website. The goal here was to understand the volume of *Ask* and *Show* posts as well as to compare the difference in engagement between the two based on the number of comments they received. In order to get more practice manipulating list of lists in Python, this exercise was done without using Numpy and Pandas. The data set being explored here is available on Kaggle. 

In [20]:
from csv import reader
import datetime as dt

pd.options.display.max_rows = 999

Reading the Hacker News csv in as a list of lists below:

In [2]:
opened = open("hacker_news.csv")
read = reader(opened)
hn = list(read)

In [6]:
headers = hn[0]
hn = hn[1:]

In [7]:
headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

Separating user posts by **Ask**, **Show**, and **Other**

In [11]:
ask_posts = []
show_posts = []
other_posts = []

for i in hn:
    if i[1].lower().startswith('ask hn'):
        ask_posts.append(i)
    elif i[1].lower().startswith('show hn'):
        show_posts.append(i)
    else:
        other_posts.append(i)

print(str(len(ask_posts)) + ' Ask Posts')
print(str(len(show_posts)) + ' Show Posts')
print(str(len(other_posts)) + ' Other Posts')

9139 Ask Posts
10158 Show Posts
273822 Other Posts


Determing which of the two post types receive more comments on average:

In [19]:
total_ask_comments = 0
total_show_comments = 0

for i in ask_posts:
    total_ask_comments += int(i[4])

for i in show_posts:
    total_show_comments += int(i[4])

avg_ask_comments = total_ask_comments / len(ask_posts)
avg_show_comments = total_show_comments / len(show_posts)

print('Ask Comments Avg ' + str(round(avg_ask_comments,2)))
print('Show Comments Avg ' + str(round(avg_show_comments,2)))

Ask Comments Avg 10.39
Show Comments Avg 4.89


Ask posts received more comments on average than Show posts on Hacker News.

Exploring whether Ask posts received more comments at certain times of the day as compared to others:

In [24]:
print(headers)
print(ask_posts[0])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
['12578908', 'Ask HN: What TLD do you use for local development?', '', '4', '7', 'Sevrene', '9/26/2016 2:53']


In [46]:
result_list = []

for i in ask_posts:
    result_list.append([i[6],int(i[4])])
    
counts_by_hour = {}
comments_by_hour = {}

for i in result_list:
    hour = dt.datetime.strptime(i[0],'%m/%d/%Y %H:%M').hour
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += i[1]
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = i[1]

In [61]:
avg_comments_hr = []

for i in comments_by_hour:
    avg = round(comments_by_hour[i]/counts_by_hour[i],2)
    avg_comments_hr.append ([avg, i])

avg_comments_hr = sorted(avg_comments_hr,reverse=True)

In [62]:
avg_comments_hr

[[28.68, 15],
 [16.32, 13],
 [12.38, 12],
 [11.14, 2],
 [10.68, 10],
 [9.71, 4],
 [9.69, 14],
 [9.45, 17],
 [9.19, 8],
 [8.96, 11],
 [8.8, 22],
 [8.79, 5],
 [8.75, 20],
 [8.69, 21],
 [7.95, 3],
 [7.94, 18],
 [7.71, 16],
 [7.56, 0],
 [7.41, 1],
 [7.16, 19],
 [7.01, 7],
 [6.78, 6],
 [6.7, 23],
 [6.65, 9]]

In [66]:
print("Top 5 Hours for Ask Posts Comments")

for row in avg_comments_hr[:5]:
        template = "{}: {} average comments per post."
        hour = str(row[1])
        hour = dt.datetime.strptime(hour,'%H').strftime("%H:%M")
        output = template.format(hour,row[0])
        print(output)

Top 5 Hours for Ask Posts Comments
15:00: 28.68 average comments per post.
13:00: 16.32 average comments per post.
12:00: 12.38 average comments per post.
02:00: 11.14 average comments per post.
10:00: 10.68 average comments per post.


If you are thinking about sharing an Ask post on Hacker News, **3pm** is probably a good time to do it if you are looking to maximize engagement through comments.