# Exploring Hacker News Posts

In this project we will be analyzing a data set that contains information about posts on the popular site Hacker News. Created by startup incubator Y Combinator, Hacker News is a site where users submit posts which are then upvoted and commented on, similar to Reddit. The data set can be found [here](https://www.kaggle.com/hacker-news/hacker-news-posts). 

For our analysis, we will be interested in two types of posts specifically, Ask HN posts where users submit a question and Show HN posts where users share a project, product or something else of interest. The questions we will look to answer are:
- Do Ask HN or Show HN posts receive more comments on average?
- Do posts created at a certian time receive more comments on average?

Using the answers to these two questions, we will determine what type of post and at what time of day will gain the largest amount of comments.

We will begin by reading in our data set and seperating the header row:

In [1]:
from csv import reader

hn = list(reader(open('HN_posts_year_to_Sep_26_2016.csv')))
header = hn[0]
hn = hn[1:]

print(header)
print('\n')
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12579008', 'You have two days to comment if you want stem cells to be classified as your own', 'http://www.regulations.gov/document?D=FDA-2015-D-3719-0018', '1', '0', 'altstar', '9/26/2016 3:26'], ['12579005', 'SQLAR  the SQLite Archiver', 'https://www.sqlite.org/sqlar/doc/trunk/README.md', '1', '0', 'blacksqr', '9/26/2016 3:24'], ['12578997', 'What if we just printed a flatscreen television on the side of our boxes?', 'https://medium.com/vanmoof/our-secrets-out-f21c1f03fdc8#.ietxmez43', '1', '0', 'pavel_lishin', '9/26/2016 3:19'], ['12578989', 'algorithmic music', 'http://cacm.acm.org/magazines/2011/7/109891-algorithmic-composition/fulltext', '1', '0', 'poindontcare', '9/26/2016 3:16'], ['12578979', 'How the Data Vault Enables the Next-Gen Data Warehouse and Data Lake', 'https://www.talend.com/blog/2016/05/12/talend-and-Â\x93the-data-vaultÂ\x94', '1', '0', 'markgainor1', '9/26/2016 3:14']]


## Extracting Ask HN and Show HN Posts

Now that the header of the data set has been removed, its time to clean up and filter our data prior to analysis. As stated before, we are only interested in Ask HN and Show HN posts so our next step will be to extract those posts from our data set. We will create three new subsets of the data, one containing only Ask HN posts, one containing only Show HN posts, and the third will contain all other posts:

In [2]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('Ask HN Posts: ', len(ask_posts))
print('Show HN Posts: ', len(show_posts))
print('Other Posts: ', len(other_posts))

Ask HN Posts:  9139
Show HN Posts:  10158
Other Posts:  273822


## Calculating Average Number of Comments for Ask HN and Show HN Posts

Next we will use our new data subsets to answer our first question, do Ask HN or Show HN posts receive more comments on average?

### Average Number of Comments for Ask HN Posts

First we calucate the average number of comments for Ask HN Posts:

In [3]:
total_ask_comments = 0

for post in ask_posts:
    num_comments = int(post[4])
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)

print('Average Number of Comments for Ask HN Posts: ', avg_ask_comments)

Average Number of Comments for Ask HN Posts:  10.393478498741656


### Average Number of Comments for Show HN Posts

Now we will repeat our calculation for Show HN posts:

In [4]:
total_show_comments = 0

for post in show_posts:
    num_comments = int(post[4])
    total_show_comments += num_comments
    
avg_show_comments = total_show_comments / len(show_posts)

print('Average Number of Comments for Show HN Posts: ', avg_show_comments)

Average Number of Comments for Show HN Posts:  4.886099625910612


From our analysis we can see that on average, Ask HN posts receive around twice as many comment than Show HN posts. Since we now know that Ask HN posts receive more comments, we will continue our analysis by focusing on these posts. The next question we will answer what is the best time of day to post an Ask HN post in order to generate the highest amount of comments. 

## Determining Amout of Ask HN Posts and Comments by Hour Created

We will now work to determine the hour of the day that the most Ask HN posts are submitted and the hour that the most comments on Ask HN posts are generated. To accomplish this we will work with the Python datetime module:

In [None]:
import datetime as dt

result_list = []

for post in ask_posts:
    # Extract post creation time and number of comments
    result_list.append([post[6], int(post[4])])
    
counts_by_hour = {}
comments_by_hour = {}