# Hacker News Data Analysis
## Site Info
---
Hacker News is a site that began on the startup website Y Combinator. Here, posts created by users can be voted and commented on, much like the site Reddit. This site appeals to those interested in technology and other startups, where posts related to this content can reach high publicity if they reach the top Hacker News' listings.

## Data Info
---
The data is comprised of 20,000 rows of data, which was reduced from 300,000 rows after removing posts that have not received any comments, and then random sampling after that process.

The column descriptions are as follows:

* id: The unique identifier from Hacker News for the post
* title: The title of the post
* url: The URL that the posts links to, if it the post has a URL
* num_points: The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
* num_comments: The number of comments that were made on the post
* author: The username of the person who submitted the post
* created_at: The date and time at which the post was submitted

In [6]:
from csv import reader
hn = []
hn_header = []
with open("hacker_news.csv") as f:
    read_file = reader(f)
    hn = list(read_file)
    hn_header = hn[0]
    hn.pop(0)

In [17]:
for v in enumerate(hn_header):
    print(v)
# print('\n')
# for i in range(5):
#     print(hn[i], '\n')

(0, 'id')
(1, 'title')
(2, 'url')
(3, 'num_points')
(4, 'num_comments')
(5, 'author')
(6, 'created_at')


## Filtering Data
---
For this analysis, we will look only at posts that are directed towards the 'Ask HN' and 'Show HN' subject. From here, we will filter the data and find which of the two post subjects are more popular on the Hacker News site, in terms of number of posts and average number of comments per post.

First, we will have to separate the dataset into three separate lists: `ask_posts`, `show_posts`, and `other_posts`.

In [21]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
print('Number of \'Ask HN\' posts:', len(ask_posts))
print('Number of \'Show HN\' posts:', len(show_posts))
print('Number of other posts:', len(other_posts))

Number of 'Ask HN' posts: 1744
Number of 'Show HN' posts: 1162
Number of other posts: 17194


Next we will assign the total number of comments of each list to `total_ask_comments` and `total_show_comments` respectively.

In [28]:
total_ask_comments = 0
for row in ask_posts:
    total_ask_comments += int(row[4])
avg_ask_comments = total_ask_comments / len(ask_posts)

total_show_comments = 0
for row in show_posts:
    total_show_comments += int(row[4])
avg_show_comments = total_show_comments / len(show_posts)

print("Avg. number of comments on 'Ask HN' posts:", round(avg_ask_comments, 2))
print("Avg. number of comments on 'Show HN' posts:", round(avg_show_comments, 2))

Avg. number of comments on 'Ask HN' posts: 14.04
Avg. number of comments on 'Show HN' posts: 10.32


From this data, we can see that the amount of Ask HN posts outnumber the Show HN posts, in addition to the average number of comments on Ask HN posts is also greater than the average for Show HN posts. This shows that Ask HN posts tend to be more popular with Hacker News users.