<h1>Exploring Hackers News Posts</h1>

<p>This project will compare two different types of posts from Hacker News website which specializes in technology related stories, it's very similar to Reddit. 

Two types of posts that will be explore in this notebook begin with either ***Ask HN*** or ***Show HN***.

Users submit Ask HN posts to ask the Hacker News community a specific question, such as "What is the best online course you've ever taken?" Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just generally something interesting.

This notebook will compare these two types of posts to determine the following:

- *Do Ask HN or Show HN receive more comments on average?*
- *Do posts created at a certain time receive more comments on average?*

**Note** *This data set was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions.*

<h3>Import Dataset</h3>

In [163]:
# Read in the data.
import csv

f = open('hacker_news.csv')
hn = list(csv.reader(f))
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

<h3>Separate Header from Data</h3>

In [164]:
# Remove the headers.
hn_header = hn[0]
hn = hn[1:]
print(hn_header)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


<h3>Filtering 'Ask HN' and 'Show HN' Posts</h3>

Identify posts that begin with either `Ask HN` or `Show HN` and separate the data into different lists.



In [165]:
# Identify posts that begin with either `Ask HN` or `Show HN` and separate the data into different lists.


ask_posts = []
show_posts = []
other_posts = []

for post in hn:
    title = post[1]
    if title.lower().startswith("Ask HN".lower()):
        ask_posts.append(post)
    elif title.lower().startswith("Show HN".lower()):
        show_posts.append(posts)
    else:
        other_posts.append(posts)
        
        

print(f'Len of whole dataset: {len(hn)}')
print(f'Len of ask_posts: {len(ask_posts)}')
print(f'Len of show_posts: {len(show_posts)}')
print(f'Len of other_posts: {len(other_posts)}')
print(f'\n* Data checking:\nLen of ask_posts + show_posts + other_posts = {len(ask_posts)+len(show_posts)+len(other_posts)}')
        

Len of whole dataset: 20100
Len of ask_posts: 1744
Len of show_posts: 1162
Len of other_posts: 17194

* Data checking:
Len of ask_posts + show_posts + other_posts = 20100


<h3>Compare Number of Comments for 'Ask HN' and 'Show HN'</h3>


In [167]:
print(f"Header: \n{hn_header}")

# Comments found in column 4

# Sample from row 4 of 'ask_posts' list
print(f"\nSample from row 4 of 'ask_posts' list:\n{ask_posts[4]}")
print(f'\nComment = {ask_posts[4][4]}')

Header: 
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

Sample from row 4 of 'ask_posts' list:
['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']

Comment = 17


In [182]:
# def count_comments(comlist):
#     num_comments = 0
#     for row in comlist:
#         comments = int(row[4])
#         num_comments += comments
    
#     return num_comments

# def avg_comments(comlist):
#     num_comments = count_comments(comlist)
#     return round(num_comments / len(comlist))


# print(f'Ask HN (Total Comments): {count_comments(ask_posts)}')
# print(f'avg comments def: {avg_comments(show_)}')
# print(f'Ask HN (Avg): {round(count_comments(ask_posts)/len(ask_posts),2)}')

# print(f'\nShow HN (Total Comments): {count_comments(show_posts)}')
# print(f'Show HN (Avg): {round(count_comments(show_posts)/len(show_posts),2)}')

# print(f'\nOther Posts (Total Comments): {count_comments(other_posts)}')
# print(f'Other Posts (Avg): {round(count_comments(other_posts)/len(other_posts),2)}')

total_show_comments = 0

for post in show_posts:
    total_show_comments += int(post[4])
    
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)
    

58.0
