# Exploring Hacker News Posts

## Introduction

Hacker News is a popular social news website revolving around the topics concerning computer science and entrepreneurship. We will be working on dataset containing samples of user submissions.

This project aims to explore the posts submitted by the users of the popular site [Hacker News](https://news.ycombinator.com/) and seeks to find the answer for the following questions:

* Do `Ask HN` or `Show HN` receive more comments on average?
* Do posts created at a certain time receive more comments on average?

Let's begin by importing the necessary libraries for working with the dataset. While we're at it, we should also extract the header row from the dataset and assign it to a new variale `headers`.

In [15]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
headers = hn[0]
hn = hn[1:]

Let's print the first 5 rows of the dataset.

In [18]:
print("HEADER")
print(headers)
for idx in range(0, 5):
    print("\n")
    print(hn[idx])


HEADER
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']


The next thing we need to do is to separate the data that starts with "Ask HN" or "Show HN." We will create three empty lists which will store data where the title starts with Ask HN, Show HN, or none of the two.

In [26]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
        
    else:
        other_posts.append(row)

Let's see the number of data stored in each list.

In [25]:
print("Number of data in ask_posts: ", len(ask_posts))
print("Number of data in read_posts: ", len(show_posts))
print("Number of data in other_posts: ", len(other_posts))

Number of data in ask_posts:  1744
Number of data in read_posts:  1162
Number of data in other_posts:  17194


In [59]:
total_ask_comments = 0

for p in ask_posts:
    total_ask_comments += int(p[4])
    
avg_ask_comments = round(total_ask_comments / len(ask_posts), 2)

print("\033[1;4m" + "Ask Posts" + "\033[0m")
print("Total Comments:", total_ask_comments)
print("Average:", avg_ask_comments)

[1;4mAsk Posts[0m
Total Comments: 24483
Average: 14.04


In [58]:
total_show_comments = 0

for p in show_posts:
    total_show_comments += int(p[4])
    
avg_show_comments = round(total_show_comments / len(show_posts),2)

print("\033[1;4m" + "Show Posts" + "\033[0m")
print("Total Comments:", total_show_comments)
print("Average:", avg_show_comments)

[1;4mShow Posts[0m
Total Comments: 11988
Average: 10.32


Let's see how many comments there are for each posts in ask and show posts.

In [38]:
for row in ask_posts:
    print("title: ", row[1])
    print("comments:", row[4])

for row in show_posts:
    print("title: ", row[1])
    print("comments:", row[4])

title:  Ask HN: How to improve my personal website?
comments: 6
title:  Ask HN: Am I the only one outraged by Twitter shutting down share counts?
comments: 29
title:  Ask HN: Aby recent changes to CSS that broke mobile?
comments: 1
title:  Ask HN: Looking for Employee #3 How do I do it?
comments: 3
title:  Ask HN: Someone offered to buy my browser extension from me. What now?
comments: 17
title:  Ask HN: Limiting CPU, memory, and I/O usage on a program for testing
comments: 1
title:  Ask HN: Which framework for a CRUD app in 2016?
comments: 4
title:  Ask HN: Enter market with a well-funded competitor?
comments: 1
title:  Ask HN: Do you use any realtime PaaS/framework and in case you so which one?
comments: 1
title:  Ask HN: Is there a home Dropbox-style solution?  (better explanation inside)
comments: 2
title:  Ask HN: How would you sell open source software?
comments: 7
title:  Ask HN: Chat-App based on Mail and PGP?
comments: 1
title:  Ask HN: What you wish you knew before launching 

comments: 1
title:  Ask HN: Best way to learn 'modern' C++?
comments: 7
title:  Ask HN: Cheap databases for new projects?
comments: 8
title:  Ask HN: Dell pricing of XPS 13 developer notebooks
comments: 4
title:  Ask HN: Worth working on a project that you have no direct relationship with?
comments: 5
title:  Ask HN: What are the best books on creating a programming language?
comments: 18
title:  Ask HN: My infosec auditor rejects open source. What now?
comments: 17
title:  Ask HN: Life coach idea, need your input
comments: 7
title:  Ask HN: How big effort do you think building a top quality OS would be?
comments: 3
title:  Ask HN: How to stop procrastination?
comments: 1
title:  Ask HN: Making the switch from physics to industry?
comments: 10
title:  Ask HN: How do I start an analytics consulting company?
comments: 4
title:  Ask HN: Why WordPress is still using SVN?
comments: 3
title:  Ask HN: Are there any auctioning algorithms?
comments: 1
title:  Ask HN: Can/should we resubmit link

comments: 8
title:  Ask HN: Has anyone here successfully applied to Stripe Atlas?
comments: 7
title:  Ask HN: Org charts and job titles
comments: 2
title:  Ask HN: What are possible drawbacks of using a company in Singapore?
comments: 2
title:  Ask HN: Did anyone's life ever gotten more comfortable after accepting funding?
comments: 53
title:  Ask HN: Am I just burnt out or should I find a new career?
comments: 9
title:  Ask HN: What research areas would you consider if you were to start a CS PhD?
comments: 7
title:  Ask HN: Why does Etsy have so many items titled DO Not PURCHASE?
comments: 2
title:  Ask HN: What do you wish someone would build?
comments: 477
title:  Ask HN: Why are e-books sold for the same price as printed books
comments: 3
title:  Ask HN: GitHub vs. Gitlab?
comments: 111
title:  Ask HN: I hate working alone, but I want to pursue my startup
comments: 18
title:  Ask HN: What are the recruitment tools/plugins used by startups?
comments: 4
title:  Ask HN: Should I use m

title:  Ask HN: How can we fight the pesticide issue in my state
comments: 1
title:  Ask HN: What's most important when creating software?
comments: 5
title:  Ask HN: Do recruiters follow up emails ever work?
comments: 5
title:  Ask HN: Suggestions for my master's degree dissertation
comments: 3
title:  Ask HN: How long did it take you to launch?
comments: 1
title:  Ask HN: What's the right subset of the C++ language?
comments: 2
title:  Ask HN: Showcasing side projects on resume
comments: 11
title:  Ask HN: So two tiny and speedy browsers today. which?
comments: 1
title:  Ask HN: Brexit  Should I vote in or out?
comments: 8
title:  Ask HN: Do you use an alternative keyboard layout like Dvorak?
comments: 7
title:  Ask HN: Templates for startup legal agreements
comments: 1
title:  Ask HN: How would you improve Twitter?
comments: 6
title:  Ask HN: Do Valley VC's ever lose money?
comments: 1
title:  Ask HN: How do you keep your (always online) Windows pc safe?
comments: 3
title:  Ask HN: 

comments: 2
title:  Ask HN: Have you faced any racism in selection process of ALLOW REMOTEcompanies
comments: 6
title:  Ask HN: Imposing a feedback on a school.
comments: 3
title:  Ask HN: What are your favorite scaffolding tools?
comments: 2
title:  Ask HN: How can a back-end engineer find a front-end engineer to collaborate?
comments: 2
title:  Ask HN: Best book on topography?
comments: 5
title:  Ask HN: Do you still use IRC?
comments: 19
title:  Ask HN: Can an idea be independent (at least enough), and for how long?
comments: 1
title:  Ask HN: Should I quit graduate school to avoid a bad advisor?
comments: 45
title:  Ask HN: What should I be aware of when open sourcing code from my company?
comments: 4
title:  Ask HN: How to stay fit?
comments: 10
title:  Ask HN: Moving to US from the UK
comments: 2
title:  Ask HN: Why HN website still uses center tag and tables?
comments: 2
title:  Ask HN: Showing unread comments in Chrome
comments: 1
title:  Ask HN: Any open AR library out there f

comments: 22
title:  Show HN: Something pointless I made
comments: 102
title:  Show HN: Shanhu.io, a programming playground powered by e8vm
comments: 1
title:  Show HN: Webscope  Easy way for web developers to communicate with Clients
comments: 3
title:  Show HN: GeoScreenshot  Easily test Geo-IP based web pages
comments: 9
title:  Show HN: Run with Mark (Runkeeper only)
comments: 3
title:  Show HN: Send an email from your shell to yourself without pain
comments: 1
title:  Show HN: Underline.js is like underscore.js but using modern ES7 syntax
comments: 1
title:  Show HN: Real-Time Stats for an iOS MMORPG Game in a Wordpress Front End
comments: 1
title:  Show HN: Bild  A collection of image processing functions in Go
comments: 2
title:  Show HN: Automated coach for programming interviews
comments: 3
title:  Show HN: /frink, a Slack app for Simpsons gifs
comments: 1
title:  Show HN: Vector Toy  Visualize and manipulate vector field functions
comments: 4
title:  Show HN: HTML5 and Canvas

comments: 1
title:  Show HN: Web app health directly on GitHub pull requests
comments: 1
title:  Show HN: OctaveWealth  Smart, flat-fee 401k
comments: 17
title:  Show HN: CloudParty  play games with friends in the cloud
comments: 1
title:  Show HN: A community run, aggregate news source
comments: 1
title:  Show HN: Measure ad blocking rate by device type and country
comments: 1
title:  Show HN: Xcode 8 Source Code Extension to Generate Swift Initializers
comments: 3
title:  Show HN: The Second Issue of Compelling Science Fiction
comments: 3
title:  Show HN: HammerJS for React Native
comments: 3
title:  Show HN: Kemal  Lightning Fast, Super Simple Crystal Web Framework
comments: 2
title:  Show HN: PouchDB Bindings for PureScript
comments: 3
title:  Show HN: Calories based food search
comments: 1
title:  Show HN: Node.js minimal Dataflow programming engine
comments: 1
title:  Show HN: MapHub  A Google My Maps Alternative, Based on OpenStreetMap Data
comments: 81
title:  Show HN: Commandc

comments: 3
title:  Show HN: Multi New Tab for Chrome  Your favorite sites on new tab pages
comments: 1
title:  Show HN: Rawcode.io  a place to find and store code snippets
comments: 2
title:  Show HN: NBLAS, Node C++ bindings to CBLAS
comments: 2
title:  Show HN: Search the web like a ninja
comments: 1
title:  Show HN: PHP serialization/deserialization library in Go
comments: 1
title:  Show HN: UTClock  Super Simple UTC Clock for Mac OS X Menu Bar
comments: 2
title:  Show HN: Chevrotain  Fault-Tolerant JavaScript Parsing DSL
comments: 16
title:  Show HN: Tumblestone  We're a team of 4 and just launched on Steam and consoles
comments: 1
title:  Show HN: How we static site
comments: 1
title:  Show HN: A foursquare client written entirely in kotlin
comments: 1
title:  Show HN: My site 15 years ago is still online on the original host
comments: 4
title:  Show HN: Another convenient web app for reading whoishiring
comments: 10
title:  Show HN: TVQue.com -a mailbox for TV. Send photos/video

comments: 50
title:  Show HN: The first issue of Compelling Science Fiction
comments: 70
title:  Show HN: Emoji-js
comments: 1
title:  Show HN: Wired Logic  a pixel-based logic simulator
comments: 15
title:  Show HN: Linux on a Poster
comments: 4
title:  Show HN: We transform Excel sheets into APIs to make complex computations easy
comments: 2
title:  Show HN: WikiPop  Endangered Species Population Tracking and Crowdfunding
comments: 1
title:  Show HN: Lufo, Last Used First Out  jQuery plugin to improve long select menus
comments: 2
title:  Show HN: Twitter Lists Redux, Chrome extension that makes lists more convenient
comments: 1
title:  Show HN: LawPatch  JQuery for Law Using Git
comments: 6
title:  Show HN: GPemu  A Chrome App to play SNES games
comments: 3
title:  Show HN: Micro web framework for low-resource systems  live example on ESP8266
comments: 40
title:  Show HN: Barter Hack  trade your technical skills for other people's
comments: 55
title:  Show HN: Mrrobot.io
comments: 1

The results suggest that on average, ask HN posts receive more comments than show HN posts. The difference may be due to the inequality of the number of posts between ask HN posts and show HN posts. It is also worth mentioning that there is no distinguishable topic in which the posts may receive more comments. This is due to the fact that many people may not be concerned for other topics aside from what they are really interested in.

We will now move on to determine whether ask posts are more likely to catch the interest of users at a specific time period. 

In [78]:
import datetime as dt
result_list = []

for p in ask_posts:
    result_list.append([p[6], p[4]])
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    time = row[0]
    date = dt.datetime.strptime(time, "%m/%d/%Y %H:%M")
    hour = date.strftime("%H")
    comments = int(row[1])    
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments
    
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
        



['00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23']


We will calculate the average number of comments per post for every hour of the day.

In [83]:
avg_by_hour = []
for h in counts_by_hour:
    avg_by_hour.append([h, round(comments_by_hour[h]/counts_by_hour[h],2)])
    
print(avg_by_hour)

[['09', 5.58], ['13', 14.74], ['10', 13.44], ['14', 13.23], ['16', 16.8], ['23', 7.99], ['12', 9.41], ['17', 11.46], ['15', 38.59], ['21', 16.01], ['20', 21.52], ['02', 23.81], ['18', 13.2], ['03', 7.8], ['05', 10.09], ['19', 10.8], ['01', 11.38], ['22', 6.75], ['08', 10.25], ['04', 7.17], ['00', 8.13], ['06', 9.02], ['07', 7.85], ['11', 11.05]]


In [112]:
swap_avg_by_hour = []

for r in avg_by_hour:
    swap_avg_by_hour.append([r[1],r[0]])
    
print(swap_avg_by_hour)

sorted_swap = sorted(swap_avg_by_hour, reverse = True)    

[[5.58, '09'], [14.74, '13'], [13.44, '10'], [13.23, '14'], [16.8, '16'], [7.99, '23'], [9.41, '12'], [11.46, '17'], [38.59, '15'], [16.01, '21'], [21.52, '20'], [23.81, '02'], [13.2, '18'], [7.8, '03'], [10.09, '05'], [10.8, '19'], [11.38, '01'], [6.75, '22'], [10.25, '08'], [7.17, '04'], [8.13, '00'], [9.02, '06'], [7.85, '07'], [11.05, '11']]


In [111]:
print("Top 5 Hours for Ask Posts Comments")

for row in sorted_swap[:5]:
    frmat = "{hour} : {value} average comments per post."
    time1 = dt.datetime.strptime(str(row[1]), "%H")
    time2 = time1.strftime("%H:%M")
    avg = row[0]
    display = frmat.format(hour = time2, value = avg)
    print(display)
    

Top 5 Hours for Ask Posts Comments
15:00 : 38.59 average comments per post.
02:00 : 23.81 average comments per post.
20:00 : 21.52 average comments per post.
16:00 : 16.8 average comments per post.
21:00 : 16.01 average comments per post.


As you can see, the time in which you have the highest chance of receiving comments in Eastern Time in the US is 3pm with an average of 38.59. In time zone however, it would be at 3am in my country.

# END

# Other questions to solve:
* Determine if show or ask posts receive more points on average.
* Determine if posts created at a certain time are more likely to receive more points.
* Compare your results to the average number of comments and points other posts receive.
* Use Dataquest's data science project style guide to format your project.
