## Analyze Hacker Rank Website "Ask HN" and "Show HN" posts

### Introduction

1. The purpose of this project is to analyze the posts from [Hacker News](https://news.ycombinator.com/) maintained by y combinator
2. Utilizing a filtered down readily available dataset, analyze posts that have `Ask HN` or `Show HN` in them. Data source is [Kaggle.com](https://www.kaggle.com/hacker-news/hacker-news-posts)
3. Analyze this dataset to see if Ask or Show HN posts receive more comments on average
4. Also, peform analysis on the dataset to see if posts created at certail times receive more comments on average

In [1]:
#1. Import hacker_news.csv dataset
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [2]:
#2. Remove headers from the hn list of lists
headers = hn[0]
hn = hn[1:]
print(headers)
print(hn[:5])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


### Identify posts that begin with Ask HN or Show HN and put it into a seperate list
1. Create 3 seperate lists for ask hn, show hn and other post comments
2. Loop through the hn dataset to check if the title begins with ask hn or show hn and assign to the respective lists
3. Display total number of posts in each list

In [7]:
# Create 3 lists called ask_posts, show_posts and other_posts
ask_posts=[]
show_posts=[]
other_posts=[]

# Loop through each row in hn
for eachrow in hn:
    title = eachrow[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(eachrow)
    elif title.lower().startswith('show hn'):
        show_posts.append(eachrow)
    else:
        other_posts.append(eachrow)

# Check number of posts in ask_hn, show_hn and other_posts       
print('Number of posts in ask posts: ' + str(len(ask_posts)))
print('Number of posts in ask posts: ' + str(len(show_posts)))
print('Number of posts in ask posts: ' + str(len(other_posts)))

Number of posts in ask posts: 1744
Number of posts in ask posts: 1162
Number of posts in ask posts: 17194


### Analysis of `Ask HN` and `Show HN` posts
Performing analysis on `Ask HN` or `Show HN` to see if they received more comments on average than other types of posts

In [9]:
# Declare variable total_ask_comments and set to 0
# Get total number of comments in ask_posts dataset and calculate average
total_ask_comments = 0
for eachrow in ask_posts:
    number_of_comments_ask=int(eachrow[4])
    total_ask_comments += number_of_comments_ask
avg_ask_comments = total_ask_comments/len(ask_posts)
print(avg_ask_comments)

# Declare variable total_show_comments and set to 0
# Get total number of comments in show_posts dataset and calculate average
total_show_comments = 0
for eachrow in show_posts:
    number_of_comments_show = int(eachrow[4])
    total_show_comments += int(number_of_comments_show)
avg_show_comments = total_show_comments/len(show_posts)
print(avg_show_comments)

14.038417431192661
10.31669535283993


### Findings
1. On average `Ask HN` posts received 14 comments per post and `Show HN` posts received 10 comments per post
2. It can be seen that on an average `Ask HN` posts receive 4 more comments that the `Show HN` posts
3. As ask posts receive more comments, focussing the rest of this analysis on these posts

#### Analysis of ask posts
1. This analysis is broken dow into 2 phases
2. In the first phase we are going to find the number of ask posts created in each hour of the day and the comments received

In [None]:
# Import datetime module as dt
import datetime as dt

# Declare 2 lists. The first one result_list is to store the created date and comments for all posts as list of list
# The second list each_row_list is to store the elements created date and number of comments for each post
result_list=[]
each_row_list=[]
for eachrow in ask_posts:
    each_row_list=[]
    each_row_list.append(eachrow[6])
    each_row_list.append(int(eachrow[4]))
    result_list.append(each_row_list)

# Creating 2 empty dictionaries called counts_by_hour and comments_by_hour
counts_by_hour={}
comments_by_hour={}
