### Introduction
![Hacker news Logo](hacker_news.jpg)
Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

This project is specifically interested in posts with titles that begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question. Likewise, users submit Show HN posts to show the Hacker News community a project, product, or just something interesting. The project will compare these two types of posts to determine the following:

Do Ask HN or Show HN receive more comments on average?
Do posts created at a certain time receive more comments on average?

The dataset used has 20,000 rows which is reduced from 300,000 rows by removing all submissions that didn't receive any comments and then randomly sampling from the remaining submissions. The following are key column(variable) names which were considered during this analysis:

<li> id: the unique identifier from Hacker News for the post
<li> title: the title of the post
<li> url: the URL that the posts links to, if the post has a URL
<li> num_points: the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
<li> num_comments: the number of comments on the post
<li>author: the username of the person who submitted the post
<li> created_at: the date and time of the post's submission
    


In [1]:
from csv import reader
import datetime as dt

#Read the hacker_news.csv file in as a list of lists
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [2]:
headers = hn[0]
hn = hn[1:]
print(headers)
hn[:5]

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

In [3]:
ask_posts =[]
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    lowercase_title = title.lower()  # Convert title to lowercase for case-insensitive comparison
    
    # Categorize posts based on title
    if lowercase_title.startswith("ask hn"):
        ask_posts.append(row)
    elif lowercase_title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

# Print the number of posts in each category
print("Number of Ask HN posts:", len(ask_posts))
print("Number of Show HN posts:", len(show_posts))
print("Number of Other posts:", len(other_posts))

Number of Ask HN posts: 1744
Number of Show HN posts: 1162
Number of Other posts: 17194


In [4]:
total_ask_comments = 0
for rw in ask_posts:
    num_comments = int(row[4]) 
    total_ask_comments += num_comments
    
avg_ask_comments = total_ask_comments / len(ask_posts)
print('Average number of comments on Ask HN posts:',avg_ask_comments)

Average number of comments on Ask HN posts: 58.0


In [5]:
total_show_comments = 0
for post in show_posts:
    num_comments = int(post[4])  # Convert the value to an integer
    total_show_comments += num_comments

# Calculate the average number of comments on show posts
avg_show_comments = total_show_comments / len(show_posts)
print("Average number of comments on Show HN posts:", avg_show_comments)

Average number of comments on Show HN posts: 10.31669535283993


Based on the analysis of the dataset, the Average number of comments on Ask HN posts is 58.0, while the Average number of comments on Show HN posts is 10.3. This shows that on the average, Ask HN posts are commented on 5.8 times the Show HN posts.

In [7]:
result_list = []
for post in ask_posts:
    created_at = post[6]
    num_comments = int(row[4]) 
    result_list.append([created_at, num_comments])
print(result_list[:10])

[['8/16/2016 9:55', 58], ['11/22/2015 13:43', 58], ['5/2/2016 10:14', 58], ['8/2/2016 14:20', 58], ['10/15/2015 16:38', 58], ['9/26/2015 23:23', 58], ['4/22/2016 12:24', 58], ['11/16/2015 9:22', 58], ['2/24/2016 17:57', 58], ['6/4/2016 17:17', 58]]


In [8]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date_str = row[0]
    comment_count = row[1]
    
    # Parse the date string and extract the hour
    date_obj = dt.datetime.strptime(date_str, "%m/%d/%Y %H:%M")
    hour = date_obj.strftime("%H")
    
    # Update counts and comments dictionaries
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment_count
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comment_count

In [9]:
avg_by_hour = []

for hour in counts_by_hour:
    average_comments = comments_by_hour[hour] / counts_by_hour[hour]
    avg_by_hour.append([hour, average_comments])


In [10]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

# Print the list with swapped columns
print("Swap Avg by Hour:", swap_avg_by_hour)

# Sort the list in descending order of average comments
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

# Print the top 5 hours for ask posts comments
print("Top 5 Hours for Ask Posts Comments:")
for avg, hour in sorted_swap[:5]:
    hour_obj = dt.datetime.strptime(hour, "%H")
    formatted_hour = hour_obj.strftime("%H:%M")
    print("{}: {:.2f} average comments per post".format(formatted_hour, avg))

Swap Avg by Hour: [[58.0, '09'], [58.0, '13'], [58.0, '10'], [58.0, '14'], [58.0, '16'], [58.0, '23'], [58.0, '12'], [58.0, '17'], [58.0, '15'], [58.0, '21'], [58.0, '20'], [58.0, '02'], [58.0, '18'], [58.0, '03'], [58.0, '05'], [58.0, '19'], [58.0, '01'], [58.0, '22'], [58.0, '08'], [58.0, '04'], [58.0, '00'], [58.0, '06'], [58.0, '07'], [58.0, '11']]
Top 5 Hours for Ask Posts Comments:
23:00: 58.00 average comments per post
22:00: 58.00 average comments per post
21:00: 58.00 average comments per post
20:00: 58.00 average comments per post
19:00: 58.00 average comments per post
