# Hacker News Posts

Use data from Hacker News Posts for analysis. Looking specifically at posts with "Ask HN" or "Show HN".

We are going to compare these two types of post to determin the following:
  - Do "Ask HN" or "Show HN" receive more comments on average?
  - Do posts created at a certain time receive more comments on average?
  

Below are descriptions of the columns:

  - id: The unique identifier from Hacker News for the post
  - title: The title of the post
  - url: The URL that the posts links to, if it the post has a URL
  - num_points: The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes
  - num_comments: The number of comments that were made on the post
  - author: The username of the person who submitted the post
  - created_at: The date and time at which the post was submitted

### 1: Introduction - Start

In [1]:
from csv import reader

# Use the open() function to open the csv
# file hacker_news.csv
opened_file = open('hacker_news.csv')

# Use the reader() function to read the 
# opened file.
read_file = reader(opened_file)

# Use the list() function to convert the 
# read file into a list of lists format.
hn = list(read_file)

# close the opened file
opened_file.close()

In [2]:
# Assign the results to variable hn
# and display the first 5 rows
hn[:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

### 1: Introduction - End

### 2: Removing Headers from a List of Lists - Start

In [3]:
# Extract the first row of data and assign it 
# to the variable "headers"
headers = hn[:1]

#Display headers
print(headers)

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]


In [4]:
# remove the first row(headers)from hn
hn = hn[1:]
# display the first 5 records again 
# verify headers are not included
hn[:5]

[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

### 2: Removing Headers from a List of Lists - End

### 3: Extracting Ask HN and Show HN Posts - Start

In [5]:
# create 3 empty lists 
ask_posts = []
show_posts = []
other_posts = []

In [6]:
# loop through the titles in the HN lists. Categorize
# and summarize the data
for row in hn:
    title = row[1].lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    if title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [7]:
# Check the number of posts in each list
template = 'Number of titles in {list_name}: {num}'

print(template.format(list_name='ask_posts', num=len(ask_posts)))
print(template.format(list_name='show_posts', num=len(show_posts)))
print(template.format(list_name='other_posts', num=len(other_posts)))


Number of titles in ask_posts: 1744
Number of titles in show_posts: 1162
Number of titles in other_posts: 18938


In [8]:
#print(ask_posts)

### 3: Extracting Ask HN and Show HN Posts - End

### 4: Calculating the Average Number of Comments for Ask HN and Show HN Posts - Start

In [9]:
total_ask_comments = 0
total_show_comments = 0

tot_template = 'Total number of comments for {list_name}: {num}'
avg_template = 'Avergage number of comments for {list_name}: {num}'

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments +=num_comments
    
print(tot_template.format(list_name='ask_posts', num=total_ask_comments))    
print(avg_template.format(list_name='ask_posts', num=total_ask_comments/len(ask_posts)))

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments +=num_comments
    
print(tot_template.format(list_name='show_posts', num=total_show_comments))    
print(avg_template.format(list_name='show_posts', num=total_show_comments/len(show_posts)))

    

Total number of comments for ask_posts: 24483
Avergage number of comments for ask_posts: 14.038417431192661
Total number of comments for show_posts: 11988
Avergage number of comments for show_posts: 10.31669535283993


On average, Ask HackerNews posts receive 14 comments per post. Show HackerNews receives 10 comments per post.

### 4: Calculating the Average Number of Comments for Ask HN and Show HN Posts - End


### 5: Finding the Amount of Ask Posts and Comments by Hour Created - Start

In [10]:
import datetime as dt

# create and empty list and append created_at and num_comments

result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])
    #result_list.append(num_comments)
    
result_list[:5]

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M")
    hour = dt.datetime.strftime(date, "%H")
    
    if hour in counts_by_hour:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
    else:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]

print(date)
print(hour)
print(counts_by_hour)
print(comments_by_hour)
print(result_list[:5])

2016-05-21 09:22:00
09
{'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}
{'09': 251, '13': 1253, '10': 793, '14': 1416, '16': 1814, '23': 543, '12': 687, '17': 1146, '15': 4477, '21': 1745, '20': 1722, '02': 1381, '18': 1439, '03': 421, '05': 464, '19': 1188, '01': 683, '22': 479, '08': 492, '04': 337, '00': 447, '06': 397, '07': 267, '11': 641}
[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17]]


### 5: Finding the Amount of Ask Posts and Comments by Hour Created - End

### 6: Calculating the Average Number of Comments for Ask HN Posts by Hour - Start

In [11]:
 
# Calc the avg number of comments per post by hour
# ['hour': avg_num_comments_per_post] 
avg_by_hour = []
#print(result_list)


for key in counts_by_hour:
    #use the key to set variable hour
    hour = key 
    #use the key-value to set variable counts
    counts = counts_by_hour[key]
    #use the key to lookup the value in comments_by_hour dict
    comments = comments_by_hour[key]
    #calculate the avg
    avg = int(comments)/int(counts)
    #print(hour, counts, comments, avg)
    #create the list of lists with avg
    avg_by_hour.append([hour, counts, comments, avg])
    
avg_by_hour

[['09', 45, 251, 5.5777777777777775],
 ['13', 85, 1253, 14.741176470588234],
 ['10', 59, 793, 13.440677966101696],
 ['14', 107, 1416, 13.233644859813085],
 ['16', 108, 1814, 16.796296296296298],
 ['23', 68, 543, 7.985294117647059],
 ['12', 73, 687, 9.41095890410959],
 ['17', 100, 1146, 11.46],
 ['15', 116, 4477, 38.5948275862069],
 ['21', 109, 1745, 16.009174311926607],
 ['20', 80, 1722, 21.525],
 ['02', 58, 1381, 23.810344827586206],
 ['18', 109, 1439, 13.20183486238532],
 ['03', 54, 421, 7.796296296296297],
 ['05', 46, 464, 10.08695652173913],
 ['19', 110, 1188, 10.8],
 ['01', 60, 683, 11.383333333333333],
 ['22', 71, 479, 6.746478873239437],
 ['08', 48, 492, 10.25],
 ['04', 47, 337, 7.170212765957447],
 ['00', 55, 447, 8.127272727272727],
 ['06', 44, 397, 9.022727272727273],
 ['07', 34, 267, 7.852941176470588],
 ['11', 58, 641, 11.051724137931034]]

### 6: Calculating the Average Number of Comments for Ask HN Posts by Hour - End

### 7: Sorting and Printing Values from a List of Lists - Start

In [22]:
swap_avg_by_hour = []
sorted_swap = []

for rec in avg_by_hour:
    hour = rec[0]
    avg = rec[3]
    swap_avg_by_hour.append([avg, hour])
    
swap_avg_by_hour
sorted_swap = sorted(swap_avg_by_hour, reverse=True)

template = "{time}: {avg:.2f} average comments per post"

#sorted_swap[:5]
print("Top 5 Hours for Ask Posts Comments")

for row in sorted_swap[:5]:
    avg = row[0]
    hour = row[1]
    time = dt.datetime.strptime(hour, "%H")
    hr_out = dt.datetime.strftime(time, "%H:%M")
    print(template.format(time=hr_out, avg=avg))
    


Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The best time to create a post and get the most comments is 15:00 or 3:00 PM.

### 7: Sorting and Printing Values from a List of Lists - End