# Guided Project: Exploring Hacker News Posts

This guided project will bring the following skills together for real-world practice:

1) How to work with strings  
2) Object-oriented programming  
3) Dates and times  

A [downsampled dataset](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts) from `Hacker News` will be used in this project. `Hacker News` is extremely popular website in technology and startup cicrles, and posts that make it to the top of the listings can get hundreds of thousands of visitors.

Primarily, we're specifically interested in posts with title that begin with etiher `Ask HN` or `Show HN`. Users submit `Ask HN` posts to ask `Hacker News` community a specific question while users submit `Show HN` posts to show `Hacker News` community a project, product, or just something interesting.

## Description of data columns

| Column      | Description|
|:-------------|:------------|
|id           |the unique identifier from Hacker News for the post|
|title        |the title of the post
|url          |the URL that the posts link to
|num_points   |the number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes|
|num_comments |the number of comments on the post
|author       |the name of the account that made the post
|created_at   |the date and time the post was made (the time zone is Eastern Time in the US)

## Project Objective

This project will compare the 2 types of posts `Ask HN` and `Show HN` to determine:

- Do `Ask HN` or `Show HN` receive more comments on average?
- Do posts created at a certain time receive more comments on average?

## Read in the csv file as a list of lists

In [1]:
from csv import reader
opened_file = open('hacker_news.csv')
read_file = reader(opened_file)
hn = list(read_file)
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


### 1. Extract the first row of data as headers

In [2]:
# Extract headers from data
headers = hn[0]

print('headers:')
print(headers)

headers:
['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


### 2. Remove headers

In [3]:
# Remove header first row from hn
hn = hn[1:]

print('First 5 row of hn data without headers:')
print(hn[:5])

First 5 row of hn data without headers:
[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


## Find the posts with titles beginning with `Ask HN` or `Show HN`

In [4]:
# Create 3 empty lists
ask_posts = []
show_posts = []
other_posts = []

In [5]:
# Seperate the posts
for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [6]:
# Count the number of posts in each list
length_ask_posts = len(ask_posts)
print(f'Number of "Ask HN" posts: {length_ask_posts}')

print()

length_show_posts = len(show_posts)
print(f'Number of "Show HN" posts: {length_show_posts}')

print()

length_other_posts = len(other_posts)
print(f'Number of other posts: {length_other_posts}')

Number of "Ask HN" posts: 1744

Number of "Show HN" posts: 1162

Number of other posts: 17194


## Display the first 5 rows of data beginning with `Ask HN` and `Show HN`

In [7]:
# Display first 5 rows of data for 'ask_posts' list
print('First 5 rows from ask_posts:')
print(ask_posts[:5])

First 5 rows from ask_posts:
[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14'], ['12210105', 'Ask HN: Looking for Employee #3 How do I do it?', '', '1', '3', 'sph130', '8/2/2016 14:20'], ['10394168', 'Ask HN: Someone offered to buy my browser extension from me. What now?', '', '28', '17', 'roykolak', '10/15/2015 16:38']]


In [8]:
# Display first 5 rows of data for 'show_posts' list
print('First 5 rows from show_posts:')
print(show_posts[:5])

First 5 rows from show_posts:
[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05'], ['12178806', 'Show HN: Webscope  Easy way for web developers to communicate with Clients', 'http://webscopeapp.com', '3', '3', 'fastbrick', '7/28/2016 7:11'], ['10872799', 'Show HN: GeoScreenshot  Easily test Geo-IP based web pages', 'https://www.geoscreenshot.com/', '1', '9', 'kpsychwave', '1/9/2016 20:45']]


## Calculate the average number of comments for `Ask HN` and `Show HN` posts

In [9]:
# Find the total number of comments on ask posts

total_ask_comments = 0
for row in ask_posts:
    num_of_ask_comments = int(row[4])
    total_ask_comments += num_of_ask_comments

# Find the average number of comments on ask posts
avg_ask_comments = total_ask_comments // length_ask_posts

print(f'The average number of comments on ask posts is {avg_ask_comments}')

The average number of comments on ask posts is 14


In [10]:
# Find the total number of comments on show posts

total_show_comments = 0
for row in show_posts:
    num_of_show_comments = int(row[4])
    total_show_comments += num_of_show_comments

# Find the average number of comments on show posts
avg_show_comments = total_show_comments // length_show_posts

print(f'The average number of comments on show posts is {avg_show_comments}')

The average number of comments on show posts is 10


## Find the Number of `Ask HN` Posts and Comments by Hour Created

We'll determine if ask posts created at a certain time are mor likely to attract comments. To perform this analysis:

1. Calculate the number of ask posts created in each hour of the day, along with the number of comments received.

2. Calculate the average number of comments ask posts receive by hour created

### Step 1 - Calculate the number of `Ask HN` posts created in each hour of the day, along with the number of comments received.

In [11]:
# Import the datetime module
import datetime as dt

# Create an empty list 'result_list'
result_list = []

# Iterate over 'ask_posts' list to append number of comments and created date
# number of comments in index 4 and created date in index 6
for post in ask_posts:
    result_list.append([post[6], int(post[4])])

# Create dictionaries 'counts_by_hour' and 'comments_by_hour'
counts_by_hour = {}
comments_by_hour = {}

for item in result_list:
    date = item[0]
    comment = item[1]
    date_format = '%m/%d/%Y %H:%M'
    
    # First is to parse string into datetime object using strptime, then extract the hour portion using strftime
    time = dt.datetime.strptime(date, date_format).strftime('%H')
    
    if time in counts_by_hour:
        comments_by_hour[time] += comment
        counts_by_hour[time] += 1
    else:
        comments_by_hour[time] = comment
        counts_by_hour[time] = 1

comments_by_hour

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

### Step 2 - Calculate the Average Number of Comments for `Ask HN` Posts by Hour

Next, we use the two dictionaries `comments_by_hour` and `counts_by_hour` to calculate the average number of comments for posts created during each hour of the day

In [12]:
avg_by_hour = []

for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour] / counts_by_hour[hour]])

avg_by_hour

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

## Sorting and Printing Values

Sort the obtained results in order to identify the hours with the highest number of comments

In [13]:
# Create empty list 'swap_avg_by_hour'
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(swap_avg_by_hour)

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

sorted_swap

[[5.5777777777777775, '09'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [16.796296296296298, '16'], [7.985294117647059, '23'], [9.41095890410959, '12'], [11.46, '17'], [38.5948275862069, '15'], [16.009174311926607, '21'], [21.525, '20'], [23.810344827586206, '02'], [13.20183486238532, '18'], [7.796296296296297, '03'], [10.08695652173913, '05'], [10.8, '19'], [11.383333333333333, '01'], [6.746478873239437, '22'], [10.25, '08'], [7.170212765957447, '04'], [8.127272727272727, '00'], [9.022727272727273, '06'], [7.852941176470588, '07'], [11.051724137931034, '11']]


[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

In [14]:
# Sort the values and print out the top 5 hours with highest average number of comments
print('Top 5 Hours for Ask Posts Comments')
for avg, hr in sorted_swap[:5]:
    print(
        '{hour}: {avg_comment:.2f} average comments per post'.format(
            hour = dt.datetime.strptime(hr,'%H').strftime('%H:%M'), avg_comment = avg))

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The dataset timezone is in US Eastern Time

Based on the above analysis, 15:00 hrs had the highest number of average comments per post (38.59), followed by 02:00 hrs (23.81), 20:00 hrs (21.52), 16:00 hrs (16.80) and 21:00 hrs (16.01).

Therefore, to have a higher chance of receiving comments for post created, it is advisable to create a post at either 15:00 hrs (or 04:00 hrs SGT) or 02:00 hrs (or 15:00hrs SGT)

## Determine if show or ask posts receive more points on average

### Calculate average number of points received by show posts

In [15]:
print(f'Total number of show posts: {length_show_posts}')

print()

# Find the total number of points received by show posts
total_points_show_posts = 0

for row in show_posts:
    points_show_post = int(row[3])
    total_points_show_posts += points_show_post

print(f"Total number of points received by show posts: {total_points_show_posts}")

print()

# Find average number of points received by show posts
avg_points_show_posts = total_points_show_posts // length_show_posts

print(f'Average number of points received by show posts: {avg_points_show_posts}')

Total number of show posts: 1162

Total number of points received by show posts: 32019

Average number of points received by show posts: 27


### Calculate average number of points received by ask posts

In [16]:
print(f'Total number of ask posts: {length_ask_posts}')

print()

# Find the total number of points received by ask posts
total_points_ask_posts = 0

for row in ask_posts:
    points_ask_post = int(row[3])
    total_points_ask_posts += points_ask_post

print(f"Total number of points received by ask posts: {total_points_ask_posts}")

print()

# Find average number of points received by ask posts
avg_points_ask_posts = total_points_ask_posts // length_ask_posts

print(f'Average number of points received by ask posts: {avg_points_ask_posts}')

Total number of ask posts: 1744

Total number of points received by ask posts: 26268

Average number of points received by ask posts: 15


Based on the above calculation, it can be seen that show posts receive a higher number of points (27) on average as compared to ask posts (15)

## Determine if `Ask HN` posts created at a certain time are more likely to receive more points

### Step 1 - Calculate the number of `Ask Posts` created in each hour of the day, along with the number of points received.

In [17]:
# Import the datetime module
import datetime as dt

# Create an empty list 'ask_point_date_list'
ask_point_date_list = []

# Iterate over 'ask_posts' list to append number of points and created date
# number of points in index 3 and created date in index 6
for post in ask_posts:
    ask_point_date_list.append([post[6], int(post[3])])

# Create dictionaries 'ask_counts_by_hour' and 'ask_points_by_hour'
# ask_counts_by_hour tracks number of occurrences for each hour of the day
# ask_points_by_hour tracks number of points for each hour of the day
ask_counts_by_hour = {}
ask_points_by_hour = {}

for item in ask_point_date_list:
    date = item[0]
    point = item[1]
    date_format = '%m/%d/%Y %H:%M'
    
    # First is to parse string into datetime object using strptime, then extract the hour portion using strftime
    time = dt.datetime.strptime(date, date_format).strftime('%H')
    
    if time in ask_counts_by_hour:
        ask_points_by_hour[time] += point
        ask_counts_by_hour[time] += 1
    else:
        ask_points_by_hour[time] = point
        ask_counts_by_hour[time] = 1

print(f'Number of ask points by hour: \n{ask_points_by_hour}')

print()

print(f'Ask counts by hour: {ask_counts_by_hour}')

Number of ask points by hour: 
{'09': 329, '13': 2062, '10': 1102, '14': 1282, '16': 2522, '23': 581, '12': 782, '17': 1941, '15': 3479, '21': 1721, '20': 1151, '02': 793, '18': 1741, '03': 374, '05': 552, '19': 1513, '01': 700, '22': 511, '08': 515, '04': 389, '00': 451, '06': 591, '07': 361, '11': 825}

Ask counts by hour: {'09': 45, '13': 85, '10': 59, '14': 107, '16': 108, '23': 68, '12': 73, '17': 100, '15': 116, '21': 109, '20': 80, '02': 58, '18': 109, '03': 54, '05': 46, '19': 110, '01': 60, '22': 71, '08': 48, '04': 47, '00': 55, '06': 44, '07': 34, '11': 58}


### Step 2 - Calculate the average number of points for `Ask HN` Posts by Hour

In [18]:
# Create empty list 'avg_ask_points_by_hour'
avg_ask_points_by_hour = []

# Iterate through the list 'ask_points_by_hour'
# Append hour of the day and average points per hour to the list 'avg_ask_points_by_hour'
for hour in ask_points_by_hour:
    avg_ask_points_by_hour.append([hour, ask_points_by_hour[hour] / ask_counts_by_hour[hour]])

avg_ask_points_by_hour

[['09', 7.311111111111111],
 ['13', 24.258823529411764],
 ['10', 18.677966101694917],
 ['14', 11.981308411214954],
 ['16', 23.35185185185185],
 ['23', 8.544117647058824],
 ['12', 10.712328767123287],
 ['17', 19.41],
 ['15', 29.99137931034483],
 ['21', 15.788990825688073],
 ['20', 14.3875],
 ['02', 13.672413793103448],
 ['18', 15.972477064220184],
 ['03', 6.925925925925926],
 ['05', 12.0],
 ['19', 13.754545454545454],
 ['01', 11.666666666666666],
 ['22', 7.197183098591549],
 ['08', 10.729166666666666],
 ['04', 8.27659574468085],
 ['00', 8.2],
 ['06', 13.431818181818182],
 ['07', 10.617647058823529],
 ['11', 14.224137931034482]]

### Step 3 - Sort the obtained values to identify the top 5 results with highest average number of points and the corresponding hour created 

In [19]:
# Create empty list 'swap_avg_ask_points_by_hour'
swap_avg_ask_points_by_hour = []

# Swap the position of average points per hour and the hour of the day
for row in avg_ask_points_by_hour:
    swap_avg_ask_points_by_hour.append([row[1], row[0]])

print(swap_avg_ask_points_by_hour)

# Sort the list 'swap_avg_ask_points_by_hour' in descending order
sorted_swap = sorted(swap_avg_ask_points_by_hour, reverse=True)

sorted_swap

[[7.311111111111111, '09'], [24.258823529411764, '13'], [18.677966101694917, '10'], [11.981308411214954, '14'], [23.35185185185185, '16'], [8.544117647058824, '23'], [10.712328767123287, '12'], [19.41, '17'], [29.99137931034483, '15'], [15.788990825688073, '21'], [14.3875, '20'], [13.672413793103448, '02'], [15.972477064220184, '18'], [6.925925925925926, '03'], [12.0, '05'], [13.754545454545454, '19'], [11.666666666666666, '01'], [7.197183098591549, '22'], [10.729166666666666, '08'], [8.27659574468085, '04'], [8.2, '00'], [13.431818181818182, '06'], [10.617647058823529, '07'], [14.224137931034482, '11']]


[[29.99137931034483, '15'],
 [24.258823529411764, '13'],
 [23.35185185185185, '16'],
 [19.41, '17'],
 [18.677966101694917, '10'],
 [15.972477064220184, '18'],
 [15.788990825688073, '21'],
 [14.3875, '20'],
 [14.224137931034482, '11'],
 [13.754545454545454, '19'],
 [13.672413793103448, '02'],
 [13.431818181818182, '06'],
 [12.0, '05'],
 [11.981308411214954, '14'],
 [11.666666666666666, '01'],
 [10.729166666666666, '08'],
 [10.712328767123287, '12'],
 [10.617647058823529, '07'],
 [8.544117647058824, '23'],
 [8.27659574468085, '04'],
 [8.2, '00'],
 [7.311111111111111, '09'],
 [7.197183098591549, '22'],
 [6.925925925925926, '03']]

In [20]:
# Print out the top 5 hours with highest average number of points
print('Top 5 Hours for Ask Posts average number of points')
for avg, hr in sorted_swap[:5]:
    print(
        '{hour}: {avg_points:.2f} average points per post'.format(
            hour = dt.datetime.strptime(hr,'%H').strftime('%H:%M'), avg_points = avg))

Top 5 Hours for Ask Posts average number of points
15:00: 29.99 average points per post
13:00: 24.26 average points per post
16:00: 23.35 average points per post
17:00: 19.41 average points per post
10:00: 18.68 average points per post


The dataset timezone is in US Eastern Time

Based on the above analysis, 15:00 hrs had the highest number of average points per ask post (29.99), followed by 13:00 hrs (24.26), 16:00 hrs (23.35), 17:00 hrs (19.41) and 10:00 hrs (18.68).

Therefore, to have a higher chance of scoring points for post created, it is advisable to create a post at either 15:00 hrs (or 04:00 hrs SGT) or 13:00 hrs (or 02:00hrs SGT)

## Determine if `Show HN` posts created at a certain time are more likely to receive more points

### Step 1 - Calculate the number of `Show HN` posts created in each hour of the day, along with the number of points received.

In [21]:
# Import the datetime module
import datetime as dt

# Create an empty list 'show_point_date_list'
show_point_date_list = []

# Iterate over 'show_posts' list to append number of points and created date
# number of points in index 3 and created date in index 6
for post in show_posts:
    show_point_date_list.append([post[6], int(post[3])])

# Create dictionaries 'counts_by_hour' and 'points_by_hour'
# counts_by_hour tracks number of occurrences for each hour of the day
# points_by_hour tracks number of points for each hour of the day
show_counts_by_hour = {}
show_points_by_hour = {}

for item in show_point_date_list:
    date = item[0]
    point = item[1]
    date_format = '%m/%d/%Y %H:%M'
    
    # First is to parse string into datetime object using strptime, then extract the hour portion using strftime
    time = dt.datetime.strptime(date, date_format).strftime('%H')
    
    if time in show_counts_by_hour:
        show_points_by_hour[time] += point
        show_counts_by_hour[time] += 1
    else:
        show_points_by_hour[time] = point
        show_counts_by_hour[time] = 1

print(f'Number of show points by hour: \n{show_points_by_hour}')

print()

print(f'Show counts by hour: {show_counts_by_hour}')

Number of show points by hour: 
{'14': 2187, '22': 1856, '18': 2215, '07': 494, '20': 1819, '05': 104, '16': 2634, '19': 1702, '15': 2228, '03': 679, '17': 2521, '06': 375, '02': 340, '13': 2438, '08': 519, '21': 866, '04': 386, '11': 1480, '12': 2543, '23': 1526, '09': 553, '01': 700, '10': 681, '00': 1173}

Show counts by hour: {'14': 86, '22': 46, '18': 61, '07': 26, '20': 60, '05': 19, '16': 93, '19': 55, '15': 78, '03': 27, '17': 93, '06': 16, '02': 30, '13': 99, '08': 34, '21': 47, '04': 26, '11': 44, '12': 61, '23': 36, '09': 30, '01': 28, '10': 36, '00': 31}


### Step 2 - Calculate the average number of points for `Show HN` Posts by Hour

In [22]:
# Create empty list 'avg_show_points_by_hour'
avg_show_points_by_hour = []

# Iterate through the list 'show_points_by_hour'
# Append hour of the day and average points per hour to the list 'avg_show_points_by_hour'
for hour in show_points_by_hour:
    avg_show_points_by_hour.append([hour, show_points_by_hour[hour] / show_counts_by_hour[hour]])

avg_show_points_by_hour

[['14', 25.430232558139537],
 ['22', 40.34782608695652],
 ['18', 36.31147540983606],
 ['07', 19.0],
 ['20', 30.316666666666666],
 ['05', 5.473684210526316],
 ['16', 28.322580645161292],
 ['19', 30.945454545454545],
 ['15', 28.564102564102566],
 ['03', 25.14814814814815],
 ['17', 27.107526881720432],
 ['06', 23.4375],
 ['02', 11.333333333333334],
 ['13', 24.626262626262626],
 ['08', 15.264705882352942],
 ['21', 18.425531914893618],
 ['04', 14.846153846153847],
 ['11', 33.63636363636363],
 ['12', 41.68852459016394],
 ['23', 42.388888888888886],
 ['09', 18.433333333333334],
 ['01', 25.0],
 ['10', 18.916666666666668],
 ['00', 37.83870967741935]]

### Step 3 - Sort the obtained values to identify the top 5 results with highest average number of points and the corresponding hour created 

In [23]:
# Create empty list 'swap_avg_show_points_by_hour'
swap_avg_show_points_by_hour = []

# Swap the position of average points per hour and the hour of the day
for row in avg_show_points_by_hour:
    swap_avg_show_points_by_hour.append([row[1], row[0]])

print(swap_avg_show_points_by_hour)

# Sort the list 'swap_avg_show_points_by_hour' in descending order
sorted_show_swap = sorted(swap_avg_show_points_by_hour, reverse=True)

sorted_show_swap

[[25.430232558139537, '14'], [40.34782608695652, '22'], [36.31147540983606, '18'], [19.0, '07'], [30.316666666666666, '20'], [5.473684210526316, '05'], [28.322580645161292, '16'], [30.945454545454545, '19'], [28.564102564102566, '15'], [25.14814814814815, '03'], [27.107526881720432, '17'], [23.4375, '06'], [11.333333333333334, '02'], [24.626262626262626, '13'], [15.264705882352942, '08'], [18.425531914893618, '21'], [14.846153846153847, '04'], [33.63636363636363, '11'], [41.68852459016394, '12'], [42.388888888888886, '23'], [18.433333333333334, '09'], [25.0, '01'], [18.916666666666668, '10'], [37.83870967741935, '00']]


[[42.388888888888886, '23'],
 [41.68852459016394, '12'],
 [40.34782608695652, '22'],
 [37.83870967741935, '00'],
 [36.31147540983606, '18'],
 [33.63636363636363, '11'],
 [30.945454545454545, '19'],
 [30.316666666666666, '20'],
 [28.564102564102566, '15'],
 [28.322580645161292, '16'],
 [27.107526881720432, '17'],
 [25.430232558139537, '14'],
 [25.14814814814815, '03'],
 [25.0, '01'],
 [24.626262626262626, '13'],
 [23.4375, '06'],
 [19.0, '07'],
 [18.916666666666668, '10'],
 [18.433333333333334, '09'],
 [18.425531914893618, '21'],
 [15.264705882352942, '08'],
 [14.846153846153847, '04'],
 [11.333333333333334, '02'],
 [5.473684210526316, '05']]

In [24]:
# Print out the top 5 hours with highest average number of points
print('Top 5 Hours for Show Posts average number of points')
for avg, hr in sorted_show_swap[:5]:
    print(
        '{hour}: {avg_points:.2f} average points per post'.format(
            hour = dt.datetime.strptime(hr,'%H').strftime('%H:%M'), avg_points = avg))

Top 5 Hours for Show Posts average number of points
23:00: 42.39 average points per post
12:00: 41.69 average points per post
22:00: 40.35 average points per post
00:00: 37.84 average points per post
18:00: 36.31 average points per post


The dataset timezone is in US Eastern Time

Based on the above analysis, 23:00 hrs had the highest number of average points per show post (42.39), followed by 12:00 hrs (41.69), 22:00 hrs (40.35), 00:00 hrs (37.84) and 18:00 hrs (36.31).

Therefore, to have a higher chance of scoring points for show post created, it is advisable to create a post at either 15:00 hrs (or 04:00 hrs SGT) or 13:00 hrs (or 02:00hrs SGT)

## Compare the average number of comments received by other posts to `Ask HN` and `Show HN` posts

### Calculate the average number of comments received by other posts

In [25]:
# Find the total number of comments on other posts

total_other_comments = 0
for row in other_posts:
    num_of_other_comments = int(row[4])
    total_other_comments += num_of_other_comments

# Find the average number of comments on other posts
avg_other_comments = total_other_comments // length_other_posts

print(f'The average number of comments on other posts is {avg_other_comments}')

The average number of comments on other posts is 26


From previous analysis, the average number of comments on `Ask HN` posts is 14 and the average number of comments on `Show HN` posts is 10. Hence, we can observe that other posts garnered higher average number of comments than `Ask HN` and `Show HN` posts.

## Compare the average number of points received by other posts to `Ask HN` and `Show HN` posts

### Calculate the average number of comments received by other posts

In [26]:
print(f'Total number of other posts: {length_other_posts}')

print()

# Find the total number of points received by other posts
total_points_other_posts = 0

for row in other_posts:
    points_other_post = int(row[3])
    total_points_other_posts += points_other_post

print(f"Total number of points received by other posts: {total_points_other_posts}")

print()

# Find average number of points received by other posts
avg_points_other_posts = total_points_other_posts // length_other_posts

print(f'Average number of points received by other posts: {avg_points_other_posts}')

Total number of other posts: 17194

Total number of points received by other posts: 952664

Average number of points received by other posts: 55


From previous analysis, the average number of points on `Ask HN` posts is 15 and the average number of points on `Show HN` posts is 27. Hence, we can observe that other posts garnered higher average number of points than `Ask HN` and `Show HN` posts.