# Timing Engagement: Analyzing the Hourly Impact on Social Media Comment Volume

### Abstract

Analysis of posts from the Hacker News (HN) Forum was conducted on a sample of 20,000 posts scraped from the site during Sep. 2015 - Sep. 2016. Posts were separated into three distinct categories (`Ask HN`, `Show HN` and `Other`) and the data was explored to identify the impact of post submission hour on the comment volume. For a complete picture of the analysis process and conclusions, refer to the full analysis.

*The main findings are outlined below:*

* Posts falling into the `Ask HN` category generally accrued a greater average number of comments per post (**cpp**). `Ask` posts accrued an average of 14.04 cpp, compared to `Show` posts, receiving 10.32 cpp. 

* It was also noted that `Ask HN` posts with the greatest comment volume were typically posted between 3pm - 4pm EST, acquring an average of 38.59 cpp. This is a 60% increase compared to the second greatest posting hour of 2am - 3am EST (with an average of 23.81 cpp). 

* `Show` posts typically received a smaller comment volume than `Ask` posts. `Show` posts showing the largest comment volume (15.77 cpp) were posted between 6pm - 7pm EST.

* Although `Other` posts received a smaller comment volume than some of the top performing posts in the `Ask` category, it was noted that `Other` posts often received a large number of comments regardless of the time of day the post was published.

>### Purpose of this Project
>
>The following portfolio project is designed to apply the skills listed below into a real-world context:
>
>* Manipulating strings for data analysis
>* Object-oriented programming (OOP)
>* Handling dates and times in the data anlysis process
>* Functional programming (specifically, passing existing functions as arguments to other functions)

## Project Background

Hacker News (HN) is a website started by the startup incubator [Y Combinator](https://www.ycombinator.com), where users submit posts, receiving votes and comments (similar to how [Reddit](https://www.reddit.com) operates). HN is extremely popular in technology and startup circles, with posts making it to the top of the listings often receiving hundreds of thousands of visitors and impressions as a result. 

Users are able to submit posts to the HN community. Two of the common categories of post include:
* `Ask HN` - where users ask the HN community a question or introduce a topic of discussion 
* `Show HN` - where users show the HN community a product, project or an interesting piece of media they have found

>The goal of this data analysis process is to compare the above types of post to determine the following:
>* Do `Ask HN` or `Show HN` posts receive more comments on average?
>* Do posts created at a certain time of day receive more comments on average?

## About the Dataset

The original dataset can be found [here](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts), a scrape of a year's worth of HN posts from September 2015 - September 2016. The raw dataset contains over 300,000 records.

For the purposes of this portfolio project, the original dataset has been downsampled to contain 20,000 records by removing all posts from the original dataset that received no comments and taking a random-sample of the remaining records. 

#### Dataset Schema

The following table summarises the fields found in the dataset:

|Column name |Description                                                                                                   |
|------------|--------------------------------------------------------------------------------------------------------------|
|id          |The unique identifier for each post                                                                           |
|title       |The title of the post                                                                                         |
|url         |The url the post links to (if post has a linked url)                                                          |
|num_points  |The number of points the post acquired (total number of downvotes subtracted from the total number of upvotes)|
|num_comments|The number of comments on the post                                                                            |
|author      |The username of the person who submitted the post                                                             |
|created_at  |The date and time (**EST Timezone**) of a post's creation.                                                    |


## Reading in the data

We begin by opening and reading the contents of the associated csv file. A **context manager** is used to handle file opening/closing to prevent data leaks. The `reader()` function from the csv module is imported from the [Python Standard Library](https://docs.python.org/3/library/csv.html) and the dataset is stored in a list of lists format, assigned to the variable `hn`.

In [1]:
# open the csv file and read in the data, converting to a list of lists.
from csv import reader

with open("hacker_news.csv") as file:
  read_file = reader(file)
  hn = list(read_file)
  
hn[:4]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20']]

The header row is removed from the dataset, assigned to the `headers` variable, reassigning the `hn` varible to contain only the data.

In [2]:
headers = hn[0]
hn = hn[1:]  # remove the header row from the data

headers

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

In [3]:
hn[:3]  # show the first three rows

[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20']]

## Organising the Dataset

Since we are only concerned with posts that have titles beginning with `Ask HN` and `Show HN`, we can filter the dataset to separate each of these types of posts into their own lists. 

The string methods `.startswith()` and `.lower()` can be used to achieve this.

In [4]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
  title = row[1]
  
  if title.lower().startswith('ask hn'): # add any posts starting with 'ask hn' to the ask_posts list
    ask_posts.append(row)
    
  elif title.lower().startswith('show hn'): # add any posts starting with 'show hn' to the show_posts list
    show_posts.append(row)
    
  else:
    other_posts.append(row)

In [5]:
ask_posts[:2]

[['12296411',
  'Ask HN: How to improve my personal website?',
  '',
  '2',
  '6',
  'ahmedbaracat',
  '8/16/2016 9:55'],
 ['10610020',
  'Ask HN: Am I the only one outraged by Twitter shutting down share counts?',
  '',
  '28',
  '29',
  'tkfx',
  '11/22/2015 13:43']]

In [6]:
show_posts[:2]

[['10627194',
  'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform',
  'https://iot.seeed.cc',
  '26',
  '22',
  'kfihihc',
  '11/25/2015 14:03'],
 ['10646440',
  'Show HN: Something pointless I made',
  'http://dn.ht/picklecat/',
  '747',
  '102',
  'dhotson',
  '11/29/2015 22:46']]

In [7]:
other_posts[:2]

[['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30']]

In [8]:
print("Number of ask posts:", len(ask_posts))
print("\n")
print("Number of show posts:", len(show_posts))
print("\n")
print("Number of other posts:", len(other_posts))

Number of ask posts: 1744


Number of show posts: 1162


Number of other posts: 17194


## Determining Average Number of Comments for each Category of Post

In [9]:
total_ask_comments = 0

for row in ask_posts:
  num_comments = int(row[-3])
  total_ask_comments += num_comments # iterate through all the ask posts and add the number of comments to the total

avg_ask_comments = total_ask_comments / len(ask_posts) # calculate the average

print(f"The average number of comments for Ask HN posts is {avg_ask_comments:.2f}")

The average number of comments for Ask HN posts is 14.04


In [10]:
total_show_comments = 0

for row in show_posts:
  num_comments = int(row[-3])
  total_show_comments += num_comments

avg_show_comments = total_show_comments / len(show_posts)

print(f"The average number of comments for Show HN posts is {avg_show_comments:.2f}")

The average number of comments for Show HN posts is 10.32


On average, of the posts receiving at least one comment, we see **engagement is slightly greater for Ask HN posts**, indicating the user base may be more inclined to comment on posts when the author of the post is looking for advice or poses a question. 

It is important to consider that this conclusion has been drawn from a relatively small sample of the available dataset, so there is likely some variability in these calculated averages when compared to the true population means.

Despite this, it is fair to expect the trend to be similar for the entire population, with ask posts receiving a larger number of comments on average.

---

*Note, since we have removed posts from the dataset that did not receive any comments, we have lost some of the finer detail. Including these would skew the mean values, hence their removal, but it does prevent us from making a more complete conclusion.*

*For instance, we cannot confidently conclude that **all** Ask HN posts are more successful on average than Show HN posts; the reality could be that many more Ask HN posts with zero comments are present in the dataset than Show HN posts with zero comments (meaning writing the more engaging Ask posts is more challenging when compared to writing an engaging show post).* 

*Further analysis would be required to make a more informed conclusion here, but this lies outside of the initial scope of the project, so will not be covered here.*

In [11]:
total_other_comments = 0

for row in other_posts:
  num_comments = int(row[-3])
  total_other_comments += num_comments

avg_other_comments = total_other_comments / len(other_posts)

print(f"The average number of comments for 'Other' posts is {avg_other_comments:.2f}")

The average number of comments for 'Other' posts is 26.87


Briefly looking at the average for posts falling into the `other` category, we see a much greater average than either Ask or Show posts.

## Frequency of Posts and Comments by Hour Created

To get an insight into whether posts created at a specific time of day have a higher number of comments on average, we can first determine the frequency of posts created at each hour during the day (alongside the total number of comments received from all posts for each hour of the day).

To do this, we first import the `datetime` module from the [Python Standard Library](https://docs.python.org/3/library/datetime.html). The alias `dt` is used to refer to the module in the script for improved legibility and to ensure the module name is distinct from the `datetime` class contained within the module. 

The datetime module contains three classes (`datetime`, `time` and `timedelta`) for handling date and time data. In the code block below, we utilise the `.strptime()` method from the `datetime` class to parse the date and time from the `created_at` field in the dataset, extracting the 24-hour representation of the time from each string using the `.strftime()` method. 

Parsing the hour that each post was created from the dataset allows us to create two dictionaries with keys corresponding to all hours in a given day (in 24-hour format). The `posts_by_hour` dictionary stores the number of posts created for each hour in a given day (hour : number_of_posts). The `comments_by_hour` dictionary is similar, but showcases the total number of comments received by all posts created during that hour of the day (hour : total_number_of_comments_for_all_posts).

---

*Ideas from functional programming were implemented in the code below. The `post_creation_and_comments` function is passed as a keyword argument into the following `posts_by_hour` function, enabling the `posts_by_hour` function to be used directly on the original dataset to summarise posts_by_hour or comments_by_hour without the need to write additional code.*

In [12]:
import datetime as dt

def post_creation_and_comments(dataset):  # a function that returns a list of lists containing only the post creation date and number of comments for a given dataset
  results_list =  []
  
  for row in dataset:
    creation_date = row[-1]
    num_comments = int(row[-3])
    
    results_list.append([creation_date, num_comments])
    
  return results_list 

The first five creation dates and times and number of comments for posts in the `ask_posts` dataset outputted from the `post_creation_and_comments` function:

In [13]:
post_creation_and_comments(ask_posts)[:5]

[['8/16/2016 9:55', 6],
 ['11/22/2015 13:43', 29],
 ['5/2/2016 10:14', 1],
 ['8/2/2016 14:20', 3],
 ['10/15/2015 16:38', 17]]

In [14]:
# the posts_by_hour function returns the number of posts created at each hour of the day. 
# if the 'comments' parameter is set to True, the function returns the number of comments received on all posts created at that hour
def posts_by_hour(dataset, func = post_creation_and_comments, comments = False): 
  counts_by_hour = {}
  comments_by_hour = {}
  
  post_info = post_creation_and_comments(dataset) # assigns creation-date/time and number of comments for each post in the specified dataset to a variable by calling the post_creation_and_comments function
  
  for element in post_info:
    creation_date = element[0]
    num_comments = element[1]
    date_format = "%m/%d/%Y %H:%M"
    creation_hour = dt.datetime.strptime(creation_date, date_format).strftime("%H") # converts the creation date to a string in the desired format and extracts the hour only
    
    if creation_hour not in counts_by_hour: # iterate through the creation hours for all posts and count the number of posts and number of comments by hour
      counts_by_hour[creation_hour] = 1 
      comments_by_hour[creation_hour] = num_comments
      
    else:
      counts_by_hour[creation_hour] += 1
      comments_by_hour[creation_hour] += num_comments
      
  if comments: # if the 'comments' parameter is set to True, function outputs the number of comments by hour
    return comments_by_hour
  
  else: # if the 'comments' parameter is set to False, function outputs the number of posts by hour
    return counts_by_hour


The result of passing the `ask_posts` dataset to the `posts_by_hour` function. The 'comments' parameter is set to `False`, meaning the dictionary returned shows the number of posts created at the specified hour of the day.

In [15]:
posts_by_hour(ask_posts, comments=False)

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

As above, but the 'comments' parameter is set to `True`, meaning the dictionary returned shows the total number of comments for all posts created during the specified hour of the day.

In [16]:
posts_by_hour(ask_posts, comments=True)

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

## Organising the Results

To improve the presentation of these results, further functions are defined to organise the `key : value` pairs (with the midnight hour `00` appearing at the top of the output through to 11 pm `23` at the bottom).

As dictionaries are inherently an **unordered** data structure in Python, the `key : value` pairs from each dictionary are first converted to a tuple (as tuples are ordered data structures) enabling the sorted() function to be applied. The resulting output arranges the results found above into a logical order.

In [17]:
def ordered_posts_by_hour(dataset, func = posts_by_hour): # outputs the number of posts by hour in an ordered list
  posts_by_hour = func(dataset) # calls the posts_by_hour function on the specified dataset, returning a dictionary like the one above
  posts_by_hour_display = []
  
  for key in posts_by_hour: # converts key:value pairs from the posts_by_hour dictionary into tuples, adds them to a list and then orders the list of tuples by the key value within each tuple
    key_val_as_tuple = (key, posts_by_hour[key]) 
    posts_by_hour_display.append(key_val_as_tuple)
    posts_by_hour_sorted = sorted(posts_by_hour_display, reverse = False) 
    
  for entry in posts_by_hour_sorted: # iterates over the ordered list of tuples and outputs the key and value in the specified format
    print(entry[0], ":", entry[1])

### Ask HN - Daily Post Frequency and Comments Received

In [18]:
print("A breakdown of the number of Ask HN posts created at each hour of the day: \n")

ordered_posts_by_hour(ask_posts)

A breakdown of the number of Ask HN posts created at each hour of the day: 

00 : 55
01 : 60
02 : 58
03 : 54
04 : 47
05 : 46
06 : 44
07 : 34
08 : 48
09 : 45
10 : 59
11 : 58
12 : 73
13 : 85
14 : 107
15 : 116
16 : 108
17 : 100
18 : 109
19 : 110
20 : 80
21 : 109
22 : 71
23 : 68


These results indicate a greater number of Ask HN posts are submitted onto the site during the early afternoon to the late evening (between the hours of 2pm - 9pm EST).

To create the same functionality for the comments by hour, a new function `comments_by_hour` is defined based on the `posts_by_hour` function, except the comments parameter defaults to True. This enables the same code block above to be used on the `comments_by_hour` function to generate an ordered list of comments by hour.

In [19]:
def comments_by_hour(dataset, comments = True): # outputs the number of comments by hour
  comments_by_hour = posts_by_hour(dataset, comments = True) # calls the posts_by_hour function on the specified dataset wiht the comments parameter set to True
  
  return comments_by_hour

In [20]:
def ordered_comments_by_hour(dataset, func = comments_by_hour): # outputs the number of comments by hour in an ordered list using the same logic as previous
  posts_by_hour = func(dataset)
  posts_by_hour_display = []
  
  for key in posts_by_hour:
    key_val_as_tuple = (key, posts_by_hour[key])
    posts_by_hour_display.append(key_val_as_tuple)
    
    posts_by_hour_sorted = sorted(posts_by_hour_display, reverse = False)
    
  for entry in posts_by_hour_sorted:
    print(entry[0], ":", entry[1])

In [21]:
print("A breakdown of the number of comments Ask HN posts created at each hour of the day received: \n")

ordered_comments_by_hour(ask_posts)

A breakdown of the number of comments Ask HN posts created at each hour of the day received: 

00 : 447
01 : 683
02 : 1381
03 : 421
04 : 337
05 : 464
06 : 397
07 : 267
08 : 492
09 : 251
10 : 793
11 : 641
12 : 687
13 : 1253
14 : 1416
15 : 4477
16 : 1814
17 : 1146
18 : 1439
19 : 1188
20 : 1722
21 : 1745
22 : 479
23 : 543


A similar pattern seems to exist when looking at the total number of comments `Ask HN` posts received. Those that were authored in the early afternoon to late evening (1pm - 9pm EST) received the greatest number of comments.

This is is partly because of the elevated number of `Ask HN` posts created during these hours, but will also likely coincide with the times where there are the greatest number of users engaging with content on the site.


### Show HN - Daily Post Frequency and Comments Received

In [22]:
print("A breakdown of the number of Show HN posts created at each hour of the day: \n")

ordered_posts_by_hour(show_posts)

A breakdown of the number of Show HN posts created at each hour of the day: 

00 : 31
01 : 28
02 : 30
03 : 27
04 : 26
05 : 19
06 : 16
07 : 26
08 : 34
09 : 30
10 : 36
11 : 44
12 : 61
13 : 99
14 : 86
15 : 78
16 : 93
17 : 93
18 : 61
19 : 55
20 : 60
21 : 47
22 : 46
23 : 36


Comparing the `Show HN` posts with the `Ask HN` posts, a similar trend is uncovered, with the majority of posts being authored in the early afternoon to early evening (1pm - 5pm EST).

In [23]:
print("A breakdown of the number of comments Show HN posts created at each hour of the day received: \n")

ordered_comments_by_hour(show_posts)

A breakdown of the number of comments Show HN posts created at each hour of the day received: 

00 : 487
01 : 246
02 : 127
03 : 287
04 : 247
05 : 58
06 : 142
07 : 299
08 : 165
09 : 291
10 : 297
11 : 491
12 : 720
13 : 946
14 : 1156
15 : 632
16 : 1084
17 : 911
18 : 962
19 : 539
20 : 612
21 : 272
22 : 570
23 : 447


Extracting the total number of comments for `Show HN` posts authored each hour suggests there are fewer hours during the day where there are a larger than average number of comments flooding into the `Show HN` posts. We observe posts submitted at 2pm and 4pm EST specifically seem to acquire a larger share of the total comments, with more notable drop-offs outside of this range. 

This is partly because there are a greater number of `Show HN` posts being authored during these times, but is likely also because these times are when the largest number of users are engaging with content on the site.

### Calculating the Average Number of Comments for Posts by Hour

The `avg_number_comments_by_hour` function is defined by passing a `dataset` parameter and the `comments_by_hour` function as a default keyword argument. The dictionary resulting from the `comments_by_hour` function passed on the given dataset is assigned to the variable `comments_by_hour`.

The contents of the comments_by_hour dictionary is iterated over, appending the hour with its associated average number of comments (calculated by `comments_by_hour[key] / posts_by_hour(dataset)[key]`).

In [24]:
def avg_number_comments_by_hour(dataset, func=comments_by_hour): # outputs the average number of comments for each hour
  comments_by_hour = func(dataset) # calls the comments_by_hour function on the specified dataset
  avg_by_hour = []
  
  for key in comments_by_hour:
    avg_by_hour.append([key, comments_by_hour[key] / posts_by_hour(dataset)[key]]) # iterates through the comments_by_hour dictionary and calculates the average number of comments for each hour
    
  return avg_by_hour

avg_number_comments_by_hour(ask_posts)

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

The output generated by the above `avg_number_comments_by_hour` function is then formatted by reversing the items indexed within each inner list and sorting the inner lists such that the greatest average is placed at the start of the list, down to the smallest average at the end of the list.

In [25]:
def ordered_avg_comments_by_hour(dataset, func=avg_number_comments_by_hour): # outputs the average number of comments for each hour in an ordered list, again using the sorted method as previous
  avg_by_hour = func(dataset)
  reversed_avg_by_hour = []
  
  for entry in avg_by_hour:
    hour = entry[0]
    avg_comments = entry[1]
    reversed_avg_by_hour.append([avg_comments, hour])
    
    sorted_avg_by_hour = sorted(reversed_avg_by_hour, reverse = True)
    
  return sorted_avg_by_hour
    
ordered_avg_comments_by_hour(ask_posts)

[[38.5948275862069, '15'],
 [23.810344827586206, '02'],
 [21.525, '20'],
 [16.796296296296298, '16'],
 [16.009174311926607, '21'],
 [14.741176470588234, '13'],
 [13.440677966101696, '10'],
 [13.233644859813085, '14'],
 [13.20183486238532, '18'],
 [11.46, '17'],
 [11.383333333333333, '01'],
 [11.051724137931034, '11'],
 [10.8, '19'],
 [10.25, '08'],
 [10.08695652173913, '05'],
 [9.41095890410959, '12'],
 [9.022727272727273, '06'],
 [8.127272727272727, '00'],
 [7.985294117647059, '23'],
 [7.852941176470588, '07'],
 [7.796296296296297, '03'],
 [7.170212765957447, '04'],
 [6.746478873239437, '22'],
 [5.5777777777777775, '09']]

The final results for the `Ask HN` and `Show HN` datasets are summarised below, returning the top 5 most active hours for comments on each type of post.

In [None]:
print("The top 5 hours for comments on 'Ask HN' posts: \n")
for avg, hour in ordered_avg_comments_by_hour(ask_posts)[:5]:
  print(f"- {dt.datetime.strptime(hour, '%H').strftime('%H:%M')}: {avg:.2f} average comments per post") # converts the hour to a string in the desired format and prints the associated average
  print("\n")

The top 5 hours for comments on 'Ask HN' posts: 

- 15:00: 38.59 average comments per post


- 02:00: 23.81 average comments per post


- 20:00: 21.52 average comments per post


- 16:00: 16.80 average comments per post


- 21:00: 16.01 average comments per post




Posts created at 3pm EST generate the highest number of comments on average (at 38.59) for posts falling in the `Ask HN` category. This is about a 60% increase in the average number of comments from the second most active hour (2am EST at 23.81 average comments per post).

In [27]:
print("The top 5 hours for comments on 'Show HN' posts: \n")
for avg, hour in ordered_avg_comments_by_hour(show_posts)[:5]:
  print(f"- {dt.datetime.strptime(hour, '%H').strftime('%H:%M')}: {avg:.2f} average comments per post")
  print("\n")

The top 5 hours for comments on 'Show HN' posts: 

- 18:00: 15.77 average comments per post


- 00:00: 15.71 average comments per post


- 14:00: 13.44 average comments per post


- 23:00: 12.42 average comments per post


- 22:00: 12.39 average comments per post




Turning our attention to the `Show HN` posts, we see a marked decrease in the number of average comments per post when compared to the `Ask HN` category. 

Unlike `Ask HN` posts, the highest number of comments on average for the `Show HN` appear on posts created during 6pm EST (at 15.77 average comments per post). This is very closely followed by posts created at 12am EST (at 15.71). 

The reasoning for this difference between the two styles of post is not clear based on this simple analysis alone, but the pattern itself is interesting.

In [28]:
print("Hourly breakdown of average comments on 'other' posts: \n")
for avg, hour in ordered_avg_comments_by_hour(other_posts):
  print(f"- {dt.datetime.strptime(hour, '%H').strftime('%H:%M')}: {avg:.2f} average comments per post")
  print("\n")

Hourly breakdown of average comments on 'other' posts: 

- 14:00: 32.33 average comments per post


- 13:00: 30.90 average comments per post


- 12:00: 30.35 average comments per post


- 11:00: 29.59 average comments per post


- 15:00: 29.52 average comments per post


- 17:00: 28.00 average comments per post


- 02:00: 27.79 average comments per post


- 09:00: 27.59 average comments per post


- 00:00: 27.08 average comments per post


- 08:00: 27.03 average comments per post


- 18:00: 26.92 average comments per post


- 03:00: 26.83 average comments per post


- 07:00: 26.81 average comments per post


- 19:00: 26.70 average comments per post


- 10:00: 26.61 average comments per post


- 16:00: 25.39 average comments per post


- 05:00: 25.18 average comments per post


- 23:00: 24.62 average comments per post


- 04:00: 24.13 average comments per post


- 21:00: 23.61 average comments per post


- 22:00: 23.27 average comments per post


- 20:00: 23.14 average comments per post

Finally, briefly examining the 17,000 posts belonging to the `other` dataset (posts that do not fall into either Ask or Show), we see posts generally receive a similar number of comments throughout most of the day since the average number of comments remain fairly consistent across all hours of the day.

### Conclusions

Based on this analysis, **of the posts that received comments**, it would seem in order to produce a HN post that maximizes the likelihood of a large number of comments, it may be best to author an `Ask HN` post and publish it between 3pm - 4pm EST. 

`Show HN` posts appear to produce a lower number of comments on average regardless of the hour the post is published. 

Additionally, it appears posts that do not fall in either the Ask or Show categories seem to lead to more consistent engagement across the board, regardless of the time of day published.