![image](https://computerworld.com.br/wp-content/uploads/2020/02/Setor-de-Health-Tech-cresce-no-Brasil.jpg)
# Best Time to Post Technology Content Online

## Introduction

Technology evolves quickly and requires constant update for the ones following the subject. Professionals and journalists that produce content related to the matter should not only provide relevant information, but also choose the right medium and time to share it.

The objective of this analysis is to verify the best time of the day to post technology content online, in order to maximize the level of reach, engagement and comments.

We analyzed a dataset with posting information from the technology website Hacker News. It allows authors to post questions, projects and products. The website users - mainly technoloty and start-up professionals - can comment and vote upon contents. The dataset was extracted from the website [kaggle](https://www.kaggle.com/) and can be found [here](https://www.kaggle.com/keplaxo/hacker-news). 

The analysis reveals that new posts tend to reach the highest level of engagement at 3 p.m. (Eastern Time in the U.S.). At this hour, content tend to get the highest number of the comments from the community.

## Data preparation

We start the analysis by opening and exploring the datafile.

In [1]:
# open the hacker news file
from csv import reader
opened = open('hacker_news.csv')
read = reader(opened)
hn = list(read)

In the code below, we show  the dataset columns.

In [2]:
# store the first row as the header and print it
hn_header = hn[0]
print(hn_header)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


We have 7 columns:
- **'id'**: the identification of the post.
- **'title'**: the title of the post.
- **'url'**: the url of the post.
- **'num_points'**: the total number of points voted upon the post.
- **'num_comments'**: number of comments received by the post.
- **'author'**: the author name.
- **'created_at'**: the date and time of the post.

Then, we remove the column names and verify the total number of entries.

In [3]:
# remove the column from the file
hn = hn[1:]

In [4]:
# verify the number of rows in the dataset
num_rows = 0
for row in hn:
    num_rows += 1
    
print(num_rows)

20100


The datafile includes a total of 20,100 rows. In the code below, we can see the first five rows of the file. All values are of string type.

In [5]:
# display the first 5 rows of the dataset
for row in hn[:5]:
    print(row)
    print("\n")

['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52']


['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30']


['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']


['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']


['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']




We then check if there is any post without comments. We can see below that the datafile includes only posts that has at least one comment.

In [6]:
# verify the number of entries with no comments
num_zero_comment = 0
for row in hn:
    comment = int(row[4])
    if comment == 0:
        num_zero_comment += 1
        
print(num_zero_comment)

0


## Analysis

The Hacker News website allows for different types of posts.
- **Ask Posts** are the ones with questions asked to the community.
- **Show Posts** include projects, news or products.
- **Other Projects** comprise any other type of posts.

Next, we split the dataset in three, according to the type of post.

In [7]:
# create three separate lists with different types of entries: Ask HN, Show HN and other rows.
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
        
    elif title.lower().startswith('show hn'):
        show_posts.append(row)
        
    else:
        other_posts.append(row)

In [8]:
# checking the number of posts in ask_posts, show_posts and other_posts
total_ask = 0
total_show = 0
total_other = 0

for row in ask_posts:
    total_ask += 1
    
for row in show_posts:
    total_show += 1
    
for row in other_posts:
    total_other += 1
    
print(total_ask, total_show, total_other) # total number of entries in each list
print(total_ask + total_show + total_other)# sum of posts of the three lists should sum up to 20,100

1744 1162 17194
20100


Above we see that there are 1,744 Ask Posts, 1,162 Show Posts and 17,194 Other Posts. Therefore, comparing Ask and Show posts, the former encompasses a higher number than the latter.

We continue our analysis calculating the average number of comments for Ask and Show posts.

### Average Number of Comments per Post Type

In [9]:
# Calculate the average number of comments in ask posts
total_ask_comments = 0
length = 0
for row in ask_posts:
    length += 1
    num_comments = int(row[4])
    total_ask_comments += num_comments

avg_ask_comments =  total_ask_comments / length 

# Calculate the average number of comments in show posts
total_show_posts = 0
length_show = 0
for row in show_posts:
    length_show += 1
    num_comments_show = int(row[4])
    total_show_posts += num_comments_show

avg_show_comments = total_show_posts / length_show

print('The average number of comments for Ask HN is ' + str(round(avg_ask_comments,2)))
print('The average number of comments for Show HN is ' + str(round(avg_show_comments,2)))

The average number of comments for Ask HN is 14.04
The average number of comments for Show HN is 10.32


As we can see above, Ask Posts receive on average more comments than Show Posts. We will focus on Ask Posts to continue our analysis. 

Next, we create a new list with two information, the date and number of comments per post.

In [10]:
# create a list of lists of two elements: the date and the number of comments
import datetime as dt
result_list = []
for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    result_list.append([created_at, num_comments])

Then, we calculate the total number of posts per hour and also the total number of comments per hour.

### Number of Posts and Comments per Time of the Day

In [11]:
# create two dictionaries: number of posts per hour and number of comments per hour
counts_by_hour = {}
comments_by_hour = {}
for row in result_list:
    date = row[0]
    date_dt = dt.datetime.strptime(date, '%m/%d/%Y %H:%M')
    hour = dt.datetime.strftime(date_dt, '%H')
    comments = int(row[1])
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comments
        
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comments

In [12]:
# create a list of lists with two elements: first the hour and second the average number of comments
avg_by_hour = []
for hour in counts_by_hour:
    avg_by_hour.append([hour, round((comments_by_hour[hour] / counts_by_hour[hour]), 2)])

Below, we print the total number of Ask Posts per hour and the total number of Comments on Ask Posts per hour. We see that 15:00 is the time with the highest number of posts.

#### Posts per Time of the Day

In [13]:
# print counts by hour
counts_by_hour

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

And with the highest number of comments in one hour.

#### Comments per Time of the Day

In [14]:
# print comments by hour
comments_by_hour

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

Below, we calculate the average number of comments by hour. The highest number of comments is reached at 15:00,  38.59 comments on average.

### Average Number of Comments per Time of the Day

In [15]:
# print avg_by_hour
for hour in avg_by_hour:
    print(hour)

['09', 5.58]
['13', 14.74]
['10', 13.44]
['14', 13.23]
['16', 16.8]
['23', 7.99]
['12', 9.41]
['17', 11.46]
['15', 38.59]
['21', 16.01]
['20', 21.52]
['02', 23.81]
['18', 13.2]
['03', 7.8]
['05', 10.09]
['19', 10.8]
['01', 11.38]
['22', 6.75]
['08', 10.25]
['04', 7.17]
['00', 8.13]
['06', 9.02]
['07', 7.85]
['11', 11.05]


We can also sort the hours according to the average number of comments. To do that, we start by creating a new list including two elements, the average number of comments and the hour.

In [16]:
# sort list according to average number of comments
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
print(swap_avg_by_hour)

[[5.58, '09'], [14.74, '13'], [13.44, '10'], [13.23, '14'], [16.8, '16'], [7.99, '23'], [9.41, '12'], [11.46, '17'], [38.59, '15'], [16.01, '21'], [21.52, '20'], [23.81, '02'], [13.2, '18'], [7.8, '03'], [10.09, '05'], [10.8, '19'], [11.38, '01'], [6.75, '22'], [10.25, '08'], [7.17, '04'], [8.13, '00'], [9.02, '06'], [7.85, '07'], [11.05, '11']]


Then, we can order this list according to the average number of comments.

In [17]:
# sort the swaped list according to the average number of comments
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

In [18]:
print("Top 5 Hours for Ask Posts Comments")

Top 5 Hours for Ask Posts Comments


Finally, we print the Top-5 hours with the highest average number of comments. As expected, 15:00 displays the highest average number of comments, 38, followed by 2:00 and 20:00, with 23 and 21 respectivelly.

In [19]:
# show the top 5 hours
# 15:00: 38.59 average comments per post
for item in sorted_swap[:5]:
    hour = item[1]
    avg = item[0]
    format_str = '{}:00: {:.2f} average comments per post'
    print(format_str.format(hour, avg))

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


## Conclusion

The aim of this project was to shed light on the best moments to post technology content online. This includes finding the best time of the day to capture users attention and generate engagement. Our approach was to analyze a dataset from Hacker News, a popular technology website on which users can post questions and projects.

The analysis showed that at 15:00 (Eastern Time in the US) posts tend to generate the highest average number of comments. Therefore, this seems to be a good time of the day for posting content online to engage with the technology community.