## Top Three Best Times to get YOUR Message Out on Hacker News

In this project, we'll discover the top three best times to post content for optimal engagement. Our data set is from Hacker News, comprised of submissions information.

The working data set was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments. Most of our courses are on data analytics/data science, and I have expanded learning to include cloud interaction and Power BI. 

The goal is to analyze existing data about already high engagement articles to find the best placement for optimal engagement. To make our recommendation, we'll try to find out: 
We'll compare these two types of posts to determine the following: 
* Do Ask HN or Show HN receive more comments on average? 
* Do posts created at a specific time receive more comments on average? 
 
I collected and sorted the data from top to engagement topics Ask HN and Show HN to:
* Determine which engagement topic receives more comments on average. 
* Determine which post created at a particular time receive more comments on average. 
* Propose a strategy for increased exposure based on analyzed data. 
 
#### Summary of Results 

After analyzing the data, engagement, or comments per post success can range from 60-85% based on two to three different Eastern Standard time postings. These timely postings in conjunction with backlinks, informing viewers about post schedules before, and doing series will work to get optimal results. 

### Exploring the raw data before cleaning

Since we're only concerned with post titles beginning with Ask HN or Show HN, we'll create: new lists of lists containing just the data for those titles.
* New lists separating the data
* Determine the Average number of comments for each title
* Determine which out of the two to further analyize



In [1]:
from csv import reader
opened_file=open('hacker_news.csv')
read_file=reader(opened_file)
hn=list(read_file)

#extract first row/header from raw data
headers=hn[0]
hn=hn[1:]

#display first four row to verify proper removal
print(headers)
print()
print(hn[:3])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20']]


### Separate posts begining with Ask HN and Show HN

In [2]:
ask_posts=[]
show_posts=[]
other_posts=[]

#list for lower case version of titles: show hn, ask hn
for row in hn:
    title=row[1]
    if title.lower().startswith('ask hn'):
        ask_posts.append(row)
    if title.lower().startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)

#verify number of posts with desired titles
print("Number of Ask Posts: ", len(ask_posts))
print("Number of Show Posts: ", len(show_posts))
print("Other posts total: ", len(other_posts))

Number of Ask Posts:  1744
Number of Show Posts:  1162
Other posts total:  18938


In [3]:
print(ask_posts[:3])
print()
print(show_posts[:3])

[['12296411', 'Ask HN: How to improve my personal website?', '', '2', '6', 'ahmedbaracat', '8/16/2016 9:55'], ['10610020', 'Ask HN: Am I the only one outraged by Twitter shutting down share counts?', '', '28', '29', 'tkfx', '11/22/2015 13:43'], ['11610310', 'Ask HN: Aby recent changes to CSS that broke mobile?', '', '1', '1', 'polskibus', '5/2/2016 10:14']]

[['10627194', 'Show HN: Wio Link  ESP8266 Based Web of Things Hardware Development Platform', 'https://iot.seeed.cc', '26', '22', 'kfihihc', '11/25/2015 14:03'], ['10646440', 'Show HN: Something pointless I made', 'http://dn.ht/picklecat/', '747', '102', 'dhotson', '11/29/2015 22:46'], ['11590768', 'Show HN: Shanhu.io, a programming playground powered by e8vm', 'https://shanhu.io', '1', '1', 'h8liu', '4/28/2016 18:05']]


### Finding the Average number of comments for the two selected titles

In [4]:
total_ask_comments=0

for row in ask_posts:
    num_comments=int(row[4])
    total_ask_comments+=num_comments
      
avg_ask_comments=total_ask_comments/len(ask_posts)
avg_ask_comments=round(avg_ask_comments, 2)
print('Average number of comments for Ask HN: ',avg_ask_comments, '%')


Average number of comments for Ask HN:  14.04 %


In [5]:
total_show_comments=0

for row in show_posts:
    num_comments=int(row[4])
    total_show_comments+=num_comments
      
avg_show_comments=total_show_comments/len(show_posts)
avg_show_comments=round(avg_show_comments, 2)
print('Average number of comments for Show HN: ',avg_show_comments, '%')


Average number of comments for Show HN:  10.32 %


At this point, the data shows that Ask HN is the better choice to futher explore. Now focus will be on forming the data into hourly time frames, establishing counts, and averaging numbers to find the top five best times.

### Determine if ask posts created at a certain time are more likely to attract comments. 

In [6]:
import datetime as dt
results_list=[]
for row in ask_posts:
    created_at=row[6]
    comments=int(row[4])
    results_list.append([created_at, comments])

print(results_list[:9])
    
    

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17], ['9/26/2015 23:23', 1], ['4/22/2016 12:24', 4], ['11/16/2015 9:22', 1], ['2/24/2016 17:57', 1]]


In [7]:
#two dictionaries containing counts by hour and comments by hour the number of comments received.

import datetime as dt
counts_by_hour={}
comments_by_hour={}
date_format= "%m/%d/%Y %H:%M"

# Calculate the amount of ask posts created during each hour of day and 
for row in results_list:
    date_hour=row[0]
    comment=row[1]
    time=dt.datetime.strptime(date_hour, date_format).strftime("%H")
    
    if time in counts_by_hour:
        counts_by_hour[time]+=1
        comments_by_hour[time]+=comment
    else:
        counts_by_hour[time]=1
        comments_by_hour[time]=comment

comments_by_hour

{'00': 447,
 '01': 683,
 '02': 1381,
 '03': 421,
 '04': 337,
 '05': 464,
 '06': 397,
 '07': 267,
 '08': 492,
 '09': 251,
 '10': 793,
 '11': 641,
 '12': 687,
 '13': 1253,
 '14': 1416,
 '15': 4477,
 '16': 1814,
 '17': 1146,
 '18': 1439,
 '19': 1188,
 '20': 1722,
 '21': 1745,
 '22': 479,
 '23': 543}

In [8]:
#calculate the average number of comment for posts created during each hour of the day
avg_by_hour=[]
for hour in comments_by_hour:
    avg_by_hour.append([hour, comments_by_hour[hour]/counts_by_hour[hour]])
    
avg_by_hour

[['13', 14.741176470588234],
 ['18', 13.20183486238532],
 ['14', 13.233644859813085],
 ['03', 7.796296296296297],
 ['04', 7.170212765957447],
 ['01', 11.383333333333333],
 ['15', 38.5948275862069],
 ['19', 10.8],
 ['11', 11.051724137931034],
 ['16', 16.796296296296298],
 ['22', 6.746478873239437],
 ['09', 5.5777777777777775],
 ['06', 9.022727272727273],
 ['23', 7.985294117647059],
 ['10', 13.440677966101696],
 ['07', 7.852941176470588],
 ['05', 10.08695652173913],
 ['17', 11.46],
 ['21', 16.009174311926607],
 ['08', 10.25],
 ['20', 21.525],
 ['12', 9.41095890410959],
 ['02', 23.810344827586206],
 ['00', 8.127272727272727]]

#### Sorting the list of lists and printing the five highest values in a format that's easier to read

In [9]:
#Swapped the data order to reflect average first then hour.
swap_avg_by_hour=[]
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])
    
print(swap_avg_by_hour[:3])

[[14.741176470588234, '13'], [13.20183486238532, '18'], [13.233644859813085, '14']]


In [10]:
sorted_swap=sorted(swap_avg_by_hour, reverse=True)
print("The Top 5 Hours for Ask Posts Comments: ")
print()
for avg, hr in sorted_swap[:5]:
    print(
        "{}: {:.2f} average comments per post".format(
            dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg
        )
    )

The Top 5 Hours for Ask Posts Comments: 

15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The is a 39% chance that posts will recieve comments if created around the 15:00 hour or 3:00 pm; this increases to 63%, when an additional post is created at 02:00 or 2:00 am. Finally, an 85% chance when a third posting time is created at 20:00 or 8:00 p.m. All times are Easter Standard time.

Source: (data set)[https://www.kaggle.com/hacker-news/hacker-news-posts]

## Conculsion
The problem: which forum on Hacker News give the highest engagement per posts? What does that look like and how to best strategized to achieve this. 
The solution, the most engaged forums are Ask HN and Show HN. Ask HN takes the lead over Show HN by around 4%. When we analyze further engagement or comments per post sucess can range from 60-85% based on two to three different Eastern Standard time postings. Namely, 3:00 p.m. at 39%, 8:oo p.m. at 22%, and 2:00 p.m. at 24%.

It should be noted that the data excluded posts without comments. And its more accurate to say that the data reflects post that recieved comments.

