# Exploring Hacker News Posts

In this project, the two types of user-submitted stories (known as "posts") will be compared. The two posts that will be compared are the **Ask HN** and the **Show HN** posts. **Ask HN** post are submitted by users to ask the Hacker News community a specific question. **Show HN** posts are submitted by user and show the Hacker News community a project, product, or generally something interesting.

In [2]:
# hacker_news.csv is read in

from csv import reader
file = open('hacker_news.csv')
read_file = reader(file)
hn = list(read_file) # Convert csv to a list of lists

# The header is removed from the data set and assigned to a variable
headers = hn[0]
hn = hn[1:]
print(headers)
print("\n")
print(hn[:4])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


### Filter Data for only Ask HN & Show HN

In [3]:
# Three empty lists created
ask_posts = []
show_posts = []
other_posts = []

# Loop through data set to sort titles into Ask HN and Show HN
for row in hn:
    title = row[1] 
    title = title.lower() #Turn the string into all lower case chracters
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print("# Ask HN Posts:", len(ask_posts))
print("# Show HN Posts:", len(show_posts))
print("# Other Posts:", len(other_posts))

# Ask HN Posts: 1744
# Show HN Posts: 1162
# Other Posts: 17194


### Calculate the Average Number of Comments for Ask HN and Show HN Posts

In [4]:
# Total number of Ask HN Posts comments
total_ask_comments = 0

for row in ask_posts:
    total_ask_comments += int(row[4])

# Average number of ask posts
avg_ask_comments = total_ask_comments / len(ask_posts)
print("Average # Ask HN Posts:", avg_ask_comments)

# Total number of Show HN Posts
total_show_comments = 0

for row in show_posts:
    total_show_comments += int(row[4])
    
# Average number of show posts
avg_show_comments = total_show_comments / len(show_posts)
print("Average # Show HN Posts:", avg_show_comments)    

Average # Ask HN Posts: 14.038417431192661
Average # Show HN Posts: 10.31669535283993


On average the **Ask HN** posts receive 14 comments per post and the **Show HN** receive 10 comments. So, the **Ask HN** posts receive more comments overall and will be explored further for more analysis.

### Find the Amount of Ask Posts and Comments by Hour Created

In [32]:
# Import the datetime module
import datetime as dt

result_list = []
counts_by_hour = {}
comments_by_hour = {}
date_format = "%m/%d/%Y %H:%M"


# Append the time the post was created and the number of comments to a list
for row in ask_posts:
    result_list.append([row[6], int(row[4])])

# Find the number of posts created and comments the post received per hour
for row in result_list:
    date = row[0]
    comment = row[1]
    time = dt.datetime.strptime(date, date_format) # Time formatted by date_formar variable
    time = time.strftime("%H") # Only hour selected
    if time not in counts_by_hour:
        counts_by_hour[time] = 1
        comments_by_hour[time] = comment
    else:
        counts_by_hour[time] += 1
        comments_by_hour[time] += comment

print(comments_by_hour)

{'17': 1146, '04': 337, '12': 687, '20': 1722, '09': 251, '08': 492, '05': 464, '13': 1253, '07': 267, '23': 543, '06': 397, '10': 793, '18': 1439, '01': 683, '16': 1814, '14': 1416, '22': 479, '11': 641, '15': 4477, '02': 1381, '19': 1188, '21': 1745, '00': 447, '03': 421}


### Calculate the Average Number of Comments for Ask HN Posts by Hour

In [51]:
avg_by_hour = []

for hr, num in comments_by_hour.items():
    avg_by_hour.append([hr, num / counts_by_hour[hr]])
    
print(avg_by_hour)

[['17', 11.46], ['04', 7.170212765957447], ['12', 9.41095890410959], ['20', 21.525], ['09', 5.5777777777777775], ['08', 10.25], ['05', 10.08695652173913], ['13', 14.741176470588234], ['07', 7.852941176470588], ['23', 7.985294117647059], ['06', 9.022727272727273], ['10', 13.440677966101696], ['18', 13.20183486238532], ['01', 11.383333333333333], ['16', 16.796296296296298], ['14', 13.233644859813085], ['22', 6.746478873239437], ['11', 11.051724137931034], ['15', 38.5948275862069], ['02', 23.810344827586206], ['19', 10.8], ['21', 16.009174311926607], ['00', 8.127272727272727], ['03', 7.796296296296297]]


### Sorting and Printing Values from a List of Lists

In [63]:
swap_avg_by_hour = []

# The hour and avg # of comments columns are swapped
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse = True) # Sorts the list in ascending order
print("Top 5 Hours for Ask posts Comments")

for avg, hr in sorted_swap[:5]:
    print("{}: {:.2f} average comments per post".format(dt.datetime.strptime(hr, "%H").strftime("%H:%M"),avg))

Top 5 Hours for Ask posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


The hour that receives the most comments per post on average is 15:00, with an average of 38.59 comments per post. There's about a 60% increase in the number of comments between the hours with the highest and second highest average number of comments.

According to the data set documentation, the timezone used is Eastern Time in the US. So, we could also write 15:00 as 3:00 pm est.

# Conclusion

In this project, we analyzed ask posts and show posts to determine which type of post and time receive the most comments on average. Based on our analysis, to maximize the amount of comments a post receives, we'd recommend the post be categorized as ask post and created between 15:00 and 16:00 (3:00 pm est - 4:00 pm est).

However, it should be noted that the data set we analyzed excluded posts without any comments. Given that, it's more accurate to say that of the posts that received comments, ask posts received more comments on average and ask posts created between 15:00 and 16:00 (3:00 pm est - 4:00 pm est) received the most comments on average.