# Exploring Hacker New Post
Hacker News is a site extremely popular in technology and startup circles. As a result, posts that make it to the top of Hacker News' listings can get hundreds of thousands of visitors as as result.

Title with either "Ask HN" or "Show HN" are particular interesting topics. "Ask HN" posts usually are question posts to the Hacker News community and "Show HN" posts are submissions to Hacker News community a project, product or just something interesting.

So, it is interesting to know more about which topics are having more comments on average. The "Ask HN" posts? Or the "Show HN" posts? Are there posts created at a certain time receive more comments on average? 

Let's us explore together! If you are interesting to know more about the dataset, please visit [here](https://www.kaggle.com/hacker-news/hacker-news-posts)

## Import the Hacker New dataset

In [1]:
import csv
opened_file = open('hacker_news.csv')
Reader = csv.reader(opened_file)
All_rows = list(Reader)
headers = All_rows[0]
hn = All_rows[1:]

FileNotFoundError: [Errno 2] No such file or directory: 'hacker_news.csv'

In [None]:
print(headers)
print(hn[:5])

In [None]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    if title.startswith('ask hn'):
        ask_posts.append(row)
    elif title.startswith('show hn'):
        show_posts.append(row)
    else:
        other_posts.append(row)
        
print('There are {} ask posts'.format(len(ask_posts)))
print('There are {} show posts'.format(len(show_posts)))
print('There are {} other posts'.format(len(other_posts)))
    

## Header information

id: The unique identifier from Hacker News for the post

title: The title of the post

url: The URL that the posts links to, if it the post has a URL

num_points: The number of points the post acquired, calculated as the total number of upvotes minus the total number of downvotes

num_comments: The number of comments that were made on the post

author: The username of the person who submitted the post

created_at: The date and time at which the post was submitted

## Let's find the average number of comments in ask posts and show posts

In [None]:
def cal_num(posts, row):
    
    total_comments = 0

    for row in posts:
        num_comment = row[4]
        total_comments += int(num_comment)
    return total_comments

In [None]:
total_ask_comments = cal_num(ask_posts, 4)
aveg_ask_comments = total_ask_comments / len(ask_posts)
print(aveg_ask_comments)

total_show_comments = cal_num(show_posts, 4)
avg_show_comments = total_show_comments / len(show_posts)
print(avg_show_comments)


By comparing the two average numbers above, we can certainly say that there are receiving more comments on average. 

## To analyze the behavior of ask post comments, we would like to study:
If the ask posts created at a certain time are more likely to attract comments. We will:
- Calculate the amount of ask posts created in each hour of day, along with the number of comments received.
- Calculate the average number of comments ask posts receive by hour created.

In [None]:
import datetime as dt

result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comment = row[4]
    result_list.append([created_at, int(num_comment)])

Find the number of posts and number of comments for each hour

In [None]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    row_dt = dt.datetime.strptime(row[0],'%m/%d/%Y %H:%M')
    Hour = row_dt.strftime('%H') # Extract hours from datetime

    if Hour not in counts_by_hour:
        counts_by_hour[Hour] = 1
        comments_by_hour[Hour] = row[1]
    else:
        counts_by_hour[Hour] += 1
        comments_by_hour[Hour] += row[1]

print(counts_by_hour)
print(comments_by_hour)

In [None]:
avg_by_hour = []
for key in counts_by_hour:
    avg = comments_by_hour[key] / counts_by_hour[key]
    avg_by_hour.append([key, avg])

Inspect the frequency dictionary based on 'hour' key

In [None]:
print(avg_by_hour)

Swap the two columns in the average hour dictionary for better inspection.

In [None]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

print(swap_avg_by_hour)

Sort the Swapped Average Hour located the maximum average comment per post at a given time

In [None]:
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

In [None]:
for row in sorted_swap:
    template = '{HourMinute}: {Avg_per_post:.2f} average comments per post'
    Hour = dt.datetime.strptime(row[1],'%H') # Create datetime hour, # Format should match your str
    Pt = Hour.strftime('%H:00') # Convert hour to str
    print(template.format(HourMinute = Pt, Avg_per_post =row[0]))


# Conclusion
We have to notice that the above time schedule are in Eastern Time (EST) in the US. As we are residents in Europe, it would be nice to convert EST to Central Europe Time (CET).


In [None]:
for row in sorted_swap:
    template = '{HourMinute}: {Avg_per_post:.2f} average comments per post'
    Hour = dt.datetime.strptime(row[1],'%H') # Create datetime hour, # Format should match your str
    Hour = Hour + dt.timedelta(hours = 6)
    Pt = Hour.strftime('%H:00') # Convert hour to str
    print(template.format(HourMinute = Pt, Avg_per_post =row[0]))

By refering to the average comments per post for each hour, we found that posts created at around 9pm, 10pm, 2am and 3 am have a higher chance of receiving comments. Post created in the midnight (3am) also have a good chance of receiving comments because 
