# Analyzing Comments on Hacker News

**In this project, we will be analyzing comments on the popular technology news aggregator site, [Hacker News.](https://news.ycombinator.com/)  
Specifically, we will be exploring two questions:**  

* which type of post, `Ask HN` or `Show HN`, receives more comments on average?  
* whether posts created at a certain time tend to receive more comments on average?  

**The dataset used for this analysis was obtained from the website's official API and includes over 300,000 posts from the past several years.  
Each post contains information such as the `title`, `author`, `creation time`, and `number of comments received`.  
By examining this data, we can gain insights into how different types of posts and posting times can impact user engagement and participation on the site.**


The following code reads in a CSV file called `"hacker_news.csv"` located in the `"C:\DataQuest_Projects\data_files"` directory. The CSV file is then converted into a list of lists called `"hn"` using the `csv.reader()` function from the csv module. Each row of the CSV file becomes a list within the `"hn"` list, and each element of the inner lists corresponds to a column in the CSV file.


In [57]:
from csv import reader

path = "C:\DataQuest_Projects\data_files"
opended_file = open(path+"\hacker_news.csv")
read_file = reader(opended_file)
hn = list(read_file) 

**The following code takes a list of lists called `"hn"` and separates the first list (which presumably contains column headers) from the rest of the data. The headers list is assigned to a variable called `"headers"`, while the rest of the data is reassigned to `"hn"` after omitting the first list.**  
  
**We confirm by printing out the headers list, and then printing out the first six rows of the remaining hn data using list slicing.**

In [60]:
headers = hn[0]
hn = hn[1:]

print(headers)
print("")
print(hn[:6])

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']

[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12'], ['10482257', 'T

**The following code categorizes posts in the `"hn"` dataset into three lists based on their titles: `"ask_posts"`, `"show_posts"`, and `"other_posts"`.**

**For each row in the `"hn"` dataset, the code converts the title of the post to lowercase letters and checks if it starts with `"ask hn"` or `"show hn"` using the `startswith()` method. If the post title starts with `"ask hn"`, the entire row is added to the `"ask_posts"` list. If the post title starts with `"show hn"`, the entire row is added to the `"show_posts"` list. If the post title does not start with either `"ask hn"` or `"show hn"`, the entire row is added to the `"other_posts"` list.**

**Finally, the code prints out the number of posts in each of the three lists using the `len()` function.**

In [79]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    title = row[1].lower()
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)


print("# of posts in ask hn list",len(ask_posts))
print("# of posts in show hn list",len(show_posts))
print("# of posts in other list",len(other_posts))
        

# of posts in ask hn list 1744
# of posts in show hn list 1162
# of posts in other list 17194


**The following code calculates the average number of comments per post for posts in the `"ask_posts"` list.**

**First, the code initializes a variable called `"total_ask_comments"` to zero. It then loops through each row in the `"ask_posts"` list and extracts the number of comments for that post from the fifth element of the row (index 4). The number of comments is converted to an integer using the `int()` function. The number of comments is then added to the `"total_ask_comments"` variable.**

**After looping through all the rows in the `"ask_posts"` list, the code calculates the average number of comments per post by dividing the `"total_ask_comments"` variable by the number of rows in the `"ask_posts"` list using the `len()` function. The result is assigned to a variable called `"ave_ask_comments"`.**

**Finally, the code prints out the average number of comments per post for the `"ask_posts"` list, rounded to two decimal places using the `round()` function.**

In [88]:
total_ask_comments = 0

for row in ask_posts:
    num_comments = int(row[4])
    total_ask_comments += num_comments
    
ave_ask_comments = total_ask_comments / len(ask_posts)
print("Average number of comments per post for 'Ask HN'", round(ave_ask_comments,2))

Average number of comments per post for 'Ask HN' 14.04


**This following code calculates the average number of comments per post for posts in the `"show_posts"` list.**

**First, the code initializes a variable called `"total_show_comments"` to zero. It then loops through each row in the `"show_posts"` list and extracts the number of comments for that post from the fifth element of the row (index 4). The number of comments is converted to an integer using the `int()` function. The number of comments is then added to the `"total_show_comments"` variable.**

**After looping through all the rows in the `"show_posts"` list, the code calculates the average number of comments per post by dividing the `"total_show_comments"` variable by the number of rows in the `"show_posts"` list using the `len()` function. The result is assigned to a variable called `"ave_show_comments"`.**

**Finally, the code prints out the average number of comments per post for the `"show_posts"` list, rounded to two decimal places using the `round() `function.**

In [91]:
total_show_comments = 0

for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
    
ave_show_comments = total_show_comments / len(show_posts)
print("Average number of comments per post for 'Show HN'", round(ave_show_comments,2))

Average number of comments per post for 'Show HN' 10.32


# Analysis of Average Number of Comments for 'Ask HN' and 'Show HN' Posts

**After analyzing the data, we found that `Ask HN` posts receive more comments on average compared to `Show HN` posts. 
The average number of comments per `Ask HN` post is `14.04`, while the average number of comments per `Show HN` post is `10.32`.**

**This finding suggests that `Ask HN` posts tend to generate more engagement and participation from users compared to `Show HN` posts. One possible explanation for this is that `Ask HN` posts invite users to share their opinions, experiences, and knowledge, which can stimulate discussion and debate. On the other hand, `Show HN` posts typically showcase a product or project, which may not necessarily encourage users to engage in a conversation or share their thoughts.**


# Determine if ask posts created at a certain time are more likely to attract comments. 
## We'll use the following steps to perform this analysis:

* Calculate the number of ask posts created in each hour of the day, along with the number of comments received.
* Calculate the average number of comments ask posts receive by hour created.


In [139]:
import datetime as dt

result_list = []

for row in ask_posts:
    date_created = row[6]
    num_comments = int(row[4])
    result_list.append([date_created, num_comments])
    
    
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    num_comments = row[1]
    hour =  dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M") 
    hour = hour.strftime("%H")
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = num_comments
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += num_comments
        
        

In [144]:
avg_by_hour = []

for hour in counts_by_hour:
    avg_by_hour.append([hour, int(comments_by_hour[hour]) / int(counts_by_hour[hour])])

avg_by_hour

[['09', 5.5777777777777775],
 ['13', 14.741176470588234],
 ['10', 13.440677966101696],
 ['14', 13.233644859813085],
 ['16', 16.796296296296298],
 ['23', 7.985294117647059],
 ['12', 9.41095890410959],
 ['17', 11.46],
 ['15', 38.5948275862069],
 ['21', 16.009174311926607],
 ['20', 21.525],
 ['02', 23.810344827586206],
 ['18', 13.20183486238532],
 ['03', 7.796296296296297],
 ['05', 10.08695652173913],
 ['19', 10.8],
 ['01', 11.383333333333333],
 ['22', 6.746478873239437],
 ['08', 10.25],
 ['04', 7.170212765957447],
 ['00', 8.127272727272727],
 ['06', 9.022727272727273],
 ['07', 7.852941176470588],
 ['11', 11.051724137931034]]

In [165]:
swap_avg_by_hour = []

for row in avg_by_hour:
    swap_avg_by_hour.append([row[1], row[0]])

sorted_swap = sorted(swap_avg_by_hour, reverse=True)

print("Top 5 Hours for Ask Posts Comments")

for row in sorted_swap[:5]:
    hour = dt.datetime.strptime(row[1], "%H")
    hour = hour.strftime("%H:%M")
    average = row[0]
    print("{time}: {avg:.2f} average comments per post".format(time=hour, avg=average))

Top 5 Hours for Ask Posts Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post


# Optimal Hours for Ask Posts: Analysis of Top 5 Hours for Maximum Comments

**According to the given outcome, the top 5 hours for Ask Posts comments are 15:00, 02:00, 20:00, 16:00, and 21:00, with the highest average comments per post received at 15:00 with an average of 38.59 comments per post.**

**Therefore, to increase the chances of receiving comments on a post, it is recommended to create an Ask post during the peak hours.**

**It is important to note that the given outcome is based on the data of Ask posts only, and the peak hours may vary for other types of posts. However, this information can be used as a general guideline to maximize the chances of receiving high comments on Ask posts.**