# Exploring Hacker News Posts

In the world of technology and online communities, Hacker News stands as a prominent platform where individuals share and discuss the latest developments, projects, and ideas. In this data analysis project, we delve into a treasure trove of insights contained within a CSV dataset sourced from Hacker News. 

With this dataset at our disposal, we aim to uncover trends, patterns, and valuable information regarding the discussions and user interactions on this platforms. 

### Storing data in the csv to a list

In [1]:
# Import the 'reader' class from the 'csv' module
from csv import reader

# Open the CSV file named 'hacker_news.csv' in read mode
open_file = open('hacker_news.csv')

# Create a 'reader' object to read the CSV file
read_file = reader(open_file)

# Read the CSV data and store it in the 'hn' list
hn = list(read_file)

# Remove the header row, which contains column names
hn = hn[1:]

# Now, 'hn' contains the CSV data without the header row

### Extracting Ask HN and Show HN Posts
Since we're only concerned with post titles beginning with Ask HN or Show HN, we'll create new lists of lists containing just the data for those titles.

In [2]:
ask_posts = []     # List to store "Ask HN" posts
show_posts = []    # List to store "Show HN" posts
other_posts = []   # List to store other posts

# Iterate through each row in the 'hn' list
for row in hn:
    title = row[1]  # Extract the title from the current row
    title = title.lower()  # Convert the title to lowercase for case-insensitive comparison
    
    # Check if the title starts with "ask hn"
    if title.startswith('ask hn'):
        ask_posts.append(row)  # Append the row to the 'ask_posts' list
    
    # Check if the title starts with "show hn"
    elif title.startswith('show hn'):
        show_posts.append(row)  # Append the row to the 'show_posts' list
    
    # If the title doesn't match either "Ask HN" or "Show HN", categorize it as 'other'
    else:
        other_posts.append(row)  # Append the row to the 'other_posts' list

### Calculating the Average Number of Comments for Ask HN and Show HN Posts
let's determine if ask posts or show posts receive more comments on average.

In [3]:
total_ask_comments = 0

for post in ask_posts:
    num_comments = int(post[4])  # Comments count is in the 5th column (index 4)
    total_ask_comments += num_comments

# Calculate the average number of comments
avg_ask_comments = total_ask_comments / len(ask_posts)

# Now, avg_show_comments contains the average number of comments on show posts
print('Average num of comments for ask_posts:', avg_ask_comments)

#-------------show_posts-----------------
total_show_comments = 0

for post in show_posts:
    num_comments = int(post[4])  # Comments count is in the 5th column (index 4)
    total_show_comments += num_comments

# Calculate the average number of comments
avg_show_comments = total_show_comments / len(show_posts)

# Now, avg_show_comments contains the average number of comments on the show posts
print('Average num of comments for show_posts:', avg_show_comments)

Average num of comments for ask_posts: 14.038417431192661
Average num of comments for show_posts: 10.31669535283993


From the result above, you can see that ask_posts on average receive more comment than show_posts

### Finding the Number of Ask Posts and Comments by Hour Created

We finding this because we want to  determine if ask posts created at a certain time are more likely to attract comments and also the number of comments ask posts receives by hour created

In [4]:
import datetime as dt

result_list = []
for post in ask_posts:
    created_at = post[6]
    num_comments = int(post[4])
    result_list.append([created_at, num_comments])

counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date_string = row[0]
    comment_count = row[1]
    
    # Parse the date string into a datetime object
    date_obj = dt.datetime.strptime(date_string, "%m/%d/%Y %H:%M")
    
    # Extract the hour from the datetime object
    hour = date_obj.strftime("%H")
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = comment_count
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += comment_count


This result shows us the number of ask posts created during each hour of the day and the corresponding number of comments it received at each hour

In [5]:
from datetime import datetime

avg_by_hour = []

for hour in counts_by_hour:
    average_comments = comments_by_hour[hour] / counts_by_hour[hour]
    hour_datetime = datetime.strptime(hour, "%H")
    formatted_hour = hour_datetime.strftime("%H:%M")
    avg_by_hour.append([formatted_hour, average_comments])

# Display the results
for hour_avg in avg_by_hour:
    print(f"{hour_avg[0]}: {hour_avg[1]:.2f} average comments per post")

09:00: 5.58 average comments per post
13:00: 14.74 average comments per post
10:00: 13.44 average comments per post
14:00: 13.23 average comments per post
16:00: 16.80 average comments per post
23:00: 7.99 average comments per post
12:00: 9.41 average comments per post
17:00: 11.46 average comments per post
15:00: 38.59 average comments per post
21:00: 16.01 average comments per post
20:00: 21.52 average comments per post
02:00: 23.81 average comments per post
18:00: 13.20 average comments per post
03:00: 7.80 average comments per post
05:00: 10.09 average comments per post
19:00: 10.80 average comments per post
01:00: 11.38 average comments per post
22:00: 6.75 average comments per post
08:00: 10.25 average comments per post
04:00: 7.17 average comments per post
00:00: 8.13 average comments per post
06:00: 9.02 average comments per post
07:00: 7.85 average comments per post
11:00: 11.05 average comments per post


### Top 5 Hours for 'Ask HN' Comments
- 15:00: 38.59 average comments per post
- 02:00: 23.81 average comments per post
- 20:00: 21.52 average comments per post
- 16:00: 16.80 average comments per post
- 21:00: 16.01 average comments per post

The time slot that receives the highest average comments per post is at 3:00 PM EST, with an average of 38.59 comments per post. There's approximately a 60% increase in the number of comments between this peak hour and the second-highest hour.

Based on the dataset's documentation, the timezone considered is Eastern Time in the US, so 15:00 can also be expressed as 3:00 PM EST.

## In conclusion, 
This project analyzed ask and show posts to identify which type and time of post receive the most comments on average. According to our analysis, for maximizing the number of comments a post receives, we recommend categorizing it as an 'ask' post and creating it between 3:00 PM and 4:00 PM EST. 

However, it's important to note that this analysis excluded posts without any comments. Therefore, it's more accurate to state that among posts with comments, 'ask' posts received the highest average comments, particularly those created between 3:00 PM and 4:00 PM EST."