# Is My Online Post Likely to Receive A Response?

## Synopsis

In this project, I will be analyzing posts made to the **hacker news** community to derive insights into what makes a post popular. Specifically, I'll be looking at what types of posts receive more comments on average, and if the average number of comments are dependent on the time of posting.

Let's begin by importing the import modules for our analysis

In [1]:
import datetime as dt
from csv import reader
import pprint as pp

Open the CSV file with the Hacker News data and quickly view some of the data

In [2]:
with open('hacker_news.csv') as f:
    hn = list(reader(f))

In [3]:
pp.pprint(hn[0:5],indent=3)

Isolate the header information

In [4]:
headers = hn[0]
del hn[0]

In [5]:
headers

In [6]:
hn[0:5]

In [7]:
# Separate the "Ask HN" and "show HN" posts 
ask_posts =[]
show_posts = []
other_posts = []

for row in hn:
    title = row[1]
    title = title.lower()
    
    if title.startswith("ask hn"):
        ask_posts.append(row)
    elif title.startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)

In [8]:
len(ask_posts), len(show_posts), len(other_posts)

In [9]:
ask_posts[0:5]

In [10]:
show_posts[0:5]

In [11]:
# find the total number of ask comments and its average

# Ask comments
total_ask_comments = 0

for row in ask_posts:
    total_ask_comments += int(row[4])

avg_ask_comments = total_ask_comments/len(ask_posts)
total_ask_comments, avg_ask_comments

In [12]:
# find the total number of show comments and its average

# Show comments
total_show_comments = 0

for row in show_posts:
    total_show_comments += int(row[4])

avg_show_comments = total_show_comments/len(show_posts)
total_show_comments, avg_show_comments

It seems on average posts where questions are asked received **more** comments compared to posts showing the hacker news community projects, products, etc. We will focus on ask posts for the remainder of the analysis.

The next question of interest is: **Does time play a role in determining how many comments a posts gets?** i.e determine if ask posts creaetd at a certain time are more likely to attract comments. How can we determine this?

1. Calculate the amount of ask posts created in each hour of the day, along with the number of comments received
2. Calculate the average number of comments ask posts receive by hour created.

To obtain the hourly posts, I will parse the time data as datetime objects, and extract their hourly component from it

In [13]:
# Extract the time and number of comments for each post
result_list = []

for row in ask_posts:
    dates = [row[6], int(row[4])]
    result_list.append(dates)


In [14]:
result_list[1]

In [15]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    hour = row[0]
    hour = dt.datetime.strptime(hour,'%m/%d/%Y %H:%M')
    
    hour = hour.strftime('%H')
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]
    
    

I have segmented the posts by the hour of the day during which they were made. The resulting segmentation is saved in dictionaries with the hour as the key, and either total number of comments, or average comments as values. Let's go ahead and calculate the average number of comments for each hour of the day

In [16]:
avg_by_hour = []

for hour in counts_by_hour:
    avg = comments_by_hour[hour]/counts_by_hour[hour]
    avg_by_hour.append([hour, avg])

In [17]:
avg_by_hour

Our list contains the average number of comments received as a function of time, but its not easy to analyze. Let's sort it. Let's swap the values with the keys to allow for easy sorting

In [18]:
swap_avg_by_hours = []
for row in avg_by_hour:
    swap_avg_by_hours.append([row[1],row[0]])
   

In [19]:
swap_avg_by_hours

In [20]:
# Sort by average number of comments
sorted_swap = sorted(swap_avg_by_hours,reverse=True)

In [21]:
print("Top 5 Hours for Ask Posts Comments")

for i in range(5):
    print("{} : {:.2f} average comments per post".format(sorted_swap[i][1],
                                                     sorted_swap[i][0]))

From our analysis, the highest average comments per post on average occur at **3 pm Eastern time**. That would be 12 pm pacific time. So posting around this time raises the probability of getting lots of replies

## Conclusion
I have performed a simple analysis of social media data, specifically from **Hacker news** platform. My analysis showed that `Ask` posts had more user engagement than `Show`type posts. Additionally, posting around mid-afternoon (Eastern time) resulted in the highest average user comments per post. 

Of course we could always go further. I only focused on the Ask type posts, and tallied the number of comments. Future work could also look at other attributes such as the number of points received per post, or created a frequency table for the most popular website domains posted. 