# Project - HN
In this project, we'll work with a dataset of submissions to popular technology site Hacker News.
To find the best times to post, to find the average number of comments on each type of post - (Show and Ask HN post).

### 1. Introduction - Import the dataset.

In [8]:
from csv import reader

with open("hacker_news.csv") as f:
    red_file = reader(f, delimiter = ',')
    dataset = list(red_file)
    print(dataset[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'], ['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01']]


### 2. Removing Headers from a List of Lists.

In [9]:
#Removing the header row from the main dataset.
headers = dataset[:1]
hn = dataset[1:]
print(headers)
print("\n")
print(hn[:5])

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']]


[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', "Florida DJs May Face Felony for April Fools' Water Joke", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]


Next, we will separate the Ask HN and Show HN posts from the rest of the posts, as these kinds of posts are different from the usual Hacker news post.

### 3. Extracting Ask HN and Show HN Posts.

In [10]:
ask_posts = []
show_posts = []
other_posts = []

for row in hn:
    if row[1].lower().startswith("ask hn"):
        ask_posts.append(row)
    elif row[1].lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
print(len(ask_posts))
print(len(show_posts))
print(len(other_posts))

1744
1162
17194


### 4. Calculating the Average Number of Comments for Ask HN and Show HN Posts.

In [11]:
total_ask_comments = 0
total_show_comments = 0
count = 0

for row in ask_posts:
    total_ask_comments += int(row[4]) # ind 4 contains the num_comments.
    avg = total_ask_comments/len(ask_posts)
print("Average comments count for Ask HN posts is:", round(avg,2))

for row in show_posts:
    total_show_comments += int(row[4])
print("Average number of comments for the Show HN posts is:",round(total_show_comments/len(show_posts),2))

Average comments count for Ask HN posts is: 14.04
Average number of comments for the Show HN posts is: 10.32


Above, we see that **Ask HN** posts have the highest average comments/responses among the Ask and Show HN posts. That is, it can be said from the observed data that each ASK posts receive at least **14.04 comments on average.**

Next, we'll determine if ask posts created at a certain time are more likely to attract comments. We'll use the following steps to perform this analysis:
<ol>
   <li> Calculate the number of ask posts created in each hour of the day, along with the number of comments received.</li>
    <li> Calculate the average number of comments ask posts receive by hour created.</li>
</ol>

### 5. Finding the Number of Ask Posts and Comments by Hour Created.

In [12]:
# To calculate the number of ask posts created per hour, along with the total number of comments.

import datetime as dt

result_list = []
for row in ask_posts:
    
    # appending as a list of list to result_list.
    result_list.append(
        [row[6], int(row[4])]
    )
    #ind 6 is the Created date, and ind 4 is the num_comments.
    
counts_by_hour = {} # to count the num. of ask posts created for the hour.
comments_by_hour = {} # to count the num. of comments created for the hour.

for row in result_list: 
    
    #Converting to dt obj to extract only the Hour.
    time = dt.datetime.strptime(row[0], "%m/%d/%Y %H:%M").strftime("%H")
    comment = row[1]
    
    # Populating the dicts. with hours' freq./num_posts and num_commnts for each hour.
    if time in counts_by_hour:        
        comments_by_hour[time] += comment
        counts_by_hour[time] += 1  
        #print(comments_by_hour)
    else:
        comments_by_hour[time] = comment 
        counts_by_hour[time] = 1        
        
comments_by_hour

{'09': 251,
 '13': 1253,
 '10': 793,
 '14': 1416,
 '16': 1814,
 '23': 543,
 '12': 687,
 '17': 1146,
 '15': 4477,
 '21': 1745,
 '20': 1722,
 '02': 1381,
 '18': 1439,
 '03': 421,
 '05': 464,
 '19': 1188,
 '01': 683,
 '22': 479,
 '08': 492,
 '04': 337,
 '00': 447,
 '06': 397,
 '07': 267,
 '11': 641}

In [14]:
counts_by_hour #num. of posts by hour created.

{'09': 45,
 '13': 85,
 '10': 59,
 '14': 107,
 '16': 108,
 '23': 68,
 '12': 73,
 '17': 100,
 '15': 116,
 '21': 109,
 '20': 80,
 '02': 58,
 '18': 109,
 '03': 54,
 '05': 46,
 '19': 110,
 '01': 60,
 '22': 71,
 '08': 48,
 '04': 47,
 '00': 55,
 '06': 44,
 '07': 34,
 '11': 58}

In [18]:
print(type(comments_by_hour['10']))
print(type(len(comments_by_hour)))

<class 'int'>
<class 'int'>


### 6. Calculating the Average Number of Comments for Ask HN Posts by Hour.

Here we will proceed to calculate the average number of comments per post for posts created during each hour of the day.

In [21]:
avg_by_hour = []

total_comments = 0
total_posts = 0
for hour in comments_by_hour:
    
    #total_comments += comments_by_hour[hour] # Totalling has been already done in the dicts.
    #total_posts += counts_by_hour[hour]
    avg = comments_by_hour[hour]/counts_by_hour[hour]
    
    avg_by_hour.append([hour, avg])
     
for row in avg_by_hour: #avg_by_hour is a list of lists.
    print(row[0],':',row[1]) 

09 : 5.5777777777777775
13 : 14.741176470588234
10 : 13.440677966101696
14 : 13.233644859813085
16 : 16.796296296296298
23 : 7.985294117647059
12 : 9.41095890410959
17 : 11.46
15 : 38.5948275862069
21 : 16.009174311926607
20 : 21.525
02 : 23.810344827586206
18 : 13.20183486238532
03 : 7.796296296296297
05 : 10.08695652173913
19 : 10.8
01 : 11.383333333333333
22 : 6.746478873239437
08 : 10.25
04 : 7.170212765957447
00 : 8.127272727272727
06 : 9.022727272727273
07 : 7.852941176470588
11 : 11.051724137931034


Above, as the objective is to find the average numbers of comments for the particular hours, each comments and posts for that hour was totalled in the freq dicts.
Finally dividing each value with the post_counts at that hour would result in average comments per posts at that hour.

### 7. Sorting and Printing Values from a List of Lists.

In [22]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_avg_by_hour.append([row[1],row[0]])
sorted_swap = sorted(swap_avg_by_hour, reverse = True)

print("Top 5 Hours for Ask Post Comments")
for i in sorted_swap[:5]:
    print("{}: {:.2f} average comments per post".format(dt.datetime.strptime(i[1], "%H").strftime("%H:%M"),i[0]))

Top 5 Hours for Ask Post Comments
15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post



The hour that receives the most comments per post on average is **15:00**, with an average of **38.59 comments per post**. There's about a 60% increase in the number of comments between the hours with the highest and second highest average number of comments.


## **Conclusion.**

In this project, we analyzed ask posts and show posts to determine which type of post and time receive the most comments on average. Based on our analysis, to maximize the amount of comments a post receives, we'd recommend the post be **categorized as ask post** and created **between 15:00 and 16:00 (3:00 pm est - 4:00 pm est).**

However, it should be noted that the data set we analyzed excluded posts without any comments. Given that, it's more accurate to say that of the posts that received comments, ask posts received more comments on average and ask posts created between 15:00 and 16:00 (3:00 pm est - 4:00 pm est) received the most comments on average.


#### **Some next steps to consider:**
<ol>
    <li>Determine if show or ask posts receive more points on average.</li>
    <li>Determine if posts created at a certain time are more likely to receive more points.</li>
    <li>Compare the results to the average number of comments and points other posts receive. </li>
</ol>    

#### 1. Determine if show or ask posts receive more points on average.

In [61]:
total_points = 0
tot = 0
oth = 0
oth_c = 0
for row in ask_posts:
    total_points += int(row[3])
print("Average points for Ask posts are:", round(total_points/len(ask_posts),2))

for row in show_posts:
    tot += int(row[3])
print("Average points for Show posts are:", round(tot/len(show_posts),2))

for row in other_posts:
    oth += int(row[3])
print("Average points for Other posts are:", round(oth/len(other_posts),2))


for row in other_posts:
    oth_c += int(row[4])
print("Average comments for Other posts are:", round(oth_c/len(other_posts),2))



Average points for Ask posts are: 15.06
Average points for Show posts are: 27.56
Average points for Other posts are: 55.41
Average comments for Other posts are: 26.87


Above, we see Show HN posts have the average points of 27.56 Also, we see that the average points and comments for **Other posts** apart from Ask and Show HN posts have the **highest average** among all three types of posts.

#### 2. Next, let us determine if posts created at a certain time are more likely to receive more points.

In [47]:
def best_postpoints_time(posts):
    points_at_hours = {}

    for row in posts:

        time = dt.datetime.strptime(row[-1], "%m/%d/%Y %H:%M").strftime("%H")

        if time in points_at_hours:
            points_at_hours[time] += int(row[3])
        else:
            points_at_hours[time] = int(row[3])

    #Next, we will try to sort the points by hours in a descending order to find the highest point hours.    
    points_list = []
    for key in points_at_hours:
        points_list.append([points_at_hours[key], key])
    points_list = sorted(points_list, reverse = True)

    # The below loop will print the points.
    for row in points_list[:5]:
        print(dt.datetime.strptime(row[1], "%H").strftime("%H:%M"),":",row[0])

In [59]:
print("Show posts and their best time for highest points")
best_postpoints_time(show_posts)
print("\nAsk posts and their best time for highest points")
best_postpoints_time(ask_posts)
print("\nOther posts and their best time for highest points")
best_postpoints_time(other_posts)

Show posts and their best time for highest points
16:00 : 2634
12:00 : 2543
17:00 : 2521
13:00 : 2438
15:00 : 2228

Ask posts and their best time for highest points
15:00 : 3479
16:00 : 2522
13:00 : 2062
17:00 : 1941
18:00 : 1741

Other posts and their best time for highest points
17:00 : 67777
15:00 : 62964
16:00 : 59655
14:00 : 59191
19:00 : 58811


Thus, we see at **16:00 EST**, the posts created have the highest points for Show posts and **15:00 EST**, for Ask posts, **17:00 EST**, for all other posts.

### 3. Conclusion on Extras.
To conclude on this extra exploration of the dataset. We see that the average points and comments for **Other posts** apart from Ask and Show HN posts have the **highest average** among all three types of posts.