### Analyzing Hacker News Posts

In this project we are going to focus on analyzing the dataset from a popular technology site [Hacker News](https://news.ycombinator.com/).Hacker news is website is a popular in technology circles with many posts and comments.The dataset can be found [here](https://www.kaggle.com/datasets/hacker-news/hacker-news-posts)

In [1]:
from csv import reader
opened_file = open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)

In [2]:
##displaying the first 5 rows
hn[0:5]

[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],
 ['12224879',
  'Interactive Dynamic Video',
  'http://www.interactivedynamicvideo.com/',
  '386',
  '52',
  'ne0phyte',
  '8/4/2016 11:52'],
 ['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01']]

In [3]:
###displaying the headers
headers = hn[0]
hn = hn[1:]
print(headers)

['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']


In [4]:
#display first 5 rows
hn[1:5]

[['10975351',
  'How to Use Open Source and Shut the Fuck Up at the Same Time',
  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',
  '39',
  '10',
  'josep2',
  '1/26/2016 19:30'],
 ['11964716',
  "Florida DJs May Face Felony for April Fools' Water Joke",
  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',
  '2',
  '1',
  'vezycash',
  '6/23/2016 22:20'],
 ['11919867',
  'Technology ventures: From Idea to Enterprise',
  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',
  '3',
  '1',
  'hswarna',
  '6/17/2016 0:01'],
 ['10301696',
  'Note by Note: The Making of Steinway L1037 (2007)',
  'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0',
  '8',
  '2',
  'walterbell',
  '9/30/2015 4:12']]

In [5]:
ask_posts = []
show_posts = []
other_posts = []
for row in hn:
    title = row[1]
    if title.lower().startswith("ask hn"):
        ask_posts.append(row)
    elif title.lower().startswith("show hn"):
        show_posts.append(row)
    else:
        other_posts.append(row)
##checking the number of posts        
print("Number of posts in ask_posts:",len(ask_posts))
print("Number of posts in show_posts:",len(show_posts))
print("Number of posts in other_posts:",len(other_posts))

Number of posts in ask_posts: 1744
Number of posts in show_posts: 1162
Number of posts in other_posts: 17194


**Calculating the Average Number of Comments for Ask HN and Show HN Posts**

In [6]:
##getting the total number of comments in ask posts
total_ask_comments = 0
for comments in ask_posts:
    num_comments = int(comments[4])
    total_ask_comments += num_comments

Calculating the average number of comments

In [7]:
avg_ask_comments = (total_ask_comments / len(ask_posts))
print("Average number of comments on ask posts: {:.4f}".format(avg_ask_comments))

Average number of comments on ask posts: 14.0384


In [8]:
##getting total number of comments in show post
total_show_comments = 0
for row in show_posts:
    num_comments = int(row[4])
    total_show_comments += num_comments
#calculating the average of the comments in show post
avg_show_comments = total_show_comments / len(show_posts)
print("Average number of comments on show : {:.4f}".format(avg_show_comments))

Average number of comments on show : 10.3167


According to the average results,ask posts tends to have more comments which appears to be an average of 14 comments per post while the show post has an average of 10 commensts per posts.
This shows that the ask posts tends to have more engagement compared to other posts.

### Finding the Number of Ask Posts and Comments by Hour Created

In [9]:
import datetime as dt
result_list = []

for row in ask_posts:
    created_at = row[6]
    num_comments = int(row[4])
    dates_and_num_comments = [created_at, num_comments]
    result_list.append(dates_and_num_comments)
##printing 5 rows from result list    
print(result_list[:5])

[['8/16/2016 9:55', 6], ['11/22/2015 13:43', 29], ['5/2/2016 10:14', 1], ['8/2/2016 14:20', 3], ['10/15/2015 16:38', 17]]


In [10]:
counts_by_hour = {}
comments_by_hour = {}

for row in result_list:
    date_time = row[0]
    # parse date and time from string
    date_obj = dt.datetime.strptime(date_time, '%m/%d/%Y %H:%M')
    
    # extract time and then hours
    time_hrs_min = date_obj.time()
    hour = time_hrs_min.hour
    
    if hour not in counts_by_hour:
        counts_by_hour[hour] = 1
        comments_by_hour[hour] = row[1]
    else:
        counts_by_hour[hour] += 1
        comments_by_hour[hour] += row[1]

In [11]:
print(counts_by_hour)

{9: 45, 13: 85, 10: 59, 14: 107, 16: 108, 23: 68, 12: 73, 17: 100, 15: 116, 21: 109, 20: 80, 2: 58, 18: 109, 3: 54, 5: 46, 19: 110, 1: 60, 22: 71, 8: 48, 4: 47, 0: 55, 6: 44, 7: 34, 11: 58}


In [12]:
print(comments_by_hour)

{9: 251, 13: 1253, 10: 793, 14: 1416, 16: 1814, 23: 543, 12: 687, 17: 1146, 15: 4477, 21: 1745, 20: 1722, 2: 1381, 18: 1439, 3: 421, 5: 464, 19: 1188, 1: 683, 22: 479, 8: 492, 4: 337, 0: 447, 6: 397, 7: 267, 11: 641}


###  Calculating the Average Number of Comments for Ask HN Posts by Hour

In [13]:
avg_by_hour = []

for hour in counts_by_hour:
    num_hour = counts_by_hour[hour]
    for comment_hour in comments_by_hour:
        if comment_hour == hour:
            num_comment = comments_by_hour[comment_hour]
     
    # create list of hour and average 
    avg_comment_hour = [hour, num_comment / num_hour]
    avg_by_hour.append(avg_comment_hour)  

In [14]:
print(avg_by_hour)

[[9, 5.5777777777777775], [13, 14.741176470588234], [10, 13.440677966101696], [14, 13.233644859813085], [16, 16.796296296296298], [23, 7.985294117647059], [12, 9.41095890410959], [17, 11.46], [15, 38.5948275862069], [21, 16.009174311926607], [20, 21.525], [2, 23.810344827586206], [18, 13.20183486238532], [3, 7.796296296296297], [5, 10.08695652173913], [19, 10.8], [1, 11.383333333333333], [22, 6.746478873239437], [8, 10.25], [4, 7.170212765957447], [0, 8.127272727272727], [6, 9.022727272727273], [7, 7.852941176470588], [11, 11.051724137931034]]


### Sorting and Printing Values from a List of Lists

In [15]:
swap_avg_by_hour = []
for row in avg_by_hour:
    swap_comment_hour = [row[1], row[0]]
    swap_avg_by_hour.append(swap_comment_hour)

In [16]:
print(swap_avg_by_hour)

[[5.5777777777777775, 9], [14.741176470588234, 13], [13.440677966101696, 10], [13.233644859813085, 14], [16.796296296296298, 16], [7.985294117647059, 23], [9.41095890410959, 12], [11.46, 17], [38.5948275862069, 15], [16.009174311926607, 21], [21.525, 20], [23.810344827586206, 2], [13.20183486238532, 18], [7.796296296296297, 3], [10.08695652173913, 5], [10.8, 19], [11.383333333333333, 1], [6.746478873239437, 22], [10.25, 8], [7.170212765957447, 4], [8.127272727272727, 0], [9.022727272727273, 6], [7.852941176470588, 7], [11.051724137931034, 11]]


In [17]:
sorted_swap  = sorted(swap_avg_by_hour, reverse=True)

In [18]:
print(sorted_swap[:5])

[[38.5948275862069, 15], [23.810344827586206, 2], [21.525, 20], [16.796296296296298, 16], [16.009174311926607, 21]]


In [19]:
for item in sorted_swap[:6]:
    avg_comment = item[0]
    hour = dt.datetime.strptime(str(item[1]), "%H")
    hour_format = hour.strftime("%H:%M")
    template = "{}: {:.2f} average comments per post"
    print(template.format(hour_format, avg_comment))


15:00: 38.59 average comments per post
02:00: 23.81 average comments per post
20:00: 21.52 average comments per post
16:00: 16.80 average comments per post
21:00: 16.01 average comments per post
13:00: 14.74 average comments per post


**Conclusion**

According to the average posts, we come to a conclusion that 3pm tends to be the most convenient time for one to post to have more engagements.As we notice that there is ana average of 38 comments per post.
2am also has a higher engagement and is among the top 5 most convenient time for one to post.We notice that evenings and at night are the most favourable time for one to post to attract comments an